Saturday 14 September 2013

Bad data cleanup activity steps in data warehousing

Step-1:Analysis
Step-2:Backup of bad data and the data which you are deleting from the original table.
Step-3:prepare the script for data deletion
Step-4:Run the delete script
Step-5:Reload the data again which ever u have delete

Step:1 

Analysis:

we need to do analysis on the data i.e we need to find out that where that bad data exists based on the business rules,like missing the relationship,unwanted data

step:2

once analysis is done then take a bad data into one table and compare that records with another table data,so that we can find if there is any bad data exist in other tables also.

Ex:

assume in person(parent table),person_inst_assignment table we found the bad data i.e there are some orphan records are exist in the table(parent is not there but chaild record is exist) for a particular source system.

tables: person,person_inst_assignment....etc

create table bad_person as select *from person where source_system_name='ABC'

checking in the remaining the tables where the same bad data exists.

select *from person_inst_assignment b where person_id in(select *from bad_person)a
where a.person_id= b.person_id

it will give the data if any matching data is there in the person_inst_assignment.like this you  need to check all the tables.

Step-3: Take a backup into new tables which the data you are deleting from the original table.

ex: i am taking a backup of person_inst_assignment

create table 
BKP_person_inst_assignment 
as 
(select *
from person_inst_assignment b 
where person_id in
(select *from bad_person)a
 where a.person_id= b.person_id)

the above query will create the backup table with the data which we want to deleting (i.e bad data)

if any other tables also having the data we need to follow the same steps like above query.

Step-4: Prepare the script for data  deletion 

Ex:

SELECT *
  FROM person_inst_assignment b
 WHERE EXISTS
          (SELECT 'X'
             FROM BKP_person_inst_assignment a
            WHERE a.person_id = b.person_id);

  
DELETE
  FROM person_inst_assignment b
 WHERE EXISTS
          (SELECT 'X'
             FROM BKP_person_inst_assignment a
            WHERE a.person_id = b.person_id);


the above queries counts need to march then only our query is correct.like this we need to prepare all the scripts for the tables which are containing the bad data.

Step-5: from informatica u can reload the data based on the Source Qualifier Query 


Friday 13 September 2013

ERROR:ORA-02292: integrity constraint () violated - child record found in informatica

ERROR:ORA-02292: integrity constraint (FDS.SYS_C00740002) violated - child record found


Reason: you are trying to delete the parent record which is having the chaild record

solution:

execute the below query with the constraint name :

select * from all_constraints where constraint_name='constraint name'

ex:

 select * from all_constraints where constraint_name='SYS_C00740002'

you can find the table name in the result set.for that table you need to disable the constraint or if that table is not needed you can drop  the table.

Sunday 28 July 2013

Database Error: Failed to connect to database using user [ ]and connection string [Postgres Liferay Source].].

Database Error: Failed to connect to database using user and connection string [Postgres Liferay Source].].

solution:

Below parameters were changed in odbc.ini file and the jobs succeeded subsequently.
 
EnableDescribeParam=0(was pointing to 1 earlier)

EncryptionMethod=1(was pointing to 0 earlier)

Primary checks before doing the above thing.


1. check are you able to connect to that data base,if you are able to connect to that data base but still its giveing error then you need to check the above parameter settings.

2.if you are not able to connect to that data base then there was a problem at data base side.

3.There may be a case that we need to check the ConnectionRetryCount in ODBC driver,this will occur because of the network latency so that the client is not able to sent the credentials with in the 5 seconds.

Try changing the ODBC parameter“ConnectionRetryCount” from 0 to 3.

postgres[20199]: [4-1] ERROR: could not send data to client: Broken pipe error in informatica/ERROR: could not send data to client: Broken pipe

ERROR:postgres[20199]: [4-1] ERROR: could not send data to client: Broken pipe


Solution:

we found that There was trace enable @ the ODBC level which was eating up resource causing process to choke and fail. All we did is disabling the trace for particular DSN.

if the Above one is not work check for the following:


0.Try to run the batch load(load the source data into batches) ,if batch load successful then most of the cases there is no data issue otherwise it is a data issue.


1.There may be a case that in source metadata changed i.e columns data types and sizes,so we need to check that source table once.

if it is the case we need to re import the source by deleting the already existing source and refresh the mapping and session.
2.if there is no issue with the source metadata ,then we need to check data discrepancy

i.e we need to check our source data for unsupported data mean that if there is any special characters are coming from source data in any column .

if it is the case we need to find that and according to that character we need to change the code page compatibility in informatica and try to load with the changes.

3.if the source/target table contains the CLOB data type try to change that and load once.


4.There may be case that there is a issue with DTM buffer size/commit interval then we need
to increase/decrease this according to the data.

5. if all the above is not work try to create a new mapping with the same business logic if it will not work contact to the informatica admin people they will do the informatica server level changes.






Tuesday 30 April 2013

informatica 9.5 fetures



Informatica PowerCenter Standard Edition:

Universal Data Access

  • Enables a single approach to accessing data in a wide variety of systems and formats
  • Provides access to a wide variety of data systems and formats (when complemented by Informatica PowerExchange), including structured, unstructured, and semistructured data; relational data; mainframe data, file data, standards-based data; data in packaged applications; NoSQL big data stores such as Hadoop HDFS; social data such as Facebook, LinkedIn, and Twitter; and message queue data

Data Integration Services

  • Accesses, transforms, and delivers data across the enterprise without generating code or having to hand-code data integration procedures
  • Scales to support large volumes of data over multiple concurrent sessions

Metadata-Driven Services

  • Defines, modifies, and enables reuse of data integration procedures by storing and managing data models, transformations, workflows, and other artifacts
  • Integrates data by combining metadata-driven data integration logic with a high-performance transformation engine

Integration Workflows

  • Manages and tracks the complex flow of data across transactional systems, databases, and other data sources through visual tools
  • Provides a built-in, time- or event-based scheduler for automated, “lights-out” operation and for IT teams to schedule and execute data integration that doesn’t intrude on business operations

Robust Development Tools

  • A Web-based, global administration console designed for distributed environments that centralizes all administrative, security, operational, and service-monitoring tasks
  • Reusable mapping templates for designing business/transformation logic in a modeling tool, such as Visio, and automatically generating data integration mappings
  • A codeless, object-based, visual development environment that supplies built-in parallel processing transformations, a robust expression language, and extensible interfaces
  • A comprehensive library of transformation objects for creating and extending context-independent, reusable integration objects
  • An extensive expression library for expanding the existing set of predefined expressions and creating user-defined functions
  • A robust visual debugger that speeds iterative development and minimizes error recovery logic for design and run-time environments

Reference Table Manager 

  • Provides an easy-to-use interface for business analysts and other subject matter experts to create, edit, and delete reference tables
  • Creates an audit trail documenting the changes to reference tables

Mapping Architect for Visio 

  • Automatically generates data integration mappings from best-practice templates and reverse-engineers existing mappings into reusable templates
  • Accelerates and automates design using a data-flow design tool based on Microsoft Visio and mapping templates based on Informatica’s implementation methodology, Informatica Velocity

Mapping Architect for Excel

  • Automatically generates data integration mappings from source and target metadata imported directly from the PowerCenter repository
  • Enables business and IT users to build specifications and generate mappings in one easy step using Microsoft Excel and a rich client interface

LDAP Integration

  • Supports secure authentication of users and groups through an enterprise directory system
  • Provides a choice between either PowerCenter or LDAP for authentication

Enterprise-Grade Security

  • Provides granular privileges and flexible permissions management to ensure security, particularly across large, globally distributed teams
  • Limits user access to only the minimal functionality and data they need, reducing the risk of potential security breaches

Operating System Profiles

  • Ensures secure access to file system resources through PowerCenter workflows executed by different departments and users
  • Tracks and monitors usage by specific groups within a shared environment, such as an Integration Competency Center (ICC), to facilitate departmental charge backs

Extended Variables and Parameterization

  • Provides flexible access to design and run-time metadata within the data mapping, session, and workflow logic

Informatica PowerCenter Advanced Edition:

Metadata Manager

  • Collects metadata across a data integration environment and provides a graphical view of the data flows and the tools to manage change within that environment
  • Shows developers an impact analysis of what data objects will be impacted by a proposed change before it is implemented

Business Glossary

  • Provides a business context to technical artifacts related to data integration, making it possible to catalog, govern, and use data consistently and efficiently
  • Provides collaboration features to speed up and improve the quality of communication between business and IT—reducing project delivery times and improving data integrity

Dynamic Partitioning

  • Enables concurrent data processing to enhance the performance and scalability of data integration environments
  • Supports multiple partition schemes, including parallelization through such multiple mechanisms as key range, hash algorithm-based, round robin, or file partitions

Workflow on Grid

  • Distributes workload across multiple servers and optimizes resource usage through adaptive load balancing to meet increased project demands
  • Allows for unlimited scalability through incremental investments in processing capacity

Team-Based Development

  • Fosters collaboration across different IT functions—such as development, quality assurance, and production administration—or across geographically disparate teams, whether onshore or offshore
  • Empowers data integration developers with version control and configuration management capabilities

Universal Data Access

  • Enables a single approach to accessing data in a wide variety of systems and formats
  • Provides access to a wide variety of data systems and formats (when complemented by Informatica PowerExchange), including structured, unstructured, and semistructured data; relational data; mainframe data; file data; standards-based data; and message queue data

Data Integration Services

  • Accesses, transforms, and delivers data across the enterprise without generating code or requiring hand-coded data integration procedures
  • Scales to support large volumes of data over multiple concurrent sessions

Metadata-Driven Services

  • Defines, modifies, and enables reuse of data integration procedures by storing and managing data models, transformations, workflows, and other artifacts
  • Integrates data by combining metadata-driven data integration logic with a high-performance transformation engine

Integration Workflows

  • Manages and tracks the complex flow of data across transactional systems, databases, and other data sources through visual tools
  • Provides a built-in, time- or event-based scheduler for automated, “lights-out” operation and for IT teams to schedule and execute data integration that doesn’t intrude on business operations

Re-entrant Integration Workflows

  • Enables IT teams to run a single instance of a data integration workflow multiple times to maximize limited computing capacity
  • Dynamically scales Web services deployments on a single grid node or multiple grid nodes to support thousands of concurrent service requests

Robust Development Tools

  • A Web-based, global administration console designed for distributed environments that centralizes all administrative, security, operational, and service-monitoring tasks
  • Reusable mapping templates for designing business/transformation logic in a modeling tool, such as Visio, and automatically generating data integration mappings
  • A codeless, object-based, visual development environment that supplies built-in parallel processing transformations, a robust expression language, and extensible interfaces
  • A comprehensive library of transformation objects for creating and extending context-independent, reusable integration objects
  • An extensive expression library for expanding the existing set of predefined expressions and creating user-defined functions
  • A robust visual debugger that speeds iterative development and minimizes error recovery logic for design and run-time environments

Reference Table Manager

  • Provides an easy-to-use interface for business analysts and other subject matter experts to create, edit, and delete reference tables
  • Creates an audit trail documenting the changes to the reference tables

Mapping Architect for Visio 

  • Automatically generates data integration mappings from best-practice templates and reverse-engineers existing mappings into reusable templates
  • Accelerates and automates design using a data-flow design tool based on Microsoft Visio and mapping templates based on Informatica’s implementation methodology, Informatica Velocity

Mapping Architect for Excel

  • Automatically generates data integration mappings from source and target metadata imported directly from the Informatica PowerCenter repository
  • Enables business and IT users to build specifications and generate mappings in one easy step using Microsoft Excel and a rich client interface

Support for Mixed Versions of Application Services

  • Supports mixed versions of the repository service, integration service, and Web services hub in a single InformaticaPowerCenter domain
  • Simplifies upgrade of application services to newer versions incrementally and administration of multiple application services from a single console

LDAP Integration

  • Supports secure authentication of users and groups through an enterprise directory system
  • Provides a choice between either Informatica PowerCenter or LDAP for authentication

Enterprise-Grade Security

  • Provides granular privileges and flexible permissions management to ensure security, particularly across large, globally distributed teams
  • Limits user access to only the minimal functionality and data they need, reducing the risk of potential security breaches

Operating System Profiles

  • Ensures secure access to file system resources through Informatica PowerCenter workflows executed by different departments and users
  • Tracks and monitors usage by specific groups within a shared environment, such as an Integration Competency Center (ICC), to facilitate departmental charge backs

Extended Variables and Parameterization

  • Provides flexible access to design and run-time metadata within the data mapping, session, and workflow logic

PowerCenter Real Time Edition:


Change Data Capture for Relational Data Sources

  • Captures changes in data sources as they happen for processing and routing to multiple target applications
  • Ensures referential data integrity of real-time change data streams

Zero-Latency Processing

  • Enables continuous data processing with a zero-latency engine
  • Supports batch, trickle-feed, and real-time data integration latencies in a single optimized environment

Reliable, Scalable Data Processing and Delivery

  • Horizontally scales message processing and delivery to handle massive real-time data volumes
  • Prevents data loss and data duplication even in system failure scenarios, and ensures referential data integrity for real-time change data streams

Dynamic Partitioning

  • Divides data processing into subsets, which run in parallel and are spread among available CPUs in a multiprocessor system
  • Provides visual controls for a wide range of partition schemes

Built-In Web Services Support

  • Leverages Web services protocols, including XML, WSDL, and SOAP, to access data, metadata, and data integration workflows within an SOA
  • Provides a Web services hub for centralized monitoring and reporting along-with built-in support for WS-Security, a built-in WSDL editor, and a Web services testing tool

Dynamically Scalable Data Services

  • Enables existing batch and real-time data transformation logic to be exposed and reused as data services
  • Automatically scales Web services to support a large number of concurrent workloads, typical of a shared services Integration Competency Center (ICC) environment

Universal Data Access

  • Enables a single approach to accessing data in a wide variety of systems and formats
  • Provides access to a wide variety of data systems and formats (when complemented by Informatica PowerExchange), including structured, unstructured, and semistructured data; relational data; mainframe data; file data; standards-based data; and message queue data

Data Integration Services

  • Accesses, transforms, and delivers data across the enterprise without generating code or having to hand-code data integration procedures
  • Scales to support large volumes of data over multiple concurrent sessions

Global Metadata Services

  • Defines, modifies, and enables reuse of data integration procedures by storing and managing data models, transformations, workflows, and other artifacts
  • Integrates data by combining metadata-driven data integration logic with a high-performance transformation engine

Integration Workflows

  • Manages and tracks the complex flow of data across transactional systems, databases, and other data sources through visual tools
  • Provides a built-in, time- or event-based scheduler for automated, “lights-out” operation and for IT teams to schedule and execute data integration that doesn’t intrude on business operations

Integration with Messaging Systems

  • Supports reading from and writing to such messaging systems as JMS (i.e., Informatica, Oracle, Sun, TIBCO, IBM, JBoss), webMethods X-Broker, IBM WebSphere MQ, and Microsoft MQ.

Robust Development Tools

  • A Web-based, global administration console designed for distributed environments that centralizes all administrative, security, operational, and service-monitoring tasks
  • Reusable mapping templates for designing business/transformation logic in a modeling tool, such as Visio, and automatically generating data integration mappings
  • A codeless, object-based, visual development environment that supplies built-in parallel processing transformations, a robust expression language, and extensible interfaces
  • A comprehensive library of transformation objects for creating and extending context-independent, reusable integration objects
  • An extensive expression library for expanding the existing set of predefined expressions and creating user-defined functions
  • A robust visual debugger that speeds iterative development and minimizes error recovery logic for design and run-time environments

Reference Table Manager

  • Provides an easy-to-use interface for business analysts and other subject matter experts to create, edit, and delete reference tables
  • Creates an audit trail documenting the changes to the reference tables

Mapping Architect for Visio

  • Automatically generates data integration mappings from best-practice templates and reverse-engineers existing mappings into reusable templates
  • Accelerates and automates design using a data-flow design tool based on Microsoft Visio and mapping templates based on Informatica’s implementation methodology, Informatica Velocity

Mapping Architect for Excel

  • Automatically generates data integration mappings from source and target metadata imported directly from the Informatica PowerCenter repository
  • Enables business and IT users to build specifications and generate mappings in one easy step using Microsoft Excel and a rich client interface

Support for Mixed Versions of Application Services

  • Supports mixed versions of the repository service, integration service, and Web services hubs in a single Informatica PowerCenter domain
  • Simplifies upgrade of application services to newer versions incrementally and administration of multiple application services from a single console

LDAP Integration

  • Supports secure authentication of users and groups through an enterprise directory system
  • Provides a choice between either Informatica PowerCenter or LDAP for authentication

Enterprise-Grade Security

  • Provides granular privileges and flexible permissions management to ensure security, particularly across large, globally distributed teams
  • Limits user access to only the minimal functionality and data they need, reducing the risk of potential security breaches

Operating System Profiles

  • Ensures secure access to file system resources through Informatica PowerCenter workflows executed by different departments and users
  • Tracks and monitors usage by specific groups within a shared environment, such as an Integration Competency Center (ICC), to facilitate departmental charge backs

Extended Variables and Parameterization

  • Provides flexible access to design and run-time metadata within the data mapping, session, and workflow logic

Powercenter Big data Edition:

o-Code Productivity

  • Replaces hand-coding within Hadoop through the visual Informatica development environment
  • Facilitates reuse and collaboration with a common integrated development environment (IDE)

Universal Data Access

  • Provides access to big transaction data, including RDBMS, OLTP, OLAP, ERP, CRM, mainframe, cloud, and others
  • Provides access to big interaction data, including social media data, log files, machine sensor data, Web sites, blogs, documents, emails, and other unstructured or multi-structured data

High-Speed Data Ingestion and Extraction

  • Accesses, loads, replicates, transforms, and extracts big data between source and target systems or directly into Hadoop or your data warehouse
  • Ingests and extracts data at high speeds with high-performance connectivity through native APIs to source and target systems with parallel processing

Unlimited Scalability

  • Processes all types of big data at any scale—from terabytes to petabytes
  • Eliminates having to develop specialized code on distributed computing platforms such as Hadoop

Optimized Performance for Lowest Cost

  • Deploys big data processing on the highest-performance and most cost-effective data processing platforms (based on data volumes, data type, latency requirements, and available hardware)
  • Maximizes current investments and capacity whether you deploy data processing on SMP machines, traditional grid clusters, distributed computing platforms like Hadoop, or data warehouse appliances

ETL on Hadoop

  • Provides an extensive library of prebuilt transformation capabilities on Hadoop, including data type conversions and string manipulations, high-performance cache-enabled lookups, joiners, sorters, routers, and aggregations
  • Includes a codeless graphical development environment where big data flows on Hadoop can be developed rapidly

Profiling on Hadoop

  • Profiles data on Hadoop through the Informatica developer tool and a browser-based analyst tool
  • Enables developers, analysts, and data scientists to understand data, identify data quality issues, collaborate on data flow specifications, and validate mapping transformation and rules logic

Design Once and Deploy Anywhere

  • Allows ETL developers to focus on data and transformation logic without having to worry where the ETL process is deployed—on Hadoop or traditional data processing platforms
  • Enables developers to design once, without any specialized knowledge of Hadoop concepts and languages, and easily deploy data flows on Hadoop or traditional systems

Complex Data Parsing on Hadoop

  • Makes it easy to access and parse complex, multi-structured, unstructured, and industry-standard data such as Web logs, JSON, XML, and machine device data
  • Prebuilt parsers for market data and industry standards like FIX, SWIFT, ACORD, HL7, HIPAA, and EDI are also available and licensed separately

Entity Extraction and Data Classification on Hadoop

  • Easily extracts and classifies entities related to your customers and products from unstructured data such as emails, social media data, and documents
  • Enriches master data with insights into customer behavior or product information such as competitive pricing

Mixed Workflows

  • Enables your IT team to easily coordinate, schedule, monitor, and manage all interrelated big data processes and workflows across your traditional and Hadoop environments
  • Lets you drill down into individual Hadoop jobs

High Availability

  • Provides 24x7 high availability with seamless failover, flexible recovery, and connection resilience
  • Ensures that new products and services developed using big data insights can scale and be available 24x7 for mission-critical operations

Informatica PowerCenter Data Virtualization Edition 

Role-Based Tools that Share Common Metadata for Instant Collaboration

  • A Web-based self-service analyst tool for analysts to define rules, profile data, and directly access the data they need, without IT intervention
  • A powerful Eclipse-based development environment for IT to graphically discover, access, merge profile, transform, optimize, and deploy reusable data services
  • A Web-based, global administration console designed for distributed environments that centralizes all administrative, security, operational, and service-monitoring tasks
  • Reusable mapping templates for designing business/transformation logic in a modeling tool, such as Visio, and automatically generating data integration mappings
  • A codeless, object-based, visual development environment that supplies built-in parallel processing transformations, a robust expression language, and extensible interfaces
  • A comprehensive library of transformation objects for creating and extending context-independent, reusable integration objects
  • An extensive expression library for expanding the existing set of predefined expressions and creating user-defined functions
  • A robust visual debugger that speeds iterative development and minimizes error recovery logic for design and run-time environments

Integrated, Any-Stage Data Profiling of Federated Data

  • Profiles federated data and integration logic at any stage, and document with valuable metadata
  • Enables sharing of profiling results from data sources and integration logic with business users to ensure requirements are being met early and throughout the process

Integrated, On-the-Fly Data Quality on Federated Data

  • Leverages pre-built native data quality rules (e.g., cleansing, enrichment, matching, and standardization) that can be reused across projects without rework
  • Executes data quality rules as a service on the fly and in real time to federated data, without pre-processing, staging, or physically consolidating the data

Reusable Logical Data Objects

  • Builds reusable logical data objects (LDO) based on business entities (e.g., CUSTOMER, PRODUCT, and ORDER) using an open, top-down and bottom-up model-based and metadata-driven approach
  • Quickly delivers a standards-based data abstraction layer to insulate consuming applications from changes in underlying data sources

Single Environment for Data Integration and Data Federation

  • Leverages a query-plan interface as well as numerous optimization and caching capabilities typically used in data federation
  • Develops and reuses the same data integration logic for data federation, ETL, or Web services – instantly and without any rework

Integrated Data Federation

  • Sources and combines data from multiple heterogeneous data sources into a single virtualized view in real-time, so that it appears as one data source to consuming applications
  • Leverages a high-performance data federation engine, which supports advanced transformations including data quality in real-time along with sophisticated caching techniques, optimizations and security typically needed in data federation

Built-In Web Services Support

  • Leverages Web services protocols, including XML, WSDL, and SOAP, REST to access data, metadata, and data integration workflows within an SOA
  • Provides a Web services hub for centralized monitoring and reporting along-with built-in support for WS-Security, a built-in WSDL editor, and a Web services testing tool

Dynamically Scalable Data Services

  • Enables existing batch and real-time data transformation logic to be exposed and reused as data services
  • Automatically scales Web services to support a large number of concurrent workloads, typical of a shared services Integration Competency Center (ICC) environment

Universal Data Access

  • Enables a single approach to accessing data in a wide variety of systems and formats
  • Provides access to a wide variety of data systems and formats (when complemented by Informatica PowerExchange), including structured, unstructured, and semistructured data; relational data; mainframe data; file data; and standards-based data

Data Integration Services

  • Accesses, transforms, and delivers data across the enterprise without generating code or having to hand-code data integration procedures
  • Scales to support large volumes of data over multiple concurrent sessions

Global Metadata Services

  • Defines, modifies, and enables reuse of data integration procedures by storing and managing data models, transformations, workflows, and other artifacts
  • Integrates data by combining metadata-driven data integration logic with a high-performance transformation engine

Integration Workflows

  • Manages and tracks the complex flow of data across transactional systems, databases, and other data sources through visual tools
  • Provides a built-in, time- or event-based scheduler for automated, “lights-out” operation and for IT teams to schedule and execute data integration that doesn’t intrude on business operations

Centralized Data Security and Governance

  • Creates policies and rules for data security, privacy, quality, and data freshness
  • Centrally enforces and manages policies and rules across federated data for all data services in a single, unified environment

Reference Table Manager

  • Provides an easy-to-use interface for business analysts and other subject matter experts to create, edit, and delete reference tables
  • Creates an audit trail documenting the changes to the reference tables

Mapping Architect for Visio

  • Automatically generates data integration mappings from best-practice templates and reverse-engineers existing mappings into reusable templates
  • Accelerates and automates design using a data-flow design tool based on Microsoft Visio and mapping templates based on Informatica’s implementation methodology, Informatica Velocity

LDAP Integration

  • Supports secure authentication of users and groups through an enterprise directory system
  • Provides a choice between either Informatica PowerCenter or LDAP for authentication

Enterprise-Grade Security

  • Provides granular privileges and flexible permissions management to ensure security, particularly across large, globally distributed teams
  • Limits user access to only the minimal functionality and data they need, reducing the risk of potential security breaches