DWH DEVIL

Informatica 10.1.0 includes the following new capabilities:

Big Data Management

PAM

HDP 2.3.x, HDP 2.4.x
CDH 5.5.x
MapR 5.1.x
HDInsight 3.3
IBM BI 4.1.x

Functionality

File Name Option: Ability to retrieve file name and path location from complex files, HDFs and flat files.
Parallel Workflow tasks
Run Time DDL Generation
New Hive Datatypes: Varchar/char datatypes in map reduce mode
BDM UTIL: Full Kerberos automation
Developer Tool enhancements

§ Generate a mapplet from connected transformations
§ Copy Paste-Replace Ports from/to Excel.
§ Search with auto-suggest in ports
§ “Create DDL” sql enhancements including parameterization.

Reuse

SQL To Mapping: Convert ANSI SQL with functions to BDM Mapping
Import / Export Frameworks Gaps: Teradata/Netezza adapter conversions from PC to BDM
Reuse Report

Connectivity
SQOOP: Full integration with SQOOP in map-reduce mode.
Teradata and Netezza partitioning: Teradata read/write and Netezza read/write partitioning support and Blaze mode of Execution.
Complex Files: Native support of Avro and Parquet using complex files.
Cloud Connectors: Azure DW, Azure Blob, Redshift connectors on map-reduce mode.

Performance
Blaze: We have done significant improvement in “Blaze 2.0” by enhancing performance and adding more connectors and transformations to run on Blaze. Some of the new features on Blaze are :

Performance

Map Side join (with persistence cache)
Map Side aggregator

Transformations

Unconnected lookup
Normalizer
Sequence generator
Aggregator pass through ports
Data Quality
Data Masking
Data Processor
Joiner with relaxed join condition for Map-side Joins. Earlier, only equal join is supported

§ Connectivity

Teradata
Netezza
Complex file reader/writer for limited cases
Compressed Hive source/target

§ Recovery

We do support partial recovery, though it is not enabled by default

Spark: Informatica BDM now fully supports Spark 1.5.1 in Cloudera and Hortonworks.

Security

Security Integration: Following features are added to support Infrastructure security on BDM:

Integration with Sentry & Ranger for Blaze mode of Execution.
Transparent Encryption support.
Kerberos: Automation through BDM UTIL.

OS Profile: Secured multi tenancy on the Hadoop cluster.

Platform

License compliance enhancements

License expiration warning messages to renew the license proactively
License core over-usage warning messages for compliance

Monitoring enhancements

Domain level resource usage trending
Click through to the actual job execution details from the summarized job run statistics reports
Historical run statistics of a job

Scheduler enhancements

Schedule Profile and Scorecard jobs
Schedules are now time zone aware

Security enhancements

OS Profiles support for BDM, DQ, IDL products - Security and isolation for job execution
Application management permissions – fine grained permissions on application/mapping and workflow objects

PowerCenter

Functionality

Drag and drop Target definition into Source Analyzer to create source definition in Designer
Enhancements to address display issues when using long names and client tools with dual monitor
SQL To Mapping: Use Developer tool to convert ANSI SQL with functions to PowerCenter Mapping
Command line enhancement to assign Integration service to workflows
Command line enhancement to support adding FTP connection
Pushdown optimization support for Greenplum

New connectors

Azure DW
Azure Blob

New Certifications

Oracle 12cR1
MS Excel 2013
MS Access 2013

Mainframe and CDC

New functionality

z/OS 2.2
z/OS CICS/TS 5.3
z/OS IMS V14 (Batch & CDC)
OPENLAP to extend security capabilities over more Linux, Unix & Windows platforms

Improved or Extended functionality
I5/OS SQL Generation for Source/Target Objects
z/OS DB2 Enhancements

IFI 306 Interest Filtering
Offloading support for DB2 Compressed Image Copy processing
Support for DB2 Flash Copy Images

Oracle Enhancements

Support of Direct Path Load options for Express CDC
Allow support of drop partition DDL to prevent CDC failures
Process Archived REDO Log Copies

Intelligent Metadata Generation (Createdatamaps)

Ability to apply record filtering by matching Metadata with physical data

Metadata Manager

Universal Connectivity Framework (UCF)

Connects to a wide range of metadata sources. The list of metadata sources is provided in Administrator Guide
A new bridge to a metadata source can be easily deployed. Documentation is provided to aid with the deployment
Linking, lineage and impact summary remains intact with Native connectivity
Connection based linking is available for any metadata source created via UCF
Rule Based and Enumerated linking is available for connectors created via UCF

Incremental Load Support for Oracle and Teradata

Improved load performance by extracting only changed artifacts from relational sources: Oracle and Teradata
Lesser load on metadata source databases compared to a full extraction
XConnect can run in full or incremental mode. Logs contain more details about extracted artifacts in the incremental mode

Enhanced Summary Lineage

A simplified lineage view to the business user without any mapping assets or stage assets in the flow
Users can drill down from the enhanced summary lineage to get to technical lineage
Users can go back to the summary view from the detailed lineage view

Profiling and Discovery

AVRO/Parquet Profiling

Profile Avro/Parquet files directly without creating Logical Data Objects for each of them
Profile on a file or a folder of files; within Big Data Environment or within Native file system
Support common Parquet compression mechanisms including Snappy
Support common Avro compression mechanisms including Snappy and Deflate
Execution mode of profiling Avro/Parquet files is available in Native, HIVE and Blaze mode

Operational Dashboards

The operational dashboard provides separate views of:

§ Number of scorecards
§ Data objects tracked by scorecards
§ Cumulative scorecard trend (acceptable/unacceptable) elements
§ Scorecard runs summary

Analyst user should be able to view the operational dashboard in the scorecard workspace

Scheduling Support for Profiles/Scorecards

Ability to schedule single/multiple profile(s)/scorecard(s)/Enterprise profile(s)
Performed from the UI in Administrator Console

Profiling/Scorecards on Blaze

Use Big Data infrastructure for Profiling Jobs
Running Profiling on Blaze is supported on both Analyst and Developer
Following jobs are supported in Blaze mode:

§ Column Profiling
§ Rule Profiling
§ Domain Discovery
§ Enterprise Profiling (Column and Domain)

Ability to use all sampling options when working in the Blaze mode: First N, Random N, Auto Random, All

Data Domain Discovery Enhancements

Ability to provide the number of records as a domain match criterion that allows detecting of domain matches even when there are a few records that match the criteria; especially useful when trying to match secure domains
Additional option to exclude NULL values from computation of Inference percentage

OS Profile Support

Providing Execution resource Isolation for Profiles and Scorecards
Configuration similar to PowerCenter/Platform for OS Profiles

Data Transformation 10.1.0

New 'Relational to Hierarchical' transformation in Developer
New REST API for executing DT services
Optimizations for reading & writing complex Avro and Parquet files

PAM (Platform – PC/Mercury)

Database Support Update : Added:

Oracle 12cR1
IBM DB2 10.5 Fix Pack 7

Web Browser Update:

Safari 9.0
Chrome51.x

Tomcat Support Update : v 7.0.68
JVM Support Update : Updated:

§ Oracle Java 1.8.0_77
§ IBM JDK 8.0

Enterprise Information Catalog

• New Connectivity

• File Scanner for HDFS(Cloudera, Hortonworks) and Amazon S3: Catalog supported files and fields in the data lake. Supported for CSV, XML and JSON file formats.

• Informatica Cloud: Extract lineage metadata from Informatica Cloud mappings.

• Microstrategy: Support for metadata extracts from Microstrategy

• Amazon Redshift: Support for metadata extract from Amazon Redshift.

• Hive: Added multi-schema lineage support for Hive

• Custom Lineage Scanner: Manually add links and link properties to existing objects in the catalog; document lineage from unsupported ETL tools and hand-coded integrations.

• Intelligence

• Semantic Search: Object type detection from search queries for targeted search results.

• Enhanced Domain Discovery: Granular controls in domain discovery like Record match and Ignore NULLs for accurate domain matching.

• User Experience Improvements

• Enhanced Catalog Home Page: Added new reports for Top 50 Assets in the organization, Trending Searches and Recently Viewed Assets by the user

• Enhance Administrator Home Page: New dashboard with widgets for task monitoring, resource views and unassigned connections.

• Performance Enhancements

• Profiling on Blaze: Run Profiling and Domain Discovery jobs on Hadoop for Big Data sets

• Incremental Profiling Support: Scanner jobs identify if the table has changed from the last discovery run and run profiling jobs only on the changed tables for selected sources (Oracle, DB2, SQL Server, and HDFS Files).

• ~4X Performance Improvement in scanning PowerCenter resources.

• ~30X search, search auto-suggest and sort performance improvements

• Deployment

• Added Support for Backup, Restore and Upgrade.

• Added Kerberos Support for embedded cluster

• Intelligent Email Alerts: To help administrators proactively take care of any potential stability issues in Catalog setup.

• EIC PAM

• RHEL 7 Support Added

• New versions added for existing scanners:

• Tableau 9.x

• MapR 5.1 Hive Scanner

• SAP Business Objects 4.1 SP4 through SP6

• New Scanners

• Amazon Redshift

• Amazon S3

• Informatica Cloud R25

• Microstrategy 10.x, 9.4.1, 9.3.1

Intelligent Data Lake (IDL)

In version 10.1, Informatica introduces a new product Intelligent Data Lake to help customers derive more value from their Hadoop based data lake and democratize the data for usage by all in the organization.

Intelligent Data Lake is a collaborative Self-service Big Data discovery and preparation solution for data analysts and data scientists to rapidly discover and turn raw data into insights with quality and governance, especially in a data lake environment.

This allows analysts to spend more time on analysis and less time on finding and preparing data. At the same time IT can ensure quality, visibility and governance.

Intelligent Data Lake provides the following benefits.

Data Analysts can quickly and easily find and explore trusted enterprise data assets within the data lake as well as outside the data lake using semantic search, knowledge graphs and smart recommendations.
Data Analysts can transform, cleanse, and enrich data in the data lake using an Excel-like spreadsheet interface in a self-service manner without need of coding skills.
Data Analysts can publish and share data as well as knowledge with rest of the community and analyze the data using their choice of BI or analytic tools.
IT and governance staff can monitor user activity related to data usage in the lake.
IT can track data lineage to verify that data is coming from the right sources and going to the right targets.
IT can enforce appropriate security and governance on the data lake
IT can operationalize the work done by data analysts into a data delivery process that can be repeated and scheduled.

Intelligent Data Lake has following features.

Search:

Find the data in the lake as well as in the other enterprise systems using smart search and inference based results.
Filter assets based on dynamic facets using system attributes and custom defined classifications.

Explore:

Get overview of assets including custom attributes, profiling stats for quality, data domains for business content and usage information.
Add business context information by Crowd-sourcing metadata enrichment and tagging.
Preview sample data to get a sense of the data asset based on user credentials.
Get lineage of assets to understand where data is coming from and where it is going to build trust.
Know how the data asset is related to other assets in the enterprise based on associations with other tables/views, users, reports, data domains etc.
Discovery assets unknown before with progressive discovery with lineage and relationship views.

Acquire:

Upload personal delimited files to the lake using a wizard based interface.
Hive tables are automatically created for the uploads in the most optimal format.
Create new, append or overwrite existing assets for uploaded data.

Collaborate:

Organize work by adding data assets to Projects.
Add collaborators to Projects with different roles such as Co-owner, Editor, Viewer etc. for different privileges.

Recommendations:

Improve productivity by using recommendations based on other users’ behaviors and reuse knowledge.
Get recommendations for alternate assets that can be used in the Project instead of what is added.
Get recommendations for additional assets that can be used in addition to what’s in the project.
Recommendations change based on what is in the project.

Prepare:

Use excel-like environment to interactively specify transformation using sample data.
See sheet level and column level overviews including value distributions, numeric/date distributions.
Add transformations in the form of Recipe steps and see immediately the result on the sheets.
Perform column level data cleansing, data transformation using string, math, date, logical operations.
Perform sheet level operations like Combine, Merge, Aggregate and Filter data.
Refresh sample in the worksheets if the data in the underlying tables changes.
Derive sheets from existing sheets and get alerts when parent sheets change.
All the transformation steps are stored in the Recipe which can be played back interactively.

Publish:

Use the power of underlying Hadoop system to run large scale data transformation without coding/scripting.
Run data preparation steps on actual large data sets in the lake to create new data assets.
Publish the data in the lake as a Hive table in desired database.
Create new, append or overwrite existing assets for published data.

Data asset operations:

Export data from the lake to CSV file
Copy data into another database or table.
Delete the data asset if allowed by user credentials.

My Activities:

Keep track of my upload activities and their status.
Keep track of publications and their status.
View log files in case of errors. Share with IT Administrators if needed.

IT Monitoring:

Keep track of User, Data asset and Project activities by building reports on top of Audit data base
Answer questions like Top Active Users, Top Datasets by Size, Last Update, Most Reused assets, Most Active projects etc.

IT Operationalization:

Operationalize the ad-hoc work done by Analysts.
User Informatica Developer tool to customize and optimize the Informatica BDM Mappings translated from the Recipe that Analyst created.
Deploy, schedule and monitor the Informatica BDM mappings to ensure data assets are delivered at the right time to the right destinations.
Make sure the entitlements in the data lake for access to various databases and tables are according to security policies.

Informatica Data Quality

Exception management

Data type based Search & replace enhancements
Non-default schema for exception tables for greater security and flexibility
Task ID for better reporting

Address Validation

IDQ now Integrated with AV 5.8.1
Ireland – Support for eircode (postal codes)
France – SNA Hexaligne 3 Data Support
UK – Roof top geocode

Execution

IDQ transformations can execute on Blaze
Workflow - parallel execution for enhanced performance

Visualization:

Scorecard dashboard for single, high level view of scorecards in the repository

click here to get the 10.1 New Features guide:

DWH DEVIL

Wednesday, 19 December 2018

How to run oracle procedure from informatica SQ presql

Thursday, 19 July 2018

how to copy file from FTP server to unix server using Python

how to upload file from unix server to ftp server using Python

Sunday, 19 February 2017

Inforamtica cloud Scenerios- Generate Fibonacci numbers using with inforamtica cloud mapping.

Thursday, 8 December 2016

Informatica 10.1 New features/ informatica 10.1 transformation guide

Blog Archive

About Me