Thursday 8 December 2016

Informatica 10.1 New features/ informatica 10.1 transformation guide

Informatica 10.1.0 includes the following new capabilities:

Big Data Management
  • PAM
    • HDP 2.3.x, HDP 2.4.x
    • CDH 5.5.x
    • MapR 5.1.x
    • HDInsight 3.3
    • IBM BI 4.1.x
  • Functionality
    • File Name Option: Ability to retrieve file name and path location from complex files, HDFs and flat files.
    • Parallel Workflow tasks
    • Run Time DDL Generation
    • New Hive Datatypes: Varchar/char datatypes in map reduce mode
    • BDM UTIL: Full Kerberos automation
    • Developer Tool enhancements
      • § Generate a mapplet from connected transformations
      • § Copy Paste-Replace Ports from/to Excel.
      • § Search with auto-suggest in ports
      • § “Create DDL” sql enhancements including parameterization.
  • Reuse
    • SQL To Mapping: Convert ANSI SQL with functions to BDM Mapping
    • Import / Export Frameworks Gaps: Teradata/Netezza adapter conversions from PC to BDM
    • Reuse Report
  • Connectivity
  • SQOOP: Full integration with SQOOP in map-reduce mode.
  • Teradata and Netezza partitioning: Teradata read/write and Netezza read/write partitioning support and Blaze mode of Execution.
  • Complex Files: Native support of Avro and Parquet using complex files.
  • Cloud Connectors: Azure DW, Azure Blob, Redshift connectors on map-reduce mode.

  • Performance
  • Blaze: We have done significant improvement in “Blaze 2.0” by enhancing performance and adding more connectors and transformations to run on Blaze. Some of the new features on Blaze are :
    • Performance
      • Map Side join (with persistence cache)
      • Map Side aggregator
    • Transformations
      • Unconnected lookup
      • Normalizer
      • Sequence generator
      • Aggregator pass through ports
      • Data Quality
      • Data Masking
      • Data Processor
      • Joiner with relaxed join condition for Map-side Joins. Earlier, only equal join is supported
    • § Connectivity
      • Teradata
      • Netezza
      • Complex file reader/writer for limited cases
      • Compressed Hive source/target
    • § Recovery
      • We do support partial recovery, though it is not enabled by default
  • Spark: Informatica BDM now fully supports Spark 1.5.1 in Cloudera and Hortonworks.

  • Security
    • Security Integration: Following features are added to support Infrastructure security on BDM:
      • Integration with Sentry & Ranger for Blaze mode of Execution.
      • Transparent Encryption support.
      • Kerberos: Automation through BDM UTIL.
    • OS Profile: Secured multi tenancy on the Hadoop cluster.


Platform


  • License compliance enhancements
    • License expiration warning messages to renew the license proactively
    • License core over-usage warning messages for compliance
  • Monitoring enhancements
    • Domain level resource usage trending
    • Click through to the actual job execution details from the summarized job run statistics reports
    • Historical run statistics of a job
  • Scheduler enhancements
    • Schedule Profile and Scorecard jobs
    • Schedules are now time zone aware
  • Security enhancements
    • OS Profiles support for BDM, DQ, IDL products - Security and isolation for job execution
    • Application management permissions – fine grained permissions on application/mapping and workflow objects

PowerCenter

Functionality
  • Drag and drop Target definition into Source Analyzer to create source definition in Designer
  • Enhancements to address display issues when using long names and client  tools with dual monitor
  • SQL To Mapping: Use Developer tool to convert ANSI SQL with functions to PowerCenter Mapping
  • Command line enhancement to assign Integration service to workflows
  • Command line enhancement to support adding FTP connection
  • Pushdown optimization support for Greenplum

New connectors
  • Azure DW
  • Azure Blob
New Certifications
  • Oracle 12cR1
  • MS Excel 2013
  • MS Access 2013
Mainframe and CDC
  • New functionality
    • z/OS 2.2
    • z/OS CICS/TS 5.3
    • z/OS IMS V14 (Batch & CDC)
    • OPENLAP to extend security capabilities over more Linux, Unix & Windows platforms
  • Improved or Extended functionality
  • I5/OS SQL Generation for Source/Target Objects
  • z/OS DB2  Enhancements
    • IFI 306 Interest Filtering
    • Offloading support for DB2 Compressed Image Copy processing
    • Support for DB2 Flash Copy Images
  • Oracle Enhancements
    • Support of Direct Path Load options for Express CDC
    • Allow support of drop partition DDL to prevent CDC failures
    • Process Archived REDO Log Copies
  • Intelligent Metadata Generation (Createdatamaps)
    • Ability to apply record filtering by matching Metadata with physical data

Metadata Manager


  • Universal Connectivity Framework (UCF)
    • Connects to a wide range of metadata sources. The list of metadata sources is provided in Administrator Guide
    • A new bridge to a metadata source can be easily deployed. Documentation is provided to aid with the deployment
    • Linking, lineage and impact summary remains intact with Native connectivity
    • Connection based linking is available for any metadata source created via UCF
    • Rule Based and Enumerated linking is available for connectors created via UCF
  • Incremental Load Support for Oracle and Teradata
    • Improved load performance by extracting only changed artifacts from relational sources: Oracle and Teradata
    • Lesser load on metadata source databases compared to a full extraction
    • XConnect can run in full or incremental mode. Logs contain more details about extracted artifacts in the incremental mode
  • Enhanced Summary Lineage
    • A simplified lineage view to the business user without any mapping assets or stage assets in the flow
    • Users can drill down from the enhanced summary lineage to get to technical lineage
    • Users can go back to the summary view from the detailed lineage view


Profiling and Discovery


  • AVRO/Parquet Profiling
    • Profile Avro/Parquet files directly without creating Logical Data Objects for each of them
    • Profile on a file or a folder of files; within Big Data Environment or within Native file system
    • Support common Parquet compression mechanisms including Snappy
    • Support common Avro compression mechanisms including Snappy and Deflate
    • Execution mode of profiling Avro/Parquet files is available in Native, HIVE and Blaze mode
  • Operational Dashboards
    • The operational dashboard provides separate views of:
      • § Number of scorecards
      • § Data objects tracked by scorecards
      • § Cumulative scorecard trend (acceptable/unacceptable) elements
      • § Scorecard runs summary
    • Analyst user should be able to view the operational dashboard in the scorecard workspace
  • Scheduling Support for Profiles/Scorecards
    • Ability to schedule single/multiple profile(s)/scorecard(s)/Enterprise profile(s)
    • Performed from the UI in Administrator Console
  • Profiling/Scorecards on Blaze
    • Use Big Data infrastructure for Profiling Jobs
    • Running Profiling on Blaze is supported on both Analyst and Developer
    • Following jobs are supported in Blaze mode:
      • § Column Profiling
      • § Rule Profiling
      • § Domain Discovery
      • § Enterprise Profiling (Column and Domain)
    • Ability to use all sampling options when working in the Blaze mode: First N, Random N, Auto Random, All
  • Data Domain Discovery Enhancements
    • Ability to provide the number of records as a domain match criterion that allows detecting of domain matches even when there are a few records that match the criteria; especially useful when trying to match secure domains
    • Additional option to exclude NULL values from computation of Inference percentage
  • OS Profile Support
    • Providing Execution resource Isolation for Profiles and Scorecards
    • Configuration similar to PowerCenter/Platform for OS Profiles

Data Transformation 10.1.0


  • New 'Relational to Hierarchical' transformation in Developer
  • New REST API for executing DT services
  • Optimizations for reading & writing complex Avro and Parquet files

PAM (Platform – PC/Mercury)


  • Database Support Update : Added:
    • Oracle 12cR1
    • IBM DB2 10.5 Fix Pack 7
  • Web Browser Update:
    • Safari 9.0
    • Chrome51.x
  • Tomcat Support Update : v 7.0.68
  • JVM Support Update :  Updated:
  • § Oracle Java 1.8.0_77
  • § IBM JDK 8.0


Enterprise Information Catalog
• New Connectivity
• File Scanner for HDFS(Cloudera, Hortonworks) and Amazon S3: Catalog supported files and fields in the data lake. Supported for CSV, XML and JSON file formats.
• Informatica Cloud: Extract lineage metadata from Informatica Cloud mappings.
• Microstrategy: Support for metadata extracts from Microstrategy
• Amazon Redshift: Support for metadata extract from Amazon Redshift.
• Hive: Added multi-schema lineage support for Hive
• Custom Lineage Scanner: Manually add links and link properties to existing objects in the catalog; document lineage from unsupported ETL tools and hand-coded integrations.
• Intelligence
• Semantic Search: Object type detection from search queries for targeted search results.
• Enhanced Domain Discovery: Granular controls in domain discovery like Record match and Ignore NULLs for accurate domain matching.
• User Experience Improvements
• Enhanced Catalog Home Page: Added new reports for Top 50 Assets in the organization, Trending Searches and Recently Viewed Assets by the user
• Enhance Administrator Home Page: New dashboard with widgets for task monitoring, resource views and unassigned connections.
• Performance Enhancements
• Profiling on Blaze: Run Profiling and Domain Discovery jobs on Hadoop for Big Data sets
• Incremental Profiling Support: Scanner jobs identify if the table has changed from the last discovery run and run profiling jobs only on the changed tables for selected sources (Oracle, DB2, SQL Server, and HDFS Files).
• ~4X Performance Improvement in scanning PowerCenter resources.
• ~30X search, search auto-suggest and sort performance improvements
• Deployment
• Added Support for Backup, Restore and Upgrade.
• Added Kerberos Support for embedded cluster
• Intelligent Email Alerts: To help administrators proactively take care of any potential stability issues in Catalog setup.
•      EIC PAM
• RHEL 7 Support Added
• New versions added for existing scanners:
• Tableau 9.x
• MapR 5.1 Hive Scanner
• SAP Business Objects 4.1 SP4 through SP6
• New Scanners
• Amazon Redshift
• Amazon S3
• Informatica Cloud R25
• Microstrategy 10.x, 9.4.1, 9.3.1

Intelligent Data Lake (IDL)


In version 10.1, Informatica introduces a new product Intelligent Data Lake to help customers derive more value from their Hadoop based data lake and democratize the data for usage by all in the organization.
Intelligent Data Lake is a collaborative Self-service Big Data discovery and preparation solution for data analysts and data scientists to rapidly discover and turn raw data into insights with quality and governance, especially in a data lake environment.
This allows analysts to spend more time on analysis and less time on finding and preparing data. At the same time IT can ensure quality, visibility and governance.

Intelligent Data Lake provides the following benefits.
  • Data Analysts can quickly and easily find and explore trusted enterprise data assets within the data lake as well as outside the data lake using semantic search, knowledge graphs and smart recommendations.
  • Data Analysts can transform, cleanse, and enrich data in the data lake using an Excel-like spreadsheet interface in a self-service manner without need of coding skills.
  • Data Analysts can publish and share data as well as knowledge with rest of the community and analyze the data using their choice of BI or analytic tools.
  • IT and governance staff can monitor user activity related to data usage in the lake.
  • IT can track data lineage to verify that data is coming from the right sources and going to the right targets.
  • IT can enforce appropriate security and governance on the data lake
  • IT can operationalize the work done by data analysts into a data delivery process that can be repeated and scheduled.

Intelligent Data Lake has following features.
  • Search:
    • Find the data in the lake as well as in the other enterprise systems using smart search and inference based results.
    • Filter assets based on dynamic facets using system attributes and custom defined classifications.
  • Explore:
    • Get overview of assets including custom attributes, profiling stats for quality, data domains for business content and usage information.
    • Add business context information by Crowd-sourcing metadata enrichment and tagging.
    • Preview sample data to get a sense of the data asset based on user credentials.
    • Get lineage of assets to understand where data is coming from and where it is going to build trust.
    • Know how the data asset is related to other assets in the enterprise based on associations with other tables/views, users, reports, data domains etc.
    • Discovery assets unknown before with progressive discovery with lineage and relationship views.
  • Acquire:
    • Upload personal delimited files to the lake using a wizard based interface.
    • Hive tables are automatically created for the uploads in the most optimal format.
    • Create new, append or overwrite existing assets for uploaded data.
  • Collaborate:
    • Organize work by adding data assets to Projects.
    • Add collaborators to Projects with different roles such as Co-owner, Editor, Viewer etc. for different privileges.
  • Recommendations:
    • Improve productivity by using recommendations based on other users’ behaviors and reuse knowledge.
    • Get recommendations for alternate assets that can be used in the Project instead of what is added.
    • Get recommendations for additional assets that can be used in addition to what’s in the project.
    • Recommendations change based on what is in the project.
  • Prepare:
    • Use excel-like environment to interactively specify transformation using sample data.
    • See sheet level and column level overviews including value distributions, numeric/date distributions.
    • Add transformations in the form of Recipe steps and see immediately the result on the sheets.
    • Perform column level data cleansing, data transformation using string, math, date, logical operations.
    • Perform sheet level operations like Combine, Merge, Aggregate and Filter data.
    • Refresh sample in the worksheets if the data in the underlying tables changes.
    • Derive sheets from existing sheets and get alerts when parent sheets change.
    • All the transformation steps are stored in the Recipe which can be played back interactively.
  • Publish:
    • Use the power of underlying Hadoop system to run large scale data transformation without coding/scripting.
    • Run data preparation steps on actual large data sets in the lake to create new data assets.
    • Publish the data in the lake as a Hive table in desired database.
    • Create new, append or overwrite existing assets for published data.
  • Data asset operations:
    • Export data from the lake to CSV file
    • Copy data into another database or table.
    • Delete the data asset if allowed by user credentials.
  • My Activities:
    • Keep track of my upload activities and their status.
    • Keep track of publications and their status.
    • View log files in case of errors. Share with IT Administrators if needed.
  • IT Monitoring:
    • Keep track of  User, Data asset and Project activities by building reports on top of Audit data base
    • Answer questions like Top Active Users, Top Datasets by Size, Last Update, Most Reused assets, Most Active projects etc.
  • IT Operationalization:
    • Operationalize the ad-hoc work done by Analysts.
    • User Informatica Developer tool to customize and optimize the Informatica BDM Mappings translated from the Recipe that Analyst created.
    • Deploy, schedule and monitor the Informatica BDM mappings to ensure data assets are delivered at the right time to the right destinations.
    • Make sure the entitlements in the data lake for access to various databases and tables are according to security policies.


Informatica Data Quality

Exception management

  • Data type based Search & replace enhancements
  • Non-default schema for exception tables for greater security and flexibility
  • Task ID for better reporting

Address Validation

  • IDQ now Integrated with AV 5.8.1
  • Ireland – Support for eircode (postal codes)
  • France – SNA Hexaligne 3 Data Support
  • UK – Roof top geocode

Execution
:
  • IDQ transformations can execute on Blaze
  • Workflow - parallel execution for enhanced performance

Visualization:

  • Scorecard dashboard for single, high level view of scorecards in the repository

click here to get the 10.1 New Features guide: