Informatica
10.1.0 includes the following new capabilities:
Big Data
Management
- PAM
- HDP 2.3.x, HDP 2.4.x
- CDH 5.5.x
- MapR 5.1.x
- HDInsight 3.3
- IBM BI 4.1.x
- Functionality
- File Name Option:
Ability to retrieve file name and path location from complex files, HDFs
and flat files.
- Parallel Workflow tasks
- Run Time DDL Generation
- New Hive Datatypes:
Varchar/char datatypes in map reduce mode
- BDM UTIL: Full Kerberos
automation
- Developer Tool
enhancements
- §
Generate a mapplet from connected transformations
- §
Copy Paste-Replace Ports from/to Excel.
- §
Search with auto-suggest in ports
- §
“Create DDL” sql enhancements including parameterization.
- Reuse
- SQL To Mapping: Convert
ANSI SQL with functions to BDM Mapping
- Import / Export
Frameworks Gaps: Teradata/Netezza adapter conversions from PC to BDM
- Reuse Report
- Connectivity
- SQOOP: Full integration
with SQOOP in map-reduce mode.
- Teradata and Netezza
partitioning: Teradata read/write and Netezza read/write partitioning
support and Blaze mode of Execution.
- Complex Files: Native
support of Avro and Parquet using complex files.
- Cloud Connectors: Azure
DW, Azure Blob, Redshift connectors on map-reduce mode.
- Performance
- Blaze: We have done
significant improvement in “Blaze 2.0” by enhancing performance and adding
more connectors and transformations to run on Blaze. Some of the new
features on Blaze are :
- Performance
- Map
Side join (with persistence cache)
- Map
Side aggregator
- Transformations
- Unconnected
lookup
- Normalizer
- Sequence
generator
- Aggregator
pass through ports
- Data
Quality
- Data
Masking
- Data
Processor
- Joiner
with relaxed join condition for Map-side Joins. Earlier, only equal join
is supported
- § Connectivity
- Teradata
- Netezza
- Complex
file reader/writer for limited cases
- Compressed
Hive source/target
- § Recovery
- We
do support partial recovery, though it is not enabled by default
- Spark: Informatica BDM now fully supports Spark
1.5.1 in Cloudera and Hortonworks.
- Security
- Security Integration: Following features are added to support
Infrastructure security on BDM:
- Integration with
Sentry & Ranger for Blaze mode of Execution.
- Transparent Encryption
support.
- Kerberos: Automation
through BDM UTIL.
- OS Profile: Secured multi tenancy on the Hadoop
cluster.
Platform
- License compliance
enhancements
- License expiration
warning messages to renew the license proactively
- License core over-usage
warning messages for compliance
- Monitoring enhancements
- Domain level resource
usage trending
- Click through to the
actual job execution details from the summarized job run statistics
reports
- Historical run
statistics of a job
- Scheduler enhancements
- Schedule Profile and
Scorecard jobs
- Schedules are now time
zone aware
- Security enhancements
- OS Profiles support for
BDM, DQ, IDL products - Security and isolation for job execution
- Application management
permissions – fine grained permissions on application/mapping and
workflow objects
PowerCenter
Functionality
- Drag and drop Target
definition into Source Analyzer to create source definition in Designer
- Enhancements to address
display issues when using long names and client tools with dual
monitor
- SQL To Mapping: Use
Developer tool to convert ANSI SQL with functions to PowerCenter Mapping
- Command line enhancement
to assign Integration service to workflows
- Command line enhancement
to support adding FTP connection
- Pushdown optimization
support for Greenplum
New connectors
New
Certifications
- Oracle 12cR1
- MS Excel 2013
- MS Access 2013
Mainframe and
CDC
- New functionality
- z/OS 2.2
- z/OS CICS/TS 5.3
- z/OS IMS V14 (Batch
& CDC)
- OPENLAP to extend
security capabilities over more Linux, Unix & Windows platforms
- Improved or Extended
functionality
- I5/OS SQL Generation for
Source/Target Objects
- z/OS DB2
Enhancements
- IFI 306 Interest
Filtering
- Offloading support for
DB2 Compressed Image Copy processing
- Support for DB2 Flash
Copy Images
- Oracle Enhancements
- Support of Direct Path
Load options for Express CDC
- Allow support of drop
partition DDL to prevent CDC failures
- Process Archived REDO
Log Copies
- Intelligent Metadata
Generation (Createdatamaps)
- Ability to apply record
filtering by matching Metadata with physical data
Metadata
Manager
- Universal Connectivity
Framework (UCF)
- Connects to a wide
range of metadata sources. The list of metadata sources is provided in
Administrator Guide
- A new bridge to a
metadata source can be easily deployed. Documentation is provided to aid
with the deployment
- Linking, lineage and
impact summary remains intact with Native connectivity
- Connection based
linking is available for any metadata source created via UCF
- Rule Based and
Enumerated linking is available for connectors created via UCF
- Incremental Load Support
for Oracle and Teradata
- Improved load
performance by extracting only changed artifacts from relational sources:
Oracle and Teradata
- Lesser load on metadata
source databases compared to a full extraction
- XConnect can run in
full or incremental mode. Logs contain more details about extracted
artifacts in the incremental mode
- Enhanced Summary Lineage
- A simplified lineage
view to the business user without any mapping assets or stage assets in
the flow
- Users can drill down
from the enhanced summary lineage to get to technical lineage
- Users can go back to
the summary view from the detailed lineage view
Profiling and
Discovery
- AVRO/Parquet Profiling
- Profile Avro/Parquet
files directly without creating Logical Data Objects for each of them
- Profile on a file or a
folder of files; within Big Data Environment or within Native file system
- Support common Parquet
compression mechanisms including Snappy
- Support common Avro
compression mechanisms including Snappy and Deflate
- Execution mode of
profiling Avro/Parquet files is available in Native, HIVE and Blaze mode
- Operational Dashboards
- The operational
dashboard provides separate views of:
- §
Number of scorecards
- §
Data objects tracked by scorecards
- §
Cumulative scorecard trend (acceptable/unacceptable) elements
- §
Scorecard runs summary
- Analyst user should be
able to view the operational dashboard in the scorecard workspace
- Scheduling Support for
Profiles/Scorecards
- Ability to schedule
single/multiple profile(s)/scorecard(s)/Enterprise profile(s)
- Performed from the UI
in Administrator Console
- Profiling/Scorecards on
Blaze
- Use Big Data
infrastructure for Profiling Jobs
- Running Profiling on
Blaze is supported on both Analyst and Developer
- Following jobs are
supported in Blaze mode:
- §
Column Profiling
- §
Rule Profiling
- §
Domain Discovery
- §
Enterprise Profiling (Column and Domain)
- Ability to use all
sampling options when working in the Blaze mode: First N, Random N, Auto
Random, All
- Data Domain Discovery
Enhancements
- Ability to provide the
number of records as a domain match criterion that allows detecting
of domain matches even when there are a few records that match the
criteria; especially useful when trying to match secure domains
- Additional option to
exclude NULL values from computation of Inference percentage
- OS Profile Support
- Providing Execution
resource Isolation for Profiles and Scorecards
- Configuration similar
to PowerCenter/Platform for OS Profiles
Data
Transformation 10.1.0
- New 'Relational to
Hierarchical' transformation in Developer
- New REST API for
executing DT services
- Optimizations for
reading & writing complex Avro and Parquet files
PAM (Platform – PC/Mercury)
- Database Support Update
: Added:
- Oracle 12cR1
- IBM DB2 10.5 Fix Pack 7
- Web Browser Update:
- Tomcat Support Update : v 7.0.68
- JVM Support Update
: Updated:
- § Oracle Java 1.8.0_77
- § IBM JDK 8.0
Enterprise
Information Catalog
• New
Connectivity
• File Scanner
for HDFS(Cloudera, Hortonworks) and Amazon S3: Catalog supported files and
fields in the data lake. Supported for CSV, XML and JSON file formats.
• Informatica
Cloud: Extract lineage metadata from Informatica Cloud mappings.
•
Microstrategy: Support for metadata extracts from Microstrategy
• Amazon
Redshift: Support for metadata extract from Amazon Redshift.
• Hive: Added
multi-schema lineage support for Hive
• Custom
Lineage Scanner: Manually add links and link properties to existing objects in
the catalog; document lineage from unsupported ETL tools and hand-coded
integrations.
• Intelligence
• Semantic
Search: Object type detection from search queries for targeted search results.
• Enhanced
Domain Discovery: Granular controls in domain discovery like Record match and
Ignore NULLs for accurate domain matching.
• User
Experience Improvements
• Enhanced
Catalog Home Page: Added new reports for Top 50 Assets in the organization,
Trending Searches and Recently Viewed Assets by the user
• Enhance
Administrator Home Page: New dashboard with widgets for task monitoring,
resource views and unassigned connections.
• Performance
Enhancements
• Profiling on
Blaze: Run Profiling and Domain Discovery jobs on Hadoop for Big Data sets
• Incremental
Profiling Support: Scanner jobs identify if the table has changed from the last
discovery run and run profiling jobs only on the changed tables for selected
sources (Oracle, DB2, SQL Server, and HDFS Files).
• ~4X
Performance Improvement in scanning PowerCenter resources.
• ~30X search,
search auto-suggest and sort performance improvements
• Deployment
• Added Support
for Backup, Restore and Upgrade.
• Added
Kerberos Support for embedded cluster
• Intelligent
Email Alerts: To help administrators proactively take care of any potential
stability issues in Catalog setup.
• EIC
PAM
• RHEL 7
Support Added
• New versions
added for existing scanners:
• Tableau 9.x
• MapR 5.1 Hive
Scanner
• SAP Business
Objects 4.1 SP4 through SP6
• New Scanners
• Amazon
Redshift
• Amazon S3
• Informatica
Cloud R25
• Microstrategy
10.x, 9.4.1, 9.3.1
Intelligent
Data Lake (IDL)
In version
10.1, Informatica introduces a new product Intelligent Data Lake to help
customers derive more value from their Hadoop based data lake and democratize
the data for usage by all in the organization.
Intelligent
Data Lake is a collaborative Self-service Big Data discovery and preparation
solution for data analysts and data scientists to rapidly discover and turn raw
data into insights with quality and governance, especially in a data lake
environment.
This allows
analysts to spend more time on analysis and less time on finding and preparing
data. At the same time IT can ensure quality, visibility and governance.
Intelligent
Data Lake provides the following benefits.
- Data Analysts can
quickly and easily find and explore trusted enterprise data assets within
the data lake as well as outside the data lake using semantic search,
knowledge graphs and smart recommendations.
- Data Analysts can
transform, cleanse, and enrich data in the data lake using an Excel-like
spreadsheet interface in a self-service manner without need of coding
skills.
- Data Analysts can
publish and share data as well as knowledge with rest of the community and
analyze the data using their choice of BI or analytic tools.
- IT and governance staff
can monitor user activity related to data usage in the lake.
- IT can track data
lineage to verify that data is coming from the right sources and going to
the right targets.
- IT can enforce
appropriate security and governance on the data lake
- IT can operationalize
the work done by data analysts into a data delivery process that can be
repeated and scheduled.
Intelligent
Data Lake has following features.
- Search:
- Find
the data in the lake as well as in the other enterprise systems using
smart search and inference based results.
- Filter
assets based on dynamic facets using system attributes and custom defined
classifications.
- Explore:
- Get overview
of assets including custom attributes, profiling stats for quality, data
domains for business content and usage information.
- Add
business context information by Crowd-sourcing metadata enrichment and
tagging.
- Preview
sample data to get a sense of the data asset based on user credentials.
- Get
lineage of assets to understand where data is coming from and where it is
going to build trust.
- Know
how the data asset is related to other assets in the enterprise based on
associations with other tables/views, users, reports, data domains etc.
- Discovery
assets unknown before with progressive discovery with lineage and
relationship views.
- Acquire:
- Upload
personal delimited files to the lake using a wizard based interface.
- Hive
tables are automatically created for the uploads in the most optimal
format.
- Create
new, append or overwrite existing assets for uploaded data.
- Collaborate:
- Organize
work by adding data assets to Projects.
- Add
collaborators to Projects with different roles such as Co-owner, Editor,
Viewer etc. for different privileges.
- Recommendations:
- Improve
productivity by using recommendations based on other users’ behaviors and
reuse knowledge.
- Get
recommendations for alternate assets that can be used in the Project
instead of what is added.
- Get
recommendations for additional assets that can be used in addition to
what’s in the project.
- Recommendations
change based on what is in the project.
- Prepare:
- Use
excel-like environment to interactively specify transformation using
sample data.
- See
sheet level and column level overviews including value distributions,
numeric/date distributions.
- Add
transformations in the form of Recipe steps and see immediately the
result on the sheets.
- Perform
column level data cleansing, data transformation using string, math,
date, logical operations.
- Perform
sheet level operations like Combine, Merge, Aggregate and Filter data.
- Refresh
sample in the worksheets if the data in the underlying tables changes.
- Derive
sheets from existing sheets and get alerts when parent sheets change.
- All
the transformation steps are stored in the Recipe which can be played
back interactively.
- Publish:
- Use
the power of underlying Hadoop system to run large scale data transformation
without coding/scripting.
- Run
data preparation steps on actual large data sets in the lake to create
new data assets.
- Publish
the data in the lake as a Hive table in desired database.
- Create
new, append or overwrite existing assets for published data.
- Data asset operations:
- Export
data from the lake to CSV file
- Copy
data into another database or table.
- Delete
the data asset if allowed by user credentials.
- My Activities:
- Keep
track of my upload activities and their status.
- Keep
track of publications and their status.
- View
log files in case of errors. Share with IT Administrators if needed.
- IT Monitoring:
- Keep
track of User, Data asset and Project activities by building
reports on top of Audit data base
- Answer
questions like Top Active Users, Top Datasets by Size, Last Update, Most
Reused assets, Most Active projects etc.
- IT Operationalization:
- Operationalize
the ad-hoc work done by Analysts.
- User
Informatica Developer tool to customize and optimize the Informatica BDM
Mappings translated from the Recipe that Analyst created.
- Deploy,
schedule and monitor the Informatica BDM mappings to ensure data assets
are delivered at the right time to the right destinations.
- Make
sure the entitlements in the data lake for access to various databases
and tables are according to security policies.
Informatica
Data Quality
Exception
management
- Data type based Search
& replace enhancements
- Non-default schema for
exception tables for greater security and flexibility
- Task ID for better
reporting
Address
Validation
- IDQ now Integrated with
AV 5.8.1
- Ireland – Support for
eircode (postal codes)
- France – SNA Hexaligne 3
Data Support
- UK – Roof top geocode
Execution
:
- IDQ transformations can
execute on Blaze
- Workflow - parallel
execution for enhanced performance
Visualization:
- Scorecard dashboard for
single, high level view of scorecards in the repository