AbInitio is one of the popular ETL tools that is in the market.
The ETL process in AbInitio is represented by AbInitio graphs. Graphs are formed by components (from the standard components library or custom), flows (data streams) and parameters.
Co>Operating System is a program provided by AbInitio which operates on the top of the operating system and is a base for all AbInitio processes.
It provides additional features known as air commands which can be installed on a variety of system environments such as Unix, HP-UX, Linux, IBM AIX and Windows systems. CoOperating System provides the following features:
- Manage and run AbInitio graphs and control the ETL processes
- Provides AbInitio extensions to the operating system
- ETL processes monitoring and debugging
- Metadata management and interaction with the EME
GDE is a graphical application for developers which is used for designing and running AbInitio graphs.It also provides:
- A user-friendly frontend for designing Ab Initio ETL graphs
- Ability to run, debug Ab Initio jobs and trace execution logs
- GDE AbInitio graph compilation process results in generation of a UNIX shell script which may be executed on a machine without the GDE installed.
Enterprise Meta>Environment (EME) is an AbInitio repository and environment for storing and managing metadata.
It provides capability to store both business and technical metadata. EME metadata can be accessed from the Ab Initio GDE, web browser or AbInitio CoOperating system command line (air commands).
Conduct>It is an environment for creating enterprise Ab Initio data integration systems. Its main role is to create AbInitio Plans which is a special type of graph constructed of another graphs and scripts. AbInitio provides both graphical and command-line interface to Conduct>IT.
The Data Profiler is a graphical data analysis tool which runs on top of the Co>Operating system. It can be used to characterize data range, scope, distribution, variance, and quality.
Ab Initio implements parallelism in mainly 3 ways:
Data parallelism – data is divided among many partitions known as multi-files. During processing, each partition is processed in parallel.
Component parallelism – multiple components are run in parallel. Components execute simultaneously on different branches of a graph.
Pipeline parallelism – when a record is processed in one component and a previous record is being processed in another components. Operations like sorting and aggregation break pipeline parallelism.
Company Website: www.abinitio.com