Here are some common interview questions related to the ETL tool Ab Initio.
What is a Graph in Ab Initio?
The ETL process in AbInitio is represented by AbInitio graphs. Graphs are formed by components (from the standard components library or custom), flows (data streams) and parameters.
What is Co>Operating System in Ab Initio?
Co>Operating System is a program provided by AbInitio which operates on the top of the operating system and is a base for all AbInitio processes.
It provides additional features known as air commands which can be installed on a variety of system environments such as Unix, HP-UX, Linux, IBM AIX and Windows systems.
The AbInitio CoOperating System provides the following features:
- Manage and run AbInitio graphs and control the ETL processes
- Provides AbInitio extensions to the operating system
- ETL processes monitoring and debugging
- Metadata management and interaction with the EME
What is AbInitio GDE (Graphical Development Enviroment)?
GDE is a graphical application for developers which is used for designing and running AbInitio graphs.
It also provides:
- The ETL process in AbInitio is represented by AbInitio graphs. Graphs are formed by components (from the standard components library or custom), flows (data streams) and parameters.
- A user-friendly frontend for designing Ab Initio ETL graphs
- Ability to run, debug Ab Initio jobs and trace execution logs
- GDE AbInitio graph compilation process results in generation of a UNIX shell script which may be executed on a machine without the GDE installed.
What is AbInitio EME?
Enterprise Meta>Environment (EME) is an AbInitio repository and environment for storing and managing metadata.
It provides capability to store both business and technical metadata. EME metadata can be accessed from the Ab Initio GDE, web browser or AbInitio CoOperating system command line (air commands).
What is Conduct>It in AbInitio?
Conduct>It is an environment for creating enterprise Ab Initio data integration systems. Its main role is to create AbInitio Plans which is a special type of graph constructed of another graphs and scripts. AbInitio provides both graphical and command-line interface to Conduct>IT.
What is a data profiler inAbInitio?
The Data Profiler is a graphical data analysis tool which runs on top of the Co>Operating system. It can be used to characterize data range, scope, distribution, variance, and quality.
What kind of parallelisms supported by Ab Initio?
Ab Initio implements parallelism in mainly 3 ways:
Data parallelism – data is divided among many partitions known as multi-files. During processing, each partition is processed in parallel.
Component parallelism – multiple components are run in parallel. Components execute simultaneously on different branches of a graph.
Pipeline parallelism – when a record is processed in one component and a previous record is being processed in another components. Operations like sorting and aggregation break pipeline parallelism.
Explain what is Lookup?
- Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file(serial/multi file).
- The dataset can be static or dynamic(in case the lookup file is being generated in previous phase and used as lookup file in current phase).
- Sometimes, hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less number of records with slim record length.
- AbInitio has built-in functions to retrieve values using the key for the lookup.
What is the difference between lookup file and lookup?
- A lookup is a component of AbInitio graph where we can store data and retrieve it by using a key parameter.
- A lookup file is the physical file where the data for the lookup is stored.
What is a local lookup?
If the lookup file is a multifile and partitioned/sorted on a particular key then local lookup function can be used ahead of lookup function call. This is local to a particular partition depending on the key.
Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much faster than retrieving from disk. It allows the transform component to process the data records of multiple files fast.