Q: What is a Data warehouse?
A: A Data warehouse is a repository of integrated information, available for queries and analysis.
Data and information are extracted from heterogeneous sources and stored in a database for easy and more efficient way to run queries and create reports.
A data warehouse is a logical collection of information gathered from many different operational databases used to create business intelligence that supports business analysis activities and decision-making tasks, primarily, a record of an enterprise’s past transactional and operational information, stored in a database designed to favour efficient data analysis and reporting (especially OLAP)”.
Q: What is data mining?
A: Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance data warehouse can be used to mine data for the most high risk people to insure in a certain geographial area.
Q: What are Data Marts?
A: Data Marts are subset of the corporate-wide data that is of value to a specific group of users.
There are two types of Data Marts:
1.Independent data marts – sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area.
2.Dependent data mart – sources directly from enterprise data warehouses.
Q: What is OLTP?
A: OnLine Transactional Processing.
Q: What is OLAP?
A: OnLine Analatical Processing.
Q: What are the differences between OLTP and OLAP?
A: Main Differences between OLTP and OLAP are:-
1. User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals.
OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis).
2. Data Contents
OLTP: manages current data, very detail-oriented.
OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process.
3. Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented database design.
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores
Q: What is real time data-warehousing?
A: Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
Q: What are the steps to build the datawarehouse ?
- Gathering business requirements.
- Identifying Sources
- Identifying Facts
- Defining Dimensions
- Define Attributes
- Redefine Dimensions & Attributes
- Organize Attribute Hierarchy & Define Relationship
- Assign Unique Identifiers
- Additional conventions:Cardinality/Adding ratios
Q: What is a CUBE in data warehousing concept?
A: Cubes are logical representation of multidimensional data.The edge of the cube contains dimension members and the body of the cube contains data values.
Q: What is a linked cube?
A: Linked cube is a cube, in which a sub-set of the data can be analysed into greater detail. The linking ensures that the data in the cubes remain consistent.
Q: What is the main difference between Inmon and Kimball philosophies of data warehousing?
A: Both differed in the concept of building the datawarehosue.
Kimball views data warehousing as a constituency of Data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level.
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise.
Kimball–First DataMarts–Combined way —Datawarehouse
Q: What is Hierarchy in data warehouse terms?
A: Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies–one for product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.
Q: What are the differnces between a RDBMS schema and a data warehouse schema?
* Used for OLTP systems
* Highly Normalized
* Difficult to understand and navigate
* Difficult to extract and solve complex problems
* Used for OLAP systems
* Easy to understand and navigate
* Relatively easier in extracting the data and solving complex problems
Q:What is meant by metadata in the context of a Data warehouse?
A: Meta data is the data about data; Business Analyst or data modeler usually capture information about data – the source (where and how the data is originated), nature of data (char, varchar, nullable, existance, valid values etc) and behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.