Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and easy maintenance.
Data warehouse architecture consists of the following interconnected layers:
Operational database layer
Operational database layer serves as source for the Data warehouse. This may include the Operation Data Store(ODS) and other similar sources(Ex. Flat files).
Data access layer
The data access layer is the part which involves in extracting the data from multiple source, cleansing and transforming the data and loading it.
The data directory – This is usually more detailed than an operational system data directory. There are dictionaries for the entire warehouse and sometimes dictionaries for the data that can be accessed by a particular reporting and analysis tool.
Informational access layer
The data accessed for reporting and analyzing and the tools for reporting and analyzing data – Business intelligence tools fall into this layer.
Famous authors and data warehouse experts Ralph Kimball and Bill Inmon give two different design methodoloigies for building a data warehouse.
Kimball’s approach is more of a Bottom-up design where data marts are created first for specific subject/business areas and have the capability to report and analyse. THen these data marts are combined to create a data warehouse. This approach provide quicker approach to get the data ready for individual sujects/businesses. The major task in this design is maintaining the Dimensions across multiple data marts.
Inmon has defined a data warehouse as a centralized repository for the entire enterprise, in which the data warehouse is designed using a normalized enterprise data model. Data at the lowest level of detail is stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse.
Inmon states that the data warehouse is: Subject-oriented, Non-volatile and Integrated.
This methodology generates highly consistent dimensional views of data across data marts since all data marts are loaded from the centralized repository. Top-down design has also proven to be robust against business changes. Generating new dimensional data marts against the data stored in the data warehouse is a relatively simple task. The main disadvantage to the top-down methodology is that it represents a very large project with a very broad scope. The up-front cost for implementing a data warehouse using the top-down methodology is significant, and the duration of time from the start of project to the point that end users experience initial benefits can be substantial. In addition, the top-down methodology can be inflexible and unresponsive to changing departmental needs during the implementation phases.
Now there are new methodologies commonly called as Hybrid Design, that combine these two and provide more comprehensive and robust data warehouse design.