Introduction to Data Warehouse

Definition & Core Concept

A Data Warehouse (DWH) is a complete repository of historical corporate data extracted from various operational transaction systems. It is fundamentally an information system designed to support Intelligent Decision-Making and Business Intelligence (BI) for knowledge workers (like executives, analysts, and managers) by allowing fast, ad-hoc access and sophisticated analysis.

A more formal definition, often attributed to the "father of Data Warehousing," Bill Inmon, is:

A Data Warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management's decision-making process.

Key Characteristics

A Data Warehouse is fundamentally different from a regular operational database (OLTP) due to its four distinct characteristics:

The Driving Force

Businesses demand a DWH because they seek to move beyond simple operational questions and delve into strategic intelligence. The need is driven by:

Data Warehouse vs OLTP

Feature Data Warehouse (DWH) OLTP (Online Transaction Processing)
Primary Goal Decision Support / Analysis (Business Intelligence) Day-to-Day Operations (Running the business)
Data Perspective Historical, Summarized, and Detailed data Operational (Current/Live) data
Normalization Hardly Denormalized (Dimensional or Normalized models) Fully Normalized (To minimize redundancy)
Scope Enterprise-wide (Application-Neutral single source of truth) Application-specific (Multiple databases with repetition)
Time Factor Time-variant (Historical data, typical availability $6\times12$ or as needed) Real-time (Current data, typical availability $24\times7$)
Unit of Work Complex Queries (Analysis, reporting, data mining) Simple Transactions (Insert, Update, Delete)
Query Complexity Low selectivity (Returns many rows, e.g., total sales for the last 5 years) High selectivity (Returns few rows, e.g., a single customer's current order)
Indexing Indexing on primary index is Non-unique Indexing on primary key is Unique
Data Update Non-Volatile (Data is loaded and remains static, only updated through scheduled processes) Volatile (Frequent updates/changes due to daily transactions)
Results Returned Number of results returned in thousands/millions Number of results returned in hundreds or less

What is a Data Mart

A Data Mart is a specialized, scaled-down version of an Enterprise Data Warehouse (DWH) designed to serve the analytical and reporting needs of a specific business function, department, or user group (like Sales, Marketing, or Finance). It is essentially a subset of the overall corporate data that is tailored and structured using simpler models (often a Star Schema) to provide rapid, targeted access to relevant information, making it easier and faster for a specific team to gain insights and make tactical decisions without having to query the entire, massive Data Warehouse.