A Data Warehouse (DWH) is a complete repository of historical corporate data extracted from various operational transaction systems. It is fundamentally an information system designed to support Intelligent Decision-Making and Business Intelligence (BI) for knowledge workers (like executives, analysts, and managers) by allowing fast, ad-hoc access and sophisticated analysis.
A more formal definition, often attributed to the "father of Data Warehousing," Bill Inmon, is:
A Data Warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management's decision-making process.
A Data Warehouse is fundamentally different from a regular operational database (OLTP) due to its four distinct characteristics:
Businesses demand a DWH because they seek to move beyond simple operational questions and delve into strategic intelligence. The need is driven by:
| Feature | Data Warehouse (DWH) | OLTP (Online Transaction Processing) |
|---|---|---|
| Primary Goal | Decision Support / Analysis (Business Intelligence) | Day-to-Day Operations (Running the business) |
| Data Perspective | Historical, Summarized, and Detailed data | Operational (Current/Live) data |
| Normalization | Hardly Denormalized (Dimensional or Normalized models) | Fully Normalized (To minimize redundancy) |
| Scope | Enterprise-wide (Application-Neutral single source of truth) | Application-specific (Multiple databases with repetition) |
| Time Factor | Time-variant (Historical data, typical availability $6\times12$ or as needed) | Real-time (Current data, typical availability $24\times7$) |
| Unit of Work | Complex Queries (Analysis, reporting, data mining) | Simple Transactions (Insert, Update, Delete) |
| Query Complexity | Low selectivity (Returns many rows, e.g., total sales for the last 5 years) | High selectivity (Returns few rows, e.g., a single customer's current order) |
| Indexing | Indexing on primary index is Non-unique | Indexing on primary key is Unique |
| Data Update | Non-Volatile (Data is loaded and remains static, only updated through scheduled processes) | Volatile (Frequent updates/changes due to daily transactions) |
| Results Returned | Number of results returned in thousands/millions | Number of results returned in hundreds or less |
A Data Mart is a specialized, scaled-down version of an Enterprise Data Warehouse (DWH) designed to serve the analytical and reporting needs of a specific business function, department, or user group (like Sales, Marketing, or Finance). It is essentially a subset of the overall corporate data that is tailored and structured using simpler models (often a Star Schema) to provide rapid, targeted access to relevant information, making it easier and faster for a specific team to gain insights and make tactical decisions without having to query the entire, massive Data Warehouse.