What is a DataWarehouse? And why would you need one?
A data warehouse is a central repository that aggregates data from all transactional and other data sources within a firm, to create a historical archive of all of the firm’s data even when transactional systems have hard data retention constraints.
It provides for the following capabilities:
- Aggregates data from disparate data sources into a single DB; hence a single query engine can be used to query, join, transform/transpose and present the data.
- Mitigate the problem of database isolation level lock contention in transactional systems due to running of large analytical queries
- Maintain data history even when source systems do not and provide a temporal view of the data
- Ability to create trend reports comparing yoy (year over year) or qoq (quarter over quarter) performance for senior management
- Improve data quality and drive a consistency in organization information – consistent code/description/ref names/values etc. Allows for flagging and fixing of data
- Provide a single data model for all data regardless of source
- Restructure data so that it makes sense to the business users
- Restructure data to improve query performance
- Add contextual value to operational systems and enterprise apps like CRMs or ERPs.
What is an Operational Data Store (ODS)?
An ODS is a database designed to integrate data from multiple sources. It allows for cleaning, resolving redundancy and integrity checking before additional operations. The data is then passed back to the operational systems and to the DWH for storage and reporting. It is usually designed to contain atomic or low level data such as transactions and prices and also has limited history which is captured real time or near real time. Much greater volume of data is stored in the DWH generally on a less frequent basis.
Why do we add Data/Strategic Marts to most modern data management platforms?
Data marts are fit for purpose access layers that support specific reporting use for individual teams or use cases for e.g. a sales and operations data mart, or a marketing strategy data mart. Usually a subset of the DWH, and very focused on the elements needed for the purpose it is designed for. The usual reasons to create data marts are –
- Easy access to frequently needed data with contentions
- Creates a collective view for a group of users
- Improves end user response times
- Ease of creation and lower cost than a DWH
- Potential users are well defined than in a full DWH
- Less cluttered as it contains only business essential data
And finally what are Data Lakes and Swamps?
A single store of all the data in the Enterprise in its raw form. It is a method of storing data within a system or repository in its natural format and facilitates the colocation of data in various schemas, structured and unstructured in files or object blobs or data bases. A deteriorated data lake, inaccessible to its intended users and of no value is called a “data swamp”.