Data Warehousing

GUPTA, Gagan       Posted by GUPTA, Gagan
      Published: June 4, 2021
        |  

Enjoy listening to this Blog while you are working with something else !

   

Data Warehousing Made Simple

What is Data Warehouse.

A data warehouse is a type of electronic data management system that is designed to enable and support business intelligence (BI) activities, especially analytics, by a business or other organization. Data warehouses are solely intended to create a trove of historical data that can be retrieved and analyzed to provide useful insight into the organization's operations.

Data warehousing is a vital component of business intelligence. That wider term encompasses the information infrastructure that modern businesses use to track their past successes and failures and inform their decisions for the future.



-Data warehousing is the storage of information over time by a business or other organization.
-New data is periodically added by people in various key departments such as marketing and sales.
-The warehouse becomes a library of historical data that can be retrieved and analyzed in order to inform decision-making in the business.
-The key factors in building an effective data warehouse include defining the information that is critical to the organization and identifying the sources of the information.
-A database is designed to supply real-time information. A data warehouse is designed as an archive of historical information

How Data Warehouse Works.

-The need to warehouse data evolved as businesses began relying on computer systems to create, file, and retrieve important business documents. The concept of data warehousing was introduced in 1988 by IBM researchers Barry Devlin and Paul Murphy.

-Data warehousing is designed to enable the analysis of historical data. Comparing data consolidated from multiple heterogeneous sources can provide insight into the performance of a company. A data warehouse is designed to allow its users to run queries and analyses on historical data derived from transactional sources.

-Data added to the warehouse do not change and cannot be altered. The warehouse is the source that is used to run analytics on past events, with a focus on changes over time. Warehoused data must be stored in a manner that is secure, reliable, easy to retrieve, and easy to manage. -Maintaining the Data Warehouse.

-There are certain steps that are taken to maintain a data warehouse. One step is data extraction, which involves gathering large amounts of data from multiple source points. After a set of data has been compiled, it goes through data cleaning, the process of combing through it for errors and correcting or excluding any that are found.

-The cleaned-up data are then converted from a database format to a warehouse format. Once stored in the warehouse, the data goes through sorting, consolidating, and summarizing, so that it will be easier to use. Over time, more data are added to the warehouse as the various data sources are updated.

-A key book on data warehousing is W. H. Inmon's "Building the Data Warehouse," a practical guide that was first published in 1990 and has been reprinted several times.

-Today, businesses can invest in cloud-based data warehouse software services from companies including Microsoft, Google, Amazon, and Oracle, among others.

Data warehouse system (DWH) is also known with other names as:
-Decision Support System (DSS)
-Executive Information System (EIS)
-Management Information System (MIS)
-Business Intelligence Solution (BIS)

Our On-Premise Corporate Classroom Training is designed for your immediate training needs

Data Warehousing
Data Warehousing

Key steps in Data warehouse implementation along with its deliverables.

Step Tasks Deliverables
1 Need to define project scope Scope Definition
2 Need to determine business needs Logical Data Model
3 Define Operational Datastore requirements Operational Data Store Model
4 Acquire or develop Extraction tools Extract tools and Software
5 Define Data Warehouse Data requirements Transition Data Model
6 Document missing data To Do Project List
7 Maps Operational Data Store to Data Warehouse D/W Data Integration Map
8 Develop Data Warehouse Database design D/W Database Design
9 Extract Data from Operational Data Store Integrated D/W Data Extracts
10 Load Data Warehouse Initial Data Load
11 Maintain Data Warehouse On-going Data Access and Subsequent Loads

Our On-Premise Corporate Classroom Training is designed for your immediate training needs

Database vs. Data Warehouse Comparison

The main difference is that databases are organized collections of stored data. Data warehouses are information systems built from multiple data sources - they are used to analyze data. That to say, OLTP Solutions are best used with a database, where data warehouses are best suited for OLAP solutions.


OLTP vs. OLAP

OLTP (online transaction processing) is a term for a data processing system that focuses on transactions. This is usually the dominant paradigm for databases that contain information used by a business on a day-to-day basis. Employees need fast, efficient queries and information that's up-to-date and accurate, which OLTP is specifically designed to enable.

OLAP (online analytical processing) is a term for a data processing system that focuses on data analysis and decision-making, rather than performance and day-to-day use. Many OLAP systems are connected with business intelligence (BI) solutions that make it easier for non-technical managers and executives to get answers to their questions.

Businesses that need an OLTP solution for fast data access typically make use of a database. Meanwhile, data warehouse systems are better suited for an OLAP solution that can aggregate current data as well as historical information.

Below are some more distinctions that further differentiate databases and data systems at a high level.
Property Database Data Warehouse
Processing Method OnLine Transaction Processing (OLTP) OnLine Analytical Processing (OLAP)
Optimization Deletes, inserts, replaces and updates large numbers of short online transactions quickly. Rapidly analyze massive volumes of data and provide different viewpoints for analysts.
Data structure Highly normalized data structure with many different tables containing no redundant data.

Thus, data is more accurate but slow to retrieve.
Denormalized data structure with few tables containing repeat data.

Thus, data is potentially less accurate but fast to retrieve.
Data timeline Current, real-time data for one part of the business Historical data for all parts of the business
Data analysis Analysis is slow and painful due to the large number of table joins needed and the small time frame of data available. Analysis is fast and easy due to the small number of table joins needed and the extensive time frame of data available.
Concurrent users Thousands of concurrent users supported.

However, only one user can modify each piece of data at a time.
Small number of concurrent users.
ACID compliance Records data in an ACID-compliant manner to ensure the highest levels of integrity. Not always ACID-compliant though some companies do offer it.
Uptime 99.99% uptime Downtime is built-in to accommodate periodic uploads of new data
Storage Limited to a single data source from a particular business function All data sources from all business functions
Query type Simple transactional queries Complex queries for in-depth analysis
Data summary Highly granular and precise As granular and precise as you want it to be

Support our effort by subscribing to our youtube channel. Update yourself with our latest videos on Data Science.

Looking forward to see you soon, till then Keep Learning !

Our On-Premise Corporate Classroom Training is designed for your immediate training needs

Data Warehousing
                         



Corporate Scholarship Career Courses