Search This Blog

Thursday, March 14, 2024

DW

 


1. Data warehouse - Intro

2. Table Types

2.1. Dimension Table

2.1.1. Types of Dimensions

2.1.1.1. Conform / Reusable Dim

2.1.1.2. Junk Dim

2.1.1.3. Degenerated Dim

2.1.1.4. Slowly Changing Dim (SCD) (3 Types)

2.2. Fact Table

2.2.1. Additive Fact - Star Schema ???

2.2.2. Semi Additive Fact - Snowflake Schema???

2.2.3. Non Additive Fact

2.2.4. Fact Less Fact

3. Schema Types

3.1. Star Schema

3.2. Snowflake Schema

4. Approach

4.1. Top Down

4.2. Bottom Up

5. Data Mart

5.1. Types of Data Mart

5.1.1. Dependent Data Mart

5.1.2. Independent Data Mart

5.1.3. Hybrid Data Mart

6. Data Modeling

6.1. Types of data modeling

6.1.1. Relational:

6.1.2. Dimensional:

6.1.3. Entity-Rich (E-R):

6.2. Three levels of data abstraction

6.2.1. Conceptual Model Design

6.2.2. Logical

6.2.3. Physical

6.3. Critical Relationship in a Data Model

6.3.1. Identifying

6.3.2. Non-identifying

6.3.3. Self-recursive.

7. Comparison

7.1. OLTP Vs OLAP

7.2. Data Warehouse Vs Data Marts


Data Warehouse Tutorial For Beginners - Intellipaat-PL

Terminology: DW, Data warehouse, ETL, Cube, Data Mart, Data Mining, Reporting, OLAP, OLTP, Metadata (Data Catalog, Data Directory, DWH Roadmap, Nerve Center), Data Drilling (Drill Up, Drill Through, Drill Across, Drill Down),  


Data Source > [ ODS (Operational Data Source) ], Staging Area > Warehouse > Data Marts > Presentation

Reference: https://learndatamodeling.com/blog/time-dimension/

https://www.1keydata.com/datawarehousing/junk-dimension.html



1. Data warehouse - Intro

  • Data comes or extract from multiple sources - clean up and Load the data - Generate the report from that - from the report management take decision

  • Supports Multidimensional Queries

  • Enable users to view the same data in different ways

  • Each aspect of information represents a different dimension

Abbrevations

  • Operational data store (ODS)



2. Table Types 

2.1. Dimension Table 

  • Primary Key and Master Data (non measurable data)

2.1.1. Types of Dimensions 

2.1.1.1. Conform / Reusable Dim

  • A conformed dimension can exist as a single dimension table that relates to multiple fact tables within the same data warehouse, or as identical dimension tables in separate data marts.

  • Date is a common conformed dimension because its attributes (day, week, month, quarter, year, etc.) have the same meaning when joined to any fact table.

2.1.1.2. Junk Dim

2.1.1.3. Degenerated Dim

  • Sometimes a dimension is defined that has no content except for its primary key. For example, when an invoice has multiple line items, the line item fact rows inherit all the descriptive dimension foreign keys of the invoice, and the invoice is left with no unique content. But the invoice number remains a valid dimension key for fact tables at the line item level. This degenerate dimension is placed in the fact table with the explicit acknowledgment that there is no associated dimension table. Degenerate dimensions are most common with transaction and accumulating snapshot fact tables.

2.1.1.4. Slowly Changing Dim (SCD) (3 Types)

  • SCD Type 1 - Latest / current data

    • New record will be inserted - No history maintained

    • Updated data will be updated without maintaining history

  • SCD Type 2 - Historical Data

    • Earlier record’s UpdatedData as system date

    • Update will be inserted as New record with future updated date

  • SCD Type 3 - Partial History

    • Only one previous data based on table design available - need to define which column we need to maintain the partial history

2.2. Fact Table

  • PKs of DIMs / FKs and Measurable Columns/data

2.2.1. Additive Fact - Star Schema ???

  • can be summed up across all dimensions in the fact table, such as sales

  • All dim tables primary key connected with Fact Table

2.2.2. Semi Additive Fact - Snowflake Schema???

  • can be summed up for some, but not all, dimensions in the fact table, such as a headcount

  • Some of the Dim primary keys connected with Fact table and some connected with other DIM tables

2.2.3. Non Additive Fact

  • Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

2.2.4. Fact Less Fact

3. Schema Types

3.1. Star Schema

  • Fact connected with all dimensions

3.2. Snowflake Schema

  • Fact table only few of the dim keys connected

4. Approach 

4.1. Top Down

  • Data will flow from DW to Data Mart

4.2. Bottom Up

  • Data Mart to Data warehouse

5. Data Mart

  • Small set of subject data area

5.1. Types of Data Mart

5.1.1. Dependent Data Mart

5.1.2. Independent Data Mart

5.1.3. Hybrid Data Mart

6. Data Modeling

  • Data modeling is the process of creating a visual representation of either a whole information system or parts of it to communicate connections between data points and structures.

  • Data models are used to describe how the data is stored, accessed, and updated in a DBMS. A set of symbols and text is used to represent them so that all the members of an organization can understand how the data is organized.

6.1. Types of data modeling 

6.1.1. Relational: 

  • Although “older” in approach, the most common database model still in use today is relational, which stores the data in fixed-format records and arranges data in tables with rows and columns. The most basic type of data model has two elements: measures and dimensions. Measures are numeric values, such as quantities and revenue, used in mathematical calculations like sum or average. Dimensions can be text or numeric. They are not used in calculations and include descriptions or locations. The raw data is defined as a measure or a dimension. Other terminology used in relational database design includes “relations” (the table with rows and columns), “attributes” (columns), “tuples” (rows), and “domain” (set of values allowed in a column). While there are additional terms and structural requirements that define a relational database, the important factor is the relationships defined within that structure. Common data elements (or keys) link tables and data sets together. Tables can also be related explicitly, like parent and child relationships including one-to-one, one-to-many, or many-to-many.


6.1.2. Dimensional: 

  • Less rigid and structured, the dimensional approach favors a contextual data structure that is more related to the business use or context. This database structure is optimized for online queries and data warehousing tools. Critical data elements, like a transaction quantity for example, are called “facts” and are accompanied by reference information called “dimensions,” be that product ID, unit price, or transaction date. A fact table is a primary table in a dimensional model. Retrieval can be quick and efficient – with data for a specific type of activity stored together – but the lack of relationship links can complicate analytical retrieval and use of the data. Since the data structure is tied to the business function that produces and uses the data, combining data produced by dissimilar systems (in a data warehouse, for instance) can be problematic.


6.1.3. Entity-Rich (E-R): 

An E-R model represents a business data structure in graphical form containing boxes of various shapes to represent activities, functions, or “entities” and lines to represent associations, dependencies, or “relationships.” The E-R model is then used to create a relational database with each row representing an entity and the fields in that row contain attributes. As in all relational databases, “key” data elements are used to link tables together.

6.2. Three levels of data abstraction

6.2.1. Conceptual Model Design


6.2.2. Logical 


6.2.3. Physical





6.3. Critical Relationship in a Data Model 

6.3.1. Identifying

  • A relationship line normally connects parent and child tables. But if a child table’s reference column is part of the table’s primary key, the tables are connected by a thick line, signifying an identifying relationship.

6.3.2. Non-identifying

  • Non-identifying. If a child table’s reference column is NOT a part of the table’s primary key, the tables are connected by a dotted line, signifying a no-identifying relationship.


6.3.3. Self-recursive.

  • Self-recursive. A recursive relationship is a standalone column in a table connected to the primary key in the same table.

7. Comparison 

7.1. OLTP Vs OLAP 


OLTP

OLAP

Online Transaction Procession Operational System

Online Analytical Processing System

Focuses on small transactions - huge transactions

Low transactions - More data (historical also)

Uses Operational Data

Uses consolidated Data




7.2. Data Warehouse Vs Data Marts


Data Warehouse

Data Marts

Enterprise wide data

Department wide data

Multiple subject areas

Single subject area

Multiple data sources

Limited data sources

Occupies large memory

Occupies limited memory

Long time to implement

Shorter time to implement




No comments:

Post a Comment