Nuts & Bolts of DataStage: Dimension Table and Its Type in Data WareHouse

A dimension is a structure that categorizes data in order to enable users to answer business questions. It contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information; others are used to specify how fact table data should be summarized to provide useful information to the analyst. Dimension tables contain hierarchies of attributes that aid in summarization. Calculations on fact table are performed through dimensions.

Dimension table fields

The dimension tables should have at least fields listed below and contain fields used to group data during the database inquiry process.

Those are three types of fields:

The primary key(surrogate key) - joins with the foreign key in the fact table and allows to connect those two tables to ensure the integrity. It has no business meaning and used to maintain a hierarchy.
The natural key(domain key) – a descriptor of the data. It is based on attributes that exist. The relationship of the surrogate key and natural key may be one o one or many to one(in slowly changing dimension)
Descriptive attributes - (textual or numeric)

Type of Dimension tables

Slowly Changing Dimensions:

Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension.
i.e - location of a company

Rapidly Changing Dimensions:

A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need to track the changes, using a standard Slowly Changing Dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimension is called a Rapidly Changing Dimension.

Junk Dimensions:

A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Itn is a collection of random transactional codes flags and/or text attributes that are unrelated to any particular dimension. Junk dimensions are often created to manage the foreign keys created by Rapidly Changing Dimensions.

i.e - Assume that we have a gender dimension and marital status dimension. In the fact table we need to maintain two keys referring to these dimensions. Instead of that create a junk dimension which has all the combinations of gender and marital status (cross join gender and marital status table and create a junk table). Now we can maintain only one key in the fact table.

Key   Gen   Mart
A0      F          Y
A1      F          N
A2      M         Y
A3      M         N

we can remove Gender and Marital status column by this key column.

Inferred Dimensions:

While loading fact records, a dimension record may not yet be ready. One solution is to generate an surrogate key with Null for all the other attributes. This should technically be called an inferred member, but is often called an inferred dimension.

Conformed Dimensions:

A Dimension that is used in multiple locations is called a conformed dimension. A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses.
i.e. - Time dimension which can be added multiple facts

Degenerate/Empty Dimensions:

A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated number in a report. You can use these values to trace back to transactions in the OLTP system.
These dimensions are used when fact tables represent transactional data. Suppose, if we have a dimension with only one record, it is unnecessary and very inconvenient to create it separately. Then we should locate it in a fact table as a degenerate dimension without the foreign key.

Role Playing Dimensions:

A role-playing dimension is one where the same dimension key — along with its associated attributes — can be joined to more than one foreign key in the fact table.
i.e - a fact table may include foreign keys for both Ship Date and Delivery Date. But the same date dimension attributes apply to each foreign key, so you can join the same dimension table to both foreign keys. Here the date dimension is taking multiple roles to map ship date as well as delivery date, and hence the name of Role Playing dimension.

Shrunken Dimensions:

A shrunken dimension is a subset of another dimension.
i.e - the Orders fact table may include a foreign key for Product, but the Target fact table may include a foreign key only for ProductCategory, which is in the Product table, but much less granular. Creating a smaller dimension table, with ProductCategory as its primary key, is one way of dealing with this situation of heterogeneous grain. If the Product dimension is snowflaked, there is probably already a separate table for ProductCategory, which can serve as the Shrunken Dimension.

Static Dimensions:

Static dimensions are not extracted from the original data source, but are created within the context of the data warehouse. A static dimension can be loaded manually — for example with Status codes — or it can be generated by a procedure, such as a Date or Time dimension.

Nuts & Bolts of DataStage

Thursday, February 20, 2014

Dimension Table and Its Type in Data WareHouse