Nuts & Bolts of DataStage: DataWareHouse (ETL) Designing Steps

Sunday, February 23, 2014

DataWareHouse (ETL) Designing Steps

A) Planning and Designing Steps

i) Requirement and Realities Gathering

1. Business Needs

2. Data Profiling and other Data source realities

3. Compliance Requirement

4. Security Requirement

5. Data Integration

6. Data Latency

7. Archiving and Lineage

8. End user delivery interfaces

9. Available Development Skills

10. Available management Skills

11. Legacy licenses

ii) Architecture

1. Hand coded v/s ETL vender tools

2. Batch v/s Streaming data flow

3. Horizontal v/s Vertical task dependency

4. Scheduler automation

5. Exception Handling

6. Quality Handling

7. Recovery and Restart

8. Metadata

9. Security

iii) System Implementation

1. Hardware

2. Software

3. Coding Practices

4. Documentation Practices

5. Specific quality checks

iv) Test and Release

1. Development System

2. Test System

3. Production System

4. Handoff Procedure

5. Update propagation approach

6. System snapshot and rollback procedure

7. Performance tuning

B) The Data Flow design

i) Extract

1. Reading source data models

2. Connecting to and accessing data

3. Scheduling the source system intercepting notification and demons

4. Capturing changed data

5. Staging the extracted data to disk

ii) Clean

1. Enforcing column properties

2. Enforcing structure

3. Enforcing data and value rules

4. Enforcing complex business rules

5. Building a meta data foundation to describe data quality

6. Staging a cleaned data to disk

iii) Conform

1. Conforming business rules ( in dimension)

2. Conforming business metrics and performance indicator ( in fact )

3. De-duplicating

4. Householding

5. Internationalizing

6. Staging the conformed data to disk

iv) Deliver

1. Loading flat/snowflaked dimensions

2. Generating time dimensions

3. Loading degenerated dimensions

4. Loading sub-dimension

5. Loading type 1,2 and 3 SCD

6. Conforming dimensions and conforming flats

7. Handling late arriving dimensions and facts

8. Loading multivalued dimensions and facts

9. Loading ragged hierarchy dimensions

10. Loading text fact in dimensions

11. Running the surrogate key pipeline for fact tables

12. Loading three fundamental fact table grain

13. Loading and updating aggregation

14. Staging the delivered data to disk

Ref : The Data Warehouse Etl Toolkit

Nuts & Bolts of DataStage

Sunday, February 23, 2014

DataWareHouse (ETL) Designing Steps

No comments :

Post a Comment