We have moved to www.dataGenX.net, Keep Learning with us.

Thursday, November 28, 2013

Tail Stage in DataStage

Tail Stage is another one stage from development stage category. It can have a single input link and a single output link.
The Tail Stage selects the last N records from each partition of an input data set and copies the selected records to an output data set.

a) Job Design :

Wednesday, November 27, 2013

Head Stage in DataStage

Welcome to Basic Intro with Stage Series, We are going to look into HEAD stage ( Developmet/Dubug Categoty).  It can have a single input link and a single output link. 
The Head Stage selects the first N rows from each partition of an input data set and copies the selected rows to an output data set. You determine which rows are copied by setting properties which allow you to specify:
  • The number of rows to copy
  • The partition from which the rows are copied
  • The location of the rows to copy
  • The number of rows to skip before the copying operation begins.

Friday, November 22, 2013

ETL Job Design Standards - 2

Part 1 --> ETL Job Design Standards - 1

Parameter Management Standards
This section defines standards to manage job parameters across environments. Jobs should use parameters liberally to avoid hard coding as much as possible. Some categories of parameters include: 

  • Environmental parameters, such as directory names, file names, etc.
  • Database connection parameters
  • Notification email addresses
  • Processing options, such as degree of parallelism

Thursday, November 14, 2013

DataStage Server Hang Issues & Resolution

Server hang issue can occurred when

1) Metadata repository database detects a deadlock condition and choose failing job as the victim of the deadlock.
2) Log maintenance is ignored.
3) Temp folders are not maintained periodically.

I will try to explain above three points in detail below:

Wednesday, November 13, 2013

Interview Questions : DataStage - self-2

48    Why can’t we use sequential file as a lookup?
49    What is data warehouse?
50    What is ‘Star-Schema’?
51    What is ‘Snowflake-Schema’?
52    What is difference between Star-Schema and Snowflake-Schema?
53    What is mean by surrogate key?
54    What is ‘Conformed Dimension’?

Tuesday, November 12, 2013

ETL Job Design Standards - 1

When using an off-the-shelf ETL tool, principles for software development do not change: we want our code to be reusable, robust, flexible, and manageable. To assist in the development, a set of best practices should be created for the implementation to follow. Failure to implement these practices usually result in problems further down the track, such as a higher cost of future development, increased time spent on administration tasks, and problems with reliability.
Although these standards are listed as taking place in ETL Physical Design, it is ideal that they be done before the prototype if possible. Once they are established once, they should be able to be re-used for future increments and only need to be reviewed. 

Listed below are some standard best practice categories that should be identified on a typical project. 

Monday, November 11, 2013

DataStage Scenario - Problem4



Saturday, November 09, 2013

DB2 commands Cheat Sheet

DB2 System Commands

    DB2LEVEL -- checks version of DB2 installed.
    DB2ILIST -- lists all instances installed
    DB2CMD -- opens a command line processor
    DB2CC -- opens db2 control center
    DB2LICM -l -- gets db2 type.

Command Line Processor Commands

Friday, November 08, 2013

Conductor Node in Datastage

Below is the sample APT CONFIG FILE ,see in bold to mention conductor node.

node "node0"
fastname "server1"
pools "conductor"
resource disk "/datastage/Ascential/DataStage/Datasets/node0" {pools "conductor"}
resource scratchdisk "/datastage/Ascential/DataStage/Scratch/node0" {pools ""}

Thursday, November 07, 2013

Why Entire partition is used in LOOKUP stage ?

Entire partition has all data across the nodes So while matching(in lookup) the records all data should be present across all nodes.

Wednesday, November 06, 2013

DataStage Scenario - Problem3

Scenario :  Get the next column value in current row

Input file :

Sq, No

Tuesday, November 05, 2013

Dummy Data Generation using Row Generator in DataStage - 2

By default the Row Generator stage runs sequentially, generating data in a single partition. You can, however, configure it to run in parallel and meaningful data.

We are using the same job design as in Dummy Data Generation using Row Generator in DataStage - 1

a) Job Design :

Sunday, November 03, 2013

Wishing You & your Loved ones a Happy Diwali !!!!!

Dummy Data Generation using Row Generator in DataStage - 1

How to Generate Input Data for your dummy jobs n practice ??

In DataStage, There is a Stage called "Row Generator" under "Devlopment/Debug Stages" category. For Generating the dummy data,  we will use this stage.

So, We are going to create a job which generates dummy data.

Saturday, November 02, 2013

Interview Questions : DataStage - self-1

Sharing some collection of InterView Questions. Try these to rank your knowledge :-)

1    Types of Stages in DS? Explain with Examples
2    What are active stages and passive stages?
3    Can you filter data in hashed file? (No)
4    Difference between sequential and hashed file?
5    How do you populate time dimension?
6    Can we use target hashed file as lookup? (Yes)
7    What is Merge Stage?
8    What is Job Sequencer?
9    What are stages in sequences?
10    How do you pass parameters?
11    What parameters you used in your project?