We have moved to www.dataGenX.net, Keep Learning with us.

Tuesday, January 29, 2013

Pivot stage made easy


Many people have the following misconceptions about Pivot stage.
1) It converts rows into columns
2) By using a pivot stage, we can convert 10 rows into 100 columns and 100 columns into 10 rows
3) You can add more points here!!



Let me first tell you that a Pivot stage only CONVERTS COLUMNS INTO ROWS and nothing else. Some DS Professionals refer to this as NORMALIZATION. Another fact about the Pivot stage is that it's irreplaceable i.e no other stage has this functionality of converting columns into rows!!! So , that makes it unique, doesn't!!!
Let's cover how exactly it does it....

For example, lets take a file with the following fields: Item, Quantity1, Quantity2, Quantity3....
Item~Quantity1~Quantity2~Quantity3
ABC~100~1000~10000
DEF~200~2000~20000
GHI~300~3000~30000

Basically you would use a pivot stage when u need to convert those 3 Quantity fields into a single field whch contains a unique Quantity value per row...i.e. You would need the following output

Item~Quantity
ABC~100
ABC~1000
ABC~10000
DEF~200
DEF~2000
DEF~20000
GHI~300
GHI~3000
GHI~30000


How to achieve the above in Datastage???

In this case our source would be a flat file. Read it using any file stage of your choice: Sequential file stage, File set stage or Dataset stage. Specify 4 columns in the Output column derivation tab.
Now connect a Pivot stage from the Tool pallette to the above output link and create an output link for the Pivot stage itself (fr enabling the Output tab for the pivot stage).

Unlike other stages, a pivot stage doesn't use the generic GUI stage page. It has a stage page of its own. And by default the Output columns page would not have any fields. Hence, you need to manually type in the fields. In this case just type in the 2 field names : Item and Quantity. However manual typing of the columns becomes a tedious process when the number of fields is more. In this case you can use the Metadata Save - Load feature. Go the input columns tab of the pivot stage, save the table definitions and load them in the output columns tab. This is the way I use it!!!

Now, you have the following fields in the Output Column's tab...Item and Quantity....Here comes the tricky part i.e you need to specify the DERIVATION ....In case the field names of Output columns tab are same as the Input tab, you need not specify any derivation i.e in this case for the Item field, you need not specify any derivation. But if the Output columns tab has new field names, you need to specify Derivation or you would get a RUN-TIME error for free....

For our example, you need to type the Derivation for the Quantity field as

Column name Derivation
Item Item (or you can leave this blank)
Quantity Quantity1, Quantity2, Quantity3.

Just attach another file stage and view your output!!! So, objective met!!!