We have moved to www.dataGenX.net, Keep Learning with us.

Monday, February 03, 2014

DataStage Scenario - Design 2 - job1


 DataStage Scenario Problem -->  DataStage Scenario - Problem2

Solution Design :


a) Job Design :

Below is the design which can achieve the output as we needed. Here, we are reading seq file as a input, then data is passing through Aggregator and Filter stage to achieve the output.
 



b) Aggregator Stage Properties

Input data contains only one column "No" , In Aggregator stage, we have group the data on the "No" column and calculate the rows for each Key ( No ). 


When we have used the "Count Rows" aggregation type, it will generate a new column which contain the count for each Key (No). Here we have given the column name - "count" and assigned to output as below.



c) Filter Stage Properties

In Filter stage, we put 2 where condition  count=1 and count>1. and assigned different  output files to both conditions.


Assigned the data ( column No ) to output tab.




d) Output File

We got two output from the jobs

i) Contains where count=1 ( unique values in input )
ii) Contains where count>1 ( dups values in input )




No comments :

Post a Comment