We have moved to www.dataGenX.net, Keep Learning with us.

Monday, July 02, 2012

Development/Debug Stage in DataStage


Head stage
The Head Stage is a Development/Debug stage. It can have a single input link and a single output link.
It is one of a number of stages that InfoSphere DataStage provides to help you sample data
The Head Stage selects the first N rows from each partition of an input data set and copies the selected
rows to an output data set. You determine which rows are copied by setting properties which allow you
to specify:
·         The number of rows to copy
·         The partition from which the rows are copied
·         The location of the rows to copy
·         The number of rows to skip before the copying operation begins
This stage is helpful in testing and debugging applications with large data sets. For example, the Partition
property lets you see data from a single partition to determine if the data is being partitioned as you
want it to be. The Skip property lets you access a certain portion of a data set.




Tail stage
The Tail Stage is a Development/Debug stage. It can have a single input link and a single output link. It
is one of a number of stages that InfoSphere DataStage provides to help you sample data.
The Tail Stage selects the last N records from each partition of an input data set and copies the selected
records to an output data set. You determine which records are copied by setting properties which allow
you to specify:
·         The number of records to copy
·         The partition from which the records are copied
This stage is helpful in testing and debugging applications with large data sets. For example, the Partition
property lets you see data from a single partition to determine if the data is being partitioned as you want it to be. The Skip property lets you access a certain portion of a data set.



Sample stage
The Sample stage is a Development/Debug stage. It can have a single input link and any number of
output links when operationg in percent mode, or a single input and single output link when operating
in period mode. It is one of a number of stages that InfoSphere DataStage provides to help you sample
data.
The Sample stage samples an input data set. It operates in two modes. In Percent mode, it extracts rows,
selecting them by means of a random number generator, and writes a given percentage of these to each
output data set. You specify the number of output data sets, the percentage written to each, and a seed
value to start the random number generator. You can reproduce a given distribution by repeating the
same number of outputs.
In Period mode, it extracts every Nth row from each partition, where N is the period, which you supply.
In this case all rows will be output to a single data set, so the stage used in this mode can only have a
single output link



Peek stage
The Peek stage is a Development/Debug stage. It can have a single input link and any number of output
links.
The Peek stage lets you print record column values either to the job log or to a separate output link as
the stage copies records from its input data set to one or more output data sets.




Row Generator stage
The Row Generator stage is a Development/Debug stage. It has no input links, and a single output link.
The Row Generator stage produces a set of mock data fitting the specified meta data. This is useful
where you want to test your job but have no real data available to process.
The meta data you specify on the output link determines the columns you are generating.
For decimal values the Row Generator stage uses dfloat. As a result, the generated values are subject to
the approximate nature of floating point numbers. Not all of the values in the valid range of a floating
point number are representable. The further a value is from zero, the greater the number of significant
digits, the wider the gaps between representable values.



Column Generator stage
The Column Generator stage is a Development/Debug stage. It can have a single input link and a single
output link.
The Column Generator stage adds columns to incoming data and generates mock data for these columns
for each data row processed. The new data set is then output. (See also the Row Generator stage which
allows you to generate complete sets of mock data.



Write Range Map stage
The Write Range Map stage is a Development/Debug stage. It allows you to write data to a range map.
The stage can have a single input link. It can only run in sequential mode.
The Write Range Map stage takes an input data set produced by sampling and sorting a data set and
writes it to a file in a form usable by the range partitioning method. The range partitioning method uses
the sampled and sorted data set to determine partition boundaries. .
A typical use for the Write Range Map stage would be in a job which used the Sample stage to sample a
data set, the Sort stage to sort it and the Write Range Map stage to write the range map which can then
be used with the range partitioning method to write the original data set to a file set.




njoy the simplicity.......
Atul Singh