Wednesday, April 09, 2014
Some more design tips for DataStage Job Development
• Common information like home directory, system date, username, password should be initialized in a global variable and then variable should be referred everywhere.
• Stage Variables allow you to hold data from a previous record when the next record, allowing you to compare between previous and current records. Stage variables also allow you return multiple errors for a record of information. By being able to evaluate all data in a record and not just error on the first exception that is found, the cleanup of data is more efficient and requires less iteration.
• Nulls are a curse when it comes to using functions/routines or normal equality type expressions.
E.g. NULL = NULL doesn’t work; neither does concatenation when one of the fields is null. Changing the nulls to 0 or “” before performing operations is recommended to avoid erroneous outcomes.
• Ensure that job does not look complex. If there are more stages (more than 10) in a job divide into two or more jobs on functional basis.
• Use containers where stages in the jobs can be grouped together.
• Use Annotations for describing steps done at stages. Use Description Annotation as job title; as Description Annotation also appears in Job properties>Short Job Description and also in the Job Report when generated.
• When using String functions on decimal always use Trim function to avoid as String functions interpret an extra Space used for sign in decimal.
• When you need to get a substring (e.g. first 2 characters from the left) of a character field:
Use <Field Name>[1,2]
Similarly for a decimal field then:
Use Trim(<Field Name>)[1,2]
• Always use Hash Partition in Join and Aggregator stages. The hash key should be the same as the key used to join/aggregate.
If Join/Aggregator stages do not produce desirable results, try running in
sequential mode (verify results; if still incorrect problem is with data/logic) and then run in parallel using Hash partition.
• Use Column Generator stage to create sequence numbers or adding columns having hard coded values.
• In Job sequences; always use “Reset if required, then run” option in Job Activity stages. (Note: This is not a default option)
• When mapping a decimal field to a char field or vice versa , it is always better to convert the value in the field using the ‘Type Conversion’ functions “DecimalToString” or “StringToDecimal” as applicable while mapping.
• “Clean-up on failure” property in sequential files must be enabled (enabled by default)
for more -->