Nuts & Bolts of DataStage: July 2014

Thursday, July 31, 2014

Oracle SQL Tuning Tips - 2

Don’t forget to tune views. Views are SELECT statements and can be tuned in just the same way as any other type of SLECT statement can be. All tuning applicable to any SQL statement are equally applicable to views.

Avoid including a HAVING clause in SELECT statements. The HAVING clause filters selected rows only after all rows have been fetched. Using a WHERE clause helps reduce overheads in sorting, summing, etc. HAVING clauses should only be used when columns with summary operations applied to them are restricted by the clause.Use:
SELECT city FROM country WHERE city!= ‘Vancouver’ AND city!= ‘Toronto’; GROUP BY city;
Rather than:
SELECT city FROM country GROUP BY city HAVING city!= ‘Vancouver’ AND city!= ‘Toronto’;

Consideration when writing an SQL statement is that it returns a correct result. The second is that it be the most efficient for a given situation. You can use many different SQL statements to achieve the same result. It is often the case that only one statement will be the most efficient choice in a given situation.

Remember that processing SQL is a sequence of Parse (syntax check and object resolution), Execution (required reads and writes), and Fetch (row results retrieved, listed, sorted, and returned). SQL “tuning” consists, quite simply, of reducing one or more of them.

Note: generally Parse is the greatest time and resource hog. Parse overhead can be minimized by the use of Procedures, Functions, Packages, Views, etc.

DataStage Custom Routine to Get a File Size

Below is a DataStage custom transform routine to get the size of a file. The full path of the file is passed in as a parameter called "Filename".

Connect to Oracle DB with Linux Shell Script

In this post, we will see how to connect to Oracle DB in Linux Shell Script. There are some prerequisite for a Linux server to connect to Oracle DB. There should be oracle db driver installed on the Linux server from which you want to connect to Oracle DB. Check whether you can call/connect SQLPLUS from command line.
$ sqlplus
if you are not able to, check whether oracle path is set in .profile file. Lets assume, Oracle DB driver is installed on Server and Sqlplus path is set.

Now, there are some basic steps to connect to Oracle DB-
a) Connect to Oracle DB with UserID and Password
b) Execute the SQLs
c) Store the output in a file or send to standard output (terminal)

Navigating the many paths of metadata for DataStage 8

Source : Navigating the many paths of metadata for DataStage 8

Looking at the methods for importing metadata table definitions into DataStage 8 ETL jobs.
All of the metadata import methods of DataStage 7 are in DataStage 8 and all execute in the same way. Developers familiar with previous versions will be right at home! What is tricky to come to terms with are all the new ways to get metadata into DataStage 8 and the Metadata Server. The Metadata Server can provide reporting on metadata from products outside of DataStage such as BI tools so in some cases you might be importing the metadata for reporting and not for ETL.
The list that follows covers just the techniques for importing metadata to be used by DataStage jobs.

Some DataStage Job design

Updating the Master Reports List (MRL) in Datastage

Create a Job which has a Row Generator stage linking to an ODBC or DB2 stage. Change the parameter in the Row Generator stage to output zero records. Write the SQL to update the MRL table as 'After SQL' in the output ODBC or DB2 stage.

DataStage Scenario - Problem20

Goal : Reverse the Column

Input is like this :

i/p
col1,col2
101,a
102,b
103,c
104,d

Handling Filename with Spaces in Linux

Spaces in Unix/Linux Filename is not a good habit as usually it is used as delimiter for field values rather than being in filename itself. In Linux, SPACE is used as a IFS ( Internal Field Separator) that is used for word splitting after expansion and to split lines into words with the read build in command. The default values of IFS is SPACE, TAB, NEW LINE.

This is why whenever we are doing something with File-name having spaces in Linux, we got error like..
: No such File or Directory

What if you got some files from a "Unix dumb" Server source, How you will handle the filename with spaces, Here are some simple ways to deal with it.

Column Import Stage - Read a Fixed width File

The Column Export stage is a restructure stage. It can have a single input link, a single output link and a single rejects link. Typically used to divide data arriving in a single column into multiple columns.
Here, I tried to explain How to use it for importing data...

a) Design :

As you can see, Here we have followed this design to elaborate the stage. We are reading a file, Importing Columns with Import Stage and then loading them into Sequential File.

DataStage Scenario - Design9 - job1

DataStage Scenario Problem --> DataStage Scenario - Problem9

Solution Design :

a) Job Design :
Below design will achieve the output as per our requirement.

Count Rows and Calculate Sum in same Aggregator

Generally, we are using Fork n Join method ( divide the data into 2 links which goes to 2 Aggregator ) to calculate Sum and Count Rows of Data because Aggregator Stage is providing one type of aggregation at a time, You can Count the rows or can do some other aggregation.
Here, I have describe a way in which we can Count Rows and Calculate Sum in a Single Aggregator Stage with some limitations :-)

Design :

As per design, we used Column Generator and Aggregator Stage to calculate and count.

Subscribe to: Posts ( Atom )

Nuts & Bolts of DataStage

Thursday, July 31, 2014

Oracle SQL Tuning Tips - 2

Tuesday, July 29, 2014

Oracle SQL Tuning Tips - 1

Friday, July 25, 2014

DataStage Custom Routine to Get a File Size

Thursday, July 24, 2014

Connect to Oracle DB with Linux Shell Script

Monday, July 21, 2014

Navigating the many paths of metadata for DataStage 8

Friday, July 18, 2014

Some DataStage Job design

Updating the Master Reports List (MRL) in Datastage

Thursday, July 17, 2014

DataStage Scenario - Problem20

Goal : Reverse the Column

Tuesday, July 15, 2014

Handling Filename with Spaces in Linux

Wednesday, July 09, 2014

Column Import Stage - Read a Fixed width File

Monday, July 07, 2014

DataStage Scenario - Design9 - job1

Saturday, July 05, 2014

Count Rows and Calculate Sum in same Aggregator

Design :