We have moved to www.dataGenX.net, Keep Learning with us.

Monday, December 31, 2012

Wish you a very Happy New Year 2013 .. :-)


Wish you a very Happy New Year.. 
May GOD transform all your dreams into reality more beautiful!!! And as the
new year comes full of blessings, peace, health and prosperity!!!! 




till then.....
njoy the simplicity.......

Wednesday, December 26, 2012

How to find duplicate values in a table?



With the SQL statement below you can find duplicate values in any table, just change the tablefield into the column you want to search and change the table into the name of the table you need to search.

Monday, December 24, 2012

The Unix "SCRIPT" command : a command recorder




script is a standard Unix command that records a script of your interaction with the Unix system. Once it's started, it works "in the background", meaning that you continue to work normally, but the script session is dumping everything that shows up on your screen (more or less*) into some file. It probably would be better called carbon-copy. 

The most common use of script  is to document terminal session. By running script you log all the information displayed on your terminal. You can then print the log file or view it with an editor. In a way script is a specialized tee for the shell.

Thursday, December 20, 2012

How to Find the changes made by particular USER in DataStage :

How to Find the changes made by particular USER in Datastage :

Here is how -

  •  Go to Data stage Designer
  • from the Menu bar  Tools --> Advanced Find

Wednesday, December 19, 2012

How to set Default OS in Windows


Hi there...
Most of us have installed 2 OSs on our System and using Windows as default but most of the times the OS which is installed after be the Default OS so we need to select OS at booting time.

Here I am sharing How you can set Windows back as a Default OS....

Tuesday, December 18, 2012

Specifying C++ compiler settings for DataStage in Windows





Today I am sharing how to specify the C++ compiler settings for DataStage in Windows Environment. From last few days I also face a lot of issue with this :P so thought it need to share with you all.
For :  DataStage 8.x

Monday, December 17, 2012

FileSet in DataStage


DataStage can generate and name exported files, write them to their destination, and list the files it has generated in a file whose extension is, by convention, .fs. The data files and the file that lists them are called a file set.  while their storage places are diverse Unix files and they're human-readable.

Friday, December 14, 2012

DataStage Naming Conventions



DataStage Naming Conventions follows the guidelines of ETL Naming Conventions.

Contents

  • 1 Job Name Prefixes
  • 2 Stage Names
  • 3 Link Names

Thursday, December 13, 2012

DataStage Environment variables




1 Buffering 2 Building Custom stages
1.1 APT_BUFFER_FREE_RUN 2.1 DS_OPERATOR_BUILDOP_DIR
1.2 APT_BUFFER_MAXIMUM_MEMORY 2.2 OSH_BUILDOP_CODE
1.3 APT_BUFFER_MAXIMUM_TIMEOUT 2.3 OSH_BUILDOP_HEADER
1.4 APT_BUFFER_DISK_WRITE_INCREMENT 2.4 OSH_BUILDOP_OBJECT
1.5 APT_BUFFERING_POLICY 2.5 OSH_BUILDOP_XLC_BIN
1.6 APT_SHARED_MEMORY_BUFFERS 2.6 OSH_CBUILDOP_XLC_BIN

Wednesday, December 12, 2012

Encrypting Data using "tar" and "openssl" in Nix



The following shows an example of writing the contents of "tapetest" to tape:

        $ tar zcvf – tapetest | openssl des3 -salt  -k secretpassword | dd of=/dev/st0

     Reading the data back:

Tuesday, December 11, 2012

InfoSphere DataStage Jobstatus returned Codes from dsjob



Equ DSJS.RUNNING To 0
 This is the only status that means the job is actually running
Equ DSJS.RUNOK To 1
 Job finished a normal run with no warnings
Equ DSJS.RUNWARN To 2
 Job finished a normal run with warnings


Monday, December 10, 2012

DataSet in DataStage


Inside a InfoSphere DataStage parallel job, data is moved around in data sets. These carry meta data with them, both column definitions and information about the configuration that was in effect when the data set was created. If for example, you have a stage which limits execution to a subset of available nodes, and the data set was created by a stage using all nodes, InfoSphere DataStage can detect that the data will need repartitioning.

Friday, December 07, 2012

A Discussion Group for you

Hi There
I've created some groups, be a part of It and discussion .....

LinkedIn
Google+
IBM DeveloperWorks
FaceBook




till then.....
njoy the simplicity.......

Thursday, December 06, 2012

Make a File "immutable" or "unalterable" in Nix


It cannot be changed or deleted even by root. Note this works on (ext2/ext3) filesystems.
And, yes, root can delete after it's changed back.

     As root:

       $ chattr +i filename

Tuesday, December 04, 2012

Alphabetical list of stages in DataStage



The following tables contain an alphabetized list of all the stages that are available in IBM InfoSphere Information Server, Version 8.5, as included with the base installation or with add-on installations.

Whether a particular stage is displayed on the DataStage Designer client palette depends on the type of job you are creating and the products and add-ons that you installed. On the palette, stages are organized by category. In this document, stages are organized alphabetically.

Sequence job stages


The following stages are available when you create a sequence job.



Stages that are available in sequence jobs

 

Parallel job stages and server job stages [ A - F ]

 The stages in this document are included and available with IBM InfoSphere DataStage unless otherwise noted in the following key.

Palette category Description
Data Quality1 Available only with IBM InfoSphere QualityStage
Data Quality2 Available as a separately licensed add-on module to InfoSphere QualityStage
Application Available as an add-on Pack to IBM InfoSphere DataStage and QualityStage

The following key includes additional notes about stages in the Designer client.

Stages in parallel jobs or server jobs [ G - P ]

The stages in this document are included and available with IBM InfoSphere DataStage unless otherwise noted in the following key.

Palette category Description
Data Quality1 Available only with IBM InfoSphere QualityStage
Data Quality2 Available as a separately licensed add-on module to InfoSphere QualityStage
Application Available as an add-on Pack to IBM InfoSphere DataStage and QualityStage

The following key includes additional notes about stages in the Designer client.

Stages in parallel jobs or server jobs [ Q - Z ]

The stages in this document are included and available with IBM InfoSphere DataStage unless otherwise noted in the following key.

Palette category Description
Data Quality1 Available only with IBM InfoSphere QualityStage
Data Quality2 Available as a separately licensed add-on module to InfoSphere QualityStage
Application Available as an add-on Pack to IBM InfoSphere DataStage and QualityStage

The following key includes additional notes about stages in the Designer client.

Saturday, December 01, 2012

star schema

 
In the star schema design, a single object (the fact table) sits in the middle and is radially connected to other surrounding objects (dimension lookup tables) like a star. Each dimension is represented as a single table. The primary key in each dimension table is related to a forieng key in the fact table. 

  

Friday, November 30, 2012

How do we know whether we running Datastage jobs on SMP or MPP System:


Look in the Configuration file if the "fastname" has the same value for all given nodes then you are running on SMP.

Tuesday, November 27, 2012

Saving Shell Script with executable permissions: tips & tricks



In our tech routine we daily usually need to write small script to do task easily. Sometimes its irritating to edit/create and save and give the executable permissions.

What if when we create a file it have the executable permissions, Yes, we can do that with the help of umask but this will applicable on all files so don’t want that. So here is the solution….

Friday, November 23, 2012

fun!! - Your Cute dog....


Here is your number one companion – cute, talented and loyal.  He can sit, lie down and roll over.  Try giving him a pet and watch how he responds.  Keep your dog entertained by playing ball or giving him a bone.  Reward him by giving him a treat.  To get him to sit, double-click your mouse on the ground close to him.  Double-click again to get him to lie down.  Then hold your mouse button down and make a circular motion to tell him to roll over.


General hardware, software, and environment information to collect for Solaris, HP-UX and Windows


Whenever we are reporting for any issue to support team they are usually ask these details

·         Operating system version
·         Operating system patches or maintenance level
·         Operating system environment variables
·         Operating system configuration & error reporting
·         Operating system kernel parameters
·         Operating system processes
·         Mounted file systems and disk resources
·         Hostname information and network information

Wednesday, November 21, 2012

DataStage Documentation Best Practices - 2

Introduction

This document contains the Data Stage Best practises and recommendations which could be used to improve the quality of data stage jobs. This document would further be enhanced to include the specific Data Stage problems and there troubleshooting.

 
Recommendations

Tuesday, November 20, 2012

Stopping DataStage from batch[bat] file in Windows

Hi there..
I am going to share a small bat[batch] script which can be used to stop the DataStage server in
 windows environment,


Monday, November 19, 2012

Starting DataStage from batch[bat] file in Windows

Hi there..
I am going to share a small bat[batch] script which can be used to start the DataStage server in
 windows environment,

put this bat file in DSHOME or u can add a line in script to cd to DSHOME..

Saturday, November 17, 2012

XMeta DB : Datastage Repository - 2

Prior to Datastage version 8, the datastage repository information was stored in files(uv). From version 8, the repository information is also stored in database. Since the information is available in database, it is easier to write queries and find the details. During the installation you will have the option to install the repository either in DB2 or in Oracle.

Usually, the datastage repository is built on DB2 database. because Db2 comes with the Information Server Software by defalt.


A word of caution: Do not try to alter the XMETA repository contents as it may have adverse effects.


Here is the Part 1 of this tutorial :   XMeta DB : Datastage Repository - 1



Wednesday, November 14, 2012

SSH Alternatives for Windows & Linux

Here, I am going to share the tools which are used for SSH connectivity.
Please have a look...

SSH Alternatives for Linux


 
Putty
Putty is a Windows SSH client/suite. Putty includes an ssh agent, and command line file transfer tools. Simple GUI interface for virtually all settings. A simple, lightweight client for connecting to a Linux shell from your Windows machine!

SSH Alternatives for Windows



 Putty – the best
Putty is a Windows SSH client/suite. Putty includes an ssh agent, and command line file transfer tools. Simple GUI interface for virtually all settings. A simple, lightweight client for connecting to a Linux shell from your Windows machine!

Tuesday, November 13, 2012

Saturday, November 10, 2012

Data transformation in DataStage

Data transformation and movement is the process by which source data is selected, converted, and mapped to the format required by targeted systems.
The process manipulates data to bring it into compliance with business, domain, and integrity rules and with other data in the target environment. Transformation can take some of the following forms:

Wednesday, November 07, 2012

DataStage job status log values

They show what values can be used when designing job sequencers when defining triggers or nested conditions. Values are also helpful when using DataStage "dsjob" utility from the operating system command line.

Tuesday, November 06, 2012

Failed to connect to DataStage Server


When trying to logon to InfoSphere DataStage Clients (Designer, Director or Administrator) logon fails with the error:
Failed to connect to DataStage server: <servername> , project: UV
(User name and/or password incorrect (80011))
The same user can successfully logon to the Web Console.

Monday, November 05, 2012

General hardware, software, and environment information for AIX and Linux

Whenever we are reporting for any issue to support team, they are usually ask these details

·         Operating system version
·         Operating system patches or maintenance level
·         Operating system environment variables
·         Operating system configuration & error reporting
·         Operating system kernel parameters
·         Operating system processes
·         Mounted file systems and disk resources
·         Hostname information and network information

Friday, November 02, 2012

uniq : a cool nix filter



Hi guys,
Yesterday, I was sitting on my desk and doing usual boaring task ;-). Then I got a seq file containing more than 3 millions records and 5 column. What I have to do is fetch the duplicates records based on first column. It's quite easy in Nix environment with out taking help of any tool.

So i thought, I should share this to you also. Here It comes.....

Tuesday, October 30, 2012

ETL process and concepts




ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to extract data, mostly from different types of systems, transform it into a structure that’s more appropriate for reporting and analysis and finally load it into the database and or cube(s).
As ETL stands for extraction, transformation and loading. Etl is a process that involves the following tasks:

Monday, October 29, 2012

Oracle DataBase books



1)  OReilly.Oracle.PL.SQL.Programming.4th.Edition download
2)  Oracle PLSQL Best Practices  download
3)  OReilly-Oracle_Language_Pocket_Reference  download
4)  Oracle_9i_SQL_Reference  download
5)  OCA Oracle Database 11g Administration ( 1Z0-051 and 1Z0-052 ) download

Sunday, October 28, 2012

Hamster - have fun !!!



This lively pet hamster will keep you company throughout the day. Watch him run on his wheel, drink water, and eat the food you feed him by clicking your mouse. Click the center of the wheel to make him get back on it.
Its ma fav…..


Friday, October 26, 2012

DB2 query to select first or last N rows

There may be instances when you wish to select first or last N rows.
You can use the following query to limit the number of rows retrieved using select command.


First N rows

Thursday, October 25, 2012

Special Nix Commands




The following are a set of special commands which the shell provides as stand alone statements. Input and output redirection is permitted for all these commands unlike the complex commands. You cannot redirect the output from a while loop construct, only the simple or special commands used within the loop list. 

  • The colon ( : ) does nothing! A zero exit code is returned. Can be used to stand in for a command but I must admit not to finding a real use for this command. 

Monday, October 22, 2012

Websphere Application Server[WAS] log files



IBM Websphere Application Server creates the following log files trace.log ,SystemOut.log , and SystemErr.log , activity.log, StartServer.log , stopServer.log , native_stdout.log , native_stderr.log.
Let us see the above log files in details .

Sunday, October 21, 2012

Friday, October 19, 2012

Newton’s Cradle - have fun !!!


For every action there is an equal and opposite reaction
try this......

Thursday, October 18, 2012

DataStage Configuration file : Explained - 3




Below is the sample diagram for 1 node and 4 node resource allocation:


 

DataStage Configuration file : Explained - 2




1.    When configuring an MPP, you specify the physical nodes in your system on which the parallel engine will run your parallel jobs. This is called Conductor Node. For other nodes, you do not need to specify the physical node.  Also, You need to copy the (.apt) configuration file only to the nodes from which you start parallel engine applications. It is possible that conductor node is not connected with the high-speed network switches. However, the other nodes are connected to each other using a very high-speed network switches. How do you configure your system so that you will be able to achieve optimized parallelism ??

1.    Make sure that none of the stages are specified to be run on the conductor node.
2.    Use conductor node just to start the execution of parallel job.
3.    Make sure that conductor node is not the part of the default pool.

DataStage Configuration file : Explained - 1



The Datastage configuration file is a master control file (a textfile which sits on the server side) for jobs which describes the parallel system resources and architecture. The configuration file provides hardware configuration for supporting such architectures as SMP (Single machine with multiple CPU , shared memory and disk), Grid , Cluster or MPP (multiple CPU, mulitple nodes and dedicated memory per node). DataStage understands the architecture of the system through this file.

This is one of the biggest strengths of Datastage. For cases in which you have changed your processing configurations, or changed servers or platform, you will never have to worry about it affecting your jobs since  all the jobs depend on this configuration file for execution. Datastage jobs determine which node to run the process on, where to store the temporary data, where to store the dataset data, based on the entries provide in the configuration file. There is a default configuration file available whenever the server is installed.

Tuesday, October 16, 2012

How to convert a (multiple) space separated text into Tab delimited (with WORD)


1) paste the columnar text in an empty Word Document
2) Start the search/replace function
2) In the search field input 2 (two) spaces
3) in the replace field input ^t (the character t preceded by ^ means a TAB)

Monday, October 15, 2012

Input/Output data buffering (on Link) in DataStage


To improve performance and resolve bottlenecks, you can specify how input and output data is buffered. Although the size and operation of the buffer are usually the same for all links on all stages, you can modify the settings for specific links.
By default, data is buffered so that no deadlocks occur. Be careful when changing data buffering settings because specifying inappropriate values might create a deadlock.
Any changes that you make to the properties on the Advanced tab are automatically reflected on the Advanced tab of the stage at the other end of the link.

Saturday, October 13, 2012

Using Configuration Files in Data Stage Best Practices & Performance Tuning


The configuration file tells DataStage Enterprise Edition how to exploit underlying system resources (processing, temporary storage, and dataset storage). In more advanced environments, the configuration file can also define other resources such as databases and buffer storage. At runtime, EE first reads the configuration file to determine what system resources are allocated to it, and then distributes the job flow across these resources.

When you modify the system, by adding or removing nodes or disks, you must modify the DataStage EE configuration file accordingly. Since EE reads the configuration file every time it runs a job, it automatically scales the application to fit the system without having to alter the job design.

Thursday, October 11, 2012

Download for Powershell v2 for Windows 7? No need... It's already there!



A while back, Microsoft announced the release of PowerShell v2 for Windows XP, Windows Server 2003, Windows Vista, and Windows Server 2008 (see http://go.microsoft.com/fwlink/?LinkID=151321).
However, it is not clear to everyone that Powershell v2 is already part of Windows 7 and Windows Server 2008 R2.

Wednesday, October 10, 2012

Environment Variable for Data Stage Best Practices and Performance Tuning



DataStage provides a number of environment variables to control how jobs operate on a UNIX system.  In addition to providing required information, environment variables can be used to enable or disable various DataStage features, and to tune performance settings. 


Data Stage Environment Variable Settings for All Jobs


Sunday, October 07, 2012

Interview Questions : DataStage - Part 1



How did you handle reject data?
Ans:
 Typically a Reject-link is defined and the rejected data is loaded back into data warehouse. So Reject link has to be defined every Output link you wish to collect rejected data. Rejected data is typically bad data like duplicates of Primary keys or null-rows where data is expected.


If worked with DS6.0 and latest versions what are Link-Partitioner and Link-Collector used for? 
Ans: Link Partitioner - Used for partitioning the data.
Link Collector - Used for collecting the partitioned data.

Saturday, October 06, 2012

To Release Job locks in Datastage


There are Three methods to unlock the DataStage jobs:

 Using DataStage Administrator Tool.
  Using UV Utility
 – Using DataStage Director

Wednesday, October 03, 2012

Difference between scratch disk and resource scratch disk



The Only difference is :
  • Scratch Disk is for Temporary storage (Like RAM in our PC)

Tuesday, October 02, 2012

DataSet, FileSet and Seq File in DataStage


Seq File:
Extract/load from/to seq file max 2GB { Its depends on OS property, Now most of the OS supports greater than 2 GB }
when used as a source at the time of compilation it will be converted into native format from ASCII
does not support null values
A seg file can only be accessed on one node.


Saturday, September 29, 2012

Interview Questions : Linux/Unix : Part - 4

1) What is piping?
Piping, represented by the pipe character “|”, is used to combine two or more commands together. The output of the first command serves as input the next command, and so on.

2) What is a superuser?
A superuser is a special type user who has open access to all files and commands on a system. Note that the superuser’s login is usually root, and is protected by a so-called root password.

Friday, September 28, 2012

Data Modeling - Conceptual, Logical, And Physical Data Models


The three level of data modeling, conceptual data model, logical data model, and physical data model, were discussed in prior sections. Here we compare these three types of data models. The table below compares the different features: 


Thursday, September 27, 2012

SSH Login without Password in Nix



You want to use Linux/AIX/Unix and OpenSSH to atomize your tasks. Therefore you need an automatic login from hostA as user a to hostB as the same user. You don’t want to enter any passwords, because (for example) you want to call ssh from a within a shell script.

First log in on hostA as user a and generate a pair of authentication keys. Do not enter a passphrase:


Tuesday, September 25, 2012

Enable NLS Support in IBM Information Server (Datastage) 8.1 after its installed


This process describes how to enable NLS Support in IBM Information Server (Datastage) 8.1 after its installed with the NLS disabled option. This avoids the need to uninstall and reinstall datastage with NLS enabled.

1. Stop the DS Engine.

# cd /IBM InformationServer/Server/DSEngine

Sunday, September 23, 2012

Dimensional Data Model



Dimensional data model is most often used in data warehousing systems. This is different from the 3rd normal form, commonly used for transactional (OLTP) type systems. As you can imagine, the same data would then be stored differently in a dimensional model than in a 3rd normal form model.
To understand dimensional data modeling, let's define some of the terms commonly used in this type of modeling:

Saturday, September 22, 2012

Useful Perl Scripts : Part-1


1.   To Replace a character in a string


#!/usr/bin/perl

my $string = "This is string";
$string =~ tr/s/a/;
print $string;

Something about CRON in Nix

cron is a Nix utility that allows tasks to be automatically run in the background at regular intervals by the cron daemon. These tasks are often termed as cron jobs in Nix.  Crontab (CRON TABle) is a file which contains the schedule of cron entries to be run and at specified times.

This document covers following aspects of Unix cron jobs
 Crontab Restrictions
 Crontab Commands
 Crontab file – syntax
 Crontab Example
 Crontab Environment
 Disable Email
 Generate log file for crontab activity


Thursday, September 20, 2012

Creating the Virtual Machine on Windows and Sharing folder with Guest OS[Linux]




I am recommending the Oracle Virtual Box for hosting a guest OS.

1) Download and Install the VirtualBox

2) Install the respected VirtualBox Extension

you can download it from here

Sunday, September 16, 2012

Configuring Oracle database connectivity in a parallel environment in DataStage


Configuring access to Oracle databases includes granting the appropriate access level to users.

Granting access to the Oracle parallel server
To access the Oracle parallel server (OPS), users must have SELECT access to the sys.gv_$instance and sys.v_$cache tables.


Thursday, September 13, 2012

Configuring TeraData database connectivity in a parallel environment in DataStage


Installing the Teradata tools and utilities

You must install Teradata Tools and Utilities on all nodes that run parallel jobs. See the installation instructions supplied by Teradata for complete information.
Ensure that the Teradata Parallel Transporter is installed and that the following environment variables are set in the /etc/profile file:

Wednesday, September 12, 2012

Unable to use Information Server due to expired passwords for WAS, xmeta or isadmin users


The following commands can be used to reset the passwords. Make sure to match these to your new passwords for the particular user. These commands should be run with the dsadm/isadmin user:
·          
                       
       Windows:

Replace the WAS administrator user/password:
<Info Server>\ASBServer\bin\AppServerAdmin.bat -was -user <new user name> -password <new password>

Tuesday, September 11, 2012

Interview Questions : Linux/Unix : Part-2


1: What are the 3 standard streams in Linux?
Output stream , represented as 0 , Input stream, represented as 1 and Error stream represented as 2.


2: I want to read all input to the command from file1 direct all output to file2 and error to file 3, how can I achieve this?
command <file1 0>file2 2>file3

Clear the Job log in MetaData Repository

Sometime having too much logs stored in the metadata repository can be a reason behind a jobs slow performance.

Periodic log clearance would not only improve the performance but would keep the metadata database healthy. 

Monday, September 10, 2012

Stopping and starting the DataStage server engine


Restart the IBM InfoSphere Information Server engine whenever you complete certain tasks such as editing the dsenv file on Linux or UNIX or modifying the uvconfig file.
  • Unix :
1.    Log in to the engine tier computer as the IBM InfoSphere DataStage® administrator (typically dsadm).
2.    Change to the engine directory and set the environment.
3.  cd $DSHOME
. ./dsenv

Friday, September 07, 2012

DataStage Installation in Silent mode



To start the installation of IBM InfoSphere DataStage in silent mode:

1.    Log in with the administrator rights.
2.    Browse to the <InfoSphere Information server install_image_directory>/is-suite directory.
3.    Run the following command to start the installation program:

Thursday, September 06, 2012

Starting and stopping WebSphere Application Server profiles


Starting a WebSphere Application Server profile


On Windows

To start a WebSphere Application Server profile, use any of these methods:
·         Go to Start > Programs > IBM WebSphere > Application Server V7.0 > Profiles > profile_name > Start the server.
·         Start the Windows service associated with the IBM WebSphere Application Server V7.0 profile.
·         Run this command: was_install_dir\bin\startServer.bat server1 –profileName profile
·         Run this command: profile_dir\bin\startServer.bat server1

Managing profiles on WebSphere Application Server V7



Listing WebSphere Application Server profiles

To list all profiles on a server:
  • On Windows: was_install_dir\bin\manageprofiles.bat –listProfiles
  • On UNIX/Linux:was_install_dir/bin/manageprofiles.sh –listProfiles