We have moved to www.dataGenX.net, Keep Learning with us.

Friday, August 30, 2013

Datastage Job Scheduler


Datastage includes a scheduling option, it does not have its own. DataStage doesn't include a "scheduler" so leverages the underlying O/S. For UNIX that means cron and a check of the crontab entries for the scheduling user will have what you need. DataStage leverages cron for recurring schedules and at for 'one off' schedules. For Windows, it uses scheduled tasks of Windows.

From the operating system command line, logged in as the scheduling user, a "crontab -l" will list the scheduled jobs.

Wednesday, August 28, 2013

14 design tips for better performance in Datastage


1) Avoid unnecessary type conversions: set the OSH_PRINT_SCHEMAS environment variable to verify that run time schemas match the job design column definitions. If you are using stage variables on a Transformer stage, ensure that their data types match the expected result types.

2) Use Transformer stages sparingly and wisely. Transformer stages can slow down your job. Do not have multiple stages where the functionality could be incorporated into a single stage, and use other stage types to perform simple transformation operations

Tuesday, August 27, 2013

Stage to OPerator in DataStage - the Real Driver


The parallel job stages are built on operators. Or Operators are individual parallel engine stages where the data actually flows. In a typical job flow, operators are end-points, and data sets are the links between the operators.

Each operator listed in the DUMP SCORE is a number of processes that are dependent on:
  • the job's established configuration file (APT_CONFIG_FILE)
  • constrained by the node pool settings
  • the operator configuration in the parallel engine code
  • Several environment variables, such as APT_DISABLE_COMBINATION, being set/unset.

Friday, August 23, 2013

DataStage BASIC functions


These functions can be used in a job control routine, which is defined as part of a jobs properties and allows other jobs to be run and controlled from the first job. Some of the functions can also be used for getting status information on the current job; these are useful in active stage expressions and before- and after-stage subroutines. 


Specify the job you want to control
DSAttachJob

Thursday, August 22, 2013

How to split source column into multiple target columns ( full name to first and Last)


Approach:

CREATE SET TABLE test
fullname varchar(30)
);


INSERT INTO test12 ('nitin raj');
INSERT INTO test12 ('nitin agarwal');
INSERT INTO test12 ('abhishek gupta');

Wednesday, August 21, 2013

Converting .rpm Packages To Debian/Ubuntu .deb Format With Alien




This article shows how you can convert .rpm packages to .deb packages with a tool called alien so that you can easily install them on Debian and Ubuntu systems. Sometimes this is quite convenient as not all software projects release their software as Debian/Ubuntu packages.

However, you should keep in mind what the alien man page says:

Tuesday, August 20, 2013

Oracle Interview Questions - Part-3


51. What is a database instance? Explain.
A database instance (Server) is a set of memory structure and background processes that access a set of database files. The processes can be shared by all of the users. The memory structure that is used to store the most queried data from database. This helps up to improve database performance by decreasing the amount of I/O performed against data file.

52. What is Parallel Server?
Multiple instances accessing the same database (only in multi-CPU environments)

Monday, August 19, 2013

Extracting font information from PDF file


In Adobe Acrobat Professional
  1. Open your PDF.
  2. Go to TOOLS -> Advanced Editing and select the "TouchUp Text Tool".
  3. Click on the text that you wish to extract the typeface from and a bounding box should appear.

Saturday, August 17, 2013

Search the string in the file : Linux/Unix


Here, I am going to share a small shell utility which will ask source file name and the pattern which need to be searched in file and give the formatted output with the line no with line. You can select the no of lines which display before and after the string.

Friday, August 16, 2013

Big Data - The Hadoop Data Warehouse - Part 1



Big data is going to change the way you do things in the future, how you gain insight, and make decisions. These videos help you get quickly up to speed on this technology and to show you the unique things IBM is doing to turn the freely available open source big data technology into a big data platform; there’s a major difference and the platform is comprised of leveraging the open source technologies (and never forking it) and marrying that to enterprise capabilities provided by a technology leader that understands the benefits a platform can provide.

Wednesday, August 14, 2013

Schema File in Datastage


Schema files and partial Schemas:

You can also specify the meta data for a stage in a plain text file known as a schema file. This is not stored in the Repository but you could, for example, keep it in a document management or source code control system, or publish it on an intranet site.

Tuesday, August 13, 2013

Error 39202 when logging into IBM DataStage


DataStage Client programs (Designer, Manager, Director) experience a connection failure, and the message displayed to the user is:

Failed to connect to host: xxxxx, project: UV
(Internal Error (39202))

Monday, August 12, 2013

WinSCP - Save all configurations in an INI file


You can configure WinSCP to save configurations to an INI file (instead of Windows Registry):
  • Open WinSCP and go to the Preferences section.
  • Next to "Other general options" click on the "Preferences" button.

Friday, August 09, 2013

DataStage Project Name With Space


It is a common problem which can cause a rework. So we should remember that during the installation we should not give a project name with spaces in it. If we give spaces in the name then the project will not be usable.



Thursday, August 08, 2013

Interview Questions : DataWareHouse - Part-1




What is Data Warehousing?
A data warehouse is the main repository of an organization’s historical data, its corporate memory. It contains the raw material for management’s decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems. Data warehousing collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. It is a repository of integrated information, available for queries and analysis.

Wednesday, August 07, 2013

Logical Data Model



A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Features of a logical data model include:
  • Includes all entities and relationships among them.
  • All attributes for each entity are specified.
  • The primary key for each entity is specified.
  • Foreign keys (keys identifying the relationship between different entities) are specified.
  • Normalization occurs at this level.

Monday, August 05, 2013

Orchadmin Command : DataStage


Orchadmin is a command line utility provided by datastage to research on data sets.

The general callable format is : $orchadmin <command> [options] [descriptor file]

1. Before using orchadmin, you should make sure that either the working directory or the $APT_ORCHHOME/etc  contains the file “config.apt” OR
The environment variable $APT_CONFIG_FILE  should be defined for your session.