Datastage tool tutorial and PDF training Guides


It is tool set for designing, developing and running applications that populate one or more table in a data ware house or mart is a Datastage. They have 3 added benefits:

  1. Allows the researchers to have a private are where the information can be viewed only by themselves and their leaders. They can save or share the files that are available for the whole research group.
  2. Helps in annotating the file and the availability to access these files outside their homes too.
  3. They have the option to send data for permanent storage.

DataStage is one of the many extensively used extraction, transformation and loading (ETL) tools in the data warehousing industry. This tool can extract information from dissimilar sources, carry out transformations as per a business’s requirements and transfer the data into chosen data warehouses. It is widely used for development and maintenance of Datawarehouses and Datamarts.

A corporation can use Datastage in any of the following ways:

  • Integration of Data from different sources
  • Development and maintenance of datamarts and datawarehouse
  • Data Migration from various sources

DataStage is centralised filestore having three added advantages:

  1. Security controls which allow researchers to own a “private” area only having access to themselves and the leader of the group as well as “shared” and “collaborative” areas to load files of use to the entire research group.
  2. Web interface which allows users to annotate their files, and reach data from out of their “home” computer.
  3. An option to transfer data to a repository for long lasting storage.

DataStage has been reduced to the mere essentials, to be as inconspicuous as possible. There exists no “client” software to download, little needed metadata fields, and a file system that develops on formats the user should have already known.

Whatever your discipline (Computer Science, Chemistry, Mongolian Studies, Fine Art), DataStage would let you save, find and retrieve your data without disturbing your work. 

DataStage Parallel Extender has a parallel structure with which it processes data. The two major types of parallelism all pied in DataStage PX are partition parallelism and pipeline. The ability to process data in a parallel fashion hastens data processing to a great extent.

DataStage Parallel Extender makes use of a variety of stages through which source data is processed and reapplied into focus databases. These are explained in terms of terabytes. Besides stages, DataStage PX makes use of containers in order to reuse the job parts and stages to run and plan multiple jobs simultaneously.
The popularly used sequences in DataStage Parallel Extender are the following

  • Transformer
  • Aggregator
  • Data set
  • Copy
  • Change apply
  • Modify
  • Filter
  • Join
  • Merge
  • Look up

Datastage provides a GUI(Graphical User Interface) driven interface to carry out the Extract Transform Load work.

The ETL work is carried out through jobs. A DataStage job can be referred to as an implementable unit of work that can be gathered & executed individually or as a component of a stream data flow.

A job is made of various stages that are connected via links.

A stage serves many purposes, comparable to database stages to link to target systems and source, running stages to carry out many data transformations, file stages so as to link to many file systems and so on.

Links are used to bring together various stages in a job to describe the flow of data.


End users can connect to Datastage as a mapped drive such as Mac. Linux or Windows machine and also can be viewed as through a web interface. Whichever your department of work is, Datastage helps you to store, find and retrieve your data without any other problems coming in its ways. Three basic files mostly used for Datastage :PRIVATE-  basically those files which can be viewed by the main owner or the administration responsible for that particular file, SHARED-  visible to all group members where the format is read only. No editions can be made and last is COLLABORATIVE- where the files can be viewed by all members  of the group where here it can  be edited as well as read only.

PDF Tutorials

Also check the DataStage interview questions.