Datastage Interview Q & A

Datastage Interview

Part 1:

1) Define Data Stage?

A data stage is basically a tool that is used to design, develop and execute various applications to fill multiple tables in data warehouse or data marts. It is a program for Windows servers that extracts data from databases and change them into data warehouses. It has become an essential part of IBM WebSphere Data Integration suite.

2) Explain how a source file is populated?

We can populate a source file in many ways such as by creating a SQL query in Oracle, or by using row generator extract tool etc.

3) Name the command line functions to import and export the DS jobs?

To import the DS jobs, dsimport.exe is used and to export the DS jobs, dsexport.exe is used.

4) What is the difference between Data stage 7.5 and 7.0?

In Data stage 7.5 many new stages are added for more robustness and smooth performance, such as Procedure Stage, Command Stage, and Generate Report etc.

5) In Data stage, how you can fix the truncated data error?

The truncated data error can be fixed by using ENVIRONMENT VARIABLE ‘IMPORT_REJECT_STRING_FIELD_OVERRUN’.

6) Define Merge?

Merge means to join two or more tables. The two tables are joined on the basis of Primary key columns in both the tables.

7) Differentiate between data file and descriptor file?

As the name implies, data files contains the data and the descriptor file contains the description/information about the data in the data files.

8) Differentiate between data stage and informatica?

Datastage

In datastage, there is a concept of partition, parallelism for node configuration. While, there is no concept of partition and parallelism in informatica for node configuration. Also, Informatica is more scalable than Datastage. Datastage is more user-friendly as compared to Informatica.

9) Define Routines and their types?

Routines are basically collection of functions that is defined by DS manager. It can be called via transformer stage. There are three types of routines such as, parallel routines, main frame routines and server routines.

10) How can you write parallel routines in datastage PX?

We can write parallel routines in C or C++ compiler. Such routines are also created in DS manager and can be called from transformer stage.

11) What is the method of removing duplicates, without the remove duplicate stage?

Duplicates can be removed by using Sort stage. We can use the option, as allow duplicate = false.

12) What steps should be taken to improve Datastage jobs?

In order to improve performance of Datastage jobs, we have to first establish the baselines. Secondly, we should not use only one flow for performance testing. Thirdly, we should work in increment. Then, we should evaluate data skews. Then we should isolate and solve the problems, one by one. After that, we should distribute the file systems to remove bottlenecks, if any. Also, we should not include RDBMS in start of testing phase. Last but not the least, we should understand and assess the available tuning knobs.

13) Differentiate between Join, Merge and Lookup stage?

All the three concepts are different from each other in the way they use the memory storage, compare input requirements and how they treat various records. Join and Merge needs less memory as compared to the Lookup stage.

14) Explain Quality stage?

Quality stage is also known as Integrity stage. It assists in integrating different types of data from various sources.

15) Define Job control?

Job control can be best performed by using Job Control Language (JCL). This tool is used to execute multiple jobs simultaneously, without using any kind of loop.

16) Differentiate between Symmetric Multiprocessing and Massive Parallel Processing?

In Symmetric Multiprocessing, the hardware resources are shared by processor. The processor has one operating system and it communicates through shared memory. While in Massive Parallel processing, the processor access the hardware resources exclusively. This type of processing is also known as Shared Nothing, since nothing is shared in this. It is faster than the Symmetric Multiprocessing.

17) What are the steps required to kill the job in Datastage?

To kill the job in Datasatge, we have to kill the respective processing ID.

18) Differentiate between validated and Compiled in the Datastage?

In Datastage, validating a job means, executing a job. While validating, the Datastage engine verifies whether all the required properties are provided or not. In other case, while compiling a job, the Datastage engine verifies that whether all the given properties are valid or not.

19) How to manage date conversion in Datastage?

We can use date conversion function for this purpose i.e. Oconv (Iconv (Fieldname,”Existing Date Format”),”Another Date Format”).

20) Why do we use exception activity in Datastage?

All the stages after the exception activity in Datastage are executed in case of any unknown error occurs while executing the job sequencer.

21) Define APT_CONFIG in Datastage?

It is the environment variable that is used to identify the *.apt file in Datastage. It is also used to store the node information, disk storage information and scratch information.

22) Name the different types of Lookups in Datastage?

There are two types of Lookups in Datastage i.e. Normal lkp and sparse lkp. In Normal lkp, the data is saved in the memory first and then the lookup is performed. In Sparse lkp, the data is directly saved in the database. Therefore, the sparse lkp is faster than the Normal lkp.

23) How a server job can be converted to a parallel job?

We can convert a server job in to a parallel job by using IPC stage and Link Collector.

24) Define Repository tables in Datastage?

In Datastage, the Repository is another name for a data warehouse. It can be centralized as well as distributed.

25) Define OConv () and IConv () functions in Datastage?

In Datastage, OConv () and IConv () functions are used to convert formats from one format to another i.e. conversions of roman numbers, time, date, radix, numeral ASCII etc. IConv () is basically used to convert formats for system to understand. While, OConv () is used to convert formats for users to understand.

26) Explain Usage Analysis in Datastage?

In Datastage, Usage Analysis is performed within few clicks. Launch Datastage Manager and right click the job. Then, select Usage Analysis and that’s it.

27) How do you find the number of rows in a sequential file?

To find rows in sequential file, we can use the System variable @INROWNUM.

28) Differentiate between Hash file and Sequential file?

The only difference between the Hash file and Sequential file is that the Hash file saves data on hash algorithm and on a hash key value, while sequential file doesn’t have any key value to save the data. Basis on this hash key feature, searching in Hash file is faster than in sequential file.

29) How to clean the Datastage repository?

We can clean the Datastage repository by using the Cleanup Resources functionality in the Datastage Manager.

30) How a routine is called in Datastage job?

In Datastage, routines are of two types i.e. Before Sub Routines and After Sub Routines. We can call a routine from the transformer stage in Datastage.

31) Differentiate between Operational Datastage (ODS) and Data warehouse?

We can say, ODS is a mini data warehouse. An ODS doesn’t contain information for more than 1 year while a data warehouse contains detailed information regarding the entire business.

32) NLS stands for what in Datastage?

NLS means National Language Support. It can be used to incorporate other languages such as French, German, and Spanish etc. in the data, required for processing by data warehouse. These languages have same scripts as English language.

33) Can you explain how could anyone drop the index before loading the data in target in Datastage?

In Datastage, we can drop the index before loading the data in target by using the Direct Load functionality of SQL Loaded Utility.

34) Does Datastage support slowly changing dimensions?

Yes. Version 8.5 + supports this feature

35) How can one find bugs in job sequence?

We can find bugs in job sequence by using DataStage Director.

36) How complex jobs are implemented in Datastage to improve performance?

In order to improve performance in Datastage, it is recommended, not to use more than 20 stages in every job. If you need to use more than 20 stages then it is better to use another job for those stages.

37) Name the third party tools that can be used in Datastage?

The third party tools that can be used in Datastage, are Autosys, TNG and Event Co-coordinator. I have worked with these tools and possess hands on experience of working with these third party tools.

38) Define Project in Datastage?

Whenever we launch the Datastage client, we are asked to connect to a Datastage project. A Datastage project contains Datastage jobs, built-in components and Datastage Designer or User-Defined components.

39) How many types of hash files are there?

There are two types of hash files in DataStage i.e. Static Hash File and Dynamic Hash File. The static hash file is used when limited amount of data is to be loaded in the target database. The dynamic hash file is used when we don’t know the amount of data from the source file.

40) Define Meta Stage?

In Datastage, Meta Stage is used to save metadata that is helpful for data lineage and data analysis.

41) Have you have ever worked in UNIX environment and why it is useful in Datastage?

Yes, I have worked in UNIX environment. This knowledge is useful in Datastage because sometimes one has to write UNIX programs such as batch programs to invoke batch processing etc.

42) Differentiate between Datastage and Datastage TX?

Datastage is a tool from ETL (Extract, Transform and Load) and Datastage TX is a tool from EAI (Enterprise Application Integration).

43) What is size of a transaction and an array means in a Datastage?

Transaction size means the number of row written before committing the records in a table. An array size means the number of rows written/read to or from the table respectively.

44) How many types of views are there in a Datastage Director?

There are three types of views in a Datastage Director i.e. Job View, Log View and Status View.

45) Why we use surrogate key?

In Datastage, we use Surrogate Key instead of unique key. Surrogate key is mostly used for retrieving data faster. It uses Index to perform the retrieval operation.

46) How rejected rows are managed in Datastage?

In the Datastage, the rejected rows are managed through constraints in transformer. We can either place the rejected rows in the properties of a transformer or we can create a temporary storage for rejected rows with the help of REJECTED command.

47) Differentiate between ODBC and DRS stage?

DRS stage is faster than the ODBC stage because it uses native databases for connectivity.

48) Define Orabulk and BCP stages?

Orabulk stage is used to load large amount of data in one target table of Oracle database. The BCP stage is used to load large amount of data in one target table of Microsoft SQL Server.

49) Define DS Designer?

The DS Designer is used to design work area and add various links to it.

50) Why do we use Link Partitioner and Link Collector in Datastage?

In Datastage, Link Partitioner is used to divide data into different parts through certain partitioning methods. Link Collector is used to gather data from various partitions/segments to a single data and save it in the target table.

 

 

Part 2:

  1. Mention DataStage characteristics.

Criteria

Characteristics

Support for Big Data Hadoop

Access Big Data on a distributed file system, JSON support, and JDBC integrator

Ease of use

Improve speed, flexibility, and efficacy for data integration

Deployment

On-premise or cloud as the need dictates

  1. What is IBM DataStage?

DataStage is an extract, transform, and load tool that is part of the IBM Infosphere suite. It is a tool that is used for working with large data warehouses and data marts for creating and maintaining a data repository.

  1. How is a DataStage source file filled?

We can develop a SQL query or we can use a row generator extract tool through which we can fill the source file in DataStage.

  1. How is merging done in DataStage?

In DataStage, merging is done when two or more tables are expected to be combined based on their primary key column.

  1. What are data and descriptor files?

Both these files are serving different purposes in DataStage. A descriptor file contains all the information or description, while a data file is the one that just contains data.

  1. How is DataStage different from Informatica?

DataStage and Informatica are both powerful ETL tools, but there are a few differences between the two. DataStage has parallelism and partition concepts for node configuration; whereas in Informatica, there is no support for parallelism in node configuration. Also, DataStage is simpler to use as compared to Informatica.

  1. What is a routine in DataStage?

DataStage Manager defines a collection of functions within a routine. There are basically three types of routines in DataStage, namely, job control routine, before/after subroutine, and transform function.

  1. What is the process for removing duplicates in DataStage?

Duplicates in DataStage can be removed using the sort function. While running the sort function, we need to specify the option which allows for duplicates by setting it to false.

  1. What is the difference between join, merge, and lookup stages?

The fundamental difference between these three stages is the amount of memory they take. Other than that how they treat the input requirement and the various records are also factors that differentiate one another. Based on the memory usage, the lookup stage uses a very less amount of memory. Both lookup and merge stages use a huge amount of memory.

  1. Explain how a source file is populated?

We can populate a source file in several ways like by using a row generator extract tool, or creating a SQL query in Oracle, etc.

  1. How to convert a server job to a parallel job in DataStage?

Using a Link collector and an IPC collector we can convert the server job to a parallel job.

  1. What is an HBase connector?

An HBase connector in DataStage is a tool used to connect databases and tables present in the HBase database. It is used to perform tasks like:

  1. Reading data in the parallel mode.
  2. Read and write data from and to the HBase database. 
  3. Using HBase as a view table
  4. What is the quality state in DataStage?

The quality state is used for cleansing the data with the DataStage tool. It is a client-server software tool that is provided as part of the IBM Information Server.

  1. What is job control in DataStage?

This tool is used for controlling a job or executing multiple jobs in a parallel manner. It is deployed using the Job Control Language within the IBM DataStage tool.

  1. How to do DataStage jobs performance tuning?

First, we have to select the right configuration files. Then, we need to select the right partition and buffer memory. We have to deal with the sorting of data and handling null-time values. We need to try to use modify, copy, or filter instead of the transformer. Reduce the propagation of unnecessary metadata between various stages.

  1. What is a repository table in DataStage?

The term ‘repository’ is another name for a data warehouse. It can be centralized or distributed. The repository table is used for Ansing ad-hoc, historical, analytical, or complex queries.

  1. Compare massive parallel processing with symmetric multiprocessing.

In massive parallel processing, many computers are present in the same chassis. While in the symmetric multiprocessing, there are many processors that share the same hardware resources. Massive parallel processing is called ‘shared nothing’ as there is no aspect between various computers. And it is faster than the symmetric multiprocessing.

  1. How can we kill a DataStage job?

To kill a DataStage job, we need to first kill the individual processing ID so that this ensures that the DataStage is killed.

  1. How do we compare the Validated OK with the Compiled Process in DataStage?

The Compiled Process ensures that the important stage parameters are mapped and these are correct such that it creates an executable job. Whereas in the Validated OK, we make sure that the connections are valid.

  1. Explain the feature of data type conversion in DataStage.

If we want to do data conversion in DataStage, then we can use the data conversion function. For this to be successfully executed, we need to ensure that the input or the output to and from the operator is the same, and the record schema needs to be compatible with the operator.

  1. What is the significance of the exception activity in DataStage?

Whenever there is an unfamiliar error happening while executing the job sequencer, all the stages after the exception activity are run. So, this makes the exception activity so important in the DataStage.

  1. What is the difference between Datastage 7.5 and 7.0?

Datastage 7.5 is more robust and performs smoothly due to many new stages which are added, such as Procedure Stage, Generate Report, Command Stage, etc.

  1. Describe the DataStage architecture briefly.

IBM DataStage preaches a client-server model as its architecture and has different types of architecture for its various versions. The different components of the client-server architecture are :

  • Client components 
  • Servers
  • Stages 
  • Table definitions
  • Containers
  • Projects
  • Jobs
  1. What are the main features of the Flow Designer?

The main features of the Flow Designer are: 

  1. There is no need to migrate the jobs to use the flow designer.
  2. It is very useful to perform jobs with a large number of stages.
  3. We can use the provided palette to add and remove connectors and operators on the designer canvas using the drag and drop feature.
  4. Name the command line functions to import and export the DS jobs?

The dsimport.exe function is used to import the DS jobs, and to export the DS jobs, dsexport.exe is used.

  1. What are the various types of lookups in DataStage?

There are different types of lookups in DataStage. These include normal, sparse, range, and caseless lookups.

  1. How can we run a job using the command line in DataStage?

The command for running a job using the command line in DataStage: ds job -run -job status <project name> <jobname>

  1. When do we use a parallel job and a server job?

Using the parallel job or a server job depends on the processing need, functionality, time to implement, and cost. The server job usually runs on a single node, it executes on a DataStage Server Engine and handles small volumes of data. The parallel job runs on multiple nodes; it executes on a DataStage Parallel Engine and handles large volumes of data.

  1. What is Usage Analysis in DataStage?

If we want to check whether a certain job is part of the sequence, then we need to right-click on the Manager on the job and then choose the Usage Analysis.

  1. How to find the number of rows in a sequential file?

For counting the number of rows in a sequential file, we should use the @INROWNUM variable.

  1. What is the difference between a sequential file and a hash file?

The hash file is based on a hash algorithm, and it can be used with a key value. The sequential file, on the other hand, does not have any key-value column. The hash file can be used as a reference for a lookup, while a sequential file cannot be used for a lookup. Due to the presence fo the hash key, the hash file is easier to search than a sequential file.

  1. How do we clean a DataStage repository?

For cleaning a DataStage repository, we have to go to DataStage Manager > Job in the menu bar > Clean up Resources.

If we want to further remove the logs, then we need to go to the respective jobs and clean up the log files.

  1. How do we call a routine in DataStage?

Routines are stored in the Routine branch of the DataStage repository. This is where we can create, view, or edit all the Routines. The Routines in DataStage could be the following: Job Control Routine, Before-after Subroutine, and Transform function.

  1. What is the difference between an Operational DataStage and a Data Warehouse?

An Operational DataStage can be considered as a staging area for real-time analysis for user processing; thus it is a temporary repository. Whereas, the data warehouse is used for long-term data storage needs and has the complete data of the entire business.

  1. What does NLS mean in DataStage?

NLS means National Language Support. This means we can use this IBM DataStage tool in various languages like multi-byte character languages (Chinese or Japanese). We can read and write in any language and process it as per the requirement.

  1. In Datastage, how you can fix the truncated data error?

The truncated data error can be fixed by using ENVIRONMENT VARIABLE ‘IMPORT_REJECT_STRING_FIELD_OVERRUN’.

  1. What is a Hive connector?

A Hive connector is a tool to support partition modes while reading the data. This can be done in two ways:

  1. modulus partition mode
  2. minimum-maximum partition mode

 

Part 3:

Q 1) What is Datastage?

Ans: Datastage is an ETL tool given by IBM which utilizes a GUI to design data integration solutions. This was the first ETL tool that gave a parallelism concept.

 

It is available in following 3 different editions

  • Server Edition
  • Enterprise Edition
  • MVS Edition
  •  

Q 2) Highlight the main features of Datastage?

Ans: The main features of Datastage are highlighted below:

  • It is the data integration component of the IBM Info sphere information server.
  • It is a GUI based tool. We just need to drag and drop the Datastage objects and we can convert it to Datastage code.
  • It is used to perform the ETL operations (Extract, Transform, Load)
  • It provides connectivity to multiple sources & multiple targets at the same time
  • It provides partitioning and parallels processing techniques that enable the Datastage jobs to process a huge volume of data quite faster.
  • It has enterprise-level connectivity.

 

Q 3) What are the primary usages of the Datastage tool?

Ans: Datastage is an ETL tool that is primarily used for extracting data from source systems, transforming that data and finally loading it to target systems.

 

Q 4) What are the main differences you have observed between 7.x and 8.x version of DataStage?

Ans: Here are the main differences between both the versions

7.x

8.x

7.x version was platform dependent

This version is platform independent

It has 2-tier architecture where datastage is built on top of Unix server

It has 3-tier architecture where we have UNIX server database at the bottom then XMETA database which acts as a repositorty and then we have datastage on top.

There is no concept of parameter set

We have parameter sets which can be used anywhere in the project.

We had designer and manager as two separate clients

In this version, the manager client was merged into designer client

We had to manually search for the jobs in this version

Here we have quick find option in the repository where we can search easily for the jobs.

 

 

Q 5) Can you highlight the main features of the IBM Info sphere information server?

Ans: The main features of IBM Info sphere information server suite are:

  • It provides a single platform for data integration. It has the capability to connect to multiple source systems as well as write to multiple target systems.
  • It is based on centralized layers. All the components of the suite are able to share the baseline architecture of the suite.
  • It has layers for the unified repository, for integrated metadata services and a common parallel engine.
  • It provides tools for analysis, cleansing, monitoring, transforming and delivering data.
  • It has massively parallel processing capabilities. It turns out the processing to be very fast.

 

Q 6) What are the different layers in the information server architecture?

Ans: Below are the different layers of information server architecture

  • Unified user interface
  • Common services
  • Unified parallel processing
  • Unified Metadata
  • Common connectivity

 

Q 7) What could be a data source system?

Ans: It could be a database table, a flat-file, or even an external application like people soft.

 

Q 8) On which interface you will be working as a developer?

Ans: As a Datastage developer, we work on the Datastage client interface which is known as a Datastage designer that needs to be installed on the local system. In the backend, it is connected to the Datastage server.

 

Q 9) What are the different common services in Datastage?

Ans: Below is the list of common services in Datastage:

  • Metadata services
  • Unified service deployment
  • Security services
  • Looping and reporting services.

 

Q 10) how do you start developing a Datastage project?

Ans: The very first step is to create a Datastage job on the Datastage server. All the Datastage objects that we create are stored in the Datastage project. A Datastage project is a separate environment on the server for jobs, tables, definitions, and routines.

A Datastage project is a separate environment on the server for jobs, tables, definitions, and routines.

Q 11) what is a DataStage job?

Ans: The Datastage job is simply a DataStage code that we create as a developer. It contains different stages linked together to define data and process flow.

Stages are nothing but the functionalities that get implemented.

For Example: Let’s assume that I want to do a sum of the sales amount. This can be a ‘group by’ operation that will be performed by one stage.

Now, I want to write the result to a target file. So, this operation will be performed by another stage. Once, I have defined both the stages, I need to define the data flow from my ‘group by’ stage to the target file stage. This data flow is defined by DataStage links.

Once, I have defined both the stages, I need to define the data flow from my ‘group by’ stage to the target file stage. This data flow is defined by DataStage links.

Q 12) what are DataStage sequences?

Ans: Datastage sequence connects the DataStage jobs in a logical flow.

 

Q 13) if you want to use the same piece of code in different jobs, how will you achieve this?

Ans: This can be done by using shared containers. We have shared containers for reusability. A shared container is a reusable job element consisting of stages and links.  We can call a shared container in different Datastage jobs.

 

Q 14) where do the Datastage jobs get stored?

Ans: The Datastage jobs get stored in the repository. We have various folders in which we can store the Datastage jobs.

 

Q 15) where do you see different stages in the designer?

Ans: All the stages are available within a window called Palette. It has various categories depending upon the kind of function that the stage provides.

The various categories of stages in the Palette are – General, Data Quality, Database, Development, File, Processing, etc.

Q 16) what are the Processing stages?

Ans: The processing stages allow us to apply the actual data transformation.

For example, the ‘aggregator’ stage under the Processing category allows us to apply all the ‘group by’ operations. Similarly, we have other stages in Processing like the ‘Join’ stage that allows us to join together the data coming from two different input streams.

 

Q 17) what are the steps needed to create a simple basic Datastage job?

Ans: Click on the File -> Click on New -> Select Parallel Job and hit Ok. A parallel job window will open up. In this Parallel job, we can put together different stages and define the data flow between them.  The simplest DataStage job is an ETL job.

In this, we first need to extract the data from the source system for which we can use either a file stage or database stage because my source system can either be a database table or a file.

Suppose we are reading data from a text file. In this case, we will drag and drop the ‘Sequential File’ stage to the parallel job window. Now, we need to perform some transformation on top of this data. We will use the ‘Transformer’ stage which is available under the Processing category. We can write any logic under the Transformer stage.

Finally, we need to load the processed data to some target table. Let’s say my target database is DB2. So, for this, we will select the DB2 connector stage. Then we will be connecting these data states through sequential links.

After this, we need to configure the stages so that they point to the correct filesystem or database.

For example, For the Sequential file stage, we need to define the mandatory parameters like the file name, file location, column metadata.

 

Then we need to compile the Datastage job. Compiling the job checks for the syntax of the job and creates an executable file for the Datastage job that can be executed at run time.

Q 18) Name the different sorting methods in Datastage.

Ans: There are two methods available:

  • Link sort
  • Inbuilt Datastage Sort

 

Q 19) In a batch if a job fails in between and you want to restart the batch from that particular job and not from the scratch then what will you do?

Ans: In Datastage, there is an option in job sequence – ‘Add checkpoints so the sequence is restart-able on failure’. If this option is checked, then we can rerun the job sequence from the point where it failed.

 

Q 20) how do you import and export the Datastage jobs?

Ans: For this, below command-line functions for this

  • Import: dsimport.exe
  • Export: dsexport.exe

 

Q 21) what are routines in Datastage? Enlist various types of routines.

Ans: Routine is a set of functions that are defined by the DS manager. It is run via the transformer stage.

There are 3 kinds of routines:

  • Parallel routines
  • Mainframe routines
  • Server routines

 

Q 22) how do you remove duplicate values in DataStage?

Ans: There are two ways to handle duplicate values

  • We can use remove duplicate stage to eliminate duplicates.
  • We can use the Sort stage to remove duplicate. The sorting stage has a property called ‘allow duplicates’. We won’t get duplicate values in the output of sort on setting this property equal to false.

 

Q 23) what are the different kinds of views available in a Datastage director?

Ans: There are 3 kinds of views available in the Datastage director. They are:

  • Log view
  • Status view
  • Job view

 

Q 24) Distinguish between Informatica & Datastage. Which one would you choose and why?

Ans: Both Informatica and DataStage are powerful ETL tools.

Enlisted points differentiate between both the tools:

 

Informatica

Datastage

Parallel Processing

Informatica does not support parallel processing.

In contrast to this, datastage provides mechanism for parallel processing.

Implementing SCDs

It is quite simple to implement SCDs (Slowly changing dimensions) in Informatica.

However, it is complex to implement SCDs in datastage. Datastage supports SCDs merely through custom scripts.

Version Control

Informatica supports version controlling through check-in and check-out of objects.

However, we don’t have this functionality available in datastage.

Available Transformations

Lesser transformations are available.

Datastage offers more variety of transformations than Informatica.

Power of lookup

Informatica provides very powerful dynamic cache lookup

We don’t have any similar thing in datastage.

In my personal opinion, I would go with Informatica over Datastage. The reason being I have found Informatica more systematic and user-friendly than DataStage.

Another strong reason is that debugging and error handling is much better in Informatica as compared to Datastage. So, fixing issues become easier in Informatica. Datastage does not provide complete error handling support.

Q 25) Give an idea of system variables.

Ans: System variables are the read-only variables beginning with ‘@’ which can be read by either the transformer stage or routine. They are used to get the system information.

 

Q 26) what is the difference between passive stage and active stage?

Ans: Passive stages are utilized for extraction and loading whereas active stages are utilized for transformation.

 

Q 27) what are the various kinds of containers available in Datastage?

Ans: We have below 2 containers in Datastage:

  • Local container
  • Shared container

 

Q 28) is the value of staging variable stored temporarily or permanently?

Ans: Temporarily. It is a temporary variable.

 

Q 29) what are the different types of jobs in Datastage?

Ans: We have two types of jobs in Datastage:

  • Server jobs (They run in a sequential manner)
  • Parallel jobs (They get executed in a parallel way)

 

Q 30) what is the use of Datastage director?

Ans: Through Datastage director, we can schedule a job, validate the job, execute the job and monitor the job.

 

Q 31) what are the various kinds of the hash file?

Ans: We have 2 types of hash files:

  • Static hash file
  • Dynamic hash file

 

Q 32) what is a quality stage?

Ans: The quality stage (also called as integrity stage) is a stage that aids in combining the data together coming from different sources.

 

Part 4:

  1. Explain Data Stage?

Ans:

A data stage is simply a tool which is used to design, develop and execute many applications to fill various tables in data warehouse or data marts. Learn more about DataStage in this insightful blog post now.

 

  1. Tell How A Source File Is Populated?

Ans:

We can generate a source file in various ways such as by making a SQL query in Oracle, or by using row generator extract tool etc.

 

  1. Write The Command Line Functions To Import And Export The Ds Jobs?

Ans:

To signify the DS jobs, dsimport.exe is used and to export the DS jobs, dsexport.exe is used.

 

  1. Differentiate Between Datastage 7.5 And 7.0?

Ans:

In Datastage 7.5 various new stages are added for more sturdiness and smooth performance, such as Procedure Stage, Command Stage, etc.

 

  1. Explain Merge?

Ans:

Merge means to merge two or more tables. The two tables are merged on the origin of Primary key columns in both the tables.

 

  1. Differentiate Between Data File And Descriptor File?

Ans:

As the name says, data files contains the data and the descriptor file contains the information about the data in the data files.

 

  1. Q 7. Differentiate Between Data Stage And Informatica?

Ans:

In datastage, there is a perception of separation, parallelism for node configuration. While, there is no perception of separation and parallelism in informatica for node configuration. Also, Informatica is more scalable than Datastage. Datastage is more easy to use as compared to Informatica.

 

  1. Explain Routines And Their Types?

Ans:

Routines are basically group of functions that is described by DS manager. It can be called through transformer stage. Routines are of three types such as, parallel routines, server routines and main frame routines.

 

  1. How Can We Write Parallel Routines In Data Stage Px?

Ans:

We can mention parallel routines in C or C++ compiler. Such routines are also developed in DS manager and can be called from transformer stage.

 

  1. What Is The Procedure Of Removing Duplicates, Without The Remove Duplicate Stage?

Ans:

Duplicates can be detached by using Sort stage. We can use the opportunity, as allow duplicate = false.

 

  1. What Steps Should Be Taken To Recover Datastage Jobs?

Ans:

In order to recover presentation of Datastage jobs, we have to first create the baselines. Secondly, we should not use only one flow for presentation testing. Thirdly, we should work in growth. Then, we should appraise data skews. Then we should separate and solve the problems, one by one. After that, we should allocate the file systems to take away bottlenecks, if any. Also, we should not embrace RDBMS in start of testing phase. Last but not the least, we should understand and evaluate the available tuning knobs.

 

  1. Compare And Contrast Between Join, Merge And Lookup Stage?

Ans:

All the three are dissimilar from each other in the way they use the memory storage, compare input necessities and how they treat various data. Join and Merge needs minimum memory as compared to the Lookup stage.

 

  1. Describe Quality Stage?

Ans:

Quality stage is also called as Integrity stage. It assists in integrating various types of data from different sources.

 

  1. Describe Job Control?

Ans:

Job control can be best performed by using Job Control Language (JCL). This tool is used to execute various jobs concurrently, without using any kind of loop.

 

  1. Contrast between Symmetric Multiprocessing and Massive Parallel Processing?

Ans:

In Symmetric Multiprocessing, the hardware resources are communal by processor. The processor has one operating system and it communicates through shared memory. While in Massive Parallel processing, the CPU contact the hardware resources completely. This type of processing is also called as Shared Nothing, as nothing is common in this. It is quicker than the Symmetric Multiprocessing.

 

  1. Write The Steps Required To Kill The Job In Datastage?

Ans:

To destroy the job in Datasatge, we have to kill the individual processing ID.

 

  1. Contrast Between Validated And Compiled In The Datastage?

Ans:

In Datastage, validating a job means, executing a job. While validating, the Datastage engine checks whether all the necessary properties are given or not. In other case, while compiling a job, the Datastage engine checks that whether all the given property are suitable or not.

 

  1. How We Can Run Date Conversion In Datastage?

Ans:

We can use date conversion function for this reason i.e. Oconv (Iconv (Fieldname, “Existing Date Format”),”Another Date Format”).

 

  1. What Is The Need Of Exception Activity In Datastage?

Ans:

All the stages after the exception activity in Datastage are run in case of any unfamiliar error occurs while executing the job sequencer. Learn how the DataStage Training Videos can take your career to the next level!

 

  1. Explain Apt_config In Datastage?

Ans:

It is the environment variable which is used to recognize the *.apt file in Datastage. It is also used to keep the node information, scratch information and disk storage information.

 

  1. Write The Different Types Of Lookups In Datastage?

Ans:

There are two types of Lookups in Datastage i.e. Normal lookup and sparse lookup.

 

  1. How We Can Covert Server Job To A Parallel Job?

Ans:

We can convert a server job in to a parallel job by using Link Collector and IPC Collector.

 

  1. Explain Repository Tables In Datastage?

Ans:

In Datastage, the Repository is second name for a data warehouse. It can be federalized as well as circulated.

 

  1. Describe Oconv () and Iconv () Functions In Datastage?

Ans:

In Datastage, OConv () and IConv () functions are used to convert formats from one format to another i.e. conversions of time, roman numbers, radix, date, numeral ASCII etc. IConv () is mostly used to change formats for system to understand. While, OConv () is used to change formats for users to understand.

 

  1. Define Usage Analysis In Datastage?

Ans:

In Datastage, Usage Analysis is done within few clicks. Launch Datastage Manager and right click on job. Then, select Usage Analysis.

 

  1. How We Can Find The Number Of Rows In A Sequential File?

Ans:

To find rows in chronical file, we can use the System variable @INROWNUM.

 

  1. Contrast between Hash File and Sequential File?

Ans:

The only dissimilarity between the Hash file and Sequential file is that the Hash file stores data on hash algorithm and on a hash key value, while sequential file doesn’t have any key value to save the data. Hence we can say that hash key feature, searching in Hash file is faster than in sequential file.

 

  1. How We Can Clean The Datastage Repository?

Ans:

We can clean the Datastage repository via the Cleanup Resources functionality in the Datastage Manager.

 

  1. How We Can Called Routine In Datastage Job?

Ans:

We can call a routine from the transformer stage in Datastage job.

 

  1. Differentiate between Operational Datastage (ods) And Data Warehouse?

Ans:

We can say, ODS is a small data warehouse. An ODS doesn’t have information for more than 1 year while a data warehouse have detailed information about the entire business.

 

  1. For What Nls Stand For In Datastage?

Ans:

NLS stand for National Language Support. It can be used to integrate various languages such as French, German, and Spanish etc. in the data, requisite for processing by data warehouse.

 

  1. Can You Explain How Could Anyone Crash The Index Before Loading The Data In Target In Datastage?

Ans:

In Datastage, we can crash the index before loading the data in target by using the Direct Load functionality of SQL Loaded Utility.

 

  1. Does Datastage Support Gradually Changing Dimensions?

Ans:

Yes, Version 8.5 + supports this feature in datastage.

 

  1. How Complicated Jobs Are Implemented In Datstage To Recover Performance?

Ans:

In order to recover performance in Datastage, it is suggested, not to use more than 20 stages in every job. If you need to use more than 20 stages then it is advisable to use next job for those stages.

 

  1. Name The Third Party Tools That Can Be Used In Datastage?

Ans:

The third party tools that can be used in Datastage, are Autosys, TNG and Event Co-coordinator.

 

  1. Describe Project In Datastage?

Ans:

Whenever we begin the Datastage client, we are asked to join to a Datastage project. A Datastage project have Datastage jobs, built-in apparatus and Datastage Designer or User-Defined components.

 

  1. What Types Of Hash Files Are There?

Ans:

There are two types of hash files in which are Static Hash File and Dynamic Hash File.

 

  1. Describe Meta Stage?

Ans:

In Datastage, MetaStage is used to store metadata that is beneficial for data lineage and data analysis.

 

  1. Why UNIX Environment Is Useful In Datastage?

Ans:

It is useful in Datastage because sometimes one has to write UNIX programs such as batch programs to raise batch processing etc.

 

  1. Contrast between Datastage and Datastage TX?

Ans:

Datastage is a tool from ETL i.e. Extract, Transform and Load and Datastage TX is a tool from EAI i.e. Enterprise Application Integration. Learn more about the ETL process in this insightful blog now.

 

  1. What Is Size Of A Transaction And An Array Means In A Datastage?

Ans:

Transaction size means the number of row written before committing the account in a table. An array size means the number of rows written/read to or from the table respectively.

 

  1. Name The Various Types Views In A Datastage Director?

Ans:

There are three types of views in a Datastage Director i.e. Log View, Job View and Status View.

 

  1. What Is The Use Of Surrogate Key?

Ans:

Surrogate key is mostly used for getting data faster. It uses catalog to perform the retrieval operation.

 

  1. How Discarded Rows Are Processed In Datastage?

Ans:

In the Datastage, the discarded rows are managed by constraints in transformer. We can either place the discarded rows in the properties of a transformer or we can create a brief storage for discarded rows with the help of REJECTED command.

 

  1. Contrast between Odbc and Drs Stage?

Ans:

DRS stage is faster than the ODBC stage because it uses local databases for connectivity.

 

  1. Describe Orabulk And Bcp Stages?

Ans:

Orabulk stage is used to store big amount of data in one target table of Oracle database. The BCP stage is used to store big amount of data in one target table of Microsoft SQL Server.

 

  1. Describe Ds Designer?

Ans:

The DS Designer is used to make work area and add many links to it.

  1. What Is The Need Of Link Partitioner And Link Collector In Datastage?

Ans:

In Datastage, Link Partitioner is used to split data into various parts by certain partitioning methods. Link Collector is used to collect data from many partitions to a single data and save it in the target table.

 

 

By bpci