Part 1:

  1. What is ETL?

ETL stands for Extract-Transform-Load, and it is a process of how data is loaded from the target system to the data warehouse. Data is extracted from a database and transformed to match according to the data warehouse schema. After that, it is loaded back to the data warehouse database in the form of dimension and fact tables.

  1. What is SAP BODS?

SAP BODS is an ETL tool for extracting data from disparate systems, transform data into meaningful information, and load data in a data warehouse. It is designed to deliver enterprise-class solutions for data integration, data quality, data processing and data profiling. The full form of SAP BODS is Business Objects Data Services.

History of SAP BODS

Here, are famous landmarks in the history of SAP BODS:

  • In 2002, BusinessObjects rebranded the two software products of Acta as BusinessObjects Data Quality tool and BusinessObjects Data Integration tool.
  • Business Objects acquired by SAP in 2007 and both of these products were renamed as SAP BODQ and SAP BODI.
  • In 2008, SAP merged both of these products in a single tool and renamed it as SAP Business Objects Data Services (BODS)

SAP BODS Architecture

Here, are some basic components of SAP BODS architecture:

SAP BODS Architecture Diagram

Repository:

A repository is a set of a table which holds user-created and predefined system object, source, target metadata, and transformation rules. It allows you to set up repositories on an open client/server platform. This helps you for sharing metadata with other enterprise tools. Each local repository is connected with one or more job server which runs the job you have created.

Management Console

SAP BOD Data Services Management Console is the web-based application with the following features.

  • Impact and Linear Analysis
  • Administration
  • Auto Documentation
  • Operational Dashboard
  • Data Validation
  • Data Quality Data Reports.

Job server:

The Job Server component helps you to starts the data movement engine. It allows you to integrate data from multiple sources. It helps you to perform complex data transformations and manages transactions and extractions from ERP systems and other sources.

The Data integration Job server tool allows you to move in data in either batch or real-time mode. It delivers high data throughput and scalability. Moreover, while designing a job, you can also run it from the Designer which tells the Job Server to run the job. The Job Server also allows you to get the job from its associated BODS SAP repository.

Data Services Designer

The Data services Designer tool offers an easy-to-use graphical user interface that helps you define transformations, data mappings, and control logic.

This component helps you to build applications containing data and workflows. This component also allows you to manage metadata stored in a repository.

Engines:

After SAP BusinessObjects Data Services job are executed, the Job Server starts the engine to perform data extraction, transformation, and movement. The engine uses parallel processing and in-memory data transformation to offer high data connectivity, quality, and scalability.

Access server:

Access server offers real time request-reply message that collects message request routes them to a real-time service and delivers a message reply in a specific duration. The Acess Server queries message and sends them to the next available real-time service across numerous computing resources.

Address Server:

The next component is the Address Server. It offers address validation and corrections. The Address Server must be started before processing data flows which contain the Global address Cleanse or Global Suggestion list transform with the EMEA engine enabled.

Important terms in SAP BODS

Datastore

A Datastore offers a connection to a data source like a database. It is a linking interface between the actual backend database and Data services. Data store also allows data services to import the description of the data source such as metadata.

CMC (Central Management Console)

CMC is a web-based administration tool for BOD. It is a helpful tool to perform some essential functions like repository registration, user management, etc.

Data Integrator Web Administrator

Data Integrator web administrator is also known as DI web admin. It helps you to maintain repositories in Data services. This SAP BOD services included in DI web admin are Meta Repository, Central Repository, job services, and web services.

SAP Data Services Advantages

Here are a few pros of BODS in SAP

  • SAP Business Objects offers better profiling because of too many acquisitions of other companies.
  • The learning curve of this ETL tool is quick, and it is easy to use.
  • Offers an easy to UI interface to perform data manipulation tasks.
  • The objects and functions of BODS in SAP allow you to perform manipulations and transformation of data very efficiently.
  • There are system-provided objects and functions which you can easily drag and dropped.
  • SAP BODS allows you to implement various data transformations using Data Integrator language
  • SAP BODS helps you to perform complex data transformations for building customized functions.
  • Data Integrator Designer allows you to store real time and batch jobs and new projects in the repository.

Disadvantages of SAP BODS

Here, are few draws backs of using SAP BODS.

  • It is an expensive tool, so the initial purchase cost is very high.
  • Business Objects many need a number of servers and extensive hardware.
  • Business Objects is a developer platform which means organizations that want to use Business Objects will require the support of a highly skilled development team.

 

Part 2:

SAP BO Data Services is an ETL tool used for Data integration, data quality, data profiling and data processing. It allows you to integrate, transform trusted data-to-data warehouse systems for analytical reporting.

BO Data Services consists of a UI development interface, metadata repository, data connectivity to source and target system and management console for scheduling of jobs.

Data Integration & Data Management

SAP BO Data Services is a data integration and management tool and consists of Data Integrator Job Server and Data Integrator Designer.

Key Features

  • You can apply various data transformations using Data Integrator language to apply complex data transformations and building customized functions.
  • Data Integrator Designer is used to store real time and batch jobs and new projects in repositories.
  • DI Designer also provides an option for team based ETL development by providing a central repository with all basics functionality.
  • Data Integrator job server is responsible for processing jobs that are created using DI Designer.

Web Administrator

Data Integrator web administrator is used by system administrators and database administrator to maintain repositories in Data services. Data Services includes Metadata Repository, Central Repository for team-based development, Job Server and Web Services.

Key functions of DI Web Administrator

  • It is used to schedule, monitor and execute batch jobs.
  • It is used for the configuration and start and stop real-time servers.
  • It is used for configuring Job Server, Access Server, and repository usage.
  • It is used for configuring adapters.
  • It is used for configuring and controlling all the tools in BO Data Services.

Data Management function emphasizes on data quality. It involves data cleansing, enhancing and consolidating the data to get correct data in the DW system.

  1. What is the use of BusinessObjects Data Services?

Ans:

BusinessObjects Data Services provides a graphical interface that allows you to easily create jobs that extract data from heterogeneous sources, transform that data to meet the business requirements of your organization, and load the data into a single location.

  1. Define Data Services components?

Ans:

Data Services includes the following standard components:

  • Designer
  • Repository
  • Job Server
  • Engines
  • Access Server
  • Adapters
  • Real-time Services
  • Address Server
  • Cleansing Packages, Dictionaries, and Directories
  • Management Console 
  1. What are the steps included in the Data integration process?

Ans:

  Stage data in an operational data store, data warehouse, or data mart.

Update staged data in batch or real-time modes.

Create a single environment for developing, testing, and deploying the entire data integration platform.

Manage a single metadata repository to capture the relationships between different extraction and access methods and provide integrated lineage and impact analysis.

  1. Define the terms Job, Workflow, and Dataflow?

Ans:

A job is the smallest unit of work that you can schedule independently for execution.

A workflow defines the decision-making process for executing data flows.

Data flows extract, transform, and load data. Everything having to do with data, including reading sources, transforming data, and loading targets, occurs inside a data flow.

  1. Arrange these objects in order by their hierarchy: Dataflow, Job, Project, and Workflow?

Ans:

Project, Job, Workflow, Dataflow.

  1. What are reusable objects in DataServices?

Ans:

Job, Workflow, Dataflow.

  1. What is a transform?

Ans:

A transform enables you to control how datasets change in a dataflow.

  1. What is a Script?

Ans:

  A script is a single-use object that is used to call functions and assign values in a workflow.

  1. What is a real-time Job?

Ans:

  Real-time jobs “extract” data from the body of the real-time message received and from any secondary sources used in the job.

  1. What is an Embedded Dataflow?

Ans:

An Embedded Dataflow is a dataflow that is called from inside another dataflow.

  1. What is the difference between a data store and a database?

Ans:

A datastore is a connection to a database.

  1. How many types of datastores are present in Data services?

Ans:

Three.

Database Datastores: provide a simple way to import metadata directly from an RDBMS.

Application Datastores: let users easily import metadata from most Enterprise Resource Planning (ERP) systems.

Adapter Datastores: can provide access to an application’s data and metadata or just metadata.

  1. What is the use of a Compact repository?

Ans:

Remove redundant and obsolete objects from the repository tables.

  1. What are Memory Datastores?

Ans:

Data Services also allows you to create a database datastore using Memory as the Database type. Memory Datastores are designed to enhance processing performance of data flows executing in real-time jobs.

  1. What are file formats?

Ans:

A file format is a set of properties describing the structure of a flat-file (ASCII). File formats describe the metadata structure. File format objects can describe files in:

Delimited format — Characters such as commas or tabs separate each field.

Fixed width format — The column width is specified by the user.

SAP ERP and R/3 format.

  1. Which is NOT a datastore type?

Ans:

File Format

  1. What is the repository? List the types of repositories?

Ans:

The DataServices repository is a set of tables that holds user-created and predefined system objects, source and target metadata, and transformation rules. There are 3 types of repositories.

A local repository

A central repository

A profiler repository

  1. What is the difference between a Repository and a Datastore?

Ans:

A Repository is a set of tables that hold system objects, source and target metadata, and transformation rules. A Datastore is an actual connection to a database that holds data.

  1. What is the difference between a Parameter and a Variable?

Ans:

  The parameter is an expression that passes a piece of information to workflow, data flow or custom function when it is called in a job. A Variable is a symbolic placeholder for values.

  1. When would you use a global variable instead of a local variable?

Ans:

  When the variable will need to be used multiple times within a job.

When you want to reduce the development time required for passing values between job components.

When you need to create a dependency between job level global variable name and job components.Top of Form

 

Bottom of Form

  1. What is a Substitution Parameter?

Ans:

The Value that is constant in one environment, but may change when a job is migrated to another environment.

  1. List some reasons why a job might fail to execute?

Ans:

Incorrect syntax, Job Server not running, port numbers for Designer and Job Server not matching.

  1. List factors you consider when determining whether to run workflows or data flows serially or in parallel?

Ans:

Consider the following:

Whether or not the flows are independent of each other

Whether or not the server can handle the processing requirements of flows running at the same time (in parallel)

  1. What does a lookup function do? How do the different variations of the lookup function differ?

Ans:

All lookup functions return one row for each row in the source. They differ in how they choose which of several matching rows to return. ‘

  1. List the three types of input formats accepted by the Address Cleanse transform?

Ans:

Discrete, multiline, and hybrid.

  1. Name the transform that you would use to combine incoming data sets to produce a single output data set with the same schema as the input data sets?

Ans:

The Merge transform.

  1. What are Adapters?

Ans:

Adapters are additional Java-based programs that can be installed on the job server to provide connectivity to other systems such as Salesforce.com or the JavaMessagingQueue. There is also a software development Kit (SDK) to allow customers to create adapters for custom applications.

  1. List the data integrator transforms?

Ans:

  • Data_Transfer
  • Date_Generation
  • Effective_Date
  • Hierarchy_Flattening
  • History_Preserving
  • Key_Generation
  • Map_CDC_Operation
  • Pivot Reverse Pivot
  • Table_Comparison
  • XML_Pipeline
  1. List the Data Quality Transforms?

Ans:

  • Global_Address_Cleanse
  • Data_Cleanse
  • Match
  • Associate
  • Country_id
  • USA_Regulatory_Address_Cleanse

 

 

 

  1. What are Cleansing Packages?

Ans:

These are packages that enhance the ability of Data Cleanse to accurately process various forms of global data by including language-specific reference data and parsing rules.

 

  1. What is Data Cleanse?

Ans:

The Data Cleanse transform identifies and isolates specific parts of mixed data, and standardizes your data based on information stored in the parsing dictionary, business rules defined in the rule file, and expressions defined in the pattern file.

  1. What is the difference between a Dictionary and Directory?

Ans:

Directories provide information on addresses from postal authorities. Dictionary files are used to identify, parse, and standardize data such as names, titles, and firm data.

  1. Give some examples of how data can be enhanced through the data cleanse transform, and describe the benefit of those enhancements?

Ans:

  • Enhancement Benefit
  • Determine gender distributions and target
  • Gender Codes marketing campaigns
  • Provide fields for improving matching
  • Match Standards results
  1. A project requires the parsing of names into given and family, validating address information, and finding duplicates across several systems. Name the transforms needed and the task they will perform?

Ans:

  • Data Cleanse: Parse names into given and family.
  • Address Cleanse: Validate address information.
  • Match: Find duplicates.
  1. Describe when to use the USA Regulatory and Global Address Cleanse transforms?

Ans:

Use the USA Regulatory transform if USPS certification and/or additional options such as DPV and Geocode are required. Global Address Cleanse should be utilized when processing multi-country data.

  1. Give two examples of how the Data Cleanse transform can enhance (append) data?

Ans:

The Data Cleanse transform can generate name match standards and greetings. It can also assign gender codes and prenames such as Mr. and Mrs.

  1. What are name match standards and how are they used?

Ans:

Name match standards illustrate the multiple ways a name can be represented. They are used in the match process to greatly increase match results.

  1. What are the different strategies you can use to avoid duplicate rows of data when reloading a job?

Ans:

  Using the auto-correct load option in the target table.

Including the Table, Comparison transforms into the data flow.

Designing the data flow to completely replace the target table during each execution.

Including a preload SQL statement to execute before the table loads.

  1. What is the use of AutoCorrect Load?

Ans:

it does not allow duplicated data entering into the target table. It works like Type 1 Insert else Update the rows based on Non-matching and matching data respectively.

  1. What is the use of Array fetch size?

Ans:

Array fetch size indicates the number of rows retrieved in a single request to a source database. The default value is 1000. Higher numbers reduce requests, lowering network traffic, and possibly improve performance. The maximum value is 5000

  1. What is the use of Case Transform?

Ans:

Use the Case transform to simplify branch logic in data flows by consolidating case or decision-making logic into one transform. The transform allows you to split a data set into smaller sets based on logical branches.

  1. What must you define in order to audit a data flow?

Ans:

You must define audit points and audit rules when you want to audit a data flow

  1. What is the difference between OLTP and a Data warehouse?

Ans:

Indexes − OLTP system has only few indexes while in an OLAP system there are many indexes for performance optimization.

Joins − In an OLTP system, large number of joins and data is normalized however in an OLAP system there are less joins and de-normalized.

Aggregation − In an OLTP system data is not aggregated while in an OLAP database more aggregations are used.

  1. Why do we need a staging area in an ETL process?

Ans:

There is a staging area that is required during ETL load. There are various reasons why a staging area is required −

As source systems are only available for a specific period of time to extract data and this time is less than total data load time so the Staging area allows you to extract the data from the source system and keep it in the staging area before the time slot is ended.

Staging area is required when you want to get data from multiple data sources together. If you want to join two or more systems together. Example- You will not be able to perform a SQL query joining two tables from two physically different databases.

Data extractions time slots for different systems vary as per the time zone and operational hours.

Data extracted from source systems can be used in multiple data warehouse system, Operational Data stores, etc.

During ETL you can perform complex transformations that allows you to perform complex transformations and require extra area to store the data.

  1. What is SAP Data services?

Ans:

SAP BO Data Services is an ETL tool used for Data integration, data quality, data profiling and data processing and allows you to integrate, transform trusted data to data warehouse system for analytical reporting.

BO Data Services consists of a UI development interface, metadata repository, data connectivity to source and target system and management console for scheduling of jobs.

  1. Explain the architecture of BODS with Job Server, Repository Manager, and Management Console?

Ans:

You can also divide BODS architecture in below layers −

Web Application Layer, Database Server Layer, Data Services Service Layer.

Architecture of BODS

What is a repository in BODS? What are the different types of Repositories in BODS?

Repository is used to store meta-data of objects used in BO Data Services. Each Repository should be registered in Central Management Console CMC and is linked with single or many job servers which is responsible to execute jobs that are created by you.

There are three types of Repositories −

Local Repository −

It is used to store the metadata of all objects created in Data Services Designer like project, jobs, data flow, work flow, etc.

Central Repository −

It is used to control the version management of the objects and is used for multiuse development. Central Repository stores all the versions of an application object so it allows you to move to previous versions.

Profiler Repository −

This is used to manage all the metadata related to profiler tasks performed in SAP BODS designer. CMS Repository stores metadata of all the tasks performed in CMC on BI platform. Information Steward Repository stores all the metadata of profiling tasks and objects created in information steward.

  1. What is a single object and reusable objects in Data services?

Ans:

Reusable Objects −

Most of the objects that are stored in repository can be reused. When a reusable objects is defined and save in the local repository, you can reuse the object by creating calls to the definition. Each reusable object has only one definition and all the calls to that object refer to that definition. Now if definition of an object is changed at one place you are changing object definition at all the places where that object appears.

An object library is used to contain object definition and when an object is drag and drop from library, it means a new reference to an existing object is created.

Single Use Objects −

All the objects that are defined specifically to a job or data flow, they are called single use objects. Example-specific transformation used in any data load.

What is a Data Store in Data services designer and what are different types of Data Stores?

Datastore are used to setup connection between an application and database. You can directly create Datastore or can be created with help of adapters. Datastore allows an application/software to read or write metadata from an application or database and to write to that database or application.

Data Services Designer

You want to set up a new repository in BODS. How do you create it?

To create BODS Repository you need a database installed. You can use SQL Server, Oracle database, My SQL, SAP HANA, Sybase, etc. You have to create below users in database while installing BODS and to create Repositories. These users are required to login to different servers CMS Server, Audit Server. To create a new repository, you have to login to Repository manager.

  1. What is a real time Job?

Ans:

Real-time jobs “extract” data from the body of the real time message received and from any secondary sources used in the job.

How do you manage object versions in BODS?

Central repository is used to control the version management of the objects and is used for multiuse development. Central Repository stores all the versions of an application object so it allows you to move to previous versions.

You want to generate the quality reports in DS system, data validation, and documentation. Where you can see this?

Data Services Management Console

  1. What is the template table?

Ans:

In Data Services, you can create a template table to move to target system that has same structure and data type as source table.

  1. How do you check the execution history of a job or a data flow?

Ans:

DS Management Console → Job Execution History

  1. What is SAP Data Services Designer? What are main ETL functions that can be performed in Designer tool?

Ans:

It is a developer tool which is used to create objects consist of data mapping, transformation, and logic. It is GUI based and work as designer for Data Services.

You can create various objects using Data Services Designer like Project, Jobs, Work Flow, Data Flow, mapping, transformations, etc.

  1. How do you check existing objects in DS repository?

Ans:

In Object library in DS Designer

  1. How do you improve the performance of data flows using memory datastore?

Ans:

You can create Datastore using memory as database type. Memory Datastore are used to improve the performance of data flows in real time jobs as it stores the data in memory to facilitate quick access and doesn’t require to go to original data source.

A memory Datastore is used to store memory table schemas in the repository. These memory tables get data from tables in Relational database or using hierarchical data files like XML message and IDocs.

The memory tables remain alive till job executes and data in memory tables can’t be shared between different real time jobs.

What is linked datastore? Explain with an example?

There are various database vendors which only provides one way communication path from one database to another database. These paths are known as database links. In SQL Server, Linked server allows one way communication path from one database to other.

Example −

Consider a local database Server name “Product” stores database link to access information on remote database server called Customer. Now users that are connected to remote database server Customer can’t use the same link to access data in database server Product. User that are connected to “Customer” should have a separate link in data dictionary of the server to access the data in Product database server.

This communication path between two databases are called database link and Datastores which are created between these linked database relationships is known as linked Datastores.

There is a possibility to connect Datastore to another Datastore and importing an external database link as option of Datastore.

You want to import application metadata into repository. How you can perform this?

Adapter Datastore allows you to import application metadata into repository. You can also access application metadata and you can also move batch and real time data between different applications and software.

  1. What are the different types of files can be used as source and target file format?

Ans:

  •  Delimited
  • SAP Transport
  • Unstructured Text
  • Unstructured Binary
  • Fixed Width
  1. You want to extract data from an Excel work book. How you can do this?

Ans:

You can use Microsoft Excel workbook as data source using file formats in Data Services. Excel work book should be available on Windows file system or Unix File system.

  1. What is the use of data flow in DS?

Ans:

Data flow is used to extract, transform and load data from source to target system. All the transformations, loading and formatting occurs in dataflow.

  1. What are different objects that you can add to a dataflow?

Ans:

  • Source
  • Target
  • Transforms
  1. What are the different properties that you can set for a data flow?

Ans:

  • Execute once
  • Parallelism
  • Database links
  • Cache
  1. Why do you use workflows in DS?

Ans:

Workflows are used to determine the process for executing the workflows. Main purpose of workflow is to prepare for executing the data flows and to set the state of the system once data flow execution is completed.

  1. What are the different objects that you can add to workflow?

Ans:

  •  Workflow
  • Data flow
  • Scripts
  • Loops
  • Conditions
  • Try or Catch Blocks
  1. Is it possible that a workflow calls itself in a Daa services job?

Ans:

Yes

  1. Give an example of workflow in production?

Ans:

There is a fact table that you want to update and you have created a data flow with the transformation. Now If you want to move the data from the source system, you have to check the last modification for the fact table so that you extract only rows that have been added after the last update.

In order to achieve this, you have to create one script which determines the last update date and then pass this as an input parameter to data flow.

You also have to check if data connection to a particular fact table is active or not. If it is not active, you need to set up a catch block which automatically sends an email to the administrator to notify about this problem.

  1. What is the use of conditionals?

Ans:

You can also add Conditionals to workflow. This allows you to implement If/Else/Then logic on the workflows.

  1. What is a transformation in Data Services?

Ans:

Transforms are used to manipulate data sets as inputs and creating one or multiple outputs. There are various transforms that can be used in Data Services.

  1. What are the common transformations that are available in Data Services?

Ans:

  • Data Integration
  • Data Quality
  • Platform
  • Merge
  • Query
  • Text data processing
  1. What are different transformations under data integration?

Ans:

  • Data_Generator
  • Data_Transfer
  • Effective_Date
  • Hierarchy_flattening
  • Table_Comparision, etc.
  1. What is the use of query transformation?

Ans:

This is most common transformation used in Data Services and you can perform below functions −

  • Data filtering from sources
  • Joining data from multiple sources
  • Perform functions and transformations on data
  • Column mapping from input to output schemas
  • Assigning Primary keys
  • Add new columns, schemas and functions resulted to output schemas

As Query transformation is the most commonly used transformation, so a shortcut is provided for this query in the tool palette.

  1. What is text data processing transformation?

Ans:

This allows you to extract the specific information from a large volume of text. You can search for facts and entities like customer, product, and financial facts specific to an organization.

This transform also checks the relationship between entities and allows the extraction.

The data extracted using text data processing can be used in Business Intelligence, Reporting, query, and analytics.

  1. What is the difference between text data processing and data cleansing?

Ans:

Text data processing is used for finding relevant information from unstructured text data however data cleansing is used for standardization and cleansing structured data.

  1. What is a real time job in Data Services?

Ans:

You can create real time jobs to process real time messages in the Data Services designer. Like a batch job, a real time job extracts the data, transforms and loads it.

Each real time job can extract data from a single message or you can also extract data from other sources like tables or files.

  1. Explain the difference between real time and batch jobs in Data Services?

Ans:

Transforms like branches and control logic are used more often in real time jobs unlike the batch jobs in design.

Real time jobs are not executed in response to a schedule or internal trigger unlike the batch jobs.

  1. What is an embedded data flow?

Ans:

Embedded data flow is known as data flows which are called from another data flow in the design. The embedded data flow can contain multiple numbers of source and targets but only one input or output passes data to the main data flow.

  1. What are the different types of embedded data flow?

Ans:

One Input − Embedded data flow is added at the end of dataflow.

One Output − Embedded data flow is added at the beginning of a data flow.

No input or output − Replicate an existing data flow.

  1. What are local and global variables in a Data services job?

Ans:

Local variables in data services are restricted to the object in which they are created.

Global variables are restricted to jobs in which they are created. Using global variables, you can change values for default global variables at run time.

  1. How variables are different form parameters in a Data Services job?

Ans:

Expressions that are used in work flow and data flow are called parameters.

All the variables and parameters in workflow and data flows are shown in the variable and parameters window.

  1. What are the different recovery mechanisms that can be used in failed jobs?

Ans:

Automatic Recovery – This allows you to run unsuccessful jobs in recovery mode.

Manually Recovery – This allows you to rerun the jobs without considering partial rerun previous time.

  1. What is the use of Data Profiling?

Ans:

Data Services Designer provides a feature of Data Profiling to ensure and improve the quality and structure of source data. Data Profiler allows you to −

  1. Find anomalies in source data, validation and corrective action and quality of source data.

Ans:

The structure and relationship of source data for better execution of jobs, workflows and data flows.

The content of source and target system to determine that your job returns the result as expected.

  1. Explain the different performance optimization techniques in BODS?

Ans:

The performance of an ETL job depends on the system on which you are using Data Services software, number of moves, etc. There are various other factors that contributes to the performance in an ETL task −

  • Source Data Base
  • Source Operating System
  • Target Database
  • Target Operating System
  • Network
  • Job Server OS
  • BODs Repository Database
  1. What do you understand by multi user development in BODS? How do you manage multi user development?

Ans:

SAP BO Data Services support multi user development where each user can work on an application in their own local repository. Each team uses a central repository to save the main copy of an application and all the versions of objects in the application.

You want to perform multi user migration in SAP BODS. How can you perform this?

In SAP Data Services, job migration can be applied at different levels- Application Level, Repository Level, Upgrade level.

To copy the content of one central repository to another central repository, you can’t do it directly and you need to make use of the local repository.

First is to get the latest version of all objects from central repository to local repository. Activate the central repository in which you want to copy the contents.

Add all the objects you want to copy from local repository to central repository.

  1. Suppose you have updated the version of Data Services software? Is it required to update the repository version?

Ans:

If you update version of SAP Data Services, there is a need to update version of Repository. Below points should be considered when migrating a central repository to upgrade version −

Point 1

Take the backup of central repository all tables and objects.

Point 2

To maintain version of objects in data services, maintain a central repository for each version. Create a new central history with new version of Data Services software and copy all objects to this repository.

Point 3

It is always recommended if you install new version of Data Services, you should upgrade your central repository to new version of objects.

Point 4

Also upgrade your local repository to same version as different version of central and local repository may not work at the same time.

Point 5

Before migrating the central repository, check in all the objects. As you don’t upgrade central and local repository simultaneously, so there is a need to check in all the objects. As once you have your central repository upgraded to new version, you will not be able to check in objects from local repository which is having older version of Data Services.

What is slowly changing dimension?

SCDs are dimensions that have data that changes over time.

  1. How do you manage slowly changing dimensions? What are the fields required in managing different types of SCD?

Ans:

  •  SCD Type 1 No history preservation
  • Natural consequence of normalization
  • SCD Type 2 Preserving all history and new rows
  • There are new rows generated for significant changes
  • You need to use of a unique key
  • There are new fields are generated to store history data
  • You need to manage an Effective_Date field.
  • SCD Type 3 Limited history preservation
  • In this only two states of data are preserved – current and old
  • Is file format in Data Services type of a data store?
  • No, File format is not a datastore type.
  1. If you want to find immediate answers to business questions, which tool you would go with?

Ans:

We will go with business objects explorer in this case. It acts like a business object’s search engine. We can search for anything through a keyword search box in the explorer and all the information spaces appear in the result where that particular keyword exists.

  1. In BO 3.1, we had a ‘dashboard builder’. What is the nomenclature for the same in BO BI 4.0?

Ans:

It is called ‘BI workspaces’ in BO BI 4.0 platform.

  1. Do you have any idea about the Live Office?

Ans:

There are many users who are comfortable using only Microsoft office products. They need business reports in Microsoft office products like Microsoft Excel and all. For those users, we have SAP BO Live office tools.

  1. We have a lot of tools available under the business objects platform. Will a company have to purchase the entire suite or does the SAP provide a way for organizations to buy only the tools which are needed?

Ans:

No, it is not necessary for a company to buy all the tools that come under the SAP BO platform. The SAP provides the customization packages under which the company can buy only the tools which are needed.

  1. What can be different data sources for BO reports?

Ans:

The different data sources are SAP BW, OLAP, application database, customer database, Text file, XML file and Web service.

  1. What is the difference between UDT and IDT?

Ans:

The universes which are designed in UDT are UNV universes whereas the universes which are designed in IDT are UNX universes. In UDT we don’t have multi-resource universes enabled whereas in IDT we have this option. IDT is enhanced and more organized as compared to UDT.

  1. Can crystal reports be built on top of UDT?

Ans:

No, the tools like crystal report for Enterprises, Dashboards, BO explorer does not support the UNV universe (designed by UDT). It supports only the UNX universe (designed by IDT). Web Intelligence is the only tool that supports both the universes – UNV and UNX.

  1. Mention what are the major benefits of reporting with BW over R/3?

Ans:

Business Warehouse uses a data warehouse and OLAP concepts for analyzing and storing data While the R/3 was intended for transaction processing. You can get the same analysis out of R/3, but it would be easier from a BW.

  1. Mention the two types of services that are used to deal with communication?

Ans:

To deal with communication, you can use two types of services

  • Message Service: In order to exchange short internal messages, this service is used by the application servers
  • Gateway Service: This service allows communication between R/3 and external applications using CPI-C protocol.
  1. Mention what are reason codes used in Account Receivable?

Ans:

“Reason Codes” are tags that can be allocated to describe under/overpayments during the allocation of incoming customer payments. They should not be mixed up with “void reason codes” used when outgoing cheques are produced.

  1. Mention what protocol does the SAP Gateway process use?

Ans:

The SAP gateway process uses TCP/IP protocol to communicate with the clients.

  1. Mention what is pooled tables?

Ans:

Pooled tables are used to store control data. Several pooled tables can be united to form a table pool. Table tool is a physical table on the database in which all the records of the allocated pooled tables are stored.

  1. Explain what is an update type with reference to a match code ID?

Ans:

If the data in one of the base tables of a matchcode ID changes, the matchcode data has to be updated. The update type stipulates when the match-code has to be updated and how it has to be done. The update type also defines which method is to be used for building match-codes.

  1. Explain what the .sca files and mention their importance?

Ans:

sca stands for SAP component Archive. It is used to deploy the Java components, patches and other java developments in the form of .sca,.sda,.war and .jar.

  1. Explain what is meant by “Business Content” in SAP?

Ans:

Business Content in SAP is a pre-configured and pre-defined models of information contained in the SAP warehouse which can be used directly or with desired modification in different industries.

  1. Explain what a dispatcher is?

Ans:

Dispatcher is a component that takes the request for client systems and stores the request in a queue.

 

Part 3 :

  1. What is the difference between OLTP and a Data warehouse?

Indexes − OLTP system has only few indexes while in an OLAP system there are many indexes for performance optimization.

Joins − In an OLTP system, large number of joins and data is normalized however in an OLAP system there are less joins and de-normalized.

Aggregation − In an OLTP system data is not aggregated while in an OLAP database more aggregations are used.

  1. Why do we need a staging area in an ETL process?

There is a staging area that is required during ETL load. There are various reasons why a staging area is required −

As source systems are only available for specific period of time to extract data and this time is less than total data load time so Staging area allows you to extract the data from source system and keep it in staging area before time slot is ended.

Staging area is required when you want to get data from multiple data sources together. If you want to join two or more systems together. Example- You will not be able to perform a SQL query joining two tables from two physically different databases.

Data extractions time slot for different systems vary as per the time zone and operational hours.

Data extracted from source systems can be used in multiple data warehouse system, Operation Data stores, etc.

During ETL you can perform complex transformations that allows you to perform complex transformations and require extra area to store the data.

  1. What is SAP Data services?

SAP BO Data Services is an ETL tool used for Data integration, data quality, data profiling and data processing and allows you to integrate, transform trusted data to data warehouse system for analytical reporting.

BO Data Services consists of a UI development interface, metadata repository, data connectivity to source and target system and management console for scheduling of jobs.

Explain the architecture of BODS with Job Server, Repository Manager, and Management Console?

You can also divide BODS architecture in below layers −

Web Application Layer, Database Server Layer, Data Services Service Layer.

  1. What is a repository in BODS? What are the different types of Repositories in BODS?

Repository is used to store meta-data of objects used in BO Data Services. Each Repository should be registered in Central Management Console CMC and is linked with single or many job servers which is responsible to execute jobs that are created by you.

There are three types of Repositories −

Local Repository −

It is used to store the metadata of all objects created in Data Services Designer like project, jobs, data flow, work flow, etc.

Central Repository −

It is used to control the version management of the objects and is used for multiuse development. Central Repository stores all the versions of an application object so it allows you to move to previous versions.

Profiler Repository −

This is used to manage all the metadata related to profiler tasks performed in SAP BODS designer. CMS Repository stores metadata of all the tasks performed in CMC on BI platform. Information Steward Repository stores all the metadata of profiling tasks and objects created in information steward.

What is single object and reusable objects in Data services?

Reusable Objects −

Most of the objects that are stored in repository can be reused. When a reusable objects is defined and save in the local repository, you can reuse the object by creating calls to the definition. Each reusable object has only one definition and all the calls to that object refer to that definition. Now if definition of an object is changed at one place you are changing object definition at all the places where that object appears.

An object library is used to contain object definition and when an object is drag and drop from library, it means a new reference to an existing object is created.

Single Use Objects −

All the objects that are defined specifically to a job or data flow, they are called single use objects. Example-specific transformation used in any data load.

  1. What is a Data Store in Data services designer and what are different types of Data Stores?

Datastore are used to setup connection between an application and database. You can directly create Datastore or can be created with help of adapters. Datastore allows an application/software to read or write metadata from an application or database and to write to that database or application.

  1. You want to set up a new repository in BODS. How do you create it?

To create BODS Repository you need a database installed. You can use SQL Server, Oracle database, My SQL, SAP HANA, Sybase, etc. You have to create below users in database while installing BODS and to create Repositories. These users are required to login to different servers CMS Server, Audit Server. To create a new repository, you have to login to Repository manager.

  1. What is a real time Job?

Real-time jobs “extract” data from the body of the real time message received and from any secondary sources used in the job.

  1. How do you manage object versions in BODS?

Central repository is used to control the version management of the objects and is used for multiuse development. Central Repository stores all the versions of an application object so it allows you to move to previous versions.

  1. You want to generate the quality reports in DS system, data validation, and documentation. Where you can see this?

Data Services Management Console

  1. What is the template table?

In Data Services, you can create a template table to move to target system that has same structure and data type as source table.

  1. How do you check the execution history of a job or a data flow?

DS Management Console → Job Execution History

  1. What is SAP Data Services Designer? What are main ETL functions that can be performed in Designer tool?

It is a developer tool which is used to create objects consist of data mapping, transformation, and logic. It is GUI based and work as designer for Data Services.

You can create various objects using Data Services Designer like Project, Jobs, Work Flow, Data Flow, mapping, transformations, etc.

  1. How do you check existing objects in DS repository?

In Object library in DS Designer

  1. How do you improve the performance of data flows using memory datastore?

You can create Datastore using memory as database type. Memory Datastore are used to improve the performance of data flows in real time jobs as it stores the data in memory to facilitate quick access and doesn’t require to go to original data source.

A memory Datastore is used to store memory table schemas in the repository. These memory tables get data from tables in Relational database or using hierarchical data files like XML message and IDocs.

The memory tables remain alive till job executes and data in memory tables can’t be shared between different real time jobs.

  1. What is linked datastore? Explain with an example?

There are various database vendors which only provides one way communication path from one database to another database. These paths are known as database links. In SQL Server, Linked server allows one way communication path from one database to other.

Example −

Consider a local database Server name “Product” stores database link to access information on remote database server called Customer. Now users that are connected to remote database server Customer can’t use the same link to access data in database server Product. User that are connected to “Customer” should have a separate link in data dictionary of the server to access the data in Product database server.

This communication path between two databases are called database link and Datastores which are created between these linked database relationships is known as linked Datastores.

There is a possibility to connect Datastore to another Datastore and importing an external database link as option of Datastore.

  1. You want to import application metadata into repository. How you can perform this?

Adapter Datastore allows you to import application metadata into repository. You can also access application metadata and you can also move batch and real time data between different applications and software.

  1. What are the different types of files can be used as source and target file format?
  • Delimited
  • SAP Transport
  • Unstructured Text
  • Unstructured Binary
  • Fixed Width
  1. You want to extract data from an Excel work book. How you can do this?

You can use Microsoft Excel workbook as data source using file formats in Data Services. Excel work book should be available on Windows file system or Unix File system.

  1. What is the use of data flow in DS?

Data flow is used to extract, transform and load data from source to target system. All the transformations, loading and formatting occurs in dataflow.

  1. What are different objects that you can add to a dataflow?
  • Source
  • Target
  • Transforms
  1. What are the different properties that you can set for a data flow?
  • Execute once
  • Parallelism
  • Database links
  • Cache
  1. Why do you use work flow in DS?

Workflows are used to determine the process for executing the workflows. Main purpose of workflow is to prepare for executing the data flows and to set the state of system once data flow execution is completed.

  1. What are the different objects that you can add to work flow?
  • Work flow
  • Data flow
  • Scripts
  • Loops
  • Conditions
  • Try or Catch Blocks
  1. Is it possible that a workflow call itself in Daa services job?

Yes

  1. Give an example of work flow in production?

There is a fact table that you want to update and you have created a data flow with the transformation. Now If you want to move the data from source system, you have to check last modification for fact table so that you extract only rows that has been added after last update.

In order to achieve this, you have to create one script which determines last update date and then pass this as input parameter to data flow.

You also have to check if data connection to a particular fact table is active or not. If it is not active, you need to setup a catch block which automatically sends an email to administrator to notify about this problem.

  1. What is the use of conditionals?

You can also add Conditionals to workflow. This allows you to implement If/Else/Then logic on the workflows.

  1. What is a transformation in Data Services?

Transforms are used to manipulate data sets as inputs and creating one or multiple outputs. There are various transforms that can be used in Data Services.

  1. What are the common transformations that are available in Data Services?
  • Data Integration
  • Data Quality
  • Platform
  • Merge
  • Query
  • Text data processing
  1. What are different transformations under data integration?
  • Data_Generator
  • Data_Transfer
  • Effective_Date
  • Hierarchy_flattening
  • Table_Comparision, etc.
  1. What is the use of query transformation?

This is most common transformation used in Data Services and you can perform below functions −

  • Data filtering from sources
  • Joining data from multiple sources
  • Perform functions and transformations on data
  • Column mapping from input to output schemas
  • Assigning Primary keys
  • Add new columns, schemas and functions resulted to output schemas
  • As Query transformation is most commonly used transformation, so a shortcut is provided for this query in tool palette.
  1. What is text data processing transformation?

This allows you to extract the specific information from large volume of text. You can search for facts and entities like customer, product, and financial facts specific to an organization.

This transform also checks the relationship between entities and allows the extraction.

The data extracted using text data processing can be used in Business Intelligence, Reporting, query, and analytics.

  1. What is difference between text data processing and data cleansing?

Text data processing is used for finding relevant information from unstructured text data however data cleansing is used for standardization and cleansing structured data.

  1. What is a real time job in Data Services?

You can create real time jobs to process real time messages in Data Services designer. Like a batch job, real time job extracts the data, transform and load it.

Each real time job can extract data from a single message or you can also extract data from other sources like tables or files.

  1. Explain the different between real time and batch job in Data Services?

Transform like branches and control logic are used more often in real time job unlike the batch jobs in designer.

Real time jobs are not executed in response of a schedule or internal trigger unlike the batch jobs.

  1. What is an embedded data flow?

Embedded data flow is known as data flows which are called from another data flow in the design. The embedded data flow can contain multiple number of source and targets but only one input or output pass data to main data flow.

  1. What are the different types of embedded data flow?

One Input − Embedded data flow is added at the end of dataflow.

One Output − Embedded data flow is added at the beginning of a data flow.

No input or output − Replicate an existing data flow.

  1. What are local and global variables in Data services job?

Local variables in data services are restricted to object in which they are created.

Global variables are restricted to jobs in which they are created. Using global variables, you can change values for default global variables at run time.

  1. How variables are different form parameters in a Data Services job?

Expressions that are used in work flow and data flow they are called parameters.

All the variables and parameters in work flow and data flows are shown in variable and parameters window.

  1. What are the different recovery mechanism that can be used in failed jobs?

Automatic Recovery – This allows you to run unsuccessful jobs in recovery mode.

Manually Recovery – This allows you to rerun the jobs without considering partial rerun previous time.

  1. What is the use of Data Profiling?

Data Services Designer provides a feature of Data Profiling to ensure and improve the quality and structure of source data. Data Profiler allows you to −

Find anomalies in source data, validation and corrective action and quality of source data.

The structure and relationship of source data for better execution of jobs, work flows and data flows.

The content of source and target system to determine that your job returns the result as expected.

  1. Explain the different performance optimization techniques in BODS?

The performance of an ETL job depends on the system on which you are using Data Services software, number of moves, etc. There are various other factors that contributes to the performance in an ETL task −

  • Source Data Base
  • Source Operating System
  • Target Database
  • Target Operating System
  • Network
  • Job Server OS
  • BODs Repository Database
  1. What do you understand by multiuser development in BODS? How do you manage multiuser development?

SAP BO Data Services support multi user development where each user can work on application in their own local repository. Each team uses central repository to save main copy of an application and all the versions of objects in the application.

  1. You want to perform multiuser migration in SAP BODS. How you can perform this?

In SAP Data Services, job migration can be applied at different levels- Application Level, Repository Level, Upgrade level.

To copy the content of one central repository to other central repository, you can’t do it directly and you need to make use of local repository.

First is to get the latest version of all objects from central repository to local repository. Activate the central repository in which you want to copy the contents.

Add all the objects you want to copy from local repository to central repository.

  1. Suppose you have updated the version of Data Services software? Is it required to update the repository version?

If you update version of SAP Data Services, there is a need to update version of Repository. Below points should be considered when migrating a central repository to upgrade version −

Point 1

Take the backup of central repository all tables and objects.

Point 2

To maintain version of objects in data services, maintain a central repository for each version. Create a new central history with new version of Data Services software and copy all objects to this repository.

Point 3

It is always recommended if you install new version of Data Services, you should upgrade your central repository to new version of objects.

Point 4

Also upgrade your local repository to same version as different version of central and local repository may not work at the same time.

Point 5

Before migrating the central repository, check in all the objects. As you don’t upgrade central and local repository simultaneously, so there is a need to check in all the objects. As once you have your central repository upgraded to new version, you will not be able to check in objects from local repository which is having older version of Data Services.

  1. What is slowly changing dimension?

SCDs are dimensions that have data that changes over time.

  1. How do you manage slowly changing dimensions? What are the fields required in managing different types if SCD?

SCD Type 1 No history preservation

Natural consequence of normalization

SCD Type 2 Preserving all history and new rows

There are new rows generated for significant changes

You need to use of a unique key

There are new fields are generated to store history data

You need to manage an Effective_Date field.

SCD Type 3 Limited history preservation

In this only two states of data are preserved – current and old

 

By bpci