DataStage Information: Real time Faqs

28.What is the difference between sequential file and a dataset? When to use the copy stage?
• Sequential file stores small amount of the data with any extension .txt where as DataSet stores Huge amount of the data and opens the file only with an extension .ds.
• Sequentiial Stage stores small amount of the data with any extension in order to acces the file where as DataSet is used to store Huge amount of the data and it opens only with an extension (.ds ) .The Copy stage copies a single input data set to a number of output datasets. Each record of the input data set is copied to every output data set.Records can be copied without modification or you can drop or change theorder of columns.
29.how to find errors in job sequence?
• using DataStage Director we can find the errors in job sequence
30.what is job control?how can it used explain with steps?
• JCL defines Job Control Language it is ued to run more number of jobs at a time with or without using loops. steps:click on edit in the menu bar and select 'job properties' and enter the parameters asparamete prompt typeSTEP_ID STEP_ID stringSource SRC stringDSN DSN stringUsername unm stringPassword pwd stringafter editing the above steps then set JCL button and select the jobs from the listbox and run the job
31.how to implement type2 slowly changing dimenstion in datastage? give me with example?
• Slow changing dimension is a common problem in Dataware housing. For example: There exists a customer called lisa in a company ABC and she lives in New York. Later she she moved to Florida. The company must modify her address now. In general 3 ways to solve this problem
Type 1: The new record replaces the original record, no trace of the old record at all, Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two different people. Type 3: The original record is modified to reflect the changes.
In Type1 the new one will over write the existing one that means no history is maintained, History of the person where she stayed last is lost, simple to use.
In Type2 New record is added, therefore both the original and the new record Will be present, the new record will get its own primary key, Advantage of using this type2 is, Historical information is maintained But size of the dimension table grows, storage and performance can become a concern.
Type2 should only be used if it is necessary for the data warehouse to track the historical changes.
In Type3 there will be 2 columns one to indicate the original value and the other to indicate the current value. example a new column will be added which shows the original address as New york and the current address as Florida. Helps in keeping some part of the history and table size is not increased. But one problem is when the customer moves from Florida to Texas the new york information is lost. so Type 3 should only be used if the changes will only occur for a finite number of time.
32.where actually the flat files store?what is the path?
• Normally flat file will be stored at FTP servers or local folders and more over .CSV , .EXL and .TXT file formats available for Flat files.
• Flat files stores the data and the path can be given in general tab of the sequential file stage
33.How does DataStage handle the user security?
• we have to create users in the Administrators and give the necessary priviliges to users.
34.How I can convert Server Jobs into Parallel Jobs?
• u cant convert server to parallel ! u have to rebuild whole graph..
• have never tried doing this, however, I have some information which will help you in saving a lot of time. You can convert your server job into a server shared container. The server shared container can also be used in parallel jobs as shared container.
35.If I add a new environment variable in Windows, how can I access it in DataStage?
• u can call it in designer window
under that job properties
there u can add an new environment variable r u can use the existing one
• U can view all the environment variables in designer. U can check it in Job properties. U can add and access the environment variables from Job properties
36.what is data set? and what is file set?
• I assume you are referring Lookup fileset only.It is only used for lookup stages only.Dataset: DataStage parallel extender jobs use data sets to manage data within a job. You can think of each link in a job as carrying a data set. The Data Set stage allows you to store data being operated on in a persistent form, which can then be used by other DataStage jobs.FileSet: DataStage can generate and name exported files, write them to their destination, and list the files it has generated in a file whose extension is, by convention, .fs. The data files and the file that lists them are called a file set. This capability is useful because some operating systems impose a 2 GB limit on the size of a file and you need to distribute files among nodes to prevent overruns.
• file set:- It allows you to read data from or write data to a file set. The stage can have a single input link. a single output link, and a single rejects link. It only executes in parallel modeThe data files and the file that lists them are called a file set. This capability is useful because some operating systems impose a 2 GB limit on the size of a file and you need to distribute files among nodes to prevent overruns.
Datasets r used to import the data in parallel jobs like odbc in server jobs
37.How the hash file is doing lookup in serverjobs?How is it comparing the key values?
Hashed File is used for two purpose: 1. Remove Duplicate Records 2. Then Used for reference lookups.The hashed file contains 3 parts: Each record having Hashed Key, Key Header and Data portion.By using hashed algorith and the key valued the lookup is faster.
38.hi! .. There are three different types of user-created stages available for PX. What are they? Which would you use? What are the disadvantage for using each type?
Hai, These are the three different stages: i) Custom ii) Build iii) Wrapped
39.If data is partitioned in your job on key 1 and then you aggregate on key 2, what issues could arise?
• data will partitioned on both the keys ! hardly it will take more for execution .
40.How can I specify a filter command for processing data while defining sequential file output data?
• We have some thing called as after job subroutine and Before subroutine, with then we can execute the Unix commands.
Here we can use the sort sommand or the filter cdommand
41.what is the meaning of the following..

1)If an input file has an excessive number of rows and can be split-up then use standard

2)logic to run jobs in parallel

3)Tuning should occur on a job-by-job basis. Use the power of DBMS.
• Question is not clear eventhough i wil try to answer something
If u have SMP machines u can use IPC,link-colector,link-partitioner for performance tuning
If u have cluster,MPP machines u can use parallel jobs
• The third point specifies about tuning the performance of job,use the power of DBMS means one can improve the performance of the job by using the power of Database like Analyzing,creating index,creating partitions one can improve the performance of sqls used in the jobs.
42.How can you implement Complex Jobs in datastage
• what do u mean by complex jobs.
if u used more than 15 stages in a job and if you used 10 lookup tables in a job then u can call it as a complex job
• Complex design means having more joins and more look ups. Then that job design will be called as complex job.We can easily implement any complex design in DataStage by following simple tips in terms of increasing performance also. There is no limitation of using stages in a job. For better performance, Use at the Max of 20 stages in each job. If it is exceeding 20 stages then go for another job.Use not more than 7 look ups for a transformer otherwise go for including one more transformer.Am I Answered for u'r abstract Question.
• if the job have good logic that is called as complex job.
simply we can say Scenarios .so If you have faced any complexity while creating job please share with us.
43.How can I extract data from DB2 (on IBM iSeries) to the data warehouse via Datastage as the ETL tool. I mean do I first need to use ODBC to create connectivity and use an adapter for the extraction and transformation of data? Thanks so much if anybody could provide an answer.
You would need to install ODBC drivers to connect to DB2 instance (does not come with regular drivers that we try to install, use CD provided for DB2 installation, that would have ODBC drivers to connect to DB2) and then try out
• if ur system is mainfarmes then u can utility called load and unload ..
load will load the records into main farme systme from there u hv to export in to your system ( windows)
44.how can we pass parameters to job by using file.
• You can do this, by passing parameters from unix file, and then calling the execution of a datastage job. the ds job has the parameters defined (which are passed by unix)
• u can create a UNIX shell script which will pass the parameters to the job and u also can create logs for the whole run process of the job.
45.What is DS Administrator used for - did u use it?
The Administrator enables you to set up DataStage users, control the purging of the Repository, and, if National Language Support (NLS) is enabled, install and manage maps and locales.
46.What about System variables?
DataStage provides a set of variables containing useful system information that you can access from a transform or routine. System variables are read-only.

@DATE The internal date when the program started. See the Date function.

@DAY The day of the month extracted from the value in @DATE.

@FALSE The compiler replaces the value with 0.

@FM A field mark, Char(254).

@IM An item mark, Char(255).

@INROWNUM Input row counter. For use in constrains and derivations in Transformer stages.

@OUTROWNUM Output row counter (per link). For use in derivations in Transformer stages.

@LOGNAME The user login name.

@MONTH The current extracted from the value in @DATE.

@NULL The null value.

@NULL.STR The internal representation of the null value, Char(128).

@PATH The pathname of the current DataStage project.

@SCHEMA The schema name of the current DataStage project.

@SM A subvalue mark (a delimiter used in UniVerse files), Char(252).

@SYSTEM.RETURN.CODE
Status codes returned by system processes or commands.

@TIME The internal time when the program started. See the Time function.

@TM A text mark (a delimiter used in UniVerse files), Char(251).

@TRUE The compiler replaces the value with 1.

@USERNO The user number.

@VM A value mark (a delimiter used in UniVerse files), Char(253).

@WHO The name of the current DataStage project directory.

@YEAR The current year extracted from @DATE.

REJECTED Can be used in the constraint expression of a Transformer stage of an output link. REJECTED is initially TRUE, but is set to FALSE whenever an output link is successfully written.
47.what are the Job parameters?
• These Parameters are used to provide Administrative access and change run time values of the job.
EDIT>JOBPARAMETERS
In that Parameters Tab we can define the name,prompt,type,value

DataStage Information

Sunday, June 29, 2008

Real time Faqs

No comments:

Search 4 DataStage

Blog Archive