Saturday, October 18, 2008

Datastage Faqs

6.how to read the data from XL FILES?explain with steps?
Reading data from Excel file is
* Save the file in .csv (comma separated files).
* use a flat file stage in datastage job panel.
* double click on the flat file stage and assign input file to the .csv file (which you stored ).
* import metadate for the file . (once you imported or typed metadata , click view data to check the data values)
Then do the rest transformation as needed
-Debasis
· Create a new DSN for the Excel driver and choose the workbook from which u want data
Select the ODBC stage and access the Excel through that i.e., import the excel sheet using the new DSN created for the Excel
7.how can we generate a surrogate key in server/parallel jobs?
· In parallel jobs we can use surrogatekey generator stage.
· in server jobs we can use an inbuilt routine called KeyMgtGetNextValue.
· You can also generate the surrogate key in the database using the sequence generator.
8.what is an environment variable?
· Basically Environment variable is predefined variable those we can use while creating DS job.We can set eithere as Project level or Job level.Once we set specific variable that variable will be availabe into the project/job.
We can also define new envrionment variable.For that we can got to DS Admin .
I hope u understand.for further details refer the DS Admin guide
· Theare are the variables used at the project or job level.We can use them to to configure the job ie.we can associate the configuration file(Wighout this u can not run ur job), increase the sequential or dataset read/ write buffer.
ex: $APT_CONFIG_FILE
Like above we have so many environment variables. Please go to job properties and click on Paramer tab then click on "add environment variable" to see most of the environment variables.
9.how can we create environment variables in datasatage?
· We can create environment variables by using DataStage Administrator.
· Hi This mostely will comes under Administrator part.As a Designer only we can add directly byDesigner-view-jobprops-parameters-addenvironment variable-under userdefined-then add.
10.have few questions1. What ar ethe various process which starts when the datastage engine starts?2. What are the changes need to be done on the database side, If I have to use dB2 stage?3. datastage engine is responsible for compilation or execution or both?
· There are three processes start when the DAtastage engine starts:
1. DSRPC
2. Datastage Engine Resources
3. Datastage telnet Services
11.How to write and execute routines for PX jobs in c++?
· You define and store the routines in the Datastage repository(ex:in routine folder). And these rountines are excuted on c++ compilers.
· You have to write routine in C++ (g++ in Unix). then you have to create a object file. provide this object file path in your routine.
12.how to eleminate duplicate rows in data stage?
· TO remove duplicate rows you can achieve by more than one way
1.In DS there is one stage called "Remove Duplicate" is exist where you can specify the key.
2.Other way you can specify the key while using the stage i mean stage itself remove the duplicate rows based on key while processing time.
· By using Hash File Stage in DS Server we can elliminate the Duplicates in DS.
· Using a sort stage,set property: ALLOW DUPLICATES :false
OR
You can use any Stage in input tab choose hash partition And Specify the key and Check the unique checkbox.if u r doing with server Jobs, V can use hashfile to eliminate duplicate rows.

No comments:

Search 4 DataStage