DataStage Information: Remove duplicates in server job

61.Will the data stage consider the second constraint in the transformer once the first condition is satisfied ( if the link odering is given
Ans:Friends, please try to write correct English first. Use a word processing software. A question becomes quite different when incorrect English is used.
I am rewriting your question in correct English:
"Will Datastage consider the second constraint in the transformer if the first constraint is satisfied (if link ordering is given)?"
62.what are the environment variables in datastage?give some examples?
Ans:Theare are the variables used at the project or job level.We can use them to to configure the job ie.we can associate the configuration file(Wighout this u can not run ur job), increase the sequential or dataset read/ write buffer.
ex: $APT_CONFIG_FILE
Like above we have so many environment variables. Please go to job properties and click on "add environment variable" to see most of the environment variables.
63.What are constraints and derivation?* Explain the process of taking backup in DataStage?*What are the different types of lookups available in DataStage?
Ans:Constraints are used to check for a condition and filter the data. Example: Cust_Id<>0 is set as a constraint and it means and only those records meeting this will be processed further.
Derivation is a method of deriving the fields, for example if you need to get some SUM,AVG etc.
64.what is job control?how it is developed?explain with steps?
Ans:Controlling Datstage jobs through some other Datastage jobs. Ex: Consider two Jobs XXX and YYY. The Job YYY can be executed from Job XXX by using Datastage macros in Routines.
To Execute one job from other job, following steps needs to be followed in Routines.
1. Attach job using DSAttachjob function.
2. Run the other job using DSRunjob function
3. Stop the job using DSStopJob function
65.how to find the number of rows in a sequential file?
Ans:Using Row Count system variable
66.how to implement routines in data stage,have any one has any material for data stage pl send to me?
Ans:there are 3 kind of routines is there in Datastage.
1.server routines which will used in server jobs.
these routines will write in BASIC Language
2.parlell routines which will used in parlell jobs
These routines will write in C/C++ Language
3.mainframe routines which will used in mainframe jobs
68.How can you implement Complex Jobs in datastage
Ans:Complex design means having more joins and more look ups. Then that job design will be called as complex job.We can easily implement any complex design in DataStage by following simple tips in terms of increasing performance also. There is no limitation of using stages in a job. For better performance, Use at the Max of 20 stages in each job. If it is exceeding 20 stages then go for another job.Use not more than 7 look ups for a transformer otherwise go for including one more transformer.Am I Answered for u'r abstract Question.
69.What are Stage Variables, Derivations and Constants?
Ans:stage variable is a variable that executes locally within the stage.constraint is like a filter condition which limits the number of records coming from input according to business rule.derivation is like expression which is used to modify or get some values from input columns.Execution order is stage variable,constraint,derivation.........
or
the right order is..

stage variables
constraints
Derivations
70.What is the flow of loading data into fact & dimensional tables?
Ans:Here is the sequence of loading a datawarehouse.
1. The source data is first loading into the staging area, where data cleansing takes place.
2. The data from staging area is then loaded into dimensions/lookups.
3.Finally the Fact tables are loaded from the corresponding source tables from the staging area.
71.Differentiate Primary Key and Partition Key?
Ans:Primary key is the key we define on the table column or set of columns(composite pk) to make sure all the rows in a table are unique.

Partition key is the key that we use while partition the table(in database), process the source records in ETL(in the etl tools). We should define the partition based on the stages( in datastage) or transformations(in Informationca) we use in the job(Datastage) or mapping(in Informatica).To improve the target load process, we use partition.
If u need more info, plz go through Database doc or Datastage or Informatica doc on
72.What are the difficulties faced in using DataStage ? or what are the constraints in using DataStage ?
Ans:* The issue that i faced with datastage is that, it was very difficult to find the errors from the error code since the error table did not specify the reason for the error. And as a fresher i did not know what the error codes satnd for :)

* Another issue is that the help in the datastage was not of much use since it was not specific and was more general.

* I donot know about other tools since this is the only tool that i have used until now. But it was simple to use so liked using it inspite of above issues.
OR
1. I feel, the most difficult part is understanding the "Datastage director job log error messages'. It doesn't give u in proper readable message.
2.We dont have many date functions available like in Informatica or traditional Relational databases.

3. Datastage is like unique product interms of functions ex: Most of the database or ETL tools use for converting from lower case to upper case : UPPER. The datastage uses "UCASE". Datastage is peculiar when we compare to other ETL tools.

Otherthan that, i dont see any issues with Datastage.

73.How do you execute datastage job from command line prompt?
Ans:u can use dsjob executable command from unix or command line.
Using "dsjob" command as follows.
dsjob -run -jobstatus projectname jobname
74.Types of Parallel Processing?
Ans:Hardware wise thereÂ areÂ 3 types of parallel processing systems available:
1. SMP (symetric multiprocessing: multiple CPUs, shared memory, single OS)
2. MPP (Massively Parallel Processing Systems: multiple CPUs each having a personal set of resources - memory, OS, etc, but physically housed on the same machine)
3. Clusters: same as MPP, but physically dispersed (not on the same box & connected via high speed networks).

DS offers 2 types of parallelism to take advantage of the above hardware:
1. Pipeline Parallelism
2. Partition Parallelism

75.What is Modulus and Splitting in Dynamic Hashed File?
Ans:The modulus size can be increased by contacting your Unix Admin.
76.What is the default cache size? How do you change the cache size if needed?
Ans:default size is 128mb and we can change cache size in job properties for particular job and by going to tunables in Administrator for all the jobs across project.
77.Compare and Contrast ODBC and Plug-In stages?
Ans:ODBC : a) Poor Performance.
b) Can be used for Variety of Databases.
c) Can handle Stored Procedures.

Plug-In: a) Good Performance.
b) Database specific.(Only one database)
c) Cannot handle Stored Procedures.

78.How to run a Shell Script within the scope of a Data stage job?
Ans:select the EDIT tab in the toolbar-> choose job properties-> select the job parameters->choose the Before/ After routines ->select the EXCESH command
79.Did you Parameterize the job or hard-coded the values in the jobs?
Ans:hi please send me some real time questions in datastage for interview.
i have interview on sunday for accenture.
how do u implement rank in datastage
80.What are OConv () and Iconv () functions and where are they used?
Ans:iconv is used to convert the date into into internal format i.e only datastage can understand
example :- date comming in mm/dd/yyyy format
datasatge will conver this ur date into some number like :- 740
u can use this 740 in derive in ur own format by using oconv.
suppose u want to change mm/dd/yyyy to dd/mm/yyyy
now u will use iconv and oconv.
ocnv(iconv(datecommingfromi/pstring,SOMEXYZ(seein help which is iconvformat),defineoconvformat))

DataStage Information

Sunday, June 29, 2008

Remove duplicates in server job

1 comment:

Search 4 DataStage

Blog Archive