DataStage Information: DS Parallel Extender

41.What is the difference bitween OCI stage and ODBC stage?
Ans:Oracle OCI:
• We can write the source query in this stage but we can’t write lookup query in this stage instead of this we are using hash file stage for the lookup.
• We are having the facility to write multiple queries before (Oracle OCI/Output/SQL/Before) or after (Oracle OCI/Output/SQL/After) executing the actual query (Oracle OCI/Output/SQL/Before)
• We don’t have multi-row lookup facility in this stage.
ODBC:
• We can write both source query as well as lookup query in this stage itself
• We are not having the facility to write multiple queries in this stage.
• We are having the multi-row lookup facility in this stage
42.Can you tell me for what puorpse .dsx files are used in the datasatage
Ans:.dsx is the standard file extension of all the various datastage jobs.Whenever we export a job or a sequence, the file is exported in the .dsx format. A standard usage for the same can be that, we develop the job in our test environment and after testing we export the file and save it as x.dsx . This can be done using Datstage Manager.
43.How do you remove duplicates without using remove duplicate stage?
Ans:Using a sort stage,set property: ALLOW DUPLICATES :false
or
Just do a hash partion of the input data and check the options Sort and Unique.
44.How to run the job in command prompt in unix?
Ans:dsjob -run -jobstatus projectname jobname
45.What is the difference between Transform and Routine in DataStage?
Ans:Transformar transform the data from one from to another form .where as Routines describes the business logic.
46.how do u clean the datastage repository.
Ans:REmove log files periodically..... (or) CLEAR.FILE &PH&
47.give one real time situation where link partitioner stage used?
Ans: If we want to send more data from the source to the targets quickly we will be using the link partioner stage in the server jobs we can make a maximum of 64 partitions. And this will be in active stage. We can't connect two active stages but it is accpeted only for this stage to connect to the transformer or aggregator stage. The data sent from the link partioner will be collected by the link collector at a max of 64 partition. This is also an active stage so in order to aviod the connection of active stage from the transformer to teh link collector we will be using inter process communication. As this is a passive stage by using this data can be collected by the link collector. But we can use inter process communication only when the target is in passive stage
48.hat does separation option in static hash-file mean?
Ans:The different hashing algorithms are designed to distribute records evenly among the groups of the file based on charecters and their position in the record ids.
When a hashed file is created, Separation and Modulo respectively specifies the group buffer size and the number of buffers allocated for a file. When a Static Hashfile is created, DATASTAGE creates a file that contains the number of groups specified by modulo.
Size of Hashfile = modulus(no. groups) * Separations (buffer size)
49.How to implement slowly changing dimentions in Datastage?
Ans:Slowly changing dimensions is concept of DWH.Datastage is tool for ETL purpose not for Slowly changing dimensions.
or
In Informatica powercenter, there is a way to implement slowly changing dimension through wizard. Datastage does not have that type of wizard to implement SCD, should be implemented by manual logic.
50.how to improve the performance of hash file?
Ans:You can inprove performance of hashed file by
1 .Preloading hash file into memory -->this can be done by enabling preloading options in hash file output stage
2. Write caching options -->.It makes data written into cache before being flushed to disk.you can enable this to ensure that hash files are written in order onto cash before flushed to disk instead of order in which individual rows are written
3 .Preallocating--> Estimating the approx size of the hash file so that file need not to be splitted to often after write operation
51.How to know the no.of records in a sequential file before running a server job?
Ans:if your environment is unix , you can check with wc -l filename command
52.how to implement type2 slowly changing dimenstion in datastage? give me with example?
Ans:Slow changing dimension is a common problem in Dataware housing. For example: There exists a customer called lisa in a company ABC and she lives in New York. Later she she moved to Florida. The company must modify her address now. In general 3 ways to solve this problem
Type 1: The new record replaces the original record, no trace of the old record at all, Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two different people. Type 3: The original record is modified to reflect the changes.
In Type1 the new one will over write the existing one that means no history is maintained, History of the person where she stayed last is lost, simple to use.
In Type2 New record is added, therefore both the original and the new record Will be present, the new record will get its own primary key, Advantage of using this type2 is, Historical information is maintained But size of the dimension table grows, storage and performance can become a concern.
Type2 should only be used if it is necessary for the data warehouse to track the historical changes.
In Type3 there will be 2 columns one to indicate the original value and the other to indicate the current value. example a new column will be added which shows the original address as New york and the current address as Florida. Helps in keeping some part of the history and table size is not increased. But one problem is when the customer moves from Florida to Texas the new york information is lost. so Type 3 should only be used if the changes will only occur for a finite number of time.
53.how we can call the routine in datastage job?explain with steps?
Ans:Routines are used for impelementing the business logic they are two types 1) Before Sub Routines and 2)After Sub Routinestepsdouble click on the transformer stage right click on any one of the mapping field select [dstoutines] option within edit window give the business logic and select the either of the options( Before / After Sub Routines
54.What are the most important aspects that a beginner must consider doin his first DS project ?
Ans:Apart from DWH concepts and different stage knowledge,try to use the director to find out errors and also how to tune the performance.Knowledge of Unix sheel scripting will be very much helpful
55.what are the different types of lookups in datastage?
Ans:Look-up file stage - Generally used with Look Up stage
- Hash Look-up
- you can also implement a "look up" using Merge stage
or
there are two types of lookupslookup stage and lookupfilesetLookup:Lookup refrence to another stage or Database to get the data from it and transforms to other database.LookupFileSet:It allows you to create a lookup file set or reference one for a lookup. The stage can have a single input link or a single output link. The output link must be a reference link. The stage can be configured to execute in parallel or sequential mode when used with an input link. When creating Lookup file sets, one file will be created for each partition. The individual files are referenced by a single descriptor file, which by convention has the suffix .fs.
56.What is difference between Merge stage and Join stage?
Ans:Someone was saying that join does not support more than two input , while merge support two or more input (one master and one or more update links). I will say, that is highly incomplete information. The fact is join does support two or more input links (left right and possibly intermediate links). But, yes, if you are tallking about full outer join then more than two links are not supported.
Coming back to main question of difference between Join and Merge Stage, the other significant differences that I have noticed are:
1) Number Of Reject Link
(Join) does not support reject link.
(Merge) has as many reject link as the update links( if there are n-input links then 1 will be master link and n-1 will be the update link).
2) Data Selection
(Join) There are various ways in which data is being selected. e.g. we have different types of joins, inner, outer( left, right, full), cross join, etc. So, you have different selection criteria for dropping/selecting a row.
(Merge) Data in Master record and update records are merged only when both have same value for the merge key columns.
57.wht is the difference beteen validated ok and compiled in datastage.
Ans:When you compile a job, it ensure that basic things like all the important stage parameters has been set, mappings are correct, etc. and then it creates an executable job.
You validate a compiled job to make sure that all the connections are valid. All the job parameters are set and a valid output can be expected after running this job. It is like a dry run where you don't actually play with the live data but you are confident that things will work.
or
When we say "Validating a Job", we are talking about running the Job in the "check only" mode. The following checks are made :
- Connections are made to the data sources or data warehouse.
- SQL SELECT statements are prepared.
- Files are opened. Intermediate files in Hashed File, UniVerse, or ODBC stages that use the local data source are created, if they do not already exist
58.purpose of using the key and difference between Surrogate keys and natural key?
Ans:We use keys to provide relationships between the entities(Tables). By using primary and foreign key relationship, we can maintain integrity of the data.
The natural key is the one coming from the OLTP system.
The surrogate key is the artificial key which we are going to create in the target DW. We can use thease surrogate keys insted of using natural key. In the SCD2 scenarions surrogate keys play a major role
59.How to remove duplicates in server job ?
Ans:Which stages u r using in the Server job. If u r using ODBC stage, then u can write User defined Query in the source stage.If u need more details, i can explain u.
60.How do you do Usage analysis in datastage ?
Ans:1. If u want to know some job is a part of a sequence, then in the Manager right click the job and select Usage Analysis. It will show all the jobs dependents.
2. To find how many jobs are using a particular table.
3. To find how many jobs are usinga particular routine.
Like this, u can find all the dependents of a particular object.
Its like nested. U can move forward and backward and can see all the dependents.

DataStage Information

Sunday, June 29, 2008

DS Parallel Extender

No comments:

Search 4 DataStage

Blog Archive