Saturday, June 28, 2008

Datastage Beginners

DataStage tip for beginners: developer short cuts

Import Export
* When you do an export cut and paste the export file name. When you go to your project and run an import paste the file name instead of having to browse for it. While export and import independently remember the last file name used they do not share that name between each other.
* When you switch export type between category and individual job it is quick to switch the type, close the export form and open it again. That way the job name or category you have highlighted will be automatically picked.
There is an Export option to export by individual job name or export by category name. This is on the second tab in the export form. Often when you go to export something it is on the wrong option, eg. you want a job but it is showing the category. You switch from category export to individual job export but back on tab 1 your job is still not highlighted.
* When you do an export there is a "View" button, click this to open the export file and run any type of search and replace on job parameter values when moving between dev and test.
* If you want to export several jobs that are not in the same category use the append option. Highlight and export the first job. Close the export window, find and highlight the second job, in the export form click the "Append" option to add to the file. Continue until all jobs have been selected and exported.
* On the Options tab is a check box to include in the export "Referenced shared containers" to also export those.

Things you could easily miss
You could use DataStage for months and not see some of these time savers, done this myself.
* There is an automap button in a lot of stages, especially the transformers, maps fields with the same names.
* When you add a shared container into your job you need to map the columns of the container to your job link. What you might miss is the extra option you get on the Columns tab "Load" button. In addition to the normal column load you get "Load from Container" which is a quick way to load the container metadata into your job.
* Don't create a job from an empty canvas. Always copy and use an existing job. Don't create shared containers from a blank canvas, always build and test a full job and then turn part of it into a container.
* If you want to copy and paste settings between jobs, for example database login values or transformer functions, open each job in a seperate Designer session. Most property windows in DataStage are modal and you can only have one property window open per Designer session, by opening two Designers you can have two property windows open at the same time and copy or compare them more easily.
* You can load metadata into a stage by using the "Load" button on the column tab or by dragging and dropping a table definition from the Designer repository window onto a link in your job. For sequential file stages the drag and drop is faster as it loads both the column names and the format values in one go. If you used the load button you would need to load the column names and then the format details seperately.
* Can't get a Modify function or Transformer function working correctly? Trial and error is often the only way to work out the syntax of a function. If you do this in a large and complex job it can be time consuming to debug due to job startup times. Consider have a couple test jobs in your dev project with a row generator, a modify or transformer stage and a peek stage. Have a column of each type in this test job. Use this throughout your project as a quick way to test a function or conversion.
* You can put job parameters into stage properties text boxes, eg. #filedir#/#filename# but you may not know that you can put macros into property text boxes. #filedir#/#filename#_#DSJobName#. In this example the first two are job parameters and the third value is not, it's a DataStage macro.

Sequence Jobs
My most annoying Sequence job "feature" is the constant need to enter job parameters over and over and over again. If you have 10-20 parameters per job (as I normally do) it becomes very repetitive and is open to manual coding errors.
Under version 7.1 and earlier you could copy and paste a job activity stage, change the job name and retain most of the parameter settings. Under 7.5.x when you change the job name all the parameter settings get wiped.

You need to set the parameters for every flippin job activity stage, even though they are likely to have the same or similar parameter lists and settings. A faster way is to do the job renaming in an export file.

* In an empty sequence job add your first job activity stage and set all parameter values or copy one in from an existing job.
* Copy and paste as many copies of this job activity as you need for your sequence.
* Close the sequence job and export the job and click the View button.
* Open the sequence job, you need it to retrieve the stage names.
* Copy the name of the last job activity stage name and search for it in the export file. When the cursor is on that part of the export file search and replace the old job name with the new job name. Make sure you only replace to the bottom of the file, most text editors should have this option. This will rename the job of the last activity stage.
* Repeat this for the second last job activity, then the third last etc until you have replaced all job names back to the second job activity stage.
* Import the job into your project.

This should give you the same set of job activity stages but with each one pointing at a different job and with the full set of job parameters set.

Debugging
* In parallel jobs I use the copy stage a lot to debug errors that do not indicate which stage caused the error. I create a copy of the job and start removing output stages, replacing them with a copy stage. I progressively remove stages until I locate the one with the error.

No comments:

Search 4 DataStage