Sunday, June 29, 2008

Tutorals for datastage

1. Create custom operators for WebSphere DataStage

This is my favourite tutorial because this is a misunderstood and underutilised part of the product. It’s written by Blayne Chard a software engineer from IBM. Unlike the other tutorials on this page this one requires an IBM ID to access. If you are like me it usually a few goes before you can remember your login!

In this tutorial you learn:

1. How to write a simple DataStage operator
2. How to set up the development environment to compile and run a DataStage operator
3. The basics of the Orchestrate Shell (OSH) scripting language for DataStage jobs
4. How to load your operator into the DataStage Designer so you can use it on any job you create

The DataStage Transformer lets you call a custom routine instead of the standard list of functions on the right mouse click menu. This lets you take code that looks complex in the Transformer and put it into a module to simplify testing and re-use.

Custom routines used to be huge in DataStage, I’ve been to sites with hundreds of them, but they were for DataStage Server Edition. This edition supported routines written in Universe BASIC via a routine editor in the DataStage Manager tool. They were easy to write and easy to test and easy to re-use between jobs.

DataStage Enterprise Edition made things a bit harder. Custom routines are c++ components built and compiled external to DataStage. There was a lot more fiddling to get the routines recognised by a DataStage transformer and c++ tends to be a more challenging language than BASIC.

This is a comprehensive tutorial taking you through the steps of creating and using a routine. Fantastic.

compile your operator

Compiling your operator is a straight forward process once the environment has been setup. To compile the operator, make sure that you have run setup.bat in the current command window, then type the following commands.




2. Modify Stage Video Tutorial

The Modify Stage is a tricky parallel stage that converts fields from one metadata definition to another. It does not have any of the fancy GUI help of the Transformer such as right mouse click menus or syntax checking. You need to look up the functions in the Parallel Job Developers Guide, guess what the syntax is and use trial and error half a dozen times until you get it working!

This tutorial saves you a lot of the trial and error.

The tutorial does require premium membership to the dsxchange at US$99 a year but if it saves you a couple days of work that should be enough for your employer to splash out for membership.

3. WebSphere DataStage Parallel Job Tutorial Version 8

The mother of all DataStage tutorials - this is the one that comes on the Information Server installation CDs (or is that DVDs? Or BlueRays?) It’s available from the IBM publications centre and you can download it free of charge as a 1.05M PDF file.

The IBM WebSphere DataStage Parallel Job Tutorial demonstrates how to design and run IBM WebSphere DataStage� parallel jobs.

The exercises, sample data, and sample import files for the parallel job tutorial are in the TutorialData’DataStage’parallel_tutorial folder on the suite CD, DVD, or downloaded installation image.

It’s a simple tutorial covering opening and running a job, designing a job, using the Transformer, loading a target table and processing in parallel. Not a lot of interest to experienced DataStage users but useful to a newcomer.

4. Configure DB2 remote connectivity with WebSphere DataStage Enterprise Edition

I also reviewed this tutorial previously in DataStage Tip: Extracting database data 250% faster.

As with XML the DB2 connectivity can be a tricky beast to get running. Remote DB2 connections are even trickier. This tutorial will save you a lot of trouble and swearing as you get the wee DB2 beastie running.

5. Transform and integrate data using WebSphere DataStage XML and Web services packs

Getting started with XML processing in DataStage usually involves a lot of fumbling around before you even get a job running. By the end of it you are usually happy to never see another XML file as long as you live. The trick is that DataStage needs to read and validate the entire XML document before it starts processing any part of it as it needs to know how to flatten it into relational data. This is different to database and sequential file sources where DataStage reads and validates one row at a time.

As you can imagine this initial read and validation of the entire file puts a limit on what DataStage can handle in terms of volume. It can process a very large number of small XML files but not one very large XML file. The platform will eventually hit a RAM or resource limit.

This tutorial is invaluable for getting started with XML files in DataStage covering importing XML metadata and using it as a source or target.

DataStage XML Job Example

6. A flexible data integration architecture using WebSphere DataStage and WebSphere Federation Server

IBM wants us to combine the different products on the Information Server for a best of breed data integration platform. One of the themes of the new analytical warehouse concepts such as the IBM Dynamic Warehouse is the use of unstructured or semi structured text. DataStage and the Federation Server are a good combination for bringing this into a warehouse.

IBM are calling this a T-ETL architecture where the Federation server is a data pre-processor.

The T-ETL architecture proposed uses federation to join, aggregate, and filter data before it enters WebSphere DataStage, with WebSphere DataStage using its parallel engine to perform more complex transformations and the maintenance of the target.

Federation DataStage Job Example

This is a generous tutorial with a lot of use cases for T-ETL, ETL and ELT.


The strength of WebSphere DataStage as a data consolidation tool lies in its flexibility -- it is able to support the many different consolidation scenarios and flavors of ETL, including ETL, ELT, and TEL (against a single data source). Combining WebSphere DataStage with WebSphere Federation Server opens up a whole new area of consolidation scenarios, namely T-ETL, against homogenous or heterogeneous data sources.

7. Access application data using WebSphere Federation Server and WebSphere DataStage

I reviewed this tutorial in a previous blog post: DataStage Tip: Free IBM DataStage SAP Tutorial

As with the last tutorial it uses a combination of DataStage and the Federation Server to access atypical enterprise data, in this case SAP.

No comments:

Search 4 DataStage