Wednesday, February 14, 2007

ETL design and methods revisited

The theme is ETL tools and ETL design patterns:

I. ETL methodology
Current data warehouse and ETL architects and developers have devised numerous theories and practices covering ETL design and support. Kimball has an excellent analysis of the ETL process from an architect role in his excellent text, The Data Warehouse ETL Toolkit. In a nutshell, performance issues can be addressed in a couple basic ways:

1. Drop or disable primary key constraints and indexes before load and transformations are performed.
2. Separate UPDATE and INSERT transformations to optimize ETL processing.


To build on the last discussion concerning ETL design methods and tools, I would like to analyze various offerings from home grown scripts to vendor tools. The major players in the current ETL market:

Informatica
GoldenGate
Business Objects
Microstrategy

In my last discourse on ETL design, I took the devils advocate and offered pros and cons on home grown ETL ie) roll your own versus implementation with a robust ETL tool such as Informatica PowerCenter. With the reality of heterogeneous mixed environments and many different source systems, it is not realistic to use home grown ETL unless one has a very large and highly skilled staff of developers and DBAs UNLESS one has a single platform such as Oracle on UNIX or Microsoft SQL Server on Windows (which can use the built in ETL tool called DTS). Most shops would actually be better off using a modern time tested and robust tool such as Informatica or another tool such as Business Objects or Golden Gate ETL software.

No comments: