Integrating data and applications throughout the enterprise and presenting them in a unified view is a complex proposition. Not only are there broad disparities in technologies, data structures, and application functionality, but there are also fundamental differences in integration architectures. Many organizations address these diverse needs with a broad palette of tools and technologies, resulting in disjointed integration projects with no leverage or unity between them. This white paper provides a technical overview of a data integration tool that meets the performance, flexibility, productivity, modularity and hot-pluggability requirements of an integration platform.
Integrating data and applications throughout the enterprise, and presenting them in a unified view is a complex proposition. Not only are there broad disparities in technologies, data structures, and application functionality, but there are also fundamental differences in integration architectures. Some integration needs are Data Oriented, especially those involving large data volumes. Other integration projects lend themselves to an Event Driven Architecture (EDA) or a Service Oriented Architecture (SOA), for asynchronous or synchronous integration.
Many organizations address these diverse needs with a broad palette of tools and technologies, resulting in disjointed integration projects with no leverage or unity between them. These tools do not meet all performance, flexibility, and modularity requirements.
Oracle Data Integrator Enterprise Edition (ODI-EE) features an active integration platform that includes all styles of data integration: data-based, event-based and service-based. Capable of transforming large volumes of data efficiently, processing events in real time through its advanced Changed Data Capture (CDC) capability, or providing data services to the Oracle SOA Suite, ODI-EE unifies silos of integration. It also provides robust data integrity controls features, assuring the consistency and correctness of data. With powerful core differentiators - heterogeneous E-LT, Declarative Design and Knowledge Modules - ODI-EE meets the performance, flexibility, productivity, modularity and hot-pluggability requirements of an integration platform.
Traditional ETL tools operate by first Extracting the data from various sources, Transforming the data on a proprietary, middle-tier ETL engine, and then Loading the transformed data onto the target data warehouse or integration server. Hence the term ?ETL? represents both the names and the order of the operations performed.
The data transformation step of the ETL process is by far the most compute-intensive, and is performed entirely by the proprietary ETL engine on a dedicated server. The ETL engine performs data transformations (and sometimes data quality checks) on a row-by-row basis, and hence, can easily become the bottleneck in the overall process. In addition, the data must be moved over the network twice - once between the sources and the ETL server, and again between the ETL server and the target data warehouse. Moreover, if one wants to ensure referential integrity by comparing data flow references against values from the target data warehouse, the
Oracle Data Integrator Technical Overview Page 2 referenced data must be downloaded from the target to the engine, thus further increasing network traffic, download time, and leading to additional performance issues.
Let's consider, for example, how a traditional ETL job would look up values from the target database to enrich data coming from source systems. To perform such a job, a traditional ETL tool could be used in one of the following three ways: ? Load look-up tables into memory: The entire look-up table is retrieved from the target server and loaded into the engine's memory. Matching (or joining) this look-up data with source records is done in memory before the resulting transformed data is written back to the target server. If the look-up table is large, the operation will require a large amount of memory and a long time to download its data and re-index it in the engine. ? Perform row-by-row look-ups ?on the fly?: For every row, the ETL engine sends a query to the look-up table located on the target server. The query returns a single row that is matched (or joined) to the current row of the flow. If the look-up table contains, for example, 500,000 rows, the ETL engine will send 500,000 queries. This will dramatically slow down the data integration process and add significant overhead to the target system. ? Use manual coding within the ETL job: Use the ETL engine only for loading source data to the target RDBMS and manually write SQL code to join this data to the target look-up table. This raises the question: why buy an ETL tool that requires man... [download for more]