Find White Papers
Home About Contact Help
Free Membership Member Login
Search the Library                  Advanced Search

GIS and Spatial ETL: Complementary Spatial Data Technologies

Safe Software
By : Safe Software
INFORMATION
Published : Nov 07, 2005
Length : 4
Type : White Paper
 
Download Now
Save for Later
  Email This Page
Overview :
Geography or GIS for that matter has never been so closer to our lives. GIS Interoperability; rightly said, as the holy grail of GIS has never been so much discussed among the geospatial community. This paper defines the function of spatial ETL and expands the role and impact it has with spatial data processes and GIS systems.
View All Items By This Company
Browse Related Categories :

Enterprise Software

,

Simulation Software

 
There are two complementary technologies within the geographic spatial data community: Geographic Information Systems (GIS) that provide the backbone for spatial data processing, and Spatial Extract, Transform and Load (spatial ETL, or simply SETL) that populates and makes data available to GIS.

While GIS has been around for several decades, spatial ETL is only now coming of age due to the unprecedented volume of GIS data being collected and distributed and the multiple formats involved. Its maturity is being hastened as GIS moves from the hands of specialists for project-based work and into those of the Information Systems (IS) department for business decision support. Before any GIS can actually use data, however, it must be interoperable with that data. Interoperability can be achieved either through format-toformat translation or through direct read capabilities. The role of spatial ETL is to make available spatial data to GIS applications. To further understand the complementary role of these twin technologies, we must examine the non-spatial world of Management Information Systems (MIS) and the role ETL plays in that domain.

MIS and ETL
Within the non-spatial world of MIS, ETL tools are now produced by a mature industry comprising many different players: Informatica, Pervasive Software (formerly Data Junction), IBM and Oracle, to name but four. Unsurprisingly, ETL tools followed the advent of MIS and can be best regarded as "information pipes" that connect two systems. Expanding this analogy further, these "information pipes" can be used as follows: %? Format translation: The "information pipe" is used to move data from one or more source data tanks to one or more destination data tanks. A data tank is simply a dataset in a particular format or system. Once the destination data tanks are full, the target system is run. %? Direct access: The "information pipe" is used to move data directly from one or more source data tanks to the system that needs the data. Both of the above are necessary, and while there are times when direct access is essential or desirable, there are other times when it is not. ETL tools include, but are not restricted to, performing the following functions: %? Supporting legacy applications: Often, when an organization moves to new technology, older legacy applications continue to be supported instead of immediately replaced. The task of the ETL application is to move data back and forth between these two systems, ensuring that the data is structured and presented as required to all systems.

- Initial data loading: When moving data to a newly integrated system, data is often scattered across multiple older systems. The ETL tool's job to migrate this data into the new system.
- Multi-vendor solutions: Many organizations have solutions that cross vendor boundaries or have requirements for data to be shared between systems.
- Data sharing: Some organizations may need to share data with other organizations. These organizations may be suppliers, customers or business partners. ETL tools ensure that the data sharing between these different systems occurs easily and accurately.
- Quality Assurance: This is not typically thought of as an ETL function, but it is often easier to have the ETL tool identify data issues before they are moved to the new system.
- Direct Read: Providing a unified interface so that applications can directly access data from other systems may have great impact. Here, the "T" (Transform) in ETL can be immensely valuable as applications typically require the data be presented in a specific schema (or view) before using it. As stated earlier, MIS/database systems form the destination for data and ETL is nothing more than the pipe through which they are moved. The 4 Rs The function of ETL can be summed up with the 4 Rs: getting the right data to the right systems, in the right structure, at the right time. Looking at each in turn:
- Right Data: The ETL tool must be able to access data from a wide variety of systems. Indeed, retrieving the right data might require getting data from multiple systems to satisfy a single ETL tool.
- Right Systems: The ETL tool must be able to write data to many different systems as a single ETL operation might require that multiple systems be updated in a single operation.
- Right Structure: The ETL tool must be able to restructure the data so that when it is provided to the destination system, it is directly usable by the applications that require it. Merely dumping the data into the "right system" and then, requiring applications on that system to run for the sole purpose of data preparation is not considered ETL. ETL tools must be able to perform operations like schema mapping, data calculations, and other types of data restructuring and selection.
- Right Time: The ETL tool must be efficient and able to run in batch mode or as part of some scheduled/triggered operation. This is where ETL becomes part of a system instead of being used merely to migrate data from one system to another. For some systems, direct access is the only way to provide data at the right time, since the system needs to see the foreign system live.
Search the Library                  Advanced Search
About Us Contact Us List Your Papers Partner With Us Site Map