Spatial data warehouses are becoming more common as government agencies, municipalities, utilities, telcos and other spatial data users start to share their data. This paper illustrates some of the issues that arise when undertaking data replication and data sharing.
Safe Software Inc. Suite 2017 7445 - 132nd St. Surrey, B.C., CANADA V3W 1J8 Telephone: 604-501-9985Fax: 604-501-9965
Data Replication and Data Sharing -
Integrating Heterogeneous Spatial Databases Mark Stoakes and Katherine Irwin Professional Services, Safe Software Inc. Abstract Spatial data warehouses are becoming more common as government agencies, municipalities, utilities, telcos and other spatial data users start to share their data. Data sharing is driven by the need to maintain more accurate and up-to-date spatial databases, but at the same time reduce data acquisition and maintenance costs. In other cases, organizations may maintain identical databases at different locations in order to reduce network loads and improve response times for the data users who are spread over a wide area. In this case, data replication is used to ensure all users are working from identical and most current data. This paper illustrates some of the issues that arise when undertaking data replication and data sharing. Introduction Location and spatial data is becoming a core part of business databases and decision-making. This growth in the use of spatial data has increased the need to share data with other organizations. Data sharing is driven by the need to maintain more accurate and up-to-date spatial databases, but at the same time reduce data acquisition and maintenance costs. In other cases, organizations may maintain identical databases at different locations in order to reduce network loads and improve response times for the data users who are spread over a wide area. In these cases, data replication is used to ensure all users are working from the most current data. Data may also be shared by linking several heterogeneous spatial databases through a common data access portal over a LAN or intranet. In all cases, the goal is to improve the accessibility of the spatial data, improve data quality and reduce the cost of maintaining the datasets involved. The three broad approaches to sharing data are: . Data Sharing. Data sharing is a data warehousing approach to making data available to a wider range of users. Data is acquired from several data owners and loaded into a centralized warehouse. Data can then be distributed to members of the data-sharing consortium through Web-Based data viewers, or delivered in different formats to the various data users. . Data Replication. Data replication is used generally used where large numbers of data users who are spread over a wide geographic area require real-time access to the same data. To reduce network loads and improve data access performance the data is replicated over several databases at different locations. The databases are synchronized on a regular basis, usually nightly.
Safe Software Inc. Suite 2017 7445 - 132nd St. Surrey, B.C., CANADA V3W 1J8 Telephone: 604-501-9985Fax: 604-501-9965 . Distributed Data Access. In this case the data warehouse simply acts as a node for data distribution. Data is held on the data owner's server, and the data warehouse acts as a live link to the data provider's datasets across a LAN, WAN or Intranet. Since the different databases may be in different formats (ESRI ArcSDE, Oracle Spatial, etc.) the Spatial ETL tool must be capable of reading all the formats to be accessed and served. There is no need to maintain multiple copies of the data, as is the case in data replication and data sharing. This paper illustrates some of the issues that arise when undertaking data replication, data sharing or distributed data access. The Challenge of Sharing Data and Replication Once organizations agree to share or replicate their spatial data, they face the challenge of maintaining up-to-date datasets. Spatial data is changing continuously as new infrastructure, subdivisions or more accurate data is collected. To maintain up-to-date databases the various data "owners" must exchange their most current datasets with those they share their data with. This can be done in one of two ways: . Complete data load. This is the most straightforward approach. The current dataset is removed and completely replaced with the new dataset. However, this approach is often impractical due to volume of data, which may be difficult to distribu... [download for more]