DataPump ETL

DataPump, StatMap’s server-side ETL (Extraction Transformation Loading) software, enables users to pull and push data between different locations, including databases and standalone file structures – meaning that you can output what is needed and in what form it is needed.

It provides the means for the eVO Platform to integrating your back-office system databases with the eVO Spatial Data Repository (SQL Server 2008+, PostGIS, or Oracle 10g+).

Extract

2553940 - arrow cloud down download.png

Transform

2554024 - refresh reload repeat update.png

Load

2553974 - arrow cloud storage upload.png

                   

DataPump is provided as part of the standard eVO Platform application server installation, enabling the Spatial Data Repository to become your enterprise spatial data warehouse, pulling data from all business systems and file servers within your organisation.  It also integrates your Spatial Data Repository with external databases and data services (e.g. WFS and JSON / GeoJSON).

The rich and powerful DataPump ETL capabilities include the following:

  • Using the Earthlight eVO internet client interface, you can publish DataPump ETL workflows to eVO server – either as scheduled routines which run at specified times or run in response to a trigger or event;
  • Manipulate the structure of data:  changing data from one format to another, applying calculations to field values, merging fields, etc.
  • Change and transform Projections – choose any one of the EPSG defined coordinate reference systems;
  • Apply pre-built transformations, or design your own:  the adaptability and configurability of DataPump enables users to create your own workflows, building from scratch, or to re-use existing transformation workflows for extracting, transforming and loading data from one system – or format – to another.
  • Consume and translate complex Real-Time data from published data source feeds or publish data from your Spatial Data Repository.

Gazetteers: OS AddressBase & LLPG Management

All varieties of gazetteers, including Open Gazetteers, are fully handled by DataPump – including OS AddressBase and LLPG / NLPG.

DataPump supports the management of all three Ordnance Survey AddressBase products:

  • AddressBase™
  • AddressBase Plus™
  • AddressBase Premium™

With DataPump, there is no need to worry about maintaining AddressBase and LLPG gazetteers, as this can all be automated to ensure that your eVO gazetteers are all up-to-date. So taking away the need for manual intervention.

Using .csv full and update files from the Ordnance Survey, DataPump loads the raw files directly into the eVO Spatial Data Repository: either SQL Server 2008 (and later), PostGIS 2.0 (and later) or Oracle 10g (and later).

When loading both OS AddressBase and LLPG gazetteers, DataPump creates a geometry field and populates it with the geographical location of the address.

Multiple Gazetteers

It is very simple and fast to use any of the AddressBase products as your enterprise gazetteer, or as one of a number of gazetteers. eVO Platform products can support one or more gazetteers – each of which published applications can use.

Geocoding

When matched with eVO Platform’s geocoding capabilities, AddressBase and the LLPG becomes a very powerful means of integrating all of your address-based business systems with the eVO Platform.

Single Point of Truth: Spatial Data Warehouse

The server-side DataPump ETL tools can be used to automate data flows between system databases, ensuring that data is synchronised between them and the spatial data repository database – in effect, creating an enterprise spatial data warehouse. In this way, you can provide a central ‘point of truth’, realising the full data warehouse concept – enabling analysis to be undertaken upon cross-organisation business data.

The unified view of spatial data – including all address-based systems, such as social care and educational management systems – provides the concept of creating a repository ripe for undertaking spatial analysis across business data, identifying underlying data trends and patterns.

Open Standard & 3rd Party Applications

Because DataPump reads and writes to open standards, you are able to use the data imported by DataPump into the spatial data repository by 3rd party applications, such as MapInfo Professional, ESRI ArcGIS, and QGIS.

Similarly, exported data types are also open for use by 3rd party systems who can read OGC compliant formats (e.g. .shp, .tab, .kml, etc.)

Public View

It is then simple to publish views of the business data, or summary data, to the general public in order to improve communication and enable self-service to reduce FOI requests and the time taken to respond to them.

Import / Export

DataPump provides the ability to both import and export data to both databases and flat file types. For importation of data, it works in one of three modes:

(i)   Scheduled – scheduled to run on the application server;
(ii)  File Watcher – triggered by events for changes in files within a location on a file server;
(iii) Ad-hoc – run when manually triggered by an administrator(s).

Web Feature Service (WFS)

DataPump consumes published Web Feature Services (WFS), which ensures that your spatial data warehouse is synchronised with externally published datasets – such as the WFS feeds published by public bodies, e.g. Natural England and the British Geological Survey (BGS).

Gazetteers

For public sector organisations, DataPump provides you with the ability to consume Full and Change Only Updates from your local land and property gazetteer (LLPG) management software, using the standard DTF exchange formats. eVO Platform products – including Earthlight and Aurora – use the designated gazetteers as the basis for their lightning fast address search and query facilities.


Data Quality Assessment / Quality Control

From a business perspective, high quality data enables reliance to be placed upon the data, enabling decisions to be made with confidence, so reducing the financial risk of using faulty data, and optimising organisational performance.

From a management perspective, it promotes effective data stewardship, drives increased usage of GIS and maximises productivity.  For Knowledge Workers it ensure that they have more confidence in the benefits of GIS in making decisions they would otherwise not be able to, and convince them of its value in their day-to-day work.

eVO Platform provides QA/QC control implementation at three levels:

  •  Rules-based validation;
  •  Interactive Tools;
  •  Tracks Errors.

Maintaining and assessing data quality and controlling what is stored within the Spatial Data Repository is vital to ensuring the integrity of your enterprise data.  Open Geospatial Consortium (OGC) standards for geometry and geographical representation are vitally important in ensuring that data can be shared freely between software and systems.

Whilst the sophisticated and powerful data editing and maintenance capabilities of eVO Platform products ensures that data is kept OGC-compliant, data from other sources can contain errors and failures in quality standards required by organisations.

Rule-based Quality CONTROLs

DataPump ETL provides Quality Assurance and Control report checking capabilities to ensure that the quality of your enterprise data is continually checked and examined.  DataPump reports upon records which are not compliant and anomalous with Geometry and business attribute field value ranges which you have set for datasets within your DataPump QC/QA checking routines.

As and when external datasets are imported, the QA/QC routines in DataPump can be triggered to run immediately post importation, or can be scheduled to run at chosen intervals.  These generate text reports which can be used to identify and correct the errors inherent in the imported data.

DataPump undertakes all checking and QA/QC for data either within or being loaded into the Spatial Data Repository database. The broad areas of Quality Assurance and Quality Control cover Spatial, Geometry and Attribute value domains.

Business Rules are implemented, and the data is validated against these rules.  The business rules can be used to enforce a wide area of QA/QC checking, including enforcement of industry standards, such as those published by the Open Geospatial Consortium (OGC). 

Subject Matter / Domain BUSINESS RULES

Industry / Domain experts know and understand what rules to set and enforce in order to ensure that their data accurately records what is in place.  As an example, water supply companies know what diameter and types of supply pipes can join together and what types of connectors can be used to connect them.  QA/QC Business Rules can be utilised in automated checks to ensure that connectors used are appropriate for the pipe work types being connected.

eVO Platform QA/QC enables industry and domain experts to implement their own rules to ensure that their data is of a high and dependable quality.

Rule Enforcement and Topology

To complement the powerful and highly configurable Network Connectivity within Earthlight Galactic, DataPump offers the ability to run tasks which can be set to split lines where the implemented rules state that edges should be joined – via a junction feature – where they cross, irrespective of whether there are coincident vertices or end nodes on each of the edges.

This enables topology for network data sets to be created and then Quality Assessed via automated or manually invoked processes.