Introduction
Reaching the full potential of GIS requires you to manage different resources inside your organization well. These include GIS data, spatial databases, and the applications where GIS data is visualized and analyzed. Apart from a strategy for managing different resources well, GIS data management involves a clear plan for monitoring GIS data quality itself at all times, so it meets user expectations and helps to reach goals.
This article is a best practices guide for GIS data management. We’ll explain what makes GIS data management hard and how to avoid common pitfalls with proven strategies that will result in better decision-making and efficiency in your organization.
GIS data management challenges
GIS data management is not an easy task:
- GIS technology, data, and user requirements inside your organization may change over time.
- New GIS data technology is released continuously in the form of new data formats, applications, databases, and the platforms they’re running on.
- Changing technology means users will have to change their habits in consuming GIS data and applications, and it’s not always clear beforehand who will win the battle between advancing technology and clinging to habits from the past.
It’s the task of a GIS data manager to facilitate new ways of working with data once it’s become clear that change is inevitable.
At the same time, GIS data managers have to listen to what their users want and facilitate their requests to make GIS a success. If GIS data and applications are not used, GIS has no added value and will become an expense before disappearing inevitably.
As if that’s not enough, GIS software and data form part of an organization’s overall IT structure which means that managing GIS separately from other IT is not always possible.
New data legislation might go into effect, which means your organization has to conform to new data or quality standards. This will most probably require adjusting your current data management strategy.
💡 Use accurate location data in your GIS Data Management strategy. We offer the most comprehensive and up-to-date postal code data worldwide, ensuring precise and reliable information. Browse GeoPostcodes datasets for free and download a sample here.
GIS Data Management done the right way
There are several practices for data management that enhance decision-making and improve efficiency in your organization:
Establish a GIS data governance framework in your organization
As a GIS data manager, it’s your job to maintain data integrity. End users of GIS data require that the data is current, accurate, and trustworthy. GIS is a tool that underpins far-reaching spatial policies, such as spatial planning or opening a new business location. It is also used to understand how areas change over time through data stored in spatial databases or image collections.
End users of GIS data shouldn’t need to worry about data quality in their daily work. They expect their data to comply with existing quality standards. They are confident that their data contains metadata describing what the data represents, how it was collected, and when. In short, high-quality GIS data is a business-critical asset, and its value depends for a large part on how it is managed during its lifecycle.
The importance of defining data standards, schemes, and policies
Managing GIS data is easier with a common framework that defines its underlying structure so it automatically conforms to quality standards upon its creation. Data goes through different stages, from its initial creation to its archiving or deletion at the end of its existence. Data management covers the entire so-called data lifecycle, so from the beginning, it has to conform to the quality standards required by your end users.
Automating data quality checks
Many standards can be automated so that they’re taken care of by software. For example, database schemes prescribe how data is stored upon creation. Such schemes can include smart checks to prevent errors, double entries, or unrealistic values. For example, numerical data containing minimum and maximum values prevents negative or very large values.
However, software automation has its limits. Focusing solely on data as an entity by itself misses the point: how your users interact with it and how it meets their needs should be key. Ask yourself and your stakeholders why GIS is used inside your organization. This will give a good indication of how to define a data policy that meets both internal GIS data quality standards and the data requirements of your users. A data policy endorsed by multiple stakeholders is more likely to succeed.
A single source of truth
To keep data consistent, it is recommended to use a single source of truth for GIS data, meaning no more than one copy of the data. Far too often, datasets are copied locally and start a life hidden from sight from a GIS data manager. With central databases, servers, and cloud applications for storing GIS data, new technology has made local data copies obsolete, but some habits are hard to break.
It is in everyone’s interest that rules of data policies and strategies are complied with. This is possible when your users take ownership of the data they use daily. It is the task of a data manager to create support for a strategy that propagates data ownership.
A single source of truth doesn’t mean there has to be a single GIS database. An organization can have many GIS databases as long as there’s a single version of each file: it’s completely normal to separate different GIS data types based on file formats, data types, or applications. A smart GIS data management strategy takes advantage of the many options available for storing, accessing, and processing data. This brings us to the next topic: organizing GIS data, applications, and their usage.
Organize GIS data, applications, and their usage
GIS applications are part of your organization’s broader IT infrastructure, which includes hardware and software components that may affect your choice of GIS technology. For instance, if your organization uses an Oracle database with spatial capabilities, this could limit alternatives like PostGIS, an open-source extension to PostgreSQL.
Open-Source vs. Proprietary GIS
You have flexibility in choosing between open-source and proprietary GIS solutions. Open-source options can complement proprietary GIS in areas where the latter falls short. For example:
- QGIS is highly regarded for its raster capabilities
- ArcGIS Pro excels in vector capabilities
Feature | Open-Source GIS | Proprietary GIS |
---|---|---|
Cost | Free to use, no licensing fees | Requires licensing fees or subscriptions |
Customization | Highly customizable, access to source code | Limited customization, depends on vendor |
Community Support | Large community, often quick support | Dedicated support, but may require additional fees |
Updates | Frequent updates from the community | Regular updates, but depend on vendor schedules |
Functionality | Extensive with plugins, but may require configuration | Often comprehensive out-of-the-box features |
Examples | QGIS, GRASS GIS, GeoServer | ArcGIS, MapInfo, ERDAS IMAGINE |
Data Compatibility | Supports various formats, often interoperable | Good compatibility, especially with proprietary formats |
Learning Curve | May require technical skills for setup | Generally user-friendly, but varies by software |
Security | Open to vulnerabilities, but widely reviewed | Typically robust with dedicated security support |
Scalability | Scalable, but may need tuning | Often designed for enterprise scalabili |
Desktop vs. Cloud-Based GIS
Desktop GIS remains valuable for individual use due to its speed and simplicity. However, as organizations grow, cloud-based GIS solutions offer more robust options for data storage, sharing, and analysis directly via web browsers, making them ideal for teams or enterprise use.
Feature | Desktop GIS | Cloud-Based GIS |
---|---|---|
Accessibility | Limited to installed device | Accessible from any internet-connected device |
Data Storage | Local storage, user-managed | Cloud storage, managed by provider |
Performance | Dependent on local hardware | Scalable, often faster for large datasets |
Collaboration | Limited, requires file sharing | Easy collaboration, real-time sharing |
Cost | One-time purchase or license fee | Subscription-based, pay-as-you-go |
Updates | Manual updates by user | Automatic updates managed by provider |
Data Security | User-controlled, secure on local device | Provider-managed, may have cloud-specific security concerns |
Customization | High, depending on software | Limited, often relies on provider’s interface |
Examples | ArcGIS Desktop, QGIS | ArcGIS Online, Google Earth Engine |
Offline Capability | Full offline functionality | Requires internet connection, some offline capabilities available |
Data Processing Power | Depends on device hardware capabilities | Leverages cloud processing power for intensive tasks |
Enterprise GIS for Large Organizations
For larger organizations with multiple users and stringent data security needs, enterprise GIS solutions provide clear benefits, such as:
- Role-Based Access: Assign distinct roles, like data managers for administrative tasks or analysts for data visualization and specific analysis.
- Enhanced Security: Enterprise GIS includes advanced security features to protect sensitive data and control access.
On-premise and Hybrid Solutions
For organizations that cannot use cloud storage due to regulatory or security concerns, on-premise and hybrid GIS solutions are viable alternatives. These options allow organizations to maintain control over both data and applications while gaining some flexibility in deployment. Smaller organizations may find that server-based GIS solutions offer the control they need, though they require a strong in-house data security strategy.
The benefits of cloud-native file formats
Over time, working in the cloud has matured. The same goes for cloud GIS: new data formats, applications, and databases have emerged that take advantage of cloud computing benefits. Think of COG files (Cloud Optimized GeoTIFF files), or other cloud-native file formats for handling spatial data such as GeoParquet.
Working with large spatial datasets, in general, has become easier over time in the context of server-based and desktop-based GIS, for example, when accessing vector data in a browser using vector tiles that load only the parts you need instead of the entire dataset.
Format | Pros | Cons |
---|---|---|
COG (Cloud Optimized GeoTIFF) | – Efficient access to large raster datasets – Supports HTTP range requests for partial data retrieval – Backward compatible with standard GeoTIFF readers | – Primarily designed for raster data; not suitable for vector data – Requires specific tools for optimal creation and usage |
GeoParquet | – Efficient storage and retrieval of large vector datasets – Columnar storage format enables efficient querying – Compatible with many big data tools and cloud data warehouses | – Still under development; may lack support in some tools – Limited support for spatial indexing and overviews |
Vector Tiles | – Efficiently load and render vector data in web applications – Allows for dynamic styling and interactivity – Reduces bandwidth usage by loading only necessary data | – Requires preprocessing to generate tiles – May not be suitable for complex analytical tasks – Limited support for certain data types and attributes |
Don’t forget your stakeholders’ needs
With multiple GIS applications in a single organization, think about how they’re connected to serve the needs of your stakeholders. Overseeing and optimizing data streams between applications are important data management tasks so that everything runs smoothly and no bottlenecks occur. This is where you can build quality checks so that data quality is consistent throughout your organization.
Guarantee consistent GIS data quality with organizational data quality procedures
You want data quality to stay consistent over time. As with the preceding best practices, combining organizational and technological strategies will do the trick. We’ve already covered the advantages of adopting and applying data quality standards for your organization’s GIS data.
Automation and data management tools will help you, but there’s more to periodically check your data quality. In many organizations, regular data quality audits are common practice to measure if your data quality is high. If your organization is ISO-certified, these audits are mandatory and performed by external and specialized audit teams.
Matching your GIS data with other data
Another way to check your data quality is to compare internal datasets to see how they match. Different spatial datasets of the same spatial area can be compared or even overlaid to see how well they compare. Similarly, authoritative datasets of other organizations can be used as reference material to compare your spatial data.
The purpose is to see if your spatial data is accurate, truthful, and uniform. The higher the data quality, the better the results of your stakeholders’ GIS work will be.
One great source to compare your data with is GeoPostcodes, a comprehensive location database that includes worldwide postal data, as well as boundary, address, and population databases, you can browse GeoPostcodes data for free here.
Data review software tools
GIS vendors provide software that automates these processes, with internal checks to review data quality. An example is Esri’s ArcGIS Data Reviewer for GIS data quality management and validation. Safe Software’s Feature Manipulation Engine (FME) offers automated data manipulation tools, including data review tools.
Data quality audits and strategies take time and effort. In an organization where time is money, such quality checks may not be your first priority. Periodically performing such audits will benefit everyone in the organization. Data managers might need to spend time and effort raising awareness for this cause, stressing the benefits of high-quality spatial data.
Conclusion
In this article, we’ve offered some best practices for GIS data management that will benefit your organization. We’ve pointed out what makes data management hard and what are the most common pitfalls, such as focusing solely on technical solutions.
However, GIS data management is not just about IT, tools, and data; it’s about the people who use them to make decisions and derive insights using GIS data and tools. GIS data managers need to keep their users happy without forgetting to keep data quality high at all times.
With constantly changing technology, both GIS users and data managers need to adapt to new tools, platforms, and workflows. These may contradict current data quality practices and the geospatial data strategy, and it is the task of a data manager to create the support to adapt to constant change, which is the only constant factor in the dynamic IT industry.
FAQ
What is GIS data management?
GIS data management involves how geographic data is collected, stored, and used inside an organization.
GIS data management includes the technological infrastructure required to unlock the potential of GIS data so that users can access and use it for their daily work.
It also requires a clear strategy to maintain data quality over time.
What does a GIS data manager do?
GIS data managers monitor spatial data quality during the entire data lifecycle, which includes data capture, processing, storage, management, and analysis.
They make sure data meets all user requirements, quality norms, and standards that guarantee it is trustworthy, accurate, and current.
What are the three types of data in GIS?
The three types of data in Geographic Information Systems are spatial, attribute, and metadata.
Spatial data includes vector, raster, imagery, and other spatial entities.
Attribute data are non-spatial characteristics from points, lines, and polygons.
Metadata describes the information contained in a spatial dataset, such as scale, projection, and projection/datum information.
What is the role of geospatial data management in spatial analysis?
Geospatial data management is crucial in spatial analysis, as it involves organizing, storing, and maintaining spatially referenced data to ensure accuracy and accessibility during analysis.
Proper management enhances the reliability of spatial insights.
What is raster data and relational database data in GIS?
Raster data, typically used in GIS for images and maps, consists of grid cells or pixels, each holding specific values.
In contrast, relational databases organize existing GIS data in tables, facilitating structured queries and relationships among various data types.
Why is primary data capture essential in geospatial data management?
Primary data capture involves collecting original data directly from sources, which is vital for accurate and up-to-date geospatial data management.
This process helps ensure that the spatial data reflects real-world conditions without relying solely on existing data.