In this post we take a look at the new spatial data type support in SQL server 2008. First, I will make a case for why you would want to integrate spatial support in your applications. Next we will take a look at the two core spatial data types in SQL Server: the Geometry and Geography types. We will explain when to use each type, and we will take a look at the main application domains for the two types.
The spatial types are implemented as CLR types in the database engine. Since some database folks might be unfamiliar with CLR types, we will make a brief detour into SQL Server and CLR types. We will talk about the differences between static and instance methods, and explain the TSQL calling notation for each.
Both the Geometry and the Geography types are really just the top-level types of a rich object hierarchy. We will take a detailed look at the different classes in this hierarchy, and we will explore the methods and properties of each class by means of a number of TSQL scripts. A part of this exploration we will take a look at the three data formats that can be used to represent the spatial types: the SQL Server-native Well-Know-Binary (WKB) format, and the OGC standard data types: the Well-Known-Text (WKT) and the Geography Markup Language (GML) . We will use TSQL scripts for each object to illustrate the different notations.
As always, a picture is worth a thousand words, and nowhere this is more the case as for spatial information. Therefore, I will use both the "Spatial Results" tab in SQL server and a number of third-party rendering tools such as SpatialViewer and GeoQuery to present a spatial query result.
After we have a good understanding of the spatial data types, we will put them to some practical use. We will use Virtual Earth to create applications in which we create mashups of Spatial data with a variety of business data. A large volume of GIS data is available on the public domain (some good sources are the US Census and USGS Web sites), but the format of this data is typically not compatible with SQL Server. Therefore, as part of this series we will create a library that will enable us to convert the "traditional" formats into a SQL Server 2008-compatible format.
The above abstract covers a wide variety of formats, therefore I am planing to spread out this article over a number of different posts. This first post will cover the need for spatial support in our applications, the basics of the Geometry and Geography data types and a quick OO primer.
Why Spatial Data?
These days it is hard to find any data that DOES NOT have a spatial aspect. A number of applications attempt to answer questions like the following:
- Where are my customers located?
- What cities have the highest accident rates?
- What Florida counties have the highest flood risk? (OK, that one's easy to answer: "All of them"!)
As GPS devices become more prevalent, more and more data is geo-tagged. For example, a lot of modern mobile phones have both a camera and a GPS chip build in.
While the above applications use spatial data only as part of their overall data set, there are a number of applications that use mapping and spatial data as their primary output:
- Consumer products such as Microsoft's Virtual Earth or Google maps.
- The government publishes the census results as spatial data.
- Utilities use mapping tools such as ESRI Server to plot the layout of electrical grid lines or underground gas lines.
While the above examples are pretty self-evident, spatial data also plays an import role in applications that we might not think about right away:
- When a warehousing application generates a pallet "pick run", it will use geospatial information to calculate the most optimal route.
- When an interior architect uses a CAD tool such as AutoDesk to perform modeling of an interior space, he/she is using spatial data quite extensively.
- Computer-aided manufacturing tools use spatial data to layout out parts on a piece of sheet metal.
- The program in a municipal kiosk uses spatial data and geometric algorithms to predict the arrival times of buses and trains.
- A multi-player computer game use spatial coordinates to keep track of the current locations of all game participants.
From the above discussion it is clear that a large number of application have a need to work with spatial data. Some common requirements for spatial data support are:
- The ability to store spatial coordinates directly in the database, preferably in the same tables as its associated data.
- The data types used for this spatial data should go beyond simple point coordinates. Most geospatial data consists out of a mixture of points, lines (both single-segment and multi segment), and closed shapes (typically referred to as polygons in the literature).
- The ability to perform a multitude of operations on this geospatial data. For example, a chip design application might want to assure that certain routes on a chip do not cross, other applications will have a need to calculate the area of a complex polygon etc.
In the next section will take a look at how SQL Server 2008 addresses the above requirements.
Spatial Data Support in SQL Server 2008
Types of Spatial Data
At the highest level, we recognize two major classes of spatial data:
- Vector Data. Vector data is data expressed by a set of vertices and their relationship to one another. Common spatial features represented by vector data include:
- Lines (where a line can have one or more segments)
- Polygons. Polygons are typically used to represent areas and regions.
- Raster Data. Raster data is data expressed as a matrix of cells. We typically recognize raster data as images. Within a spatial or GIS context, we see raster data manifested as:
- Satellite images.
- The Virtual Earth bird's eye images.
- Google "street level" images.
An example of each type of spatial data is shown below
Sample Raster Image (a sample satellite heat map)
|Sample Vector (in this case a simple line vector) |
| || |
SQL Server 2008 does focus exclusively on Spatial Vector data.
The SQL Server 2008 Spatial Data Types
SQL Server 2008 introduces two new data types:
- Geometry. The geometry data type is based on the Cartesian coordinate system, based upon a "flat earth" representation model. In this model, a point is represented by an X, Y and optionally a Z coordinate. This usage domain of this data type is either:
- The representation of simple coordinates in a two or three dimensional space. A example is the precise location of a pallet in a warehouse.
- The representation of map coordinates, where distances are limited enough so that they are not affected by the round earth model.
- Geography. The geography data point can store points, lines, polygon and collections of each of these, using a "round earth" model as opposed to the "flat earth" model used by the Geometry data type. Instead of using X and Y coordinates, the geography data type will use a latitude/longitude combination to represent a single point. Most of the GIS data available on the Web is latitude/longitude based, so the Geography data type is the data type you should used in most of you GIS applications, especially when you are dealing with longer distances where the shape of the earth becomes relevant.
|Geometry : Cartesian coordinate system, "flat earth" model ||Geography: Latitude/longitude coordinates, "round earth" model |
| || |
The geometry data type conforms to the Open Geospatial Consortium (OGC) Simple features for SQL specification version 1.1.0. The OGC is a non-profit, voluntary concensus standards organization which is the leading consortium when it comes to the drafting and ratification of standards for geospatial and location based services. One of the standards published by the OGC is the Well-Know-Text specification (WKT) for spatial data types. We will investigate this format in detail in part 2 of the series.
CLR Objects in the Database Engine and the Database Developer
Both the geometry and geography data types are implemented as user-defined types (UDT's) in the database engine. These UDT's are implemented as .NET Common Language Runtime (CLR) types. Before we take a more detailed look how to use these types, it is important that we make sure that the reader is familiar with some basic OO principles. If you are already familiar with object-orientation and .NET types, please feel free to skip ahead to the next part in this series.
The core abstraction in the object-oriented world is the class. A class can be seen as the blueprint for a particular type. From this blueprint, a client can create any number of actual object instances. A class hides away its implementation details behind a set of publicly accessible properties and methods. This principle is called encapsulation or information hiding. For example, a Car class will encapsulate the behavior of an automobile. It might expose methods such as Start, Stop, SlowDown and SpeedUp, and it might provide properties such as CurrentSpeed and DaysTillNextOilChange. The Car class will hide the details of how it executes these methods and exposes these properties, enabling the client to be "blissfully unaware" of the implementation details of the Car class.
Classes can be related to one another. At a high-level we can recognize the following types of relationships:
- The "has-a" relationship. For example, a Car class might have a SteeringWheel and a GasTank. This type of relationship is sometimes referred to as a "uses" relationship, for example the Car class "uses" the SteeringWheel class and "uses" the GasTank class.
- The "is-a" relationship. To stay with our Car example, a SportsCar is a specialized type of Car, which is specially equipped. It might have a Spoiler, a HighPerformanceEngine etc.. When classes have such a relationship they will use an OOP concept called inheritance to implement this relationship. In an inheritance implementation, the class from which we inherit is called the base class and the more specialized class is called the sub class. The sub class gets all of the functionality of the base class "for free", so it only needs to worry about implementing it own specialized behavior on top of the functionality of the base class. For example, the SportsCar class only implements the additional functionality that makes it a sports car.
An example of a "uses" ("has a") and an inheritance ("is a") relationship is shown in the figure below:
Some base classes only define a general abstract concepts or entities. These types of classes cannot be directly instantiated by the client, since they are not "feature complete". Such a class is called an abstract class The features defined by an abstract class are implemented by a sub class of the abstract class. Such a sub class that can be instantiated is called a concrete class. We will see in the next section that the geometry and geography classes are indeed defined as abstract classes in the database engine.
In the next part of this series, we will take a look at the class diagrams for both the geography and geometry classes, and we'll start writing some TSQL code!