in

Foo Theory

Partners in Community - serving up some ice cold Kool-Aid!
Welcome to footheory.com.  The bloggers and contributing members on this site are consultants, project/program managers and software architects working across the US.  Our community will focus on Microsoft technologies, .NET architecture, software patterns & practices and just plain stream of consciousness.

Bennie's Weblog

October 2008 - Posts

  • Investigating the new Spatial Types in SQL Server 2008 - Part 2

    Introduction

    In the previous part of this series, we talked about the importance of spatial data in our everyday lives. We talked about vector and raster data, and we explained that the current version of SQL Server 2008 focuses on two-dimensional vector data. Next, we introduced the new geography and geometry data types in SQL server 2008, and we took a quick detour into some important object-oriented principles.

    In this article, we start out by taking a look at the class hierarchies for both the geometry and the geography data types in SQL Server 2008. We will focus on the concrete classes in the class diagram which are:

    1. Point. 
    2. LineString
    3. Polygon

    We will also take a look at the collection classes in the object hierarchy, which allow us to store multiple instances of each type.

    Once we have a good grasp on the geometry and geography class hierarchy we will dive into the details of each concrete class. We will start our investigation by looking into the details of the Point and LineString class, and we will write a T-SQL script that highlights the most important features. In the process we will take a look at the three different data formats that can be used to represent spatial data:

    1. The Well-Known-Binary  (WKB) format
    2. The Well-Known-Text (WKT) format
    3. The Geography Markup Language (GML) data format.

    All of these formats are published as a standard by the Open Geospatial Consortium (OGC), which is the leading standards body for geospatial and location based services.

    The Geometry class hierarchy

    Overview

    A class diagram showing the Geometry class and all of its sub-classes is shown below (note that the Geography class diagram is basically identical, with the Geography class at the root of the tree):

    Geometry Class Diagram

    The darker-blue shaded classes are abstract base classes, so they cannot be directly instantiated in the database engine. The light-blue shaded classes are the concrete classes that we can use directly in our T-SQL code:

    1. Point
    2. LineString
    3. Polygon

    Each of these single-instance classes has a collection equivalent that can contain multiple instances:

    1. MultiPoint
    2. MultiLineString
    3. MultiPolygon

    The GeomCollection class is an additional collection class that can contain a mixture of instances of any type (Points, LineStrings and Polygons).

    Supported Data Formats

    All of the classes in the geometry and the geography class hierarchy can be represented by three different formats:

    1. The Well-Known Binary (WKB) format. This is a byte-stream (binary) representation for a geography instance. This format is the preferred serialization format for those applications that need to store geospatial information in a compact, self-contained format.
    2. The Well-Known Text (WKT) format. This is a compact, easy to read representation. Because of its user-friendly format it is the most commonly used representation for ad-hoc geospatial queries.
    3. The Geography Markup Language (GML) format. This is an XML-based representation, which is best suited for including geospatial information in an XML document. This format is very useful for those applications that need to exchange geospatial information by means of an XML Web service.

    All of the above formats are published as standards by the OGC. In the code samples that follow, we will be showing multiple examples of each format.

    The Point Class

    Basic Representation

    A Point is a zero-dimensional object which represents a single, exact location. It always contains an X and Y coordinate, and can optionally can contain a Z (elevation) value, and a M (measure) value. The measure value is an arbitrary floating point measurement value that you can associate with the Point instance.

    The sample T-SQL script (GeometryPoint.sql)illustrates some of the capabilities of the Point class. Throughout all of our demos, we will be working with a very simple table called GeometryDemo, which allows us to associate a Geometry instance with an integer ID:

       1: CREATE TABLE GeometryDemo
       2: (
       3:     ID    INT NOT NULL,
       4:     Value GEOMETRY NOT NULL
       5: );

    In the following code sample, we insert a Point instance, and then perform a select to retrieve the point instance in a number of formats:

       1: -- Insert a simple Point with an
       2: -- X and Y coordinate
       3: INSERT INTO GeometryDemo (ID, Value)
       4: VALUES    (1, 'POINT(50 65)');
       5:  
       6: -- Get the X and Y coordinates and 
       7: -- the most relevant methods
       8: SELECT ID, Value.ToString() AS WKT, 
       9:            Value.STX AS [X Coordinate],                        
      10:            Value.STY AS [Y Coordinate],
      11:            Value.STAsBinary() as [Binary Representation],
      12:            Value.STasText() as [Text Representation]
      13: FROM GeometryDemo
      14: WHERE ID = 1;

    In lines 3 and 4, we use the WKT format to insert a Point with an X coordinate of 50 and a Y coordinate of 65. The WKT format for a Point simply requires you to to specify the X and Y coordinates. Note that you should NOT use a comma separator between the two, which might seem counterintuitive at first.

    In lines 8 through 14 we select the instance that we just inserted. Since Value is a geometry class instance, we need to use the  dot (".") notation to access properties and invoke methods on the object:

    • In line 8 we invoke the ToString() method. This method will return the Well-Known Text (WKT) representation of the instance.
    • In lines 9 and 10 we invoke the STX and STY properties to get the X and Y coordinates of our Point.
    • In line 11 we invoke the STAsBinary() method to get the binary representation of the Point.
    • Finally, in line 12 we invoke the STAsText() method, which will again return the WKT format of the Point.

    One important note: although TSQL by itself is not case sensitive, the CLR methods and properties on the spatial data types are, so make sure that you spell the property and method names correctly! For example if you would spell "Value.STAsText()" as "Value.STasText()" you would get the following error message:

       1: Msg 6506, Level 16, State 10, Line 34
       2: Could not find method 'STasText' for type 'Microsoft.SqlServer.Types.SqlGeometry' in assembly 'Microsoft.SqlServer.Types'
    The output of the select query is shown below:

    OutputSimplePoint

    From the above output, we see that the WKT representation for the Point is returned by both the ToString() and the STAsText() methods. We also see the binary WKB representation and the X and Y coordinates of the Point instance.

    Using the Z and M coordinates

    As we mentioned earlier, a Point can optionally contain a Z (Elevation) coordinate and a M (Measurement) value. Note that both the Z and M values should be floats. Below is a T-SQL example:

       1: -- Insert a Point with an X, Y, Z and
       2: -- M coordinate
       3: INSERT INTO GeometryDemo (ID, Value)
       4: VALUES (2, 'POINT(55 70 100.2 50)');
       5:  
       6: SELECT ID, Value.ToString() AS WKT, 
       7:            Value.STX AS [X Coordinate],                        
       8:            Value.STY AS [Y Coordinate],
       9:            Value.Z AS [Elevation],
      10:            Value.M AS [Measure]
      11: FROM GeometryDemo
      12: WHERE ID = 2;
    As you can see, we simply added the Z and M values in the WKT format. The output of the select query is shown below:

    OutputAdvancedPoint2

    OGC-compliant methods and Microsoft Extensions

    The attentive observer probably noticed that some method and properties have an ST prefix (such as STX, STY and STAsText()), while other properties and methods do NOT have this prefix (for example: Z, M and ToString()). All methods and properties that are prefixed with ST are OGC-compliant methods, while the other methods and properties are Microsoft extensions to the OGC standard.

    In the documentation you will notice that the OGC standard methods and the Microsoft extensions are cleanly separated:

    OGCMethodsAndExtensions

    In the above diagram we also see sections for static methods (both OGC and Microsoft extensions). We will take a look at static methods in a later section of this post.

    Geography Markup Language (GML)

    The only format that we have not worked with so far is the Geographic Markup Language (GML), let's write a query to retrieve both points as GML:

       1: --    Select both Points as GML
       2: --  Note that we are NOT retrieving the Z and M
       3: --  coordinates of the second point
       4: --  because they are NOT OGC compliant!
       5: SELECT ID, Value.AsGml() as [GML]
       6: FROM GeometryDemo;
    The AsGml() method can be used to retrieve a geometry instance in GML format.

    The output of the query is shown below:

    OutputPointGML

    The standard XML namespace for GML will have the URN: http://www.opengis.net/gml.

    There is also one important note that we can make from the above output: the Z and M coordinates that we used for the second point are NOT returned in the GML! That is because they are proprietary Microsoft extensions to the OGC standard.

    Using the static Point methods

    The methods and properties we have used so far were invoked on an instance of the Point class, so we used the "dot" (.) notation:

    instanceName.Property

    -or-

    instanceName.MethodCall(... method arguments here ...)

    Instance properties and methods are always tied to a particular object instance, in this case a Point instance. Besides instance methods, the spatial data types in SQL Server 2008 also use static methods and properties. Static methods and properties are tied to the class itself, and can thus be invoked without having an instance around. You use the "::" notation in combination with the class name to invoke a static method or property in T-SQL:

    className::Property

    -or-

    className::MethodCall()

    One frequent usage pattern for a static method is the Factory Pattern. The Factory Pattern returns a new instance of the class through a static method. This pattern is used in both geospatial types in SQL Server 2008. Below is a code example from the GeometryPoint.sql T-SQL script:

       1: /* ========================================================================= 
       2:    ======= Using the Static methods to create a Point Instance =============
       3:    ========================================================================= */
       4: DECLARE @Pnt1 Geometry, @Pnt2 Geometry, @Pnt3 Geometry, @Pnt4 Geometry, @Pnt5 Geometry
       5: DECLARE @xmlSnippet xml;
       6:  
       7: --    1. Create a Point from an X and Y coordinate and an SRID
       8: SET @Pnt1 = geometry::Point(30, 20, 0);
       9:  
      10: -- 2. Create a Point using the Parse method
      11: SET @Pnt2 = geometry::Parse('POINT( 90 23 56000 231)');
      12:  
      13: -- 3. Create a Point using the OGC STPointFromText method
      14: -- Notice the SRID at the end
      15: SET @Pnt3 = geometry::STPointFromText('POINT(20 25)', 0)
      16:  
      17: -- 4. Create a Point from GML 
      18: set @xmlSnippet = 
      19:     '<Point xmlns="http://www.opengis.net/gml"> <pos>34 23</pos> </Point>';
      20: SET @Pnt4 = geometry::GeomFromGml(@xmlSnippet, 0);
      21:  
      22: -- 5. Null Point
      23: SET @Pnt5 = geometry::[Null];
      24:  
      25: -- Now select all points
      26: SELECT @Pnt1.ToString() AS [From Arguments],
      27:        @Pnt2.ToString() AS [From Parse],
      28:        @Pnt3.ToString() as [From STPointFromText],
      29:        @Pnt4.ToString() AS [From Xml],
      30:        @Pnt5 AS [Null Point];
    Following is a discussion of each static method:
    • In line 8, we use the static geometry::Point() method to create a Point instance by passing in the X, Y, Z and M coordinates.
    • In line 11, we use the geometry::Parse() method to create a Point instance from Well-Known Text (WKT).
    • In line 15, we use the geometry::STPointFromText() method to create a Point instance from Well-Known Text. The is the OGC equivalent of the geometry::Parse() method.
    • In line 18, we have an example of how to create a Point instance from a Geographic Markup Language (GML) snippet, using the static geometry::GeomFromGml() method.
    • Finally, in line 23 we create a NULL Point instance with the static geometry::[Null] property.

    The output of the select statement is shown below:

    StaticPointMethods

    As we can conclude from the above output, the expected Point instances were created by means of the static methods.

    The Spatial Reference Identifier

    In the previous code sample, you might have noticed the "0" after the @xmlSnippet parameter:

       1: set @xmlSnippet = 
       2:     '<Point xmlns="http://www.opengis.net/gml"> <pos>34 23</pos> </Point>';
       3: SET @Pnt4 = geometry::GeomFromGml(@xmlSnippet, 0);

    This argument is the SRID, or the Spatial Reference Identifier. The SRID corresponds to a spatial reference system based upon the specific ellipsoid used for either flat-earth or round-earth mapping. Different projection systems exist, some of you might be familiar with the Mercator Projection that was used to create the maps that were in most classrooms in recent times. A good discussion of map projections can be found at: http://maps.unomaha.edu/Peterson/gis/notes/MapProjCoord.html.

    Each projection is uniquely defined by its own SRID. You can use any projection you want, but you need to be aware that when you are comparing spatial instances using any of the methods (such as STIntersection()) you need to make sure that all objects have the same SRID otherwise you will get a run-time error!

    That concludes part 2 of this series. In part three, we take a look at the LineString and the Polygon class, and we will use some visualization tools such as GeoQuery, Spatial Viewer, and SQL Server Management Studio's very own "Spatial Tab" to get a visual representation of our spatial data!

  • Investigating the new Spatial Types in SQL Server 2008 - Part 1

    Series Abstract

    In this post we take a look at the new spatial data type support in SQL server 2008. First, I will make a case for why you would want to integrate spatial support in your applications. Next we will take a look at the two core spatial data types in SQL Server:  the Geometry and Geography types. We will explain when to use each type, and we will take a look at the main application domains for the two types.

    The spatial types are implemented as CLR types in the database engine. Since some database folks might be unfamiliar with CLR types, we will make a brief detour into SQL Server and CLR types. We will talk about the differences between static and instance methods, and explain the TSQL calling notation for each.

    Both the Geometry and the Geography types are really just the top-level types of a rich object hierarchy. We will take a detailed look at the different classes in this hierarchy, and we will explore the methods and properties of each class by means of a number of TSQL scripts. A part of this exploration we will take a look at the three data formats that can be used to represent the spatial types: the SQL Server-native  Well-Know-Binary (WKB) format, and the OGC standard data types: the Well-Known-Text (WKT) and the Geography Markup Language (GML) . We will use TSQL scripts for each object to illustrate the different notations.

    As always, a picture is worth a thousand words, and nowhere this is more the case as for spatial information. Therefore, I will use both the "Spatial Results" tab in SQL server and a number of third-party rendering tools such as SpatialViewer and GeoQuery to present a spatial query result.

    After we have a good understanding of the spatial data types, we will put them to some practical use. We will use Virtual Earth to create applications in which we create mashups of Spatial data with a variety of business data. A large volume of GIS data is available on the public domain (some good sources are the US Census and USGS Web sites), but the format of this data is typically not compatible with SQL Server. Therefore, as part of this series we will create a library that will enable us to convert the "traditional" formats into a SQL Server 2008-compatible format.

    The above abstract covers a wide variety of formats, therefore I am planing to spread out this article over a number of different posts. This first post will cover the need for spatial support in our applications, the basics of the Geometry and Geography data types and a quick OO primer.

    Why Spatial Data?

    These days it is hard to find any data that DOES NOT have a spatial aspect. A number of applications attempt to answer questions like the following:

    1. Where are my customers located?
    2. What cities have the highest accident rates?
    3. What Florida counties have the highest flood risk? (OK, that one's easy to answer: "All of them"!)

    As GPS devices become more prevalent, more and more data is geo-tagged. For example, a lot of modern mobile phones have both a camera and a GPS chip build in.

    While the above applications use spatial data only as part of their overall data set, there are a number of applications that use mapping and spatial data as their primary output:

    1. Consumer products such as Microsoft's Virtual Earth or Google maps.
    2. The government publishes the census results as spatial data.
    3. Utilities use mapping tools such as ESRI Server to plot the layout of electrical grid lines or underground gas lines.

    While the above examples are pretty self-evident, spatial data also plays an import role in applications that we might not think about right away:

    • When a warehousing application generates a pallet "pick run", it will use geospatial information to calculate the most optimal route.
    • When an interior architect uses a CAD tool such as AutoDesk to perform modeling of an interior space, he/she is using spatial data quite extensively.
    • Computer-aided manufacturing tools use spatial data to layout out parts on a piece of sheet metal.
    • The program in a municipal kiosk uses spatial data and geometric algorithms to predict the arrival times of buses and trains.
    • A multi-player computer game use spatial coordinates to keep track of the current locations of all game participants.

    From the above discussion it is clear that a large number of application have a need to work with spatial data. Some common requirements for spatial data support are:

    • The ability to store spatial coordinates directly in the database, preferably in the same tables as its associated data.
    • The data types used for this spatial data should go beyond simple point coordinates. Most geospatial data consists out of a mixture of points, lines (both single-segment and multi segment), and closed shapes (typically referred to as polygons in the literature).
    • The ability to perform a multitude of operations on this geospatial data. For example, a chip design application might want to assure that certain routes on a chip do not cross, other applications will have a need to calculate the area of a complex polygon etc.

    In the next section will take a look at how SQL Server 2008 addresses the above requirements.

    Spatial Data Support in SQL Server 2008

    Types of Spatial Data

    At the highest level, we recognize two major classes of spatial data:

    1. Vector Data. Vector data is data expressed by a set of vertices and their relationship to one another. Common spatial features represented by vector data include:
      • Points.
      • Lines (where a line can have one or more segments)
      • Polygons. Polygons are typically used to represent areas and regions.
    2. Raster Data. Raster data is data expressed as a matrix of cells. We typically recognize raster data as images. Within a spatial or GIS context, we see raster data manifested as:
      • Satellite images.
      • The Virtual Earth bird's eye images.
      • Google "street level" images.

    An example of each type of spatial data is shown below

    Sample Raster Image (a sample satellite heat map)

    Sample Vector (in this case a simple line vector)
    RasterImageElevation vectorGeom1

    SQL Server 2008 does focus exclusively on Spatial Vector data.

    The SQL Server 2008 Spatial Data Types

    SQL Server 2008 introduces two new data types:

    1. Geometry. The geometry data type is based on the Cartesian coordinate system,  based upon a "flat earth" representation model. In this model, a point is represented by an X, Y and optionally a Z coordinate. This usage domain of this data type is either:
      • The representation of simple coordinates in a two or three dimensional space. A example is the precise location of a pallet in a warehouse.
      • The representation of map coordinates, where distances are limited enough so that they are not affected by the round earth model.
    2. Geography. The geography data point can store points, lines, polygon and collections of each of these, using a "round earth" model as opposed to the "flat earth" model used by the Geometry data type. Instead of using X and Y coordinates, the geography data type will use a latitude/longitude combination to represent a single point. Most of the GIS data available on the Web is latitude/longitude based, so the Geography data type is the data type you should used in most of you GIS applications, especially when you are dealing with longer distances where the shape of the earth becomes relevant.
    Geometry : Cartesian coordinate system, "flat earth" model Geography: Latitude/longitude coordinates, "round earth" model
    CartesianCoordinateSystem LATITUDE_LONGITUDE_MERIDIA

    The geometry data type conforms to the Open Geospatial Consortium (OGC) Simple features for SQL specification version 1.1.0. The OGC is a non-profit, voluntary concensus standards organization which is the leading consortium when it comes to the drafting and ratification of standards for geospatial and location based services. One of the standards published by the OGC is the Well-Know-Text specification (WKT) for spatial data types. We will investigate this format in detail in part 2 of the series.

    CLR Objects in the Database Engine and the Database Developer

    Both the geometry and geography data types are implemented as user-defined types (UDT's) in the database engine. These UDT's are implemented as .NET Common Language Runtime (CLR) types. Before we take a more detailed look how to use these types, it is important that we make sure that the reader is familiar with some basic OO principles. If you are already familiar with object-orientation and .NET types, please feel free to skip ahead to the next part in this series.

    The core abstraction in the object-oriented world is the class. A class can be seen as the blueprint for a particular type. From this blueprint, a client can create any number of actual object instances. A class hides away its implementation details behind a set of publicly accessible properties and methods. This principle is called encapsulation or information hiding. For example, a Car class will encapsulate the behavior of an automobile. It might expose methods such as Start, Stop, SlowDown and SpeedUp, and it might provide properties such as CurrentSpeed and DaysTillNextOilChange. The Car class will hide the details of how it executes these methods and exposes these properties, enabling the client to be "blissfully unaware" of the implementation details of the Car class.

    Classes can be related to one another. At a high-level we can recognize the following types of relationships:

    1. The "has-a" relationship. For example, a Car class might have a SteeringWheel and a GasTank. This type of relationship is sometimes referred to as a "uses" relationship, for example the Car class "uses" the SteeringWheel class and "uses" the GasTank class.
    2. The "is-a" relationship. To stay with our Car example, a SportsCar is a specialized type of Car,  which is specially equipped. It might have a Spoiler, a HighPerformanceEngine etc.. When classes have such a relationship they will use an OOP concept called inheritance to implement this relationship. In an inheritance implementation, the class from which we inherit is called the base class and the more specialized class is called the sub class. The sub class gets all of the functionality of the base class "for free", so it only needs to worry about implementing it own specialized behavior on top of the functionality of the base class. For example, the SportsCar class only implements the additional functionality that makes it a sports car.

    An example of a "uses" ("has a") and an inheritance ("is a") relationship is shown in the figure below:

    OOLatest

    Some base classes only define a general abstract concepts or entities. These types of classes cannot be directly instantiated by the client, since they are not "feature complete". Such a class is called an abstract class The features defined by an abstract class are implemented by a sub class of the abstract class. Such a sub class that can be instantiated is called a concrete class. We will see in the next section that the geometry and geography classes are indeed defined as abstract classes in the database engine.

    In the next part of this series, we will take a look at the class diagrams for both the geography and geometry classes, and we'll start writing some TSQL code!

Copyright ASIQS Corporation © 2006, All rights reserved.
Powered by Community Server (Commercial Edition), by Telligent Systems