in

Foo Theory

Partners in Community - serving up some ice cold Kool-Aid!
Welcome to footheory.com.  The bloggers and contributing members on this site are consultants, project/program managers and software architects working across the US.  Our community will focus on Microsoft technologies, .NET architecture, software patterns & practices and just plain stream of consciousness.

Bennie's Weblog

  • Investigating the new Spatial Types in SQL Server 2008 - Part 2

    Introduction

    In the previous part of this series, we talked about the importance of spatial data in our everyday lives. We talked about vector and raster data, and we explained that the current version of SQL Server 2008 focuses on two-dimensional vector data. Next, we introduced the new geography and geometry data types in SQL server 2008, and we took a quick detour into some important object-oriented principles.

    In this article, we start out by taking a look at the class hierarchies for both the geometry and the geography data types in SQL Server 2008. We will focus on the concrete classes in the class diagram which are:

    1. Point. 
    2. LineString
    3. Polygon

    We will also take a look at the collection classes in the object hierarchy, which allow us to store multiple instances of each type.

    Once we have a good grasp on the geometry and geography class hierarchy we will dive into the details of each concrete class. We will start our investigation by looking into the details of the Point and LineString class, and we will write a T-SQL script that highlights the most important features. In the process we will take a look at the three different data formats that can be used to represent spatial data:

    1. The Well-Known-Binary  (WKB) format
    2. The Well-Known-Text (WKT) format
    3. The Geography Markup Language (GML) data format.

    All of these formats are published as a standard by the Open Geospatial Consortium (OGC), which is the leading standards body for geospatial and location based services.

    The Geometry class hierarchy

    Overview

    A class diagram showing the Geometry class and all of its sub-classes is shown below (note that the Geography class diagram is basically identical, with the Geography class at the root of the tree):

    Geometry Class Diagram

    The darker-blue shaded classes are abstract base classes, so they cannot be directly instantiated in the database engine. The light-blue shaded classes are the concrete classes that we can use directly in our T-SQL code:

    1. Point
    2. LineString
    3. Polygon

    Each of these single-instance classes has a collection equivalent that can contain multiple instances:

    1. MultiPoint
    2. MultiLineString
    3. MultiPolygon

    The GeomCollection class is an additional collection class that can contain a mixture of instances of any type (Points, LineStrings and Polygons).

    Supported Data Formats

    All of the classes in the geometry and the geography class hierarchy can be represented by three different formats:

    1. The Well-Known Binary (WKB) format. This is a byte-stream (binary) representation for a geography instance. This format is the preferred serialization format for those applications that need to store geospatial information in a compact, self-contained format.
    2. The Well-Known Text (WKT) format. This is a compact, easy to read representation. Because of its user-friendly format it is the most commonly used representation for ad-hoc geospatial queries.
    3. The Geography Markup Language (GML) format. This is an XML-based representation, which is best suited for including geospatial information in an XML document. This format is very useful for those applications that need to exchange geospatial information by means of an XML Web service.

    All of the above formats are published as standards by the OGC. In the code samples that follow, we will be showing multiple examples of each format.

    The Point Class

    Basic Representation

    A Point is a zero-dimensional object which represents a single, exact location. It always contains an X and Y coordinate, and can optionally can contain a Z (elevation) value, and a M (measure) value. The measure value is an arbitrary floating point measurement value that you can associate with the Point instance.

    The sample T-SQL script (GeometryPoint.sql)illustrates some of the capabilities of the Point class. Throughout all of our demos, we will be working with a very simple table called GeometryDemo, which allows us to associate a Geometry instance with an integer ID:

       1: CREATE TABLE GeometryDemo
       2: (
       3:     ID    INT NOT NULL,
       4:     Value GEOMETRY NOT NULL
       5: );

    In the following code sample, we insert a Point instance, and then perform a select to retrieve the point instance in a number of formats:

       1: -- Insert a simple Point with an
       2: -- X and Y coordinate
       3: INSERT INTO GeometryDemo (ID, Value)
       4: VALUES    (1, 'POINT(50 65)');
       5:  
       6: -- Get the X and Y coordinates and 
       7: -- the most relevant methods
       8: SELECT ID, Value.ToString() AS WKT, 
       9:            Value.STX AS [X Coordinate],                        
      10:            Value.STY AS [Y Coordinate],
      11:            Value.STAsBinary() as [Binary Representation],
      12:            Value.STasText() as [Text Representation]
      13: FROM GeometryDemo
      14: WHERE ID = 1;

    In lines 3 and 4, we use the WKT format to insert a Point with an X coordinate of 50 and a Y coordinate of 65. The WKT format for a Point simply requires you to to specify the X and Y coordinates. Note that you should NOT use a comma separator between the two, which might seem counterintuitive at first.

    In lines 8 through 14 we select the instance that we just inserted. Since Value is a geometry class instance, we need to use the  dot (".") notation to access properties and invoke methods on the object:

    • In line 8 we invoke the ToString() method. This method will return the Well-Known Text (WKT) representation of the instance.
    • In lines 9 and 10 we invoke the STX and STY properties to get the X and Y coordinates of our Point.
    • In line 11 we invoke the STAsBinary() method to get the binary representation of the Point.
    • Finally, in line 12 we invoke the STAsText() method, which will again return the WKT format of the Point.

    One important note: although TSQL by itself is not case sensitive, the CLR methods and properties on the spatial data types are, so make sure that you spell the property and method names correctly! For example if you would spell "Value.STAsText()" as "Value.STasText()" you would get the following error message:

       1: Msg 6506, Level 16, State 10, Line 34
       2: Could not find method 'STasText' for type 'Microsoft.SqlServer.Types.SqlGeometry' in assembly 'Microsoft.SqlServer.Types'
    The output of the select query is shown below:

    OutputSimplePoint

    From the above output, we see that the WKT representation for the Point is returned by both the ToString() and the STAsText() methods. We also see the binary WKB representation and the X and Y coordinates of the Point instance.

    Using the Z and M coordinates

    As we mentioned earlier, a Point can optionally contain a Z (Elevation) coordinate and a M (Measurement) value. Note that both the Z and M values should be floats. Below is a T-SQL example:

       1: -- Insert a Point with an X, Y, Z and
       2: -- M coordinate
       3: INSERT INTO GeometryDemo (ID, Value)
       4: VALUES (2, 'POINT(55 70 100.2 50)');
       5:  
       6: SELECT ID, Value.ToString() AS WKT, 
       7:            Value.STX AS [X Coordinate],                        
       8:            Value.STY AS [Y Coordinate],
       9:            Value.Z AS [Elevation],
      10:            Value.M AS [Measure]
      11: FROM GeometryDemo
      12: WHERE ID = 2;
    As you can see, we simply added the Z and M values in the WKT format. The output of the select query is shown below:

    OutputAdvancedPoint2

    OGC-compliant methods and Microsoft Extensions

    The attentive observer probably noticed that some method and properties have an ST prefix (such as STX, STY and STAsText()), while other properties and methods do NOT have this prefix (for example: Z, M and ToString()). All methods and properties that are prefixed with ST are OGC-compliant methods, while the other methods and properties are Microsoft extensions to the OGC standard.

    In the documentation you will notice that the OGC standard methods and the Microsoft extensions are cleanly separated:

    OGCMethodsAndExtensions

    In the above diagram we also see sections for static methods (both OGC and Microsoft extensions). We will take a look at static methods in a later section of this post.

    Geography Markup Language (GML)

    The only format that we have not worked with so far is the Geographic Markup Language (GML), let's write a query to retrieve both points as GML:

       1: --    Select both Points as GML
       2: --  Note that we are NOT retrieving the Z and M
       3: --  coordinates of the second point
       4: --  because they are NOT OGC compliant!
       5: SELECT ID, Value.AsGml() as [GML]
       6: FROM GeometryDemo;
    The AsGml() method can be used to retrieve a geometry instance in GML format.

    The output of the query is shown below:

    OutputPointGML

    The standard XML namespace for GML will have the URN: http://www.opengis.net/gml.

    There is also one important note that we can make from the above output: the Z and M coordinates that we used for the second point are NOT returned in the GML! That is because they are proprietary Microsoft extensions to the OGC standard.

    Using the static Point methods

    The methods and properties we have used so far were invoked on an instance of the Point class, so we used the "dot" (.) notation:

    instanceName.Property

    -or-

    instanceName.MethodCall(... method arguments here ...)

    Instance properties and methods are always tied to a particular object instance, in this case a Point instance. Besides instance methods, the spatial data types in SQL Server 2008 also use static methods and properties. Static methods and properties are tied to the class itself, and can thus be invoked without having an instance around. You use the "::" notation in combination with the class name to invoke a static method or property in T-SQL:

    className::Property

    -or-

    className::MethodCall()

    One frequent usage pattern for a static method is the Factory Pattern. The Factory Pattern returns a new instance of the class through a static method. This pattern is used in both geospatial types in SQL Server 2008. Below is a code example from the GeometryPoint.sql T-SQL script:

       1: /* ========================================================================= 
       2:    ======= Using the Static methods to create a Point Instance =============
       3:    ========================================================================= */
       4: DECLARE @Pnt1 Geometry, @Pnt2 Geometry, @Pnt3 Geometry, @Pnt4 Geometry, @Pnt5 Geometry
       5: DECLARE @xmlSnippet xml;
       6:  
       7: --    1. Create a Point from an X and Y coordinate and an SRID
       8: SET @Pnt1 = geometry::Point(30, 20, 0);
       9:  
      10: -- 2. Create a Point using the Parse method
      11: SET @Pnt2 = geometry::Parse('POINT( 90 23 56000 231)');
      12:  
      13: -- 3. Create a Point using the OGC STPointFromText method
      14: -- Notice the SRID at the end
      15: SET @Pnt3 = geometry::STPointFromText('POINT(20 25)', 0)
      16:  
      17: -- 4. Create a Point from GML 
      18: set @xmlSnippet = 
      19:     '<Point xmlns="http://www.opengis.net/gml"> <pos>34 23</pos> </Point>';
      20: SET @Pnt4 = geometry::GeomFromGml(@xmlSnippet, 0);
      21:  
      22: -- 5. Null Point
      23: SET @Pnt5 = geometry::[Null];
      24:  
      25: -- Now select all points
      26: SELECT @Pnt1.ToString() AS [From Arguments],
      27:        @Pnt2.ToString() AS [From Parse],
      28:        @Pnt3.ToString() as [From STPointFromText],
      29:        @Pnt4.ToString() AS [From Xml],
      30:        @Pnt5 AS [Null Point];
    Following is a discussion of each static method:
    • In line 8, we use the static geometry::Point() method to create a Point instance by passing in the X, Y, Z and M coordinates.
    • In line 11, we use the geometry::Parse() method to create a Point instance from Well-Known Text (WKT).
    • In line 15, we use the geometry::STPointFromText() method to create a Point instance from Well-Known Text. The is the OGC equivalent of the geometry::Parse() method.
    • In line 18, we have an example of how to create a Point instance from a Geographic Markup Language (GML) snippet, using the static geometry::GeomFromGml() method.
    • Finally, in line 23 we create a NULL Point instance with the static geometry::[Null] property.

    The output of the select statement is shown below:

    StaticPointMethods

    As we can conclude from the above output, the expected Point instances were created by means of the static methods.

    The Spatial Reference Identifier

    In the previous code sample, you might have noticed the "0" after the @xmlSnippet parameter:

       1: set @xmlSnippet = 
       2:     '<Point xmlns="http://www.opengis.net/gml"> <pos>34 23</pos> </Point>';
       3: SET @Pnt4 = geometry::GeomFromGml(@xmlSnippet, 0);

    This argument is the SRID, or the Spatial Reference Identifier. The SRID corresponds to a spatial reference system based upon the specific ellipsoid used for either flat-earth or round-earth mapping. Different projection systems exist, some of you might be familiar with the Mercator Projection that was used to create the maps that were in most classrooms in recent times. A good discussion of map projections can be found at: http://maps.unomaha.edu/Peterson/gis/notes/MapProjCoord.html.

    Each projection is uniquely defined by its own SRID. You can use any projection you want, but you need to be aware that when you are comparing spatial instances using any of the methods (such as STIntersection()) you need to make sure that all objects have the same SRID otherwise you will get a run-time error!

    That concludes part 2 of this series. In part three, we take a look at the LineString and the Polygon class, and we will use some visualization tools such as GeoQuery, Spatial Viewer, and SQL Server Management Studio's very own "Spatial Tab" to get a visual representation of our spatial data!

  • Investigating the new Spatial Types in SQL Server 2008 - Part 1

    Series Abstract

    In this post we take a look at the new spatial data type support in SQL server 2008. First, I will make a case for why you would want to integrate spatial support in your applications. Next we will take a look at the two core spatial data types in SQL Server:  the Geometry and Geography types. We will explain when to use each type, and we will take a look at the main application domains for the two types.

    The spatial types are implemented as CLR types in the database engine. Since some database folks might be unfamiliar with CLR types, we will make a brief detour into SQL Server and CLR types. We will talk about the differences between static and instance methods, and explain the TSQL calling notation for each.

    Both the Geometry and the Geography types are really just the top-level types of a rich object hierarchy. We will take a detailed look at the different classes in this hierarchy, and we will explore the methods and properties of each class by means of a number of TSQL scripts. A part of this exploration we will take a look at the three data formats that can be used to represent the spatial types: the SQL Server-native  Well-Know-Binary (WKB) format, and the OGC standard data types: the Well-Known-Text (WKT) and the Geography Markup Language (GML) . We will use TSQL scripts for each object to illustrate the different notations.

    As always, a picture is worth a thousand words, and nowhere this is more the case as for spatial information. Therefore, I will use both the "Spatial Results" tab in SQL server and a number of third-party rendering tools such as SpatialViewer and GeoQuery to present a spatial query result.

    After we have a good understanding of the spatial data types, we will put them to some practical use. We will use Virtual Earth to create applications in which we create mashups of Spatial data with a variety of business data. A large volume of GIS data is available on the public domain (some good sources are the US Census and USGS Web sites), but the format of this data is typically not compatible with SQL Server. Therefore, as part of this series we will create a library that will enable us to convert the "traditional" formats into a SQL Server 2008-compatible format.

    The above abstract covers a wide variety of formats, therefore I am planing to spread out this article over a number of different posts. This first post will cover the need for spatial support in our applications, the basics of the Geometry and Geography data types and a quick OO primer.

    Why Spatial Data?

    These days it is hard to find any data that DOES NOT have a spatial aspect. A number of applications attempt to answer questions like the following:

    1. Where are my customers located?
    2. What cities have the highest accident rates?
    3. What Florida counties have the highest flood risk? (OK, that one's easy to answer: "All of them"!)

    As GPS devices become more prevalent, more and more data is geo-tagged. For example, a lot of modern mobile phones have both a camera and a GPS chip build in.

    While the above applications use spatial data only as part of their overall data set, there are a number of applications that use mapping and spatial data as their primary output:

    1. Consumer products such as Microsoft's Virtual Earth or Google maps.
    2. The government publishes the census results as spatial data.
    3. Utilities use mapping tools such as ESRI Server to plot the layout of electrical grid lines or underground gas lines.

    While the above examples are pretty self-evident, spatial data also plays an import role in applications that we might not think about right away:

    • When a warehousing application generates a pallet "pick run", it will use geospatial information to calculate the most optimal route.
    • When an interior architect uses a CAD tool such as AutoDesk to perform modeling of an interior space, he/she is using spatial data quite extensively.
    • Computer-aided manufacturing tools use spatial data to layout out parts on a piece of sheet metal.
    • The program in a municipal kiosk uses spatial data and geometric algorithms to predict the arrival times of buses and trains.
    • A multi-player computer game use spatial coordinates to keep track of the current locations of all game participants.

    From the above discussion it is clear that a large number of application have a need to work with spatial data. Some common requirements for spatial data support are:

    • The ability to store spatial coordinates directly in the database, preferably in the same tables as its associated data.
    • The data types used for this spatial data should go beyond simple point coordinates. Most geospatial data consists out of a mixture of points, lines (both single-segment and multi segment), and closed shapes (typically referred to as polygons in the literature).
    • The ability to perform a multitude of operations on this geospatial data. For example, a chip design application might want to assure that certain routes on a chip do not cross, other applications will have a need to calculate the area of a complex polygon etc.

    In the next section will take a look at how SQL Server 2008 addresses the above requirements.

    Spatial Data Support in SQL Server 2008

    Types of Spatial Data

    At the highest level, we recognize two major classes of spatial data:

    1. Vector Data. Vector data is data expressed by a set of vertices and their relationship to one another. Common spatial features represented by vector data include:
      • Points.
      • Lines (where a line can have one or more segments)
      • Polygons. Polygons are typically used to represent areas and regions.
    2. Raster Data. Raster data is data expressed as a matrix of cells. We typically recognize raster data as images. Within a spatial or GIS context, we see raster data manifested as:
      • Satellite images.
      • The Virtual Earth bird's eye images.
      • Google "street level" images.

    An example of each type of spatial data is shown below

    Sample Raster Image (a sample satellite heat map)

    Sample Vector (in this case a simple line vector)
    RasterImageElevation vectorGeom1

    SQL Server 2008 does focus exclusively on Spatial Vector data.

    The SQL Server 2008 Spatial Data Types

    SQL Server 2008 introduces two new data types:

    1. Geometry. The geometry data type is based on the Cartesian coordinate system,  based upon a "flat earth" representation model. In this model, a point is represented by an X, Y and optionally a Z coordinate. This usage domain of this data type is either:
      • The representation of simple coordinates in a two or three dimensional space. A example is the precise location of a pallet in a warehouse.
      • The representation of map coordinates, where distances are limited enough so that they are not affected by the round earth model.
    2. Geography. The geography data point can store points, lines, polygon and collections of each of these, using a "round earth" model as opposed to the "flat earth" model used by the Geometry data type. Instead of using X and Y coordinates, the geography data type will use a latitude/longitude combination to represent a single point. Most of the GIS data available on the Web is latitude/longitude based, so the Geography data type is the data type you should used in most of you GIS applications, especially when you are dealing with longer distances where the shape of the earth becomes relevant.
    Geometry : Cartesian coordinate system, "flat earth" model Geography: Latitude/longitude coordinates, "round earth" model
    CartesianCoordinateSystem LATITUDE_LONGITUDE_MERIDIA

    The geometry data type conforms to the Open Geospatial Consortium (OGC) Simple features for SQL specification version 1.1.0. The OGC is a non-profit, voluntary concensus standards organization which is the leading consortium when it comes to the drafting and ratification of standards for geospatial and location based services. One of the standards published by the OGC is the Well-Know-Text specification (WKT) for spatial data types. We will investigate this format in detail in part 2 of the series.

    CLR Objects in the Database Engine and the Database Developer

    Both the geometry and geography data types are implemented as user-defined types (UDT's) in the database engine. These UDT's are implemented as .NET Common Language Runtime (CLR) types. Before we take a more detailed look how to use these types, it is important that we make sure that the reader is familiar with some basic OO principles. If you are already familiar with object-orientation and .NET types, please feel free to skip ahead to the next part in this series.

    The core abstraction in the object-oriented world is the class. A class can be seen as the blueprint for a particular type. From this blueprint, a client can create any number of actual object instances. A class hides away its implementation details behind a set of publicly accessible properties and methods. This principle is called encapsulation or information hiding. For example, a Car class will encapsulate the behavior of an automobile. It might expose methods such as Start, Stop, SlowDown and SpeedUp, and it might provide properties such as CurrentSpeed and DaysTillNextOilChange. The Car class will hide the details of how it executes these methods and exposes these properties, enabling the client to be "blissfully unaware" of the implementation details of the Car class.

    Classes can be related to one another. At a high-level we can recognize the following types of relationships:

    1. The "has-a" relationship. For example, a Car class might have a SteeringWheel and a GasTank. This type of relationship is sometimes referred to as a "uses" relationship, for example the Car class "uses" the SteeringWheel class and "uses" the GasTank class.
    2. The "is-a" relationship. To stay with our Car example, a SportsCar is a specialized type of Car,  which is specially equipped. It might have a Spoiler, a HighPerformanceEngine etc.. When classes have such a relationship they will use an OOP concept called inheritance to implement this relationship. In an inheritance implementation, the class from which we inherit is called the base class and the more specialized class is called the sub class. The sub class gets all of the functionality of the base class "for free", so it only needs to worry about implementing it own specialized behavior on top of the functionality of the base class. For example, the SportsCar class only implements the additional functionality that makes it a sports car.

    An example of a "uses" ("has a") and an inheritance ("is a") relationship is shown in the figure below:

    OOLatest

    Some base classes only define a general abstract concepts or entities. These types of classes cannot be directly instantiated by the client, since they are not "feature complete". Such a class is called an abstract class The features defined by an abstract class are implemented by a sub class of the abstract class. Such a sub class that can be instantiated is called a concrete class. We will see in the next section that the geometry and geography classes are indeed defined as abstract classes in the database engine.

    In the next part of this series, we will take a look at the class diagrams for both the geography and geometry classes, and we'll start writing some TSQL code!

  • An Introduction to LINQ to XML

    Note: This post is a work in progress... not yet completed...

    Overview

    When Visual Studio 2008 and the .NET Framework 3.5 shipped in late 2007, the feature that got by far the most attention was Language Integrated Query (LINQ). This article will provide a quick overview of the LINQ technologies, and will then focus in on LINQ to XML, which is a subset of the LINQ technologies.

    This article is structured as follows:

    1. The next section will provide a short overview of LINQ in general. We take a look at what the main drivers were behind the creation of LINQ, and we will start looking at some of the specifics of LINQ to XML.
    2. Once we have established a basic understanding of LINQ to XML, we will take a look at its object model. LINQ to XML introduces an easy to use, flattened object model, especially when compared to the XML DOM object model.
    3. Next, we will take a look at what is in my opinion one of the most important features of LINQ to XML, and that is Functional Construction. Functional construction enables us to create an XML document in a very straightforward way. No longer do we need to use the elaborate XML DOM object model and write 100 lines of complex code, just to create a simple XML document that is just a few lines long.
    4. After we have an understanding of how to create an XML document, we take a look at how we can use the features on the new LINQ to XML object model to query, iterate, manipulate and validate an XML document.
    5. To illustrate how easy it is to leverage the LINQ to XML features in a real-world application, we will take a look at an example that  performs two-way binding of a rather complex XML document to a WPF tree control.
    6. To round out our discussion, we will take a look at some practical tips and tricks, which enables us to take full advantage of LINQ to XML, and we will also mentioned some common pitfalls, and inform you how to save time not making the same mistakes as your truly ;-)!

    Prerequisites

    If you want to run the code samples that are associated with this article, you will need the following pre-requisites:

    • Since we will be using LINQ, you obviously need to have a version of Visual Studio 2008 installed. If you install Visual Studio, you will automatically also install .NET 3.5 and the C# 3.0 compiler.
    • We will be showing how you can use LINQ to SQL to create XML documents, so you will need to have SQL Server 2005 or the SQL Server 2008 beta installed. If you have the express edition installed, I recommend you also download SQL Server Management Studio express.
    • We will be using the AdventureWorks sample database to illustrate how we can create an XML document from a SQL Server database, so you should have the latest version of the AdventureWorks database installed. If you currently don't have this database installed, you can navigate to: http://www.codeplex.com/MSFTDBProdSamples/Release/ProjectReleases.aspx?ReleaseId=4004, and download the appropriate version of the AdventureWorks*.msi file, and follow the instructions to install the database on your SQL server.

    LINQ Overview

    Background - Problem Description

    Most of us write business applications to earn our living. All business  applications that I ever worked with deal with some type of data, and quite a number of them involved multiple heterogeneous data sources. 

    In the early days (1980-1990) all of the data for an application was centralized in one file, and the data access capabilities were directly built into the language (I know that at this point, my friend Pete Miller is thinking back nostalgically about his FoxPro and DBASE days... ;-). The upside of this was that data querying and data manipulation were a core part of the programming experience, but the downside was that every platform (DBASE, FoxPro, FileMaker) would integrate these data access features in a completely different way, tailored to the capabilities of the underlying data access tool. As a result, programmers faced a steep learning curve when moving from one platform to another. Another drawback of these first-generation data access systems was scalability. All platforms were file-based, and therefore had a hard time scaling to multiple users and large data sizes.

    In response to this and other issues, (1990s  through early 2000s) relational database systems (RDBMS) were created. All of these databases used an emerging query and DML language called SQL to access the data, and database programs migrated from being strictly tied to a particular data access system dialect, to using industry-standard SQL. The query and data manipulation features became again external to the core programming languages, and were typically made available through a set of external libraries (db-Library anyone?). This process was accelerated by the creation of ODBC, which provides a standard means of accessing any database that provides an ODBC driver.

    In recent years, we have seen the emergence of Object-Relational Mapping (ORM) tools. The main philosophy behind the ORM movement was the fact that designers and programmers are dealing with objects to represent their data, but each time they have to persist or load these objects from a data store, they have to make a paradigm shift back to SQL and the relational model,  and the specific details associated with the database access libraries that are being used. An ORM tool, such as NHibernate takes over the responsibility of persisting and loading objects to the database in a transparent fashion.

    While both SQL and their associated data access technologies (such as ADO.NET) and/or the newer ORM tools have simplified things significantly, we still have a number of challenges that remain:

    1. Most real-world applications out there deal with other data types besides relational data. Indeed, in this connected world, we access a variety of XML data sources, we connect to RSS feeds, leverage Web Services using the SOAP or REST protocols, access data in Active Directory or some other LDAP-based data store, and so on. Each of these data sources have their own data access paradigms.
    2. When working with "plain old objects", we have to use a very different API to  sort, filter, group our otherwise manipulate our objects, as compared to relational data. Wouldn't it be nice if we could have one standard API for these common tasks?
    3. Often we have to perform complex data transformations and/or data shaping. The way in which these transformations are performed is often dependent on the type of the data. For XML we use XSLT, for relational data we use views or complex joins, for objects we use manual code etc. Again, it would be nice to have access to one standard approach for performing transformations.
    4. What if we want to access data from a dynamic language, such as Ruby, IronPython, PowerShell, F# or any of the future DLR-based languages? Will these languages be able to use the same data access tool as the statically-typed .NET languages such as C# or VB.NET?

    Microsoft's Answer: LINQ

    At the core, LINQ is really a set of constructs, built into the language, which allow us to work with any type of data, be it relational, XML or plain old objects. LINQ is supported in both C# 3.0 and VB 9.0, which are compilers that shipped with Visual Studio 2008. In a way, LINQ brings us "back to the future", making querying and manipulating data  a core programming concept again. The main different with the "old school" languages such as FoxPro is that LINQ is fully independent of the type of data that is being accessed (object, relational data, XML, etc.), and the specific implementation of the data source (SQL Server, Oracle etc.).

    The LINQ features in the C# 3.0 compiler are built on top of a number of other language enhancements such as:

    1. Anonymous types
    2. Anonymous methods
    3. Type inference
    4. Lambda expressions
    5. Expression trees
    6. Extension methods
    7. Instance and collection initializers
    8. Partial methods

    A number of these features were addressed in some of my previous posts. Please refer to the blog archives for more information.

    The .NET 3.5 Framework also provides LINQ support through a number of types, available in the System.*.LINQ namespaces. These types provide additional support on top of the compiler features.

    A graphical overview of LINQ is shown below:

    LINQ Overview

    LINQ to XML

    LINQ to XML is portion of LINQ that allows us to:

    • Construct
    • Traverse
    • Manipulate
    • Query
    • Search

    XML documents and fragments, using the standard LINQ API.

    One of the main goals of LINQ to XML was to address the main shortcomings in the W3C XML DOM API, as implemented in the System.Xml.* .NET 2.0 namespaces, with a focus on the following areas:

    • Simplify XML tree construction with functional construction.
    • Eliminate document centricity in favor of element centricity.
    • Simplify naming by eliminating prefixes from the API.
    • Simplify Node value extraction.

    The above topics have always been a hard area to deal with for any programmer working with XML documents. XML DOM code is unnecessary complex and bloated, and often unintentionally obfuscated. For example, it is not easy to imply the structure of the created XML document, when reading XML DOM document creation code.

    Other issues that Microsoft wanted to address with LINQ to XML is to allow a developer to quickly move data extracted from a relational model to an XML representation, or from an object graph to an XML document. This current DOM API does not support constructs such as projections inside a XQuery, LINQ to XML provides an elegant solution to this problem.

    In the .NET 3.5 framework, LINQ to XML is implemented in the System.Xml.Linq.dll assembly, and exposed through the System.Xml.Linq namespace.

    The dependencies of the System.Xml.Linq.dll assembly is show in the figure below:

    System.XML.Linq.Overview

    LINQ to XML Object Model

    LINQ to XML was developed with Language-Integrated Query over XML in mind from the onset. It takes advantage of the standard query operators and adds query extensions specific to XML. Just as significant as the Language-Integrated Query capabilities of LINQ to XML is the fact that LINQ represents a consistent query experience across all LINQ enabled APIs and allows us to combine XML queries from other data sources. So, with one query, you can access data from:

    1. Local objects in memory
    2. An XML Data Source
    3. One or more SQL Server data sources

    The core classes that make up the LINQ to XML object model are listed below:

    LinqToXMLCoreClasses

    As you can conclude from the object model above, the number of classes involved has been dramatically reduced, resulting in a reduced learning curve. Actually I think that the biggest challenge in working with LINQ to XML is to unlearn some of the bad practices that we had to burn into our brain to make the W3C XML DOM work for us.

    Some key issues regarding this object model are listed below:    

    1. You can now work in a "document free mode" if you would like to do so. In some scenarios, you simply want to create or load some XML, manipulate and query it, and save it back. With the W3C DOM, you would be forced to create an XML document. In LINQ to XML, this is no longer the case. To perform the task listed, you could simply:
      • Create XElements directly (without having an XDocument involved at all)
      • Manipulate the XElements or XAttributes directly.
      • Save the resulting XML tree directly to a writer.
    2. XML names have been greatly simplified. LINQ to XML goes out of its way to make XML names as straightforward as possible. One can say that the complexity of XML names does not originate in namespaces, but from XML prefixes. XML prefixes can be used for reducing the keystrokes required when inputting XML or making XML easier to read, however prefixes are just shortcuts for using the full XML namespace. On input LINQ to XML resolves all prefixes to their corresponding XML Namespace and prefixes are  not exposed at all in the programming API. In LINQ to XML, a XName represents a full XML name consisting of an XNamespace and the local name. Developers will usually find it more convenient to use the XNamespace rather than the namespace URI string.
    3. An attribute (modeled by means of the XAttribute class) is no longer a subclass of the node class). It is now simply a XName-value pair, which is what it always should have been.

    Functional Construction

     

    Querying

     

    WPF Data Binding

     

     

    Tips and Tricks

     

    Conclusion

     

     

     

     

    Notes: Make sure how to construct an XML document from a LINQ database query

  • Creating Application Shortcuts with Google Chrome

    Introduction

    Unless you have been living under a rock for the last week, you probably know that Google shipped the first beta-version of it's new browser, named "Google Chrome". I am not planning to start yet another debate here about why I think Google decided at this point that it needed to get in the Web browser business. You can read plenty of interesting comments on various blog sites, for example Paul Thurrot, from the Windows Supersite had some very interesting feedback on the subject in his Windows Weekly Podcast with Leo Laporte.

    Most of the  features introduced by Chrome have an equivalent in Microsoft's upcoming IE 8 browser (beta 2 just shipped a couple of weeks ago). I do want to highlight one interesting feature of Chrome and that is the ability to create application shortcuts.

    What is  a Chrome Application Shortcut?

    A Chrome Application Shortcut is basically a quick, streamlined way to access a Web Application's functionality in it's own window. You can create a shortcut on either the desktop, the "Start bar", or the "Quick Launch" toolbar. Once the short cut is created, you can simply double-click the shortcut to launch the application in it's own window. Using shortcuts is a nice alternative to launching the full browser window, and entering the URL (either manually or through a "favorites" entry).

    Also, if you use Google Gears, you optionally have the ability to access the application's functionality if offline mode (if so desired).

    Walkthrough

    To create a shortcut, first navigate to the URL of the Web Application. Next, click the Page menu next to the address bar, and select the "Create application shortcuts.." menu item, as is shown below:

    CreateChromeShortcut

    In the Google Gears dialog box that comes up, you can select the location(s) for your shortcut. So in my case, I elected to create a shortcut to Gmail, and I opted to have a shortcut on both my desktop and the "Quick Launch" bar.

    ChromeShortcutLocations

    Note that Chrome will use the appropriate icon for your shortcut.

    After creating your shortcut, I can simply double-click the shortcut, to open the selected Web Application as show below:

    ChromeShortcutInQuickLaunch

    The application opens up in its own Chrome window, without any toolbar or address bar clutter, as shown below:

    ChromAppInOwnWindow

  • Software as a Service - A high level overview

    Table of Contents

    Software as a service is a term is frequently making headlines in a number of trade journals. In all articles on the subject, Software as a Service is mentioned as the "next big thing", which is going to revolutionize the software industry as we know it.

    This article will provide a high-level overview of the subject. The article structured as follows:

    1. In the first section we introduce Software as a Service (know by its acronym Saas), provide a definition and discuss some fundamental topics that form the basis for the technology. Also, since the concept of a "tenant" and "multi-tenancy" is closely associated with Saas, we will include a definition for these terms in the same section.
    2. The next section will address how Saas can mitigate the software acquisition risks for an organization.
    3. Often, a (false) rumor has been spread throughout the industry, that SaaS eliminates the need for an client organization's IT staff. In our next section, we clarify this misconception, and show how SaaS can empower an IT organization to focus on new initiatives that contribute measurable, immediate value to the client's organization.
    4. Next, we discuss the three core attributes that are essential for an effective single instance, multi-tenant application architecture.
    5. Four different maturity models have been defined for a multi-tenant SaaS application, our next section discusses these different maturity levels and their specifics.
    6. In the last section of this article, I provide the reader with some guidance of how the select the appropriate maturity level for their organization's  Saas application(s).

    This article is mostly non-technical is nature. However, I am planning to follow-up this article with a number of technical articles, where we dive into some of the more challenging areas that make it possible to create a highly scalable, configurable and multi-tenant-efficient SaaS solution.

    Introduction

    When you ask a number of software professionals for a definition of the term "Software as a Service", you will get many different, and sometimes conflicting answers. Slowly tough, more and more experts are starting to agree on the fundamental principles that define Software as a Service, and the concepts that differentiate the technology from traditional, on-premise packaged software solutions.

    The current, widely-accepted definition for Software as a Service can best be phrased as follows:

    Saas is software deployed as a multi-tenant hosted service and accessed over the Internet.

    Saas should not be confused with the services provided by traditional Application Service Providers (ASPs). An ASP typically provides a "shrink-wrapped" application to business users over  the Internet, and therefore can be better compared with a traditional on-premise IT application which has been deployed and hosted by a third-party. These applications were architected as a single-tenant application, and were not designed to share data and business process logic with other applications. Since we used the term tenant here I feel like another definition is in order:

    A tenant is a client organization, which is using an application that has been deployed as a SaaS Application.

    Multitenancy refers to the architectural principle, in which a single instance of the software runs on as Software as a Service (SaaS) vendor's servers, servicing multiple client organizations (tenants). Multitenancy can be contrasted with a multi-instance architecture, where separate software instances (or hardware systems) are set up for different client organizations. With a multitenant architecture, a software application is designed to virtually partition its data and configuration so that each client organization works with a different virtual application instance.

    A typical SaaS application is offered either directly by the vendor or by an intermediary party called an aggregator, which bundles SaaS offerings from different vendors and offers them as part of a unified application platform.

    I strongly believe that Saas is going to have a major impact on the software industry, because software as a service will change the way people build, sell, buy and use software. For this to happen though, software vendors need resources and information about developing Saas applications effectively.

    When you take a look at our definition for Saas, it should become immediately clear that it does not prescribe any specific application architecture; it does not say anything about

    Saas and Software Acquisition Risk Mitigation

    Another area in which Saas applications tend to be different as compared to on-premise software is in their licensing model. On-Premise software applications typically have a high cost of entry, either because the application has to be developed from scratch by the internal or external IT staff, or in the case of a purchased application, because of the high up-front licensing fees. In contrast, SaaS application access is typically sold based upon a subscription model. The different subscription models offered by SaaS vendors are:

    • A flat time-based fee (monthly, yearly etc.). This time-based fee will typically be much lower than the large up-front licensing fees for purchased on-premise applications.
    • Usage based upon the metered activity by the customer, where 'usage activity' can be measured in a variety of ways, including number of business transactions, number of orders processed etc.

    Some vendors will use a combination of both models, charging a flat time based fee in addition to some usage-based premium. In addition, most Saas vendors offer a "try before you buy" option, where a potential tenant has the opportunity to try out the application for a period of time before needing to make a commitment towards becoming a full-fledged customer.

    In the traditional model, deploying large scale business-critical software systems (SAP anyone ? ;-) has been a major undertaking. As we mentioned earlier, deploying these systems across a large enterprise can cost hundred of thousands of dollars in up-front licensing cost, and usually requires a small (or sometimes large  ;-) army of IT personnel and consultants to customize and integrate it with with the organization's other systems and data. The time, staff and budget requirements of a deployment of this magnitude represents a significant risk for an organization of any size, and often puts such software out of reach of smaller organizations that would otherwise be able to derive great benefits  from the applications.

    The subscription-based delivery model of SaaS applications changes this whole mindset. SaaS applications don't require the deployment of a large infrastructure at the customer's location, which eliminates or drastically reduces the upfront commitment of resources. With no significant initial investment to amortize, an enterprise that deploys a Saas application that turns out to produce disappointing results can walk away and pursue a different direction, without having to abandon an expensive on-premise infrastructure.

    Does Saas eliminate the need for an IT organization?

    One (wrong) conclusion that some people draw from the deployment of SaaS-based application is that it eliminates, or drastically reduces the need for an internal IT staff of an tenant's organization. Indeed with SaaS, the job of deploying  an application and keeping it running from day to day - testing and installing patches, managing upgrades, monitoring performance, ensuring high availability, and so forth - is handled by the provider.

    But, rather than eliminating the need for an internal IT staff, by transferring the responsibility for the above-mentioned "overhead" activities to a third-party, the IT department can focus more on high-value activities that align with and support the business goals of the enterprise. Instead of being primarily reactive and operations-focused, the chief information offices (CIO) and IT staff can more efficiently function as technology strategists to the rest of the company, working with the business units to understand their business needs and advise them on how to best use technology to accomplish their objectives.

    So, instead of being made obsolete by SaaS, the IT department has an opportunity to contribute to the success of the enterprise more directly than ever before. For example, based upon business requirements, the IT staff could create "composite applications" and "executive dashboards", which act as integrators and collectors of the valuable data contained in the different Saas applications. A successful service-centric IT organization can directly produce more value for the business by providing services that draw from both internal and external sources and align closely with the overall business goals of the organization.

    The Three Attributes of a Single Instance Multi-Tenant Architecture

    From an application's architect point of view, there are three key differentiators that separate a well-design SaaS application from a poorly designed one. A well-designed SaaS application is:

    1. Scaleable
    2. Multi-Tenant-efficient
    3. Configurable

    A graphical representation of these three important differentiators is shown below:

    SaasDifferentiators

    Scaling the application means achieving maximum  concurrently and using application resources more efficiently - for example, optimizing locking duration, statelessness, efficiently leveraging sharing pooled resources  such as thread and network connections, caching frequently used reference data, and partitioning large databases. Note that I am not stating here that a standard single-tenant application should NOT be designed  in a scaleable manner, but because of the resource-intensive nature of a multi-tenant application, these requirements are more amplified, and achieving maximum scalability is therefore even more important.

    Multi-tenancy may be the most significant paradigm shift that an architect accustomed to designing isolated, single-tenant applications has to make. For example, when a user at one company accesses customer information by using a CRM application (e.g. SalesForce.com), service, the application instance that the user connects to may be accommodating users from dozens, or even hundreds, of other companies - all completely unbeknownst to any of the users. This requires an architecture that maximizes the sharing of resources across tenants, but in such a manner that the application is still able to differentiate data belonging to different customers. For example, in a multi-tenant environment, it might be prohibitive to allocate an expensive resource, such as a WIN32 process to a single tenant (we will have more to say about this topic in the next section). Also, the architect needs to ensure 100% data privacy for each tenant while maintaining an acceptable performance for each tenant.

    Of course, if a single application instance on a single server has to accommodate users from several different companies at once, you can't simply write custom code to customize the end-user experience; anything you do to customize the application for one customer will change the application for other customers as well. Instead of customizing the application in the traditional sense, each customer uses metadata to configure the way the application appears and behaves for its users. The challenge for the SaaS Architect is to ensure that the task of configuring applications is simple and easy for the customers, without incurring extra development or operation costs for each configuration. Note that configuration is this context can apply to a number of different areas:

    1. Allowing tenants to add custom fields or tables to the database.
    2. Providing a mechanism by which a tenant can customize the user interface and the look and feel of the application (for example, in the case of a web application, the user should be able to upload custom a style sheet for the application).
    3. Allowing tenants to customize the business process workflows, by including some type of process editor in the tenant's administrator user interface.

    Saas as a Service-Oriented Architecture Maturity Model

    In the above sections, we have discussed the business and technical benefits of Saas. The biggest impact of SaaS tough might be the fact that SaaS finally provides an organization with the right incentives for moving IT towards  a services-centric model or in other words: a "real", applied SOA strategy with concrete,  tangible benefits.

    Broadly speaking, SaaS service maturity can be expressed using a model with four distinct levels. Each level is distinguished from the previous one by the addition of one of the three attributes listed above. A graphical level with the different maturity levels is shown below:

    Saas Overview

    Maturity Level I : Ad Hoc / Custom

    The first level of maturity is similar to the traditional application service provider (ASP) model of software delivery, dating back to the 1990s. At this level, each customer (or tenant) has its own customized version of the hosted application, and runs its own instance of the application on the hosts's servers. Architecturally, software at this maturity level is very similar to traditionally-sold LOB (Line-of-Business) software, in that different clients within the organization connect to a single ins