U2U Article: "Using SQL Server 2005 to store and query serialized objects" Copyright © 2004 by U2U nv/sa, Belgium. All rights reserved.
Please address any questions or suggestions to U2U.

Using SQL Server 2005 to store and query serialized objects

Introduction

.NET offers the possibility to serialize and deserialize objects into a binary or XML representation. Serializing into XML is a requirement in order to create and consume webservices, binary serialization can be used for remoting. But besides moving objects between different applications, can serialization also be used to ‘materialize’ objects, and give them a persistent storage. This can be done to file, but databases offer a more robust storage (transactional consistency, backup,…). But there is more to it. Since Sql Server 2005 offers XML querying functionality (XPath as well as XQuery), this allows us to query the materialized objects, and even alter them. In this article, we demonstrate this idea by building a small application in which we store persons in a listbox. We then store the state of our listbox items (including the Person objects) serialized as XML in the SQL Server 2005 database, and deserialize this. Since we can store multiple serialized listboxes, we extend our load method so that we can query our database. In our case, we allow the user to enter the name of a person, and when we then click load, the first materialized listbox in which a person with this name occurs is loaded.

In this article, we will take multiple steps to come to this solution:

1.       Create a Sql Server 2005 database to store the materialized objects

2.       Serialize and store the listbox items

3.       Deserialize XML into a .NET object

4.       Query the database so that we can select which item to deserialize

5.       Improve robustness and efficiency of the code by using XSD schemas and XML indexes.

 

Creating the database

From within Visual Studio, we can use the Server Explorer to manage our SQL Server 2005 database server. Start Visual Studio, and open a new Windows application. Next, open up the Server Explorer (usually docked on the left hand side, if this window is not open, choose the View menu, and select Server Explorer). In this walk-through, we store the objects in a new database, which we call Demo. So, right click the Data Connections in the Server Explorer and select Create New SQL Server Database.

Provide the correct information for connecting the server, and specify Demo as the name. Next, a new data connection appears in the Server Explorer. Unfold this connection, and right click on the Tables folder. Select Add New Table, and create a table as shown in the next screenshot. Notice that column 1 is an AutoIncrement column: This column is the identification number for the object, and will later be used in index creation. The other column is of type Xml, which is a new native type in Sql Server 2005.

Also, we make the ObjectID column a primary key by right clicking this column as select Primary Key:

Serializing the listbox items

In order to demonstrate the serialization, we create a Person class, which just stores the Age and Name of a person, offers two constructors (one taking both states as parameters, and one taking no parameters), and overrides the ToString method. Right click the project in the Solution Explorer, choose Add -> Class, Type Person as class name and add the following code:

C#

public class Person

{

  string name;

  int age;

 

  public Person() { }

  public Person(string name, int age)

  {

    Name = name;

    Age = age;

  }

 

  public string Name

  {

    get { return name; }

    set { name = value; }

  }

 

  public int Age

  {

    get { return age; }

    set { age = value; }

  }

 

  public override string ToString()

  {

    return name + ", " + age;

  }

}

VB.NET

Public Class Person

  Private NameValue As String

  Private AgeValue As Integer

  Public Sub New()

 

  End Sub

 

  Public Sub New(ByVal name As String, ByVal age As Integer)

    NameValue = name

    AgeValue = age

  End Sub

 

  Public Property Name() As String

    Get

      Return NameValue

    End Get

    Set(ByVal value As String)

      NameValue = value

    End Set

  End Property

 

  Public Property Age() As Integer

    Get

      Return AgeValue

    End Get

    Set(ByVal value As Integer)

      AgeValue = value

    End Set

  End Property

 

  Public Overrides Function ToString() As String

    Return Name & ", " & Age

  End Function

End Class

Next, we go back to our WinForm, and add the listboxes, labels, textboxes and buttons so that we get the GUI shown in the previous section. Double click the Add button and add the following Event handler, which stores a new Person object in the ListBox

C#

private void Add_Click(object sender, EventArgs e)

{

  listBox1.Items.Add(new Person(textBoxName.Text, Convert.ToInt32(textBoxAge.Text)));

}

VB.NET

Private Sub Add_Click(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles Add.Click

  ListBox1.Items.Add(New Person(TextBox1.Text, Convert.ToInt32(TextBox2.Text)))

End Sub

The next step is writing the code which serializes the collection that stores our Person objects (accessible via the Items property of the listbox object) into Xml. In our implementation, we serialize our object into a StringBuilder. The resulting string will then be passed on to a SqlCommand for insertion into the database. We add this code to the Save_Click event handler (double click the Save button for generating this handler):

C#

private void Save_Click(object sender, EventArgs e)

{

  StringBuilder sb = new StringBuilder();

  TextWriter tw = new StringWriter(sb);

  System.Xml.Serialization.XmlSerializer xs =

    new System.Xml.Serialization.XmlSerializer(listBox1.Items.GetType(), new Type[] { typeof(Person) });

  xs.Serialize(tw, listBox1.Items);

 

  SqlConnection con = new SqlConnection("Data Source=.;Initial Catalog=Demo;Integrated Security=True");

  SqlCommand cmd = con.CreateCommand();

  cmd.CommandText = "insert into dbo.XmlDemo values (@xmlObject)";

  cmd.Parameters.AddWithValue("@xmlObject", sb.ToString());

  try

  {

    con.Open();

    cmd.ExecuteNonQuery();

    con.Close();

  }

  catch (Exception ex)

  {

    MessageBox.Show("Error writing to database: " + ex.Message);

  }

}

VB.NET

Imports System

Imports System.Text

Imports System.IO

Imports System.Data.SqlClient

Imports System.Data.SqlTypes

 

‘…

Private Sub Save_Click(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles Save.Click

  Dim sb As New StringBuilder

  Dim tw As TextWriter = New StringWriter(sb)

  Dim xs As New System.Xml.Serialization.XmlSerializer(_

  ListBox1.Items.GetType(), New Type() {GetType(Person)})

  xs.Serialize(tw, ListBox1.Items)

 

  Dim con As New SqlConnection( _

  "Data Source=.;Initial Catalog=demo;Integrated Security=True")

  Dim cmd As SqlCommand = con.CreateCommand()

  cmd.CommandText = "INSERT INTO dbo.XmlDemo VALUES (@xmlObject)"

  cmd.Parameters.AddWithValue("@xmlObject", sb.ToString())

  Try

    con.Open()

    cmd.ExecuteNonQuery()

    con.Close()

  Catch ex As Exception

    MessageBox.Show("Error while writing to database: " & ex.Message)

  End Try

End Sub

In this code, we create an XmlSerializer. Such a serializer takes as a parameter the type of object it is going to serialize, but in this case, we also need to provide the type of our Person class, since we will be serializing a collection and the serializer needs to know about the Person type. After the serialization, we set up a connection to the Sql Server 2005 database in the traitional ADO.NET way, by creating a SqlConnection and a SqlCommand. Notice that we add the xmlObject parameter as a string: SQL Server will check if the received parameter is well formed XML anyway, so there is no real overhead in passing this parameter as a string. SQL Server handles the conversion from string to Xml in an implicit way.

This shows how simple it is to serialize an object to a database. The additional advantage of storing the XML serialization as XML instead of as string, is that the database checks that well formed XML gets inserted into the database (else the ExecuteNonQuery would cause an exception).

Deserializing

The next step is of course to reverse the serialization process. Afer we restart our application, we want to be able to fill up our listbox again by deserializing the object from our database. In order to do so, we once more set up a traditional SqlConnection, and set as a command the instruction to retrieve the first object in the database. We could have done this via an ExecuteScalar method on our SqlCommand, but in this code we do this via a SqlDataReader to illustrate the new GetSqlXml statement. This returns a SqlXml object, from which we can get a XmlReader. In that way, this Sql type helps us to link the Sql Server Xml storage to the .NET Xml capabilities, without the need to retrieve the data as string and parse this string again into Xml.

Something we have to pay attention to in the deserialization is the type information we give to the XmlSerializer object. If we would use the same object as the one we used during serialization, we run into a problem: ListBox.ObjectCollection, the type of the Items collection in the listbox, does not offer a public constructor. Because of that, we decide to deserialize our objects into an array of objects: a listbox items collection can be filled with the AddRange method based on an array of objects.

The code for the Load button then becomes as follows

C#

private void Load_Click(object sender, EventArgs e)

{

  SqlConnection con = new SqlConnection("Data Source=.;Initial Catalog=demo;Integrated Security=T

  SqlCommand cmd = con.CreateCommand();

  SqlXml x = null;

  cmd.CommandText = "SELECT Object from dbo.XmlDemo WHERE ObjectId = 1";

  try

  {

    con.Open();

    SqlDataReader rdr =cmd.ExecuteReader();

    if (rdr.Read())

    {

       x = rdr.GetSqlXml(0);

    }

    rdr.Close();

    con.Close();

  }

  catch (Exception ex)

  {

    MessageBox.Show("Error reading from database: " + ex.Message);

  }

  System.Xml.Serialization.XmlSerializer xs =

    new System.Xml.Serialization.XmlSerializer( typeof(object[]), new Type[] { typeof(Person) });

  object o = null;

  try

  {

    o = xs.Deserialize(x.CreateReader());

  }

  catch (Exception ex)

  {

    MessageBox.Show("Error deserializing: " + ex.Message);

  }

  listBox1.Items.Clear();

  listBox1.Items.AddRange((object[])o);

  listBox1.Refresh();

}

VB.NET

Private Sub LoadButton_Click(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles LoadButton.Click

  Dim con As New SqlConnection( _

  "Data Source=.;Initial Catalog=demo;Integrated Security=True")

  Dim cmd As SqlCommand = con.CreateCommand()

  Dim x As SqlXml = Nothing

  cmd.CommandText = "SELECT Object FROM dbo.XmlDemo WHERE ObjectID = 1"

  Try

    con.Open()

    Dim rdr As SqlDataReader = cmd.ExecuteReader()

    If rdr.Read() Then

      x = rdr.GetSqlXml(0)

    End If

    rdr.Close()

    con.Close()

  Catch ex As Exception

    MessageBox.Show("Error while reading from database: " & ex.Message)

  End Try

  Dim xs As New System.Xml.Serialization.XmlSerializer(GetType(Object()), _

  New Type() {GetType(Persoon)})

  Dim o As Object = Nothing

  Try

    o = xs.Deserialize(x.CreateReader())

  Catch ex As Exception

    MessageBox.Show("Error while deserializing: " & ex.Message)

  End Try

  ListBox1.Items.Clear()

  ListBox1.Items.AddRange(CType(o, Object()))

  ListBox1.Refresh()

End Sub

Querying the serialized objects

The main advantage of serializing an object into Xml instead of a binary representation is that we can query and even manipulate the serialized object without the need of deserializing it first. Sql Server 2005 offers great possibilities in this, since it supports XPath as well as a large subset of the XQuery language. Besides this, Sql Server 2005 also offers a data manipulation language for Xml, which makes it easy to alter Xml within the database.

In our sample application, we illustrate this by selecting which serialized object we will deserialize (note that with our current implementation, we can serialize our listbox as often as we want, but we can only  deserialize the first serialized listbox). In order to do so, we will make use of the exist function in Sql Server 2005. This function checks if a given Xml document or fragment contains at least one node which satisfies the XPath expression given as a parameter to the exist function.

We alter the event handler for our load button. Instead of loading the first serialized object, we check to see if our search textbox has a value. If so, we use the exist function to discover the id of the oldest serialized object which has this string as the name of one of the serialized objects, and deserialize that Xml document. As we already pointed out, the exist function takes as a parameter an XPath expression. However, in this expression, we have to insert the name we read via a textbox. To prevent sql injection attacks, we have to pass this name as a parameter; if we would just build the command string using concatenation, we would risk that other commands would be sneaked into our command string, with potential disasterous consequences to our database. In order to use a parameter in the XPath expression, we must use a special function named sql:variable(“@variablename”). Note that it is not allowed to put the complete XPath expression in a string variable; SQL Server syntax does not allow this.

In this way, we get the following code:

C#

private void Load_Click(object sender, EventArgs e)

{

  SqlConnection con = new SqlConnection("Data Source=.;Initial Catalog=demo;Integrated Security=True");

  SqlCommand cmd = con.CreateCommand();

  SqlXml x = null;

  int position = 1;

  if (textBox1.Text.Length != 0)

  {

    cmd.CommandText = "SELECT min(ObjectID) FROM dbo.XmlDemo " +

      "WHERE Object.exist('//anyType[Name= sql:variable(\"@name\")]') = 1";

    cmd.Parameters.AddWithValue("@name", textBox1.Text);

    con.Open();

    position = Convert.ToInt32(cmd.ExecuteScalar());

    con.Close();

  }

  cmd.CommandText = "SELECT Object from dbo.XmlDemo WHERE ObjectId = " + position;

  try

  {

    con.Open();

    SqlDataReader rdr =cmd.ExecuteReader();

    if (rdr.Read())

    {

       x = rdr.GetSqlXml(0);

    }

    rdr.Close();

    con.Close();

  }

  catch (Exception ex)

  {

    MessageBox.Show("Error reading from database: " + ex.Message);

  }

  System.Xml.Serialization.XmlSerializer xs =

    new System.Xml.Serialization.XmlSerializer( typeof(object[]), new Type[] { typeof(Person) });

  object o = null;

  try

  {

    o = xs.Deserialize(x.CreateReader());

  }

  catch (Exception ex)

  {

    MessageBox.Show("Error deserializing: " + ex.Message);

  }

  listBox1.Items.Clear();

  listBox1.Items.AddRange((object[])o);

  listBox1.Refresh();

}

VB.NET

Private Sub LoadButton_Click(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles LoadButton.Click

  Dim con As New SqlConnection( _

  "Data Source=.;Initial Catalog=demo;Integrated Security=True")

  Dim cmd As SqlCommand = con.CreateCommand()

  Dim x As SqlXml = Nothing

  Dim position As Integer = 1

  If (TextBox3.Text.Length > 0) Then

    cmd.CommandText = "SELECT min(ObjectID) FROM dbo.XmlDemo " & _

    "WHERE Object.exist('//anyType[Name = sql:variable(""@name"")]') = 1"

    cmd.Parameters.AddWithValue("@name", TextBox3.Text)

    con.Open()

    position = Convert.ToInt32(cmd.ExecuteScalar())

    con.Close()

  End If

  cmd.CommandText = "SELECT Object FROM dbo.XmlDemo WHERE ObjectId = " & _

  position

  Try

    con.Open()

    Dim rdr As SqlDataReader = cmd.ExecuteReader()

    If rdr.Read() Then

      x = rdr.GetSqlXml(0)

    End If

    rdr.Close()

    con.Close()

  Catch ex As Exception

    MessageBox.Show("Error while reading from database: " & ex.Message)

  End Try

  Dim xs As New System.Xml.Serialization.XmlSerializer(GetType(Object()), _

  New Type() {GetType(Persoon)})

  Dim o As Object = Nothing

  Try

    o = xs.Deserialize(x.CreateReader())

  Catch ex As Exception

    MessageBox.Show("Error while deserializing: " & ex.Message)

  End Try

  ListBox1.Items.Clear()

  ListBox1.Items.AddRange(CType(o, Object()))

  ListBox1.Refresh()

End Sub

Now, our application is ready to use. Build and run this application, enter a few persons, save, clear, enter a few other persons, save again, load without providing a name, load again, this time providing the name of a person that appears in the second set but not in the first set.

WARNING. Notice that the application as it is right now, is not well implemented: there are still pieces of code that are very likely to cause exceptions, which we do not catch. For instance, when we query our database with the exist function, we do not check if there is at least one query that satisfies our search criterion, and hence we can get an InvalidCastException. As an exercise, you can scan the current code for these types of problems and fix them.

Improving reliability and performance

In the above implementation, we can improve the Xml storage and retrieval in two ways. The first way is by storing typed Xml (Xml for which an xsd schema exists). This has two advantages. First advantage is that we are sure that we store in our database only Xml files which can always be deserialized – with our current implementation it is possible that we try to read ‘random’ xml from the database, which will cause a deserialization exception. The second advantage is that Sql Server can store and process the xml files more compact and efficient. This is due to the fact that for untyped xml, all node and attribute values need to be stored as strings, while with types xml, we know that certain values are integers, doubles, dates,…, and we can store them as such.

The second way in which we can improve the performance is by creating xml indexes. These are special indexes which speed up the querying of xml documents. But, as all indexes, they have a downside as well: They take additional CPU time for creating and maintaining the indexes, and the indexes take additional storage. So, if we use our application to do many inserts and we only very rarely query, update or delete the data, indexes won’t be any good. However, most applications query more than they insert, and for those applications indexes will give a speedup.

Typed Xml

If we want to store typed data, we run into a problem with our application as it is right now. In our code, we are serializing a collection of objects, but all objects in it are of runtime type Person. Because of that, .NET serializes this information as follows:

<ArrayOfAnyType xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <anyType xsi:type="Person">

    <Name>nico</Name>

    <Age>31</Age>

  </anyType>

  <anyType xsi:type="Person">

    <Name>Siska</Name>

    <Age>30</Age>

  </anyType>

</ArrayOfAnyType>

The collection is serialized as an ArrayOfAnyType element, with each person serialized as an anyType element. With the type="Person" attribute, .NET tells us that this anyType element in .NET is a Person. Based on this Xml file we can let Visual Studio generate an Xsd schema.

<?xml version="1.0" encoding="utf-8"?>

<xsd:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified">

  <xs:element name="ArrayOfAnyType">

    <xs:complexType>

      <xs:sequence>

        <xs:element maxOccurs="unbounded" name="anyType">

          <xs:complexType>

            <xs:sequence>

              <xs:element name="Name" type="xs:string" />

              <xs:element name="Age" type="xs:unsignedByte" />

            </xs:sequence>

          </xs:complexType>

        </xs:element>

      </xs:sequence>

    </xs:complexType>

  </xs:element>

</xsd:schema>

And here we run into an inconsistency: If we use Visual Studio or .NET to check if the above Xml file satisfies the Xsd schema, it will succeed:

C#

XmlReaderSettings xrs = new XmlReaderSettings();

xrs.Schemas.Add(null, @"c:\PersonList.xsd");

XmlReader xr = XmlReader.Create(@"c:\PersonList.xml", xrs);

try

{

  while (xr.Read()) { }

}

catch (XmlException xe)

{

  Console.WriteLine(xe.Message);

}

Console.WriteLine("valid!");

VB.NET

Imports System.Xml

 

Module Module1

 

  Sub Main()

    Dim xrs As New XmlReaderSettings

    xrs.Schemas.Add(Nothing, "c:\PersonList.xsd")

    Dim xr As XmlReader = XmlReader.Create("c:\PersonList.xml", xrs)

    Try

      While xr.Read

 

      End While

    Catch ex As XmlException

      Console.WriteLine("Invalid: " & ex.Message)

    End Try

    Console.WriteLine("Valid")

  End Sub

 

End Module

However, if we use the Xsd schema to type an Xml column in SQL Server 2005, it will reject the above Xml with this warning:

Error writing to database: XML Validation: Invalid cast for element 'anyType' from type 'ANONYMOUS' to type 'Person'. Location: /*:ArrayOfAnyType[1]/*:anyType[1]

The problem is the type attribute. A possible solution to this problem is avoiding that .NET generates these attributes, the other solution is rewriting the generated xsd schema so that it handles these type attributes correctly. In this article, we take the first approach, and solve the problem by not serializing the ObjectCollection in the listbox, but by serializing an array of Person objects. Since we have no typed ListBox collection, this will require an explicit casting when copying the ObjectCollection into the array. As a consequence of this, our XmlSerializer object now only gets as a type an array of Person references. Our serialization code becomes as follows:

C#

private void Save_Click(object sender, EventArgs e)

{

  StringBuilder sb = new StringBuilder();

  TextWriter tw = new StringWriter(sb);

  System.Windows.Forms.ListBox.ObjectCollection oc = listBox1.Items;

 

  Person[] family = new Person[oc.Count];

  for (int i = 0; i < oc.Count; i++)

  {

    family[i] = (Person)oc[i];

  }

 

  System.Xml.Serialization.XmlSerializer xs =

    new System.Xml.Serialization.XmlSerializer(family.GetType());

  xs.Serialize(tw, family);

 

  SqlConnection con = new SqlConnection("Data Source=.;Initial Catalog=demo;Integrated Security=True");

  SqlCommand cmd = con.CreateCommand();

  cmd.CommandText = "insert into dbo.XmlDemo values (@xmlObject)";

  cmd.Parameters.AddWithValue("@xmlObject", sb.ToString());

  try

  {

    con.Open();

    cmd.ExecuteNonQuery();

    con.Close();

  }

  catch (Exception ex)

  {

    MessageBox.Show("Error writing to database: " + ex.Message);

  }

}

VB.NET

Private Sub Save_Click(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles Save.Click

  Dim sb As New StringBuilder

  Dim tw As TextWriter = New StringWriter(sb)

  Dim oc As Windows.Forms.ListBox.ObjectCollection = ListBox1.Items

 

  Dim family As Persoon() = New Persoon(oc.Count - 1) {}

  For i As Integer = 0 To (oc.Count - 1)

    family(i) = CType(oc(i), Persoon)

  Next

  Dim xs As New System.Xml.Serialization.XmlSerializer(family.GetType())

  xs.Serialize(tw, family)

 

  Dim con As New SqlConnection( _

  "Data Source=.;Initial Catalog=demo;Integrated Security=True")

  Dim cmd As SqlCommand = con.CreateCommand()

  cmd.CommandText = "INSERT INTO dbo.XmlDemo VALUES (@xmlObject)"

  cmd.Parameters.AddWithValue("@xmlObject", sb.ToString())

  Try

    con.Open()

    cmd.ExecuteNonQuery()

    con.Close()

  Catch ex As Exception

    MessageBox.Show("Error while writing to database: " & ex.Message)

  End Try

End Sub

We can now also simplify our deserialisation. The declaration of the serializer object in the Load_Click event handler becomes:

C#

System.Xml.Serialization.XmlSerializer xs =

        new System.Xml.Serialization.XmlSerializer( typeof(Person[]));

VB.NET

Dim xs As New System.Xml.Serialization.XmlSerializer(GetType(Persoon()))

Next, we can test our application. This is necessary because in the next step, we need an Xml document generated by our altered code: In order to assign an Xsd schema on our Object column in our SQL Server database, we must first create such an Xsd schema. This can be done by hand or by one of the many tools. We will use  Visual Studio, which can derive an Xsd schema from an Xml file. So, first we will save one of the serialized Xml files.

We will use Visual Studio for this. Open up in the Server Explorer the table where our data is stored, right click this table and choose Show table data.

Next, right click any of the cells containing the Xml serialization stored by running the latest version of our application, and choose Copy. Now, right click the project in the solution explorer, and choose Add New Item. In the dialog box, pick Xml File (not Xsd file!). In the new file, paste the Xml code you copied after the single header line which is default in an Xml file. The resulting Xml file should resemble the following (except for the actual names and ages):

<?xml version="1.0" encoding="utf-8" ?>

<ArrayOfPerson xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <Person>

    <Name>Nico</Name>

    <Age>31</Age>

  </Person>

  <Person>

    <Name>Siska</Name>

    <Age>30</Age>

  </Person>

</ArrayOfPerson>

In the Xml menu select Create Schema. A new tab appears with the XSD schema. Verify that the schema matches this schema:

<?xml version="1.0" encoding="utf-8"?>

<xsd:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified">

  <xs:element name="ArrayOfPerson">

    <xs:complexType>

      <xs:sequence>

        <xs:element maxOccurs="unbounded" name="Person">

          <xs:complexType>

            <xs:sequence>

              <xs:element name="Name" type="xs:string" />

              <xs:element name="Age" type="xs:unsignedByte" />

            </xs:sequence>

          </xs:complexType>

        </xs:element>

      </xs:sequence>

    </xs:complexType>

  </xs:element>

</xsd:schema>

Be careful with this step! The system derives a schema, but manual inspection is still needed. For instance, in our case, the system created a schema where age is an unsignedByte. This makes sense for the age of humans, expressed in years. But if we would express age in days, a larger type is needed. Also, the xsd schema allows for an ‘unbounded’ number of persons in a collection. If we want to limit this, we can do so by manually altering the Xsd schema.

This schema must now by added to our Sql Server 2005. Unfortunate, there is no easy graphical way to do this from Visual Studio 2005, so we will switch to the Sql Server 2005 Management Studio, the tool which more or less replaces the Sql Server 2000 Enterprise Manager and Query Analyzer. This program is part of the SQL Server 2005 client tools, which are not selected in a default SQL Server 2005 installation; you have to select these explicit on the installation screen. Of course, you can always install them afterwards by running the setup process again. Another problem we have to deal with is that we cannot alter our table to typed xml, since not all entries satisfy the xml schema. To solve that problem we will truncate our table, in that way removing all entries.

Open up the SQL Server 2005 Management Studio (from the start menu, or by typing sqlwb on the prompt. Provide the credentials for connecting to the database server, and in the object explorer unfold the databases. Now, right click the demo database, and choose New Query. First, we truncate our table by typing and executing the following statement:

truncate table dbo.XmlDemo

Next, we create a new Xml Schema Collection (which is the xsd schema stored in the database) with the following SQL statement:

create xml schema collection PersonList as '<?xml version="1.0" encoding="utf-8"?>

<xsd:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified">

  <xs:element name="ArrayOfPerson">

    <xs:complexType>

      <xs:sequence>

        <xs:element maxOccurs="unbounded" name="Person">

          <xs:complexType>

            <xs:sequence>

              <xs:element name="Name" type="xs:string" />

              <xs:element name="Age" type="xs:unsignedByte" />

            </xs:sequence>

          </xs:complexType>

        </xs:element>

      </xs:sequence>

    </xs:complexType>

  </xs:element>

</xsd:schema>'

After executing this statement, the Xml Schema Collection is permanently stored in the database, and we can continue working in Visual Studio, where we open our Server Explorer and browse to the XmlDemo table in our demo database. When we right click this table, we select Open Table definition. In the definition, we click on the column that stores our xml documents. If we then look at the column properties under Xml, we see the option for specifying the Xml schema collection. After saving your solution, click on the combobox and select dbo.PersonList. We save first because in some of the beta editions, this operation can cause Visual Studio to go down. We also set the option Is Xml Document to true, else sql server only checks if one enters well formed XML fragments, but with this option on, the database only accepts XML documents in this column.

With these changes, our application works again, but now the storage of our Xml documents is more compact, and typed.

Xml Indexes

If we would save large and/or many Xml documents in our database, the querying with the exist function can be improved significantly by creating special Xml indexes. These indexes speed up searches involving XPath expressions. There are primary and secondary Xml indexes.

A primary index roughly stores for each node (element, attribute, …) in the Xml document the type of node, the value, the path from this node to the Xml document root, and the (non-xml) primary key of the row in which this Xml document appears. This means that an Xml Primary index can only be created after a normal primary key has been defined on the table. A primary Xml index can be seen as a stored shredded Xml document. Without such an index, each call to the exist function would cause each Xml document in the table being shredded into nodes, and the nodes being tested on the XPath criterion. With a primary Xml index, this shredding is no longer needed, and the system can directly start to scan the table and (in our case) compare the stored paths with the XPath expression. Sql Server takes a clever approach in this, because it stores paths from deepest node to root node (the reverse of how we often write paths), because in that way XPath expressions containing wildcards (such as // or *) can still use the index as long as the wildcard is not the last node in the XPath expression (which is rarely the case).

To build such an index, we go back to our Sql Server 2005 Management Studio. In the Object Explorer, unfold the XmlDemo table, right click the Indexes folder, and choose New Index…Provide a name for this index, and select as inde type Primary Xml. Click Add… to add a column to this index and choose the Object column, then click OK to add this index.

While this primary Xml index will result in a speedup of the exist function compared with no indexes at all, we are still not dealing with our Xml data optimally. The reason is that for each XPath expression, all the Xml paths in our index need to be checked sequentially. In fact, what we want in to index (part of) our primary Xml index, and this is exactly what the secondary Xml indexes allow us to do. This also makes clear that we cannot build secondary Xml indexes without first building a primary Xml index.

There are three types of secondary Xml indexes: PATH, VALUE and PROPERTY. A path index indexes the paths and node values, and is mainly intended for queries where the full path without wildcards (or the descendant axis) is used. A value index indexes the node values and path, and are because of that suited for XPath queries where we search on a value, with only minimal path specifications (as is the case for the query we are using in our project). Finally, a property index is useful if you have many value queries, where you retrieve a single scalar value out of a Xml document. A property index is based on the primary key, the path and the node value, and can because of that be used to get scalar values out of the Xml document soley from the index, without the need to go to the primary Xml index.

In our case, we can use a value index, so we create one with the following T-SQL statement (there is unfortunately no graphical way for constructing these secondary indexes):

CREATE XML INDEX SIXMLV_XmlDemo_Object

ON dbo.XmlDemo(Object)

USING XML INDEX PIXML_XmlDemo_Object

FOR VALUE

With this, we obtain our final database.

 

Dr. Nico Jacobs

is trainer and consultant at U2U (www.u2u.net). His focus is on Microsoft SQL Server and ADO.NET. You can reach Nico at nico@u2u.be.

Contact me Contact


Contact me Receive U2U Newsletter.
Looking for a challenging job Download Brochure On Site Training Looking for a challenging job
Favorites Favorites

Copyright © 1999-2010 by U2U