U2U Article: "Using SQL Server 2005 to store and
query serialized objects" Copyright © 2004 by U2U nv/sa, Belgium. All rights reserved.
Please address any questions or suggestions to U2U.
Using SQL Server 2005 to store and
query serialized objects
Introduction
.NET offers the possibility to serialize
and deserialize objects into a binary or XML representation. Serializing into
XML is a requirement in order to create and consume webservices, binary
serialization can be used for remoting. But besides moving objects between
different applications, can serialization also be used to ‘materialize’
objects, and give them a persistent storage. This can be done to file, but
databases offer a more robust storage (transactional consistency, backup,…).
But there is more to it. Since Sql Server 2005 offers XML querying
functionality (XPath as well as XQuery), this allows us to query the
materialized objects, and even alter them. In this article, we demonstrate this
idea by building a small application in which we store persons in a listbox. We
then store the state of our listbox items (including the Person objects)
serialized as XML in the SQL Server 2005 database, and deserialize this. Since
we can store multiple serialized listboxes, we extend our load method so that
we can query our database. In our case, we allow the user to enter the name of
a person, and when we then click load, the first materialized listbox in which
a person with this name occurs is loaded.

In this article, we will take multiple
steps to come to this solution:
1.
Create a Sql Server 2005 database to store the
materialized objects
2.
Serialize and store the listbox items
3.
Deserialize XML into a .NET object
4.
Query the database so that we can select which
item to deserialize
5.
Improve robustness and efficiency of the code by
using XSD schemas and XML indexes.
From within Visual Studio, we can use the
Server Explorer to manage our SQL Server 2005 database server. Start Visual Studio, and open a new Windows application. Next, open up the Server Explorer
(usually docked on the left hand side, if this window is not open, choose the View menu, and
select Server Explorer). In this walk-through, we store the objects in a new database,
which we call Demo. So, right click the Data Connections in the Server Explorer
and select Create New SQL Server
Database.

Provide the correct information for
connecting the server, and specify Demo as the name. Next, a new data
connection appears in the Server Explorer. Unfold this connection, and right click
on the Tables folder. Select Add
New Table, and create a table as shown in the
next screenshot. Notice that column 1 is an AutoIncrement column: This column
is the identification number for the object, and will later be used in index creation. The other column is of type Xml, which is a new native type in Sql Server
2005.

Also, we make the ObjectID column a primary
key by right clicking this column as select Primary Key:

In order to demonstrate the serialization,
we create a Person class, which just stores the Age and Name of a person, offers two constructors (one taking both states as parameters, and one
taking no parameters), and overrides the ToString method. Right click the
project in the Solution Explorer, choose Add -> Class, Type Person as class
name and add the following code:
C#
public class Person
{
string name;
int age;
public Person() { }
public Person(string name, int age)
{
Name = name;
Age = age;
}
public string Name
{
get { return name; }
set { name = value; }
}
public int Age
{
get { return age; }
set { age = value; }
}
public override string ToString()
{
return name + ", " + age;
}
}
VB.NET
Public Class Person
Private NameValue As String
Private AgeValue As Integer
Public Sub New()
End Sub
Public Sub New(ByVal name As String, ByVal
age As Integer)
NameValue = name
AgeValue = age
End Sub
Public Property Name() As String
Get
Return NameValue
End Get
Set(ByVal value As String)
NameValue = value
End Set
End Property
Public Property Age() As Integer
Get
Return AgeValue
End Get
Set(ByVal value As Integer)
AgeValue = value
End Set
End Property
Public Overrides Function ToString() As
String
Return Name & ", " & Age
End Function
End Class
Next, we go back to our WinForm, and add
the listboxes, labels, textboxes and buttons so that we get the GUI shown in
the previous section. Double click the Add button and add the following Event
handler, which stores a new Person object in the ListBox
C#
private void Add_Click(object sender, EventArgs
e)
{
listBox1.Items.Add(new
Person(textBoxName.Text, Convert.ToInt32(textBoxAge.Text)));
}
VB.NET
Private Sub Add_Click(ByVal sender As
System.Object, _
ByVal e As
System.EventArgs) Handles Add.Click
ListBox1.Items.Add(New Person(TextBox1.Text,
Convert.ToInt32(TextBox2.Text)))
End Sub
The next step is writing the code which
serializes the collection that stores our Person objects (accessible via the
Items property of the listbox object) into Xml. In our implementation, we
serialize our object into a StringBuilder. The resulting string will then be
passed on to a SqlCommand for insertion into the database. We add this code to the
Save_Click event handler (double click the Save button for generating this
handler):
C#
private void Save_Click(object sender,
EventArgs e)
{
StringBuilder sb = new StringBuilder();
TextWriter tw = new StringWriter(sb);
System.Xml.Serialization.XmlSerializer xs =
new
System.Xml.Serialization.XmlSerializer(listBox1.Items.GetType(), new Type[] {
typeof(Person) });
xs.Serialize(tw, listBox1.Items);
SqlConnection con = new SqlConnection("Data
Source=.;Initial Catalog=Demo;Integrated Security=True");
SqlCommand cmd = con.CreateCommand();
cmd.CommandText = "insert into
dbo.XmlDemo values (@xmlObject)";
cmd.Parameters.AddWithValue("@xmlObject", sb.ToString());
try
{
con.Open();
cmd.ExecuteNonQuery();
con.Close();
}
catch (Exception ex)
{
MessageBox.Show("Error writing to
database: " + ex.Message);
}
}
VB.NET
Imports System
Imports System.Text
Imports System.IO
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
‘…
Private Sub Save_Click(ByVal sender As System.Object,
_
ByVal e As System.EventArgs) Handles Save.Click
Dim sb As New StringBuilder
Dim tw As TextWriter = New StringWriter(sb)
Dim xs As New
System.Xml.Serialization.XmlSerializer(_
ListBox1.Items.GetType(), New Type()
{GetType(Person)})
xs.Serialize(tw, ListBox1.Items)
Dim con As New SqlConnection( _
"Data Source=.;Initial
Catalog=demo;Integrated Security=True")
Dim cmd As SqlCommand = con.CreateCommand()
cmd.CommandText = "INSERT INTO
dbo.XmlDemo VALUES (@xmlObject)"
cmd.Parameters.AddWithValue("@xmlObject",
sb.ToString())
Try
con.Open()
cmd.ExecuteNonQuery()
con.Close()
Catch ex As Exception
MessageBox.Show("Error while writing
to database: " & ex.Message)
End Try
End Sub
In this code, we create an XmlSerializer.
Such a serializer takes as a parameter the type of object it is going to
serialize, but in this case, we also need to provide the type of our Person
class, since we will be serializing a collection and the serializer needs to
know about the Person type. After the serialization, we set up a connection to
the Sql Server 2005 database in the traitional ADO.NET way, by creating a
SqlConnection and a SqlCommand. Notice that we add the xmlObject parameter as a
string: SQL Server will check if the received parameter is well formed XML
anyway, so there is no real overhead in passing this parameter as a string. SQL
Server handles the conversion from string to Xml in an implicit way.
This shows how simple it is to serialize an
object to a database. The additional advantage of storing the XML serialization
as XML instead of as string, is that the database checks that well formed XML
gets inserted into the database (else the ExecuteNonQuery would cause an
exception).
The next step is of course to reverse the
serialization process. Afer we restart our application, we want to be able to
fill up our listbox again by deserializing the object from our database. In
order to do so, we once more set up a traditional SqlConnection, and set as a
command the instruction to retrieve the first object in the database. We could
have done this via an ExecuteScalar method on our SqlCommand, but in this code
we do this via a SqlDataReader to illustrate the new GetSqlXml statement. This
returns a SqlXml object, from which we can get a XmlReader. In that way, this
Sql type helps us to link the Sql Server Xml storage to the .NET Xml
capabilities, without the need to retrieve the data as string and parse this
string again into Xml.
Something we have to pay attention to in the
deserialization is the type information we give to the XmlSerializer object. If
we would use the same object as the one we used during serialization, we run
into a problem: ListBox.ObjectCollection, the type of the Items collection in
the listbox, does not offer a public constructor. Because of that, we decide to
deserialize our objects into an array of objects: a listbox items collection
can be filled with the AddRange method based on an array of objects.
The code for the Load button then becomes
as follows
C#
private void Load_Click(object sender,
EventArgs e)
{
SqlConnection con = new
SqlConnection("Data Source=.;Initial Catalog=demo;Integrated Security=T
SqlCommand cmd = con.CreateCommand();
SqlXml x = null;
cmd.CommandText = "SELECT Object from
dbo.XmlDemo WHERE ObjectId = 1";
try
{
con.Open();
SqlDataReader rdr =cmd.ExecuteReader();
if (rdr.Read())
{
x = rdr.GetSqlXml(0);
}
rdr.Close();
con.Close();
}
catch (Exception ex)
{
MessageBox.Show("Error reading from
database: " + ex.Message);
}
System.Xml.Serialization.XmlSerializer xs =
new System.Xml.Serialization.XmlSerializer(
typeof(object[]), new Type[] { typeof(Person) });
object o = null;
try
{
o = xs.Deserialize(x.CreateReader());
}
catch (Exception ex)
{
MessageBox.Show("Error deserializing:
" + ex.Message);
}
listBox1.Items.Clear();
listBox1.Items.AddRange((object[])o);
listBox1.Refresh();
}
VB.NET
Private Sub LoadButton_Click(ByVal sender As
System.Object, _
ByVal e As System.EventArgs) Handles
LoadButton.Click
Dim con As New SqlConnection( _
"Data Source=.;Initial
Catalog=demo;Integrated Security=True")
Dim cmd As SqlCommand = con.CreateCommand()
Dim x As SqlXml = Nothing
cmd.CommandText = "SELECT Object FROM
dbo.XmlDemo WHERE ObjectID = 1"
Try
con.Open()
Dim rdr As SqlDataReader =
cmd.ExecuteReader()
If rdr.Read() Then
x = rdr.GetSqlXml(0)
End If
rdr.Close()
con.Close()
Catch ex As Exception
MessageBox.Show("Error while reading
from database: " & ex.Message)
End Try
Dim xs As New
System.Xml.Serialization.XmlSerializer(GetType(Object()), _
New Type() {GetType(Persoon)})
Dim o As Object = Nothing
Try
o = xs.Deserialize(x.CreateReader())
Catch ex As Exception
MessageBox.Show("Error while
deserializing: " & ex.Message)
End Try
ListBox1.Items.Clear()
ListBox1.Items.AddRange(CType(o, Object()))
ListBox1.Refresh()
End Sub
The main advantage of serializing an object
into Xml instead of a binary representation is that we can query and even
manipulate the serialized object without the need of deserializing it first.
Sql Server 2005 offers great possibilities in this, since it supports XPath as
well as a large subset of the XQuery language. Besides this, Sql Server 2005
also offers a data manipulation language for Xml, which makes it easy to alter
Xml within the database.
In our sample application, we illustrate
this by selecting which serialized object we will deserialize (note that with
our current implementation, we can serialize our listbox as often as we want,
but we can only deserialize the first serialized listbox). In order to do so,
we will make use of the exist function in Sql Server 2005. This function checks if a given Xml
document or fragment contains at least one node which satisfies the XPath
expression given as a parameter to the exist function.
We alter the event handler for our load
button. Instead of loading the first serialized object, we check to see if our
search textbox has a value. If so, we use the exist function to discover the id
of the oldest serialized object which has this string as the name of one of the
serialized objects, and deserialize that Xml document. As we already pointed
out, the exist function takes as a parameter an XPath expression. However, in
this expression, we have to insert the name we read via a textbox. To prevent
sql injection attacks, we have to pass this name as a parameter; if we would
just build the command string using concatenation, we would risk that other
commands would be sneaked into our command string, with potential disasterous
consequences to our database. In order to use a parameter in the XPath
expression, we must use a special function named sql:variable(“@variablename”).
Note that it is not allowed to put the complete XPath expression in a string
variable; SQL Server syntax does not allow this.
In this way, we get the following code:
C#
private void Load_Click(object sender,
EventArgs e)
{
SqlConnection con =
new SqlConnection("Data Source=.;Initial Catalog=demo;Integrated
Security=True");
SqlCommand cmd =
con.CreateCommand();
SqlXml x = null;
int position = 1;
if (textBox1.Text.Length != 0)
{
cmd.CommandText = "SELECT
min(ObjectID) FROM dbo.XmlDemo " +
"WHERE Object.exist('//anyType[Name=
sql:variable(\"@name\")]') = 1";
cmd.Parameters.AddWithValue("@name", textBox1.Text);
con.Open();
position =
Convert.ToInt32(cmd.ExecuteScalar());
con.Close();
}
cmd.CommandText = "SELECT Object from
dbo.XmlDemo WHERE ObjectId = " + position;
try
{
con.Open();
SqlDataReader rdr =cmd.ExecuteReader();
if (rdr.Read())
{
x = rdr.GetSqlXml(0);
}
rdr.Close();
con.Close();
}
catch (Exception ex)
{
MessageBox.Show("Error reading from
database: " + ex.Message);
}
System.Xml.Serialization.XmlSerializer xs =
new System.Xml.Serialization.XmlSerializer(
typeof(object[]), new Type[] { typeof(Person) });
object o = null;
try
{
o = xs.Deserialize(x.CreateReader());
}
catch (Exception ex)
{
MessageBox.Show("Error deserializing:
" + ex.Message);
}
listBox1.Items.Clear();
listBox1.Items.AddRange((object[])o);
listBox1.Refresh();
}
VB.NET
Private Sub LoadButton_Click(ByVal sender As
System.Object, _
ByVal e As System.EventArgs) Handles
LoadButton.Click
Dim con As New SqlConnection( _
"Data Source=.;Initial
Catalog=demo;Integrated Security=True")
Dim cmd As SqlCommand = con.CreateCommand()
Dim x As SqlXml = Nothing
Dim position As Integer = 1
If (TextBox3.Text.Length > 0) Then
cmd.CommandText = "SELECT
min(ObjectID) FROM dbo.XmlDemo " & _
"WHERE Object.exist('//anyType[Name =
sql:variable(""@name"")]') = 1"
cmd.Parameters.AddWithValue("@name", TextBox3.Text)
con.Open()
position =
Convert.ToInt32(cmd.ExecuteScalar())
con.Close()
End If
cmd.CommandText = "SELECT Object FROM
dbo.XmlDemo WHERE ObjectId = " & _
position
Try
con.Open()
Dim rdr As SqlDataReader = cmd.ExecuteReader()
If rdr.Read() Then
x = rdr.GetSqlXml(0)
End If
rdr.Close()
con.Close()
Catch ex As Exception
MessageBox.Show("Error while reading
from database: " & ex.Message)
End Try
Dim xs As New System.Xml.Serialization.XmlSerializer(GetType(Object()),
_
New Type() {GetType(Persoon)})
Dim o As Object = Nothing
Try
o = xs.Deserialize(x.CreateReader())
Catch ex As Exception
MessageBox.Show("Error while
deserializing: " & ex.Message)
End Try
ListBox1.Items.Clear()
ListBox1.Items.AddRange(CType(o, Object()))
ListBox1.Refresh()
End Sub
Now, our application is ready to use. Build
and run this application, enter a few persons, save, clear, enter a few other
persons, save again, load without providing a name, load again, this time
providing the name of a person that appears in the second set but not in the
first set.
WARNING. Notice that the application as it
is right now, is not well implemented: there are still pieces of code that are
very likely to cause exceptions, which we do not catch. For instance, when we
query our database with the exist function, we do not check if there is at
least one query that satisfies our search criterion, and hence we can get an
InvalidCastException. As an exercise, you can scan the current code for these
types of problems and fix them.
In the above implementation, we can improve
the Xml storage and retrieval in two ways. The first way is by storing typed
Xml (Xml for which an xsd schema exists). This has two advantages. First
advantage is that we are sure that we store in our database only Xml files
which can always be deserialized – with our current implementation it is
possible that we try to read ‘random’ xml from the database, which will cause a
deserialization exception. The second advantage is that Sql Server can store
and process the xml files more compact and efficient. This is due to the fact
that for untyped xml, all node and attribute values need to be stored as
strings, while with types xml, we know that certain values are integers,
doubles, dates,…, and we can store them as such.
The second way in which we can improve the
performance is by creating xml indexes. These are special indexes which speed
up the querying of xml documents. But, as all indexes, they have a downside as
well: They take additional CPU time for creating and maintaining the indexes, and the indexes take additional storage. So, if we use our application to do many
inserts and we only very rarely query, update or delete the data, indexes won’t be any good. However, most applications query more than they insert, and for
those applications indexes will give a speedup.
Typed Xml
If we want to store typed data, we run into
a problem with our application as it is right now. In our code, we are
serializing a collection of objects, but all objects in it are of runtime type
Person. Because of that, .NET serializes this information as follows:
<ArrayOfAnyType
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<anyType xsi:type="Person">
<Name>nico</Name>
<Age>31</Age>
</anyType>
<anyType xsi:type="Person">
<Name>Siska</Name>
<Age>30</Age>
</anyType>
</ArrayOfAnyType>
The collection is serialized as an ArrayOfAnyType
element, with each person serialized as an anyType element. With the type="Person" attribute, .NET tells us that this anyType element in .NET is a Person.
Based on this Xml file we can let Visual Studio generate an Xsd schema.
<?xml version="1.0"
encoding="utf-8"?>
<xsd:schema
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
attributeFormDefault="unqualified" elementFormDefault="qualified">
<xs:element name="ArrayOfAnyType">
<xs:complexType>
<xs:sequence>
<xs:element
maxOccurs="unbounded" name="anyType">
<xs:complexType>
<xs:sequence>
<xs:element
name="Name" type="xs:string" />
<xs:element
name="Age" type="xs:unsignedByte" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xsd:schema>
And here we run into an inconsistency: If
we use Visual Studio or .NET to check if the above Xml file satisfies the Xsd
schema, it will succeed:
C#
XmlReaderSettings xrs = new
XmlReaderSettings();
xrs.Schemas.Add(null,
@"c:\PersonList.xsd");
XmlReader xr =
XmlReader.Create(@"c:\PersonList.xml", xrs);
try
{
while (xr.Read()) { }
}
catch (XmlException xe)
{
Console.WriteLine(xe.Message);
}
Console.WriteLine("valid!");
VB.NET
Imports System.Xml
Module Module1
Sub Main()
Dim xrs As New XmlReaderSettings
xrs.Schemas.Add(Nothing,
"c:\PersonList.xsd")
Dim xr As XmlReader =
XmlReader.Create("c:\PersonList.xml", xrs)
Try
While xr.Read
End While
Catch ex As XmlException
Console.WriteLine("Invalid: "
& ex.Message)
End Try
Console.WriteLine("Valid")
End Sub
End Module
However, if we use the Xsd schema to type
an Xml column in SQL Server 2005, it will reject the above Xml with this
warning:
Error writing to database: XML Validation:
Invalid cast for element 'anyType' from type 'ANONYMOUS' to type 'Person'. Location: /*:ArrayOfAnyType[1]/*:anyType[1]
The problem is the type attribute.
A possible solution to this problem is avoiding that .NET generates these
attributes, the other solution is rewriting the generated xsd schema so that it
handles these type attributes correctly. In this article, we take the first
approach, and solve the problem by not serializing the ObjectCollection in the
listbox, but by serializing an array of Person objects. Since we have no typed
ListBox collection, this will require an explicit casting when copying the ObjectCollection into the array. As a consequence of this, our XmlSerializer object now
only gets as a type an array of Person references. Our serialization code
becomes as follows:
C#
private void Save_Click(object sender,
EventArgs e)
{
StringBuilder sb = new StringBuilder();
TextWriter tw = new StringWriter(sb);
System.Windows.Forms.ListBox.ObjectCollection
oc = listBox1.Items;
Person[] family = new Person[oc.Count];
for (int i = 0; i
< oc.Count; i++)
{
family[i] = (Person)oc[i];
}
System.Xml.Serialization.XmlSerializer xs =
new System.Xml.Serialization.XmlSerializer(family.GetType());
xs.Serialize(tw, family);
SqlConnection con = new
SqlConnection("Data Source=.;Initial Catalog=demo;Integrated
Security=True");
SqlCommand cmd =
con.CreateCommand();
cmd.CommandText = "insert into
dbo.XmlDemo values (@xmlObject)";
cmd.Parameters.AddWithValue("@xmlObject", sb.ToString());
try
{
con.Open();
cmd.ExecuteNonQuery();
con.Close();
}
catch (Exception ex)
{
MessageBox.Show("Error writing to
database: " + ex.Message);
}
}
VB.NET
Private Sub Save_Click(ByVal sender As
System.Object, _
ByVal e As System.EventArgs) Handles Save.Click
Dim sb As New StringBuilder
Dim tw As TextWriter = New StringWriter(sb)
Dim oc As
Windows.Forms.ListBox.ObjectCollection = ListBox1.Items
Dim family As Persoon() = New
Persoon(oc.Count - 1) {}
For i As Integer = 0 To (oc.Count - 1)
family(i) = CType(oc(i), Persoon)
Next
Dim xs As New System.Xml.Serialization.XmlSerializer(family.GetType())
xs.Serialize(tw, family)
Dim con As New SqlConnection( _
"Data Source=.;Initial
Catalog=demo;Integrated Security=True")
Dim cmd As SqlCommand = con.CreateCommand()
cmd.CommandText = "INSERT INTO
dbo.XmlDemo VALUES (@xmlObject)"
cmd.Parameters.AddWithValue("@xmlObject", sb.ToString())
Try
con.Open()
cmd.ExecuteNonQuery()
con.Close()
Catch ex As Exception
MessageBox.Show("Error while writing
to database: " & ex.Message)
End Try
End Sub
We can now also simplify our
deserialisation. The declaration of the serializer object in the Load_Click
event handler becomes:
C#
System.Xml.Serialization.XmlSerializer xs =
new
System.Xml.Serialization.XmlSerializer( typeof(Person[]));
VB.NET
Dim xs As New
System.Xml.Serialization.XmlSerializer(GetType(Persoon()))
Next, we can test our application. This is
necessary because in the next step, we need an Xml document generated by our
altered code: In order to assign an Xsd schema on our Object column in our SQL
Server database, we must first create such an Xsd schema. This can be done by
hand or by one of the many tools. We will use Visual Studio, which can derive
an Xsd schema from an Xml file. So, first we will save one of the serialized
Xml files.
We will use Visual Studio for this. Open up
in the Server Explorer the table where our data is stored, right click this
table and choose Show table data.

Next, right click any of the cells
containing the Xml serialization stored by running the latest version of our
application, and choose Copy. Now, right click the project in the solution
explorer, and choose Add New Item. In the dialog box, pick Xml File (not Xsd file!). In
the new file, paste the Xml code you copied after the single header line which
is default in an Xml file. The resulting Xml file should resemble the following
(except for the actual names and ages):
<?xml version="1.0"
encoding="utf-8" ?>
<ArrayOfPerson
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Person>
<Name>Nico</Name>
<Age>31</Age>
</Person>
<Person>
<Name>Siska</Name>
<Age>30</Age>
</Person>
</ArrayOfPerson>
In the Xml menu select Create Schema. A new tab appears with the XSD schema.
Verify that the schema matches this schema:
<?xml version="1.0"
encoding="utf-8"?>
<xsd:schema
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
attributeFormDefault="unqualified"
elementFormDefault="qualified">
<xs:element
name="ArrayOfPerson">
<xs:complexType>
<xs:sequence>
<xs:element
maxOccurs="unbounded" name="Person">
<xs:complexType>
<xs:sequence>
<xs:element name="Name"
type="xs:string" />
<xs:element
name="Age" type="xs:unsignedByte" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xsd:schema>
Be careful with this step! The system derives a schema, but
manual inspection is still needed. For instance, in our case, the system
created a schema where age is an unsignedByte. This makes sense for the age of
humans, expressed in years. But if we would express age in days, a larger type
is needed. Also, the xsd schema allows for an ‘unbounded’ number of persons in
a collection. If we want to limit this, we can do so by manually altering the
Xsd schema.
This schema must now by added to our Sql Server 2005.
Unfortunate, there is no easy graphical way to do this from Visual Studio 2005,
so we will switch to the Sql Server 2005 Management Studio,
the tool which more or less replaces the Sql Server 2000 Enterprise Manager and
Query Analyzer. This program is part of the SQL Server 2005 client tools, which
are not selected in a default SQL Server 2005 installation; you have to select
these explicit on the installation screen. Of course, you can always install
them afterwards by running the setup process again. Another problem we have to
deal with is that we cannot alter our table to typed xml, since not all entries
satisfy the xml schema. To solve that problem we will truncate our table, in
that way removing all entries.
Open up the SQL Server 2005 Management
Studio (from the start menu, or by typing sqlwb on the prompt. Provide the
credentials for connecting to the database server, and in the object explorer
unfold the databases. Now, right click the demo database, and choose New Query.
First, we truncate our table by typing and executing the following statement:
truncate table dbo.XmlDemo
Next, we create a new Xml Schema Collection
(which is the xsd schema stored in the database) with the following SQL
statement:
create xml schema collection PersonList as '<?xml
version="1.0" encoding="utf-8"?>
<xsd:schema
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
attributeFormDefault="unqualified" elementFormDefault="qualified">
<xs:element name="ArrayOfPerson">
<xs:complexType>
<xs:sequence>
<xs:element
maxOccurs="unbounded" name="Person">
<xs:complexType>
<xs:sequence>
<xs:element
name="Name" type="xs:string" />
<xs:element name="Age"
type="xs:unsignedByte" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xsd:schema>'
After executing this statement, the Xml Schema Collection is permanently stored in the database, and we can continue working in Visual Studio, where we open our Server Explorer and browse to the XmlDemo table in our demo
database. When we right click this table, we select Open Table definition. In the definition, we click on
the column that stores our xml documents. If we then look at the column
properties under Xml, we see the option for specifying the Xml schema
collection. After saving your solution, click on the combobox and select
dbo.PersonList. We save first because in some of the beta editions, this
operation can cause Visual Studio to go down. We also set the option Is Xml
Document to true, else sql server only checks if one enters well formed XML
fragments, but with this option on, the database only accepts XML documents in
this column.

With these changes, our application works again, but now the
storage of our Xml documents is more compact, and typed.
Xml Indexes
If we would save large and/or many Xml documents in our
database, the querying with the exist function can be improved significantly by
creating special Xml indexes. These indexes speed up searches involving XPath expressions.
There are primary and secondary Xml indexes.
A primary index roughly stores for each node (element,
attribute, …) in the Xml document the type of node, the value, the path from
this node to the Xml document root, and the (non-xml) primary key of the row in
which this Xml document appears. This means that an Xml Primary index can only be created after a normal primary key has been defined on the table. A
primary Xml index can be seen as a stored shredded Xml document. Without such
an index, each call to the exist function would cause each Xml document in the
table being shredded into nodes, and the nodes being tested on the XPath
criterion. With a primary Xml index, this shredding is no longer needed, and
the system can directly start to scan the table and (in our case) compare the
stored paths with the XPath expression. Sql Server takes a clever approach in
this, because it stores paths from deepest node to root node (the reverse of
how we often write paths), because in that way XPath expressions containing
wildcards (such as // or *) can still use the index as long as the wildcard is
not the last node in the XPath expression (which is rarely the case).
To build such an index, we go back to our Sql Server 2005
Management Studio. In the Object Explorer, unfold the XmlDemo table, right
click the Indexes folder, and choose New Index…Provide a name for this index, and select as inde type Primary Xml. Click Add… to add a column to this index and
choose the Object column, then click OK to add this index.

While this primary Xml index will result in a speedup of the
exist function compared with no indexes at all, we are still not dealing with
our Xml data optimally. The reason is that for each XPath expression, all the
Xml paths in our index need to be checked sequentially. In fact, what we want
in to index (part of) our primary Xml index, and this is exactly what the
secondary Xml indexes allow us to do. This also makes clear that we cannot
build secondary Xml indexes without first building a primary Xml index.
There are three types of secondary Xml indexes: PATH, VALUE
and PROPERTY. A path index indexes the paths and node values, and is mainly
intended for queries where the full path without wildcards (or the descendant
axis) is used. A value index indexes the node values and path, and are because
of that suited for XPath queries where we search on a value, with only minimal
path specifications (as is the case for the query we are using in our project).
Finally, a property index is useful if you have many value queries, where you
retrieve a single scalar value out of a Xml document. A property index is based on the primary key, the path and the node value, and can because of that be
used to get scalar values out of the Xml document soley from the index, without the need to go to the primary Xml index.
In our case, we can use a value index, so we create one with
the following T-SQL statement (there is unfortunately no graphical way for
constructing these secondary indexes):
CREATE XML INDEX SIXMLV_XmlDemo_Object
ON dbo.XmlDemo(Object)
USING XML INDEX PIXML_XmlDemo_Object
FOR VALUE
With this, we obtain our final database.
Dr. Nico Jacobs
is trainer and consultant at U2U
(www.u2u.net). His focus is on Microsoft SQL Server and ADO.NET. You can reach Nico at nico@u2u.be.