U2U Article: "Regular Expressions
using .NET Common Language Runtime integration in SQL Server 2005" Copyright © 2004 by U2U nv/sa, Belgium. All rights reserved.
Please address any questions or suggestions to U2U.
Regular Expressions
using .NET Common Language Runtime integration in SQL Server 2005
Nico Jacobs
U2U nv/sa
Introduction
SQL Server 2005 offers the possibility to
write functions, stored procedures, aggregate functions, triggers and user
defined types in a .NET language such as Visual Basic .NET or C#. One of the
great advantages of this functionality, is that we can ‘pull’ functionality
that is available in the .NET class libraries into the database. In this
article, we illustrate this idea by ‘pulling’ the .NET support for regular
expressions into SQL Server 2005. We do so by writing a scalar function in .NET
which takes as input a string and a regular expression, and we return a boolean:
true when the regular expression occurred in the string, false if it did not
occur.
We implement this using Visual Studio 2005
Beta 2 and SQL Server 2005 Community Technical Preview IDW 14 (April 2005
build), though we use no deprecated features, meaning this code sample should
also work on the release edition of Visual Studio 2005 and SQL Server 2005.
The first step consists of loading the
correct template for building a .NET scalar function in SQL Server 2005. This
is done by starting Visual Studio 2005 and selecting a C# or Visual Basic
project. Within that folder, we select SQL Server Project. We name our project RegularExpression and click OK.

Visual Studio then
asks to which database we want to connect. This will be the database where we
will deploy the function we are going to write. In our case, we want to deploy
to the AdventureWorks database on our local machine (localhost). Notice that a
default installation of SQL Server 2005 does not install the demonstration
databases (AdventureWorks and AdventureWorksDW), if you did not install these
demonstration databases, then use a different demo database, or rerun the SQL
Server 2005 installation and select the advanced button to select individual items, where you then select the AdventureWorks database.
We click Add New reference and then we type
localhost as server and select AdventureWorks as database. Click the Test Connection
button to verify that all settings are correct. In the Add Database Reference, click the newly created reference and click OK.

Next, in the Solution Explorer (right side
of the screen), right click the project name (RegularExpression) and select Add
-> User-Defined Function. Type ContainsRegExp as function name.

We then get the following code:
C#
using System;
using System.Data;
using System.Data.Sql;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString ContainsRegExp()
{
// Put your code here
return new
SqlString("Hello");
}
};
VB.NET
Imports System
Imports System.Data
Imports System.Data.Sql
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Partial Public Class UserDefinedFunctions
<Microsoft.SqlServer.Server.SqlFunction()>
_
Public Shared Function ContainsRegExp() As
SqlString
' Add your code here
Return New SqlString("Hello")
End Function
End Class
This is a partial class so that we may
construct multiple functions, and we can place each function in a separate
file. Also notice that the ContainsRegExp() function is preceded with the Microsoft.SqlServer.Server.SqlFunction
attribute. In this way, Visual Studio knows that when we deploy this project,
it must not only build the assembly and deploy this assembly to the SQL Server,
but that it must also create a scalar function in the database, which points to
this function in the assembly.
We also see that by default, Visual Studio implements the
traditional Hello World program to illustrate how to program a .NET scalar
function. In the next step, we will change this implementation to suit our
needs.
We now change our code to reflect the idea
we had in mind: given a string and a regular expression, return true if the
regular expression occurs at least once in the string, else return false.
However, while writing this, we have to keep in mind that the .NET code we
write, will execute on a SQL Server, taking its input from SQL Server. As a
consequence, we have to be carefull when we work with .NET types. For instance,
in .NET a boolean can take two values: true or false. But in a database, a boolean (in SQL Server in fact a bit)
can take three possible values: true, false and NULL. Because of that, .NET contains special types to write .NET code to
run in the SQL Server database. These types are implemented in the System.Data.SqlTypes namespace. Here we find for instance a SqlBoolean type, of which
the objects can represent a NULL as well, by setting the IsNull property to
true. With this knowledge, we can change the signature of our function to
reflect the fact that we take two strings as input, and return a Boolean:
C#
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBoolean ContainsRegExp(SqlString
text, SqlString pattern)
VB.NET
<Microsoft.SqlServer.Server.SqlFunction()> _
Public Shared Function ContainsRegExp(ByVal text As
SqlString, _
ByVal pattern As SqlString) As SqlBoolean
The remainder of the code is simply using
the classes offered by the .NET framework: We use the RegEx class in
the System.Text.RegularExpressions namespace. The only thing special is the conversion of the SqlString
objects into String objects. The full code becomes:
C#
using System;
using System.Data;
using System.Data.Sql;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBoolean
ContainsRegExp(SqlString text, SqlString pattern)
{
if (text.IsNull || pattern.IsNull)
{
return (new SqlBoolean(false));
}
else
{
return new SqlBoolean(Regex.IsMatch(text.Value,pattern.Value));
}
}
};
VB.NET
Imports System
Imports System.Data
Imports System.Data.Sql
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.Text.RegularExpressions
Partial Public Class UserDefinedFunctions
<Microsoft.SqlServer.Server.SqlFunction()>
_
Public Shared Function ContainsRegExp(ByVal
text As SqlString, _
ByVal pattern As SqlString) As SqlBoolean
If (text.IsNull OrElse
pattern.IsNull) Then
Return New SqlBoolean(False)
Else
Return New
SqlBoolean(Regex.IsMatch(text.Value, pattern.Value))
End If
End Function
End Class
Next, we build and deploy this project.
First, right click the project in the Solution Explorer and select Build. After a
succesfull build, right click the project again and select Deploy. This
should take a few seconds to connect to the SQL Server, store the assembly (the
.dll file) in the database, and add the function to the list of available
functions.
To use the function we just created, we
open the SQL Server 2005 Management Studio, and we connect to the server on
localhost. To see if the assembly and the function are really there, we unfold
the AdventureWorks database in the object explorer, unfold Programmability, and
then we should see under Assemblies an entry for our RegularExpression
assembly. Under Functions, Scalar-valued Functions, we see a function
dbo.ContainsRegExp.
Next, we open up a query window on the
AdventureWorks database.

Type the following SQL query, looking for
all products containing the substring cup or cep:
select * from Production.ProductDescription
where
dbo.ContainsRegExp(Description,'c[u,e]p')=1
You should get a list of 11 items:

Speed up using Attributes
Though the previous code runs correctly,
Sql Server 2005 will not have the possibility for scheduling the function call
as efficient as possible. The reason for this is that we did not provide enough
information to the database for this. For instance, we did not tell Sql Server
that the function we just wrote is deterministic, meaning that if we run it
many times with the same input parameter, it will return the same result, this
. Another thing that is handy for Sql Server to know is whether this function accesses
the database for fetching information from system tables or regular tables.
Finally, the Sql Server query optimizer would like to know if the function is
precise, meaning that the result of the calculation is not dependant on the
underlying hardware on which the machine is running (e.g. the result of a float
operation 1 / 3 is not the same on all possible hardware configurations on
which Sql Server runs). In all these, Sql Server takes a worst-case scenario,
meaning that if no information is specified about these topics, he assumes the
most pessimistic case from a query optimization point of view. But in our case,
we have written a function that does not access the database at all, nor does
non-deterministic or imprecise operations, so we can inform Sql Server of this,
in that way allowing the system to produce more performant query plans. We
communicate this information by means of named parameters in the SqlFuction
attribute as follows:
C#
[Microsoft.SqlServer.Server.SqlFunction(DataAccess=DataAccessKind.None,
IsDeterministic=true, IsPrecise=true,
SystemDataAccess=SystemDataAccessKind.None)]
public static SqlBoolean ContainsRegExp(SqlString
text, SqlString pattern)
VB.NET
<Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True,
_
DataAccess:=DataAccessKind.None,
SystemDataAccess:=SystemDataAccessKind.None, _
IsPrecise:=True)> _
Public Shared Function ContainsRegExp(ByVal text
As SqlString, _
ByVal pattern As SqlString) As SqlBoolean
Conclusions
In this article we illustrated how easily
.NET functionality can be ‘ported’ into the Sql Server database by creating a
Sql Server CLR scalar functionwhich ‘wraps’ the .NET functionality. We
illustrated this by wrapping the Regular Expresion functionality, but the same
approach can be used to wrap nearly any functionality offered by the extensive
.NET library. Finally notice that this technique is not limited to predefined
.NET functionality, but can of course also being used on any .NET functions you
created yourself.
Dr. Nico Jacobs
is trainer and consultant at U2U
(www.u2u.net). His focus is on Microsoft SQL Server and ADO.NET. You can reach Nico at nico@u2u.be.