U2U Article: "Regular Expressions using .NET Common Language Runtime integration in SQL Server 2005" Copyright © 2004 by U2U nv/sa, Belgium. All rights reserved.
Please address any questions or suggestions to U2U.

Regular Expressions using .NET Common Language Runtime integration in SQL Server 2005

Nico Jacobs
U2U nv/sa

Introduction

SQL Server 2005 offers the possibility to write functions, stored procedures, aggregate functions, triggers and user defined types in a .NET language such as Visual Basic .NET or C#. One of the great advantages of this functionality, is that we can ‘pull’ functionality that is available in the .NET class libraries into the database. In this article, we illustrate this idea by ‘pulling’ the .NET support for regular expressions into SQL Server 2005. We do so by writing a scalar function in .NET which takes as input a string and a regular expression, and we return a boolean: true when the regular expression occurred in the string, false if it did not occur.

We implement this using Visual Studio 2005 Beta 2 and SQL Server 2005 Community Technical Preview IDW 14 (April 2005 build), though we use no deprecated features, meaning this code sample should also work on the release edition of Visual Studio 2005 and SQL Server 2005.

Starting a database project

The first step consists of loading the correct template for building a .NET scalar function in SQL Server 2005. This is done by starting Visual Studio 2005 and selecting a C# or Visual Basic project. Within that folder, we select SQL Server Project. We name our project RegularExpression and click OK.

Visual Studio then asks to which database we want to connect. This will be the database where we will deploy the function we are going to write. In our case, we want to deploy to the AdventureWorks database on our local machine (localhost). Notice that a default installation of SQL Server 2005 does not install the demonstration databases (AdventureWorks and AdventureWorksDW), if you did not install these demonstration databases, then use a different demo database, or rerun the SQL Server 2005 installation and select the advanced button to select individual items, where you then select the AdventureWorks database.

We click Add New reference and then we type localhost as server and select AdventureWorks as database. Click the Test Connection button to verify that all settings are correct. In the Add Database Reference, click the newly created reference and click OK.

Next, in the Solution Explorer (right side of the screen), right click the project name (RegularExpression) and select Add -> User-Defined Function. Type ContainsRegExp as function name.

We then get the following code:

C#

using System;

using System.Data;

using System.Data.Sql;

using System.Data.SqlTypes;

using Microsoft.SqlServer.Server;

 

public partial class UserDefinedFunctions

{

      [Microsoft.SqlServer.Server.SqlFunction]

      public static SqlString ContainsRegExp()

      {

            // Put your code here

            return new SqlString("Hello");

      }

};

VB.NET

Imports System

Imports System.Data

Imports System.Data.Sql

Imports System.Data.SqlTypes

Imports Microsoft.SqlServer.Server

 

Partial Public Class UserDefinedFunctions

  <Microsoft.SqlServer.Server.SqlFunction()> _

  Public Shared Function ContainsRegExp() As SqlString

    ' Add your code here

    Return New SqlString("Hello")

  End Function

End Class

This is a partial class so that we may construct multiple functions, and we can place each function in a separate file. Also notice that the ContainsRegExp() function is preceded with the Microsoft.SqlServer.Server.SqlFunction attribute. In this way, Visual Studio knows that when we deploy this project, it must not only build the assembly and deploy this assembly to the SQL Server, but that it must also create a scalar function in the database, which points to this function in the assembly.

We also see that by default, Visual Studio implements the traditional Hello World program to illustrate how to program a .NET scalar function. In the next step, we will change this implementation to suit our needs.

Implementing the ContainsRegExp function

We now change our code to reflect the idea we had in mind: given a string and a regular expression, return true if the regular expression occurs at least once in the string, else return false. However, while writing this, we have to keep in mind that the .NET code we write, will execute on a SQL Server, taking its input from SQL Server. As a consequence, we have to be carefull when we work with .NET types. For instance, in .NET a boolean can take two values: true or false. But in a database, a boolean (in SQL Server in fact a bit) can take three possible values: true, false and NULL. Because of that, .NET contains special types to write .NET code to run in the SQL Server database. These types are implemented in the System.Data.SqlTypes namespace. Here we find for instance a SqlBoolean type, of which the objects can represent a NULL as well, by setting the IsNull property to true. With this knowledge, we can change the signature of our function to reflect the fact that we take two strings as input, and return a Boolean:

C#

[Microsoft.SqlServer.Server.SqlFunction]

public static SqlBoolean ContainsRegExp(SqlString text, SqlString pattern)

VB.NET

<Microsoft.SqlServer.Server.SqlFunction()> _

Public Shared Function ContainsRegExp(ByVal text As SqlString, _

ByVal pattern As SqlString) As SqlBoolean

The remainder of the code is simply using the classes offered by the .NET framework: We use the RegEx class in the System.Text.RegularExpressions namespace. The only thing special is the conversion of the SqlString objects into String objects. The full code becomes:

C#

using System;

using System.Data;

using System.Data.Sql;

using System.Data.SqlTypes;

using Microsoft.SqlServer.Server;

using System.Text.RegularExpressions;

 

public partial class UserDefinedFunctions

{

      [Microsoft.SqlServer.Server.SqlFunction]

      public static SqlBoolean ContainsRegExp(SqlString text, SqlString pattern)

      {

    if (text.IsNull || pattern.IsNull)

    {

      return (new SqlBoolean(false));

    }

    else

    {

      return new SqlBoolean(Regex.IsMatch(text.Value,pattern.Value));

    }

  }

};

VB.NET

Imports System

Imports System.Data

Imports System.Data.Sql

Imports System.Data.SqlTypes

Imports Microsoft.SqlServer.Server

Imports System.Text.RegularExpressions

 

Partial Public Class UserDefinedFunctions

  <Microsoft.SqlServer.Server.SqlFunction()> _

  Public Shared Function ContainsRegExp(ByVal text As SqlString, _

ByVal pattern As SqlString) As SqlBoolean

    If (text.IsNull OrElse pattern.IsNull) Then

      Return New SqlBoolean(False)

    Else

      Return New SqlBoolean(Regex.IsMatch(text.Value, pattern.Value))

    End If

  End Function

End Class

Next, we build and deploy this project. First, right click the project in the Solution Explorer and select Build. After a succesfull build, right click the project again and select Deploy. This should take a few seconds to connect to the SQL Server, store the assembly (the .dll file) in the database, and add the function to the list of available functions.

Using the function

To use the function we just created, we open the SQL Server 2005 Management Studio, and we connect to the server on localhost. To see if the assembly and the function are really there, we unfold the AdventureWorks database in the object explorer, unfold Programmability, and then we should see under Assemblies an entry for our RegularExpression assembly. Under Functions, Scalar-valued Functions, we see a function dbo.ContainsRegExp.

Next, we open up a query window on the AdventureWorks database.

Type the following SQL query, looking for all products containing the substring cup or cep:

select * from Production.ProductDescription

where dbo.ContainsRegExp(Description,'c[u,e]p')=1

You should get a list of 11 items:

Speed up using Attributes

Though the previous code runs correctly, Sql Server 2005 will not have the possibility for scheduling the function call as efficient as possible. The reason for this is that we did not provide enough information to the database for this. For instance, we did not tell Sql Server that the function we just wrote is deterministic, meaning that if we run it many times with the same input parameter, it will return the same result, this . Another thing that is handy for Sql Server to know is whether this function accesses the database for fetching information from system tables or regular tables. Finally, the Sql Server query optimizer would like to know if the function is precise, meaning that the result of the calculation is not dependant on the underlying hardware on which the machine is running (e.g. the result of a float operation 1 / 3 is not the same on all possible hardware configurations on which Sql Server runs). In all these, Sql Server takes a worst-case scenario, meaning that if no information is specified about these topics, he assumes the most pessimistic case from a query optimization point of view. But in our case, we have written a function that does not access the database at all, nor does non-deterministic or imprecise operations, so we can inform Sql Server of this, in that way allowing the system to produce more performant query plans. We communicate this information by means of named parameters in the SqlFuction attribute as follows:

C#

[Microsoft.SqlServer.Server.SqlFunction(DataAccess=DataAccessKind.None,

  IsDeterministic=true, IsPrecise=true,

  SystemDataAccess=SystemDataAccessKind.None)]

public static SqlBoolean ContainsRegExp(SqlString text, SqlString pattern)

VB.NET

<Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, _

 DataAccess:=DataAccessKind.None, SystemDataAccess:=SystemDataAccessKind.None, _

 IsPrecise:=True)> _

Public Shared Function ContainsRegExp(ByVal text As SqlString, _

ByVal pattern As SqlString) As SqlBoolean

Conclusions

In this article we illustrated how easily .NET functionality can be ‘ported’ into the Sql Server database by creating a Sql Server CLR scalar functionwhich ‘wraps’ the .NET functionality. We illustrated this by wrapping the Regular Expresion functionality, but the same approach can be used to wrap nearly any functionality offered by the extensive .NET library. Finally notice that this technique is not limited to predefined .NET functionality, but can of course also being used on any .NET functions you created yourself.

Dr. Nico Jacobs

is trainer and consultant at U2U (www.u2u.net). His focus is on Microsoft SQL Server and ADO.NET. You can reach Nico at nico@u2u.be.

Contact me Contact


Contact me Receive U2U Newsletter.
Looking for a challenging job Download Brochure On Site Training Looking for a challenging job
Favorites Favorites

Copyright © 1999-2010 by U2U