Francesco's blog

 Thursday, December 01, 2005

One of the .NET Framework features that fascinate me most is regular expressions, which I often use to simplify and speed up my applications. Well, at least this is what I believed until some time ago, when I was busy writing the forthcoming Programming Microsoft Visual Basic 2005: The Language (due in mid-January). This book is a core reference on the VB language and includes a section on the LIKE operator, which in recent years a overlooked in favor of regexes. I (mistakenly) assumed that the Like operator internally used the Regex classes, therefore surely it would have been slower. After all these years, I should have learned that I should never jump to conclusions without testing and benchmarking my code accurately. .

Let's say that you must check that a string has 9 characters, the first of which must be an uppercase "A" and the last four chars must be digits. This is how you'd perform this test with a regex:

Dim re As New Regex("^A....\d\d\d\d$")

and here's the version that uses the Like operator:

If teststring Like "A????####" Then Match = True

Surprise! Putting this code in a loop (but leaving the creation of the regex out of the loop) and using a string that makes the test succeed (e.g. "ABCDE1234"), the Like operator is about 4 times faster than the regular expression. Not bad, uh? But the biggest surprise came when I benchmarked the same test based on methods of the System.Char class exclusively:

If teststring.Length = 9 AndAlso teststring.Char(0) = "A"c AndAlso Char.IsDigit(teststring.Char(5)) Then
   AndAlso
Char.IsDigit(teststring.Char(6)) AndAlso Char.IsDigit(teststring.Char(7))
  
AndAlso Char.IsDigit(teststring.Char(8)) Then match = True

Despite of its length, this last test is about five times faster than the Lik operator, and therefore about 20 times faster than the regexes! The gap gets closer if using compiled regexes, but the System.Chars approach is by far the fastest of the lot.

The bottom line: (1) if you write VB code, use the Like operator instead of regexes if the condition isn't too complex, and (2) regardless of the language you work with, if you really want the highest performance, use the methods of the String and Char types, if the search operation isn't too complex.

 
Get RSS/Atom Feed
RSS 2.0 | Atom 1.0
Search in the blog
Archive
<May 2008>
SunMonTueWedThuFriSat
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567
Categories

Powered by: newtelligence dasBlog 1.8.5223.1