Francesco's blog

 Tuesday, August 29, 2006

I know, I know, there are soooo many regular expression tester tools available on the 'Net, but I couldn't help creating my own. It's very simple, yet it supports all the basic features you'd espect from a tool of its kind, including code generation (VB and C#) and compilation to stand-alone assemblies. Best of all, it comes with source code. You can download it from the home page of my Visual Basic 2005 book (together with all other code samples in the book) or directly from here:

Executable (requires .NET Framework 2.0): RegexTester.zip (75.57 KB)
Source code (VB2005): RegexTester source.zip (88.15 KB)

Using the tool is quite simple. The main window is divided in three panes: (a) the pane where you enter the regex, (b) the pane where you load the text the regex must be applied to, (c) the result pane. A fourth pane appears when you select the Replace item from the Commands menu, and it's where you enter the replace pattern. As you can see in the image below, you can enter most regex patterns by selecting them from the context menu:

You select the kind of command (Find, replace, Split) from the Commands menu and you select one or more regex options from the Options menu:

After selecting the proper options, you press the F5 key (the Run item in the Commands menu) to execute the regex. Results are displayed in the bottom pane in a variety of formats and sort orders, which you can select from the Results menu, and the status bar displays the number of matches, the execution time, and the properties of the result currently highlighted in the result pane:

Alternatively, you can set all these options from the Properties dialog box (the Properties command in the File menu, or just press the F4 key):

Assigning a name to the current regex is important because you can save it on disk in a file with .regex extension, for later retrieval.

The Commands menu contains a couple of other interesting items. First, you can generate the C# or VB code for the current regular expression and copy it to the Clipboard:

Second, and more interesting, you can compile one or more regular expressions (including saved .regex projects) into a compiled assembly, which you can later reference from any .NET application. Using such compiled regexes is obviously faster than defining them in code, because you skip the parsing step:

 

That's it. You can use the YART tool for your own use, study its source code, modify and expands it as you like. If you find any major problems or add some noteworthy feature, just let me know.

C# | Regex | Tools | Visual Basic
8/29/2006 8:00:02 PM (GMT Daylight Time, UTC+01:00)  #    Disclaimer  |  Comments [1]  | 
 Monday, March 06, 2006

I am reorganizing my MP3 collection and found that I needed to rename a large quantity of files. Of course, there are many free utilities that allow this operation - and that can use MP3 tags in the process - but I thought that I might write one myself. Thanks to regular expressions, the task shouldn't be that hard. In fact, in a few minutes I came up with the following console application. As you see, most of the code is used to extract and validate arguments on the command line:

Imports System.Text.RegularExpressions
Imports System.IO

Module Renx

  
Function Main(ByVal args() As String) As Integer
     
Console.WriteLine("RENX (C) Francesco Balena / Code Architects Srl")

      Dim recurse As Boolean = False
     
Dim renameMode As Boolean = False
     
Dim oldNamePattern As String = Nothing
     
Dim newNamePattern As String = Nothing

      ' analyze each argument
     
For Each arg As String In args
        
Select Case arg.ToLower()
           
Case "/s", "-s"
              
recurse = True
            
Case "/r", "-r"
              
renameMode = True
           
Case "/h", "-h"
              
Return ShowHelp(0)
           
Case Else
              
If oldNamePattern Is Nothing Then
                 
oldNamePattern = "^" & arg & "$"
              
ElseIf newNamePattern Is Nothing Then
                 
newNamePattern = arg
              
Else
                 
Return ShowHelp(1)
              
End If
        
End Select
     
Next

      ' check that we have both mandatory arguments
     
If oldNamePattern Is Nothing OrElse newNamePattern Is Nothing Then
        
Return ShowHelp(1)
     
End If
     
' create the regex and check that pattern syntax is ok
     
Dim reSearch As Regex
     
Try
        
reSearch = New Regex(oldNamePattern, RegexOptions.IgnoreCase)
        
' test the replace pattern as well
        
Dim tmp As String = reSearch.Replace("a dummy string", newNamePattern)
      
Catch ex As Exception
         Console.WriteLine(
"SYNTAX ERROR: {0}", ex.Message)
        
Return 3
     
End Try
     
Console.WriteLine()

      ' iterate over all files in current directory (and its subdirectories, if recurse mode)
     
Dim searchOpt As SearchOption = SearchOption.TopDirectoryOnly
     
If recurse Then searchOpt = SearchOption.AllDirectories

      Dim parsedFilesCount As Integer = 0
     
Dim renamedFilesCount As Integer = 0
     
Dim errorsCount As Integer = 0
     
For Each oldFile As String In Directory.GetFiles(Directory.GetCurrentDirectory(), "*.*", searchOpt)
         parsedFilesCount += 1
        
' the regex applies to name only
        
Dim oldName As String = Path.GetFileName(oldFile)
        
Dim ma As Match = reSearch.Match(oldName)
        
If ma.Success Then
           
' this is the new name
           
Dim newName As String = ma.Result(newNamePattern)
            Console.WriteLine(oldFile)
            Console.Write(
" => {0}", newName)
            renamedFilesCount += 1
           
' proceed with rename only if not in simulation mode
           
If renameMode Then
              
Try
                 
Dim dirName As String = Path.GetDirectoryName(oldFile)
                 
Dim newFile As String = Path.Combine(dirName, newName)
                  File.Move(oldFile, newFile)
              
Catch ex As Exception
                  Console.Write(
" -- ERROR: {0}", ex.Message)
                  errorsCount += 1
              
End Try
           
End If
           
Console.WriteLine()
        
End If
     
Next

      ' Display a report
     
If renameMode Then
        
Console.WriteLine("Summary: {0} parsed files, {1} renamed files, {2} errors", parsedFilesCount, renamedFilesCount, errorsCount)
     
Else
        
Console.WriteLine("Summary: {0} parsed files, {1} files affected", parsedFilesCount, renamedFilesCount)
         Console.WriteLine()
         Console.WriteLine(
"NOTE: Running in simulation mode. Specify the /R option to actually rename files.")
     
End If
     
' Return an error code
     
If errorsCount = 0 Then
        
Return 0
     
Else
        
Return 2
     
End If
  
End Function

   Function ShowHelp(ByVal exitCode As Integer) As Integer
     
Console.WriteLine()
      Console.WriteLine(
"Syntax: RENX <oldnamepattern> <newnamepattern> [/R] [/S] [/H]")
      Console.WriteLine(
" oldnamepattern : regex that selects the files to be renamed")
      Console.WriteLine(
" newnamepattern : regex that specifies how files must be renamed")
      Console.WriteLine(
" /R : rename files")
      Console.WriteLine(
" /S : iterate over subdirectories")
      Console.WriteLine(
" /H : display this help")
      Console.WriteLine(
"NOTE: By default the program runs in simulation mode, and just displays how files would be renamed.")
      Console.WriteLine(
" You must specify the /R option to actually rename the files.")
     
Return exitCode
  
End Function

End Module

At the very minimum, the RENX utility requires two arguments: a regex that specifies which files in the current directory (and its subdirectories, if you add the /S option) must be renamed, and a second regex that specifies how to rename the files that are matched by the first regex. The power of RENX is the fact that the first regex can (actually, must) specify one or more groups of characters, and these groups are then referenced in the second regex. For example, let's suppose that I have a folder with the following files:

        01 Speak to Me.mp3
        02 On the Run.mp3
        03 Time.mp3
        04 The Great Gig in the Sky.vb3
        05 Money.mp3
        06 Us and Them.mp3
        07 Any Colour You Like.vbr
        08 Brain Damage.mp3
        09 Eclipse.vb3

and that I want to rename them as follows:

        01 - Speak to Me - The Dark Side of the Moon.mp3
        02 - On the Run - The Dark Side of the Moon.mp3
        03 - Time - The Dark Side of the Moon.mp3
        04 - The Great Gig in the Sky - The Dark Side of the Moon.vbr
        05 - Money - The Dark Side of the Moon.mp3
        06 - Us and Them - The Dark Side of the Moon.mp3
        07 - Any Colour You Like - The Dark Side of the Moon.vb3
        08 - Brain Damage - The Dark Side of the Moon.mp3
        09 - Eclipse - The Dark Side of the Moon.vbr

Here's the RENX command that does it:

        RENX "(\d\d) (.+?)(\..+)"    "${1} - ${2} - The Dark Side of the Moon.${3}"

Notice that the first regex creates three groups by enclosing them in parenthesis: (\d\d) matches the song number, (.+?) matches the song title, and (\..+) matches the file extension, dot included. The second argument can then reorder these three groups, using the ${N}, where N is the position of the group as specified in the first regex. It is therefore to insert a dash after the song number, and the albumname after the song title.

Because the RENX utility is quite dangerous, by default it doe NOT rename the files, and it just lists how files would be renamed. To actually proceed with the rename operation, you must specify the /R option:

        RENX "(\d\d) (.+?)(\..+)"    "${1} - ${2} - The Dark Side of the Moon${3}" /R

That's all. You can play with the source code to extend the RENX utility as you prefer, and maybe turn it into a Windows Form application, or you can download the binary version from this link: Renx.zip (5.51 KB)

3/6/2006 5:30:10 PM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [0]  | 
 Monday, January 16, 2006

If I could get istantaneous results for the following simple two-question survey

  1. Is Visual Studio the application that you use most often?
  2. Did you ever use regex searches in the VS Find dialog box?

I'd bet that 80% of you would answer YES to the first question, but 99% of you would answer NO to the second question, which would be a rather weird result. Regex searches are among the most powerful VS features, yet few developers use them or even know that they exist.

IMHO, the real problem is that VS regex's syntax is completely different from the syntax you use with the Regex class, therefore using this feature requires that you learn yet another regex dialect. This is a bit too much for most developers. Microsoft should allow the standard regex syntax in this dialog: they could implement this change very easily and in a short time, without caring about backward compabibility issues.

While waiting for Microsoft to offer this little-big innovation, you can have fun with what you have today. Here are a few examples, excerpted from my new book Programming Microsoft Visual Basic 2005: The Language:

:i = :z   Search assignments of an integer constant (:z) to a variable (:i). In VB, but more rarely in C#, it can deliver false matches, when the = operator is used in an expression.

:i = :q   Search assignments of a quoted string constant (:q) to a variable.

(Dim|Private|Public) :i As String   Search for variable declarations of string type (VB only). You can easy adapt it to other data types.

Dim <(:Lu(:Ll)*)+> As   Search for local VB variables that use a PascalCase naming convention and therefore violate Microsoft's guidelines. (Local variables should use the camelCase convention.)

^:b*'.+\n   Search for comment lines in VB, that is, lines that begin with an apostrophe. (It doesn't consider the REM keyword.) You can replace the apostrophe with // to use this search pattern in C# as well.

Dim {:i} As (.|\n)#<\1>    Highlights the portion of code between the declaration of a local variable and the first occurrence of that variable in the method. You can repeat this search for all the local variables in a method and check whether you should refactor your code by moving the declaration closer to where the variable is used for the fist time. (See effect in figure below.)

1/16/2006 5:13:02 PM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [0]  | 
 Thursday, December 01, 2005

One of the .NET Framework features that fascinate me most is regular expressions, which I often use to simplify and speed up my applications. Well, at least this is what I believed until some time ago, when I was busy writing the forthcoming Programming Microsoft Visual Basic 2005: The Language (due in mid-January). This book is a core reference on the VB language and includes a section on the LIKE operator, which in recent years a overlooked in favor of regexes. I (mistakenly) assumed that the Like operator internally used the Regex classes, therefore surely it would have been slower. After all these years, I should have learned that I should never jump to conclusions without testing and benchmarking my code accurately. .

Let's say that you must check that a string has 9 characters, the first of which must be an uppercase "A" and the last four chars must be digits. This is how you'd perform this test with a regex:

Dim re As New Regex("^A....\d\d\d\d$")

and here's the version that uses the Like operator:

If teststring Like "A????####" Then Match = True

Surprise! Putting this code in a loop (but leaving the creation of the regex out of the loop) and using a string that makes the test succeed (e.g. "ABCDE1234"), the Like operator is about 4 times faster than the regular expression. Not bad, uh? But the biggest surprise came when I benchmarked the same test based on methods of the System.Char class exclusively:

If teststring.Length = 9 AndAlso teststring.Char(0) = "A"c AndAlso Char.IsDigit(teststring.Char(5)) Then
   AndAlso
Char.IsDigit(teststring.Char(6)) AndAlso Char.IsDigit(teststring.Char(7))
  
AndAlso Char.IsDigit(teststring.Char(8)) Then match = True

Despite of its length, this last test is about five times faster than the Lik operator, and therefore about 20 times faster than the regexes! The gap gets closer if using compiled regexes, but the System.Chars approach is by far the fastest of the lot.

The bottom line: (1) if you write VB code, use the Like operator instead of regexes if the condition isn't too complex, and (2) regardless of the language you work with, if you really want the highest performance, use the methods of the String and Char types, if the search operation isn't too complex.

12/1/2005 10:56:56 AM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [0]  | 
 
Get RSS/Atom Feed
RSS 2.0 | Atom 1.0
Search in the blog
Archive
<May 2008>
SunMonTueWedThuFriSat
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567
Categories

Powered by: newtelligence dasBlog 1.8.5223.1