Francesco's blog

 Saturday, December 22, 2007

When you migrate code from VB6 - regardless of whether you are using an automatic conversion tool - chances are that string-intensive code will actually run slower under VB.NET, if it uses a lot of string concatenation operations. For example, this code takes 2.8 seconds when it runs in VB6 and 27 seconds after its conversion to VB.NET on my 3GHz system:

Dim s As String = ""
For i As Integer = 1 To 100000
    s = s + "*"
Next

The solution is of course trivial: just replace the string variable with a StringBuilder object. Alas, this fix requires that you completely revise your source code, because you need to replace all + and & operators with the Append method, not to mention cases where the StringBuilder is used as an argument to string functions such as Trim or Len.

Is there a way to speed up the previous code with a minimal impact on the code itself? The answer is yes and the solution is actually very simple: you just need to create a VB.NET class that is based on the System.Text.StringBuilder object, that redifines the + and & operators, and that supports the implicit conversion to and from the System.String type. Authoring such a StringBuilder6 class is a matter of minutes:

' a wrapper for the StringBuilder object, with support for + and & operators

Public Class StringBuilder6

    Private buffer As New System.Text.StringBuilder

    ' return the inner string

    Public Overrides Function ToString() As String
       
Return buffer.ToString()
    End Function

    Public Shared Operator +(ByVal op1 As StringBuilder6, ByVal op2 As String) As StringBuilder6
        op1.buffer.Append(op2)
       
Return op1
    End Operator

    Public Shared Operator &(ByVal op1 As StringBuilder6, ByVal op2 As String) As StringBuilder6
        op1.buffer.Append(op2)
        Return op1
    End Operator

    ' convert to string

    Public Shared Widening Operator CType(ByVal op As StringBuilder6) As String
       
Return op.ToString()
    End Operator

    ' convert from string

    Public Shared Widening Operator CType(ByVal str As String) As StringBuilder6
        Dim op As New StringBuilder6()
        op.buffer.Append(str)
        Return op
    End Operator

End Class

Once the StringBuilder6 class is in place, you just need to replace the type of the String variable:

Dim s As StringBuilder6 = ""

After this edit, the loop runs in 0.008 seconds, that is about 2000 times faster!!! Not bad, for such a simple fix :-)

Regardless of whether you are migrating code from VB6 or you've written VB.NET or C# code from scratch, the StringBuilder class gives you a quick and simple way to check whether your string concatenations can be optimized by resorting to a StringBuilder.

 

12/22/2007 11:40:06 AM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [3]  | 
 Wednesday, April 11, 2007

I like the ability to extend the power of my applications by simply adding a reference to an assembly that contains the functions or the controls that I need. I like much less, however, the need to distribute and deploy many DLLs together with my executables. In this post I show a technique that I use to compress (nearly) all the DLLs of a Windows Forms application and "merge" them with the main EXE.

All the files you need are in this ZIP archive, which contains the AsmZip.exe utility (which you run from the command prompt) and two source files, Unzipper.cs and Unzipper.vb. I suggest that you copy the AsmZip utility in a directory listed on the system path, to run it easily.

Step-by-step
These are the steps you must follow to implement the technique.

1) Add either the Unzipper.vb or the Unzipper.cs file to the main project of your application, depending on the language you've used.

2) In the Main method, add a statement that initializes the AssemblyUnzipper class (which is defined in the Unzipper file you added in step 1).
       ' (Visual Basic 2005)
       CodeArchitects.AssemblyUnzipper.Initialize()
       // Visual C# 2005
       CodeArchitects.AssemblyUnzipper.Initialize();
It is essential that this statement runs before any other statement in the application, particularly before showing a form that contains a control implemented in one of the DLLs you want to compress. If you are working with VB and the application has a startup form (and therefore you don't have a Sub Main method), you should initialize the AssemblyUnzipper class from inside the startup form's static constructor:
      Shared Sub New()
         CodeArchitects.AssemblyUnzipper.Initialize()
      End Sub

3) compile the project, obviously in Release mode. (You should use this technique just before delivering the executable to your customer(s).

4) open a command prompt window from inside the application's \bin directory, and run the AsmZip utility as follows:
             AsmZip main.exe *.dll
where main.exe is the name of the main executable. The above command compresses *all* the DLLs in the directory and appends the compressed data to the main.exe file. If you want to compress just a subset of the DLLs that the application uses, you should specify their names, as in this example:
             AsmZip main.exe CodeArchitects*.dll Microsoft*.dll
There can be a few good reasons not to compress some of the DLLs used by the applications, as I'll explain shortly.

5) You can now delete all the DLLs that you have compressed, because the application - thanks to the AssemblyUnzipper class - is able to find them at the end of its executable file, to decompress them, and to load them in memory.

Pros
Before proceeding with an explaination of the technique's inner details, let's summarize its advantages:

a) simpliefied deployment: you need to distribute fewer files (often just the main EXE)
b) more robust applications: end users can't break the application by accidentally deleting one of its DLLs
c) fewer bytes on disk: all DLLs are compressed and appended to the main EXE file
d) the ability to "hide" some of your trade secrets, for example which 3rd party controls you've used
e) a slightly better protection of your intellectual property: compressed DLLs can't be decompiled, at least not as easily as uncompressed DLLs .

The last two points aren't a real protection against even unexperienced malicious hackers, if he or she is determined to peek into your application. To do so he would just need to decompile the main EXE, understand how the AssemblyZipper class works, and write a short programma that works similarly but saves the uncompressed assemblies to disk. In other words, don't rely on this technique to protect your code from reverse engineering.

The AsmZip tool relies on the GZipStream class to compress the original DLLs, therefore the compression factor that you achieve with this technique is lower than the one you can obtain with WinZip or WinRar, but it is usually more than adequate, as the following figure shows.


How it works
This technique relies on the AssemblyResolve event of the AppDomain object. This event fires when the CLR loads an assembly referenced by the running application. By handling this event you can perform some nice tricks that wouldn't be possible otherwise. For example, you might load satellite assemblies from a network share or from a binary field in a database.

The AssemblyUnzippere class uses this event to search the required assembly from a compressed stream that has been appended to the application's main EXE file:
      // the handler for AssemblyResolve event
      static Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs e)
      {
         // find the assembly with given name, cause error if not found
         AssemblyInfo info = null;
         if ( AsmInfos.TryGetValue(e.Name, out info) )
            return ExtractAssembly(info);
         // signal error
         Debug.WriteLine("Failed to uncompress assembly " + info.Name);
         return null;
      }
Each AssemblyInfo object keeps track of where, in the main EXE file, the compressed data for each DLL is located. The AsmInfos dictionary enables the code to quickly locate the information associated with a DLL with given name. This dictionary is created inside the Initialize method, when the application is launched, and is then used each time the application attempts to load an assembly. For more details, see comments in either the VB or the C# source code.

Limitations
I tried this technique with several Windows Forms apps, without any problem. The main issue is that compressed assemblies loaded programmatically have their Location property set to null/Nothing, but if you don't use reflection to explore the assembly's feature you might never realize that the assembly was loaded in a nonstandard way. For example, if your app dynamically loads all the assemblies in a given directory, for example to explore their attributes, it is evident that it won't work as intended if these DLLs have been compressed and then deleted. In such cases, you should exclude these DLLs from compression.

The AssemblyZipper class works only with Windows Forms applications. For what I know, it is possible to use the AssemblyResolve event inside ASP.NET applications, but it isn't possible to use the AssemblyUnzipper in that context. However, the problems that this technique solves aren't considered as real issues under ASP.NET, therefore I don't think it makes sense to use it in web applications.

The only other limitation is that this technique works with DLLs but not with the main EXE. If you have a large EXE that uses some small DLLs, you won't achieve an interesting compression factor. In such a case, you might want to move your forms from the main EXE into a DLL and then compress the DLL with AsmZip. Even better, the main EXE might contain only the splash screen (if you have one) and it should load the startup form from the DLL that contains the actual application. Using this approach it is often possible to achieve an overall compression factor near or above 60 percent.

Note: in the first implementation of this technique I managed to successfully compress even the main EXE and used a small “stub” executable whose only job was to decompress and launch the actual EXE. After some tests, however, I found that the technique wasn’t very stable and I fell back to the technique described in this article.

4/11/2007 5:07:41 PM (GMT Daylight Time, UTC+01:00)  #    Disclaimer  |  Comments [2]  | 
 Thursday, December 08, 2005

Last spring I co-authored this book, Practical Guidelines and Best Practices for Microsoft Visual Basic .NET and Visual C# Developers, arguably the longest title in Microsoft Press's history. The book is a reasoned list of guidelines that all .NET developers should follow, actually is by far the largest collection of its kind you can find anywhere. It covers language syntax, memory usage, Windows Forms and ASP.NET applications, security, and more.

Unlike most other similar collections, though, we clearly divide the "rules" in guidelines (naming guidelines, comment usage, etc.) and best practices. The difference is subtle but important: most guidelines are primarily a style matter, whereas best practices impact the scalability, the speed, or the robustness of your application. This means that our guidelines are arbitrary and in fact we often offer alternate rules and clearly explain the pros and cons of each style.

You can learn more about the principles we used in the book's Introduction and in John Robbins's Foreword. (Unlike most foreword writers, John actually read each and every page in the manuscript and gave us some great advice about improving it.) Or click the figure to jump to the book's home page, where you can read three sample chapters and download the book's source code.

Today I have uploaded a 30-page Word document that contains a summary of all the rules covered in the book, orderly grouped by topic and with a reference where in the book each rule is explained. You can edit this document as you see fit, delete or edit the guidelines you aren't interested in, and so forth. We routinely use this document in internal code reviews or when we consult at customers' places, so we hope it will be useful to you as well.

P.S. You must register to access this material. We swear we'll never send you anything that vaguely resemble spamming, just 100% technical contents!

12/8/2005 2:50:28 PM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [0]  | 
 Thursday, December 01, 2005

One of the .NET Framework features that fascinate me most is regular expressions, which I often use to simplify and speed up my applications. Well, at least this is what I believed until some time ago, when I was busy writing the forthcoming Programming Microsoft Visual Basic 2005: The Language (due in mid-January). This book is a core reference on the VB language and includes a section on the LIKE operator, which in recent years a overlooked in favor of regexes. I (mistakenly) assumed that the Like operator internally used the Regex classes, therefore surely it would have been slower. After all these years, I should have learned that I should never jump to conclusions without testing and benchmarking my code accurately. .

Let's say that you must check that a string has 9 characters, the first of which must be an uppercase "A" and the last four chars must be digits. This is how you'd perform this test with a regex:

Dim re As New Regex("^A....\d\d\d\d$")

and here's the version that uses the Like operator:

If teststring Like "A????####" Then Match = True

Surprise! Putting this code in a loop (but leaving the creation of the regex out of the loop) and using a string that makes the test succeed (e.g. "ABCDE1234"), the Like operator is about 4 times faster than the regular expression. Not bad, uh? But the biggest surprise came when I benchmarked the same test based on methods of the System.Char class exclusively:

If teststring.Length = 9 AndAlso teststring.Char(0) = "A"c AndAlso Char.IsDigit(teststring.Char(5)) Then
   AndAlso
Char.IsDigit(teststring.Char(6)) AndAlso Char.IsDigit(teststring.Char(7))
  
AndAlso Char.IsDigit(teststring.Char(8)) Then match = True

Despite of its length, this last test is about five times faster than the Lik operator, and therefore about 20 times faster than the regexes! The gap gets closer if using compiled regexes, but the System.Chars approach is by far the fastest of the lot.

The bottom line: (1) if you write VB code, use the Like operator instead of regexes if the condition isn't too complex, and (2) regardless of the language you work with, if you really want the highest performance, use the methods of the String and Char types, if the search operation isn't too complex.

12/1/2005 10:56:56 AM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [0]  | 
 Friday, November 25, 2005

Every now and then I get an email from a reader or a customer, who asks for clarifications on object finalization and disposing. As far as I know, the best article on this topic is this essay by Joe Duffy. It's over 25-page long, covers both .NET 1.1 and 2.0, and includes comments from gurus such as Jeffrey Richter e Chris Brumme. This is easily the definitive article on this topic and I urge you to read it if you haven't already.

The Dispose-Finalize pattern is objectively a complex matter. However, in most cases it can be simplified significantly if you use the following approach: (1) the class with the Dispose/Finalize method should wrap only one single unmanaged resource, and (2) this finalizable class should be private and nested inside another disposable (but not finalizable) type. The outer class is the only class that can use the finalizable class.

This simple trick enables the GC to immediately release all the memory used by the wrapper (disposable) class even in the worst case - that is, if the client code omits to invoke the Dispose method - and simplifies the structure of the type that uses the unmanaged resource. A listing is worth one thousand words, thus here is the C# version of what I mean:

// the class that clients use to work with the unmanaged resource
class WinResource : IDisposable
{
  // private field that creates a wrapper for the unmanaged resource
  private UnmanagedResourceWrapper wrapper = null;
  // this is true if the object has been disposed of
  bool disposed = false;

  public WinResource(string someData)
  {
    // allocate the unmanaged resource here
    wrapper = new UnmanagedResourceWrapper(someData);
  }

  // a public method that clients call to work with the unmanaged resource
  public void DoSomething()
  {
    // throw if the object has been already disposed of
    if ( disposed )
      throw new ObjectDisposedException("");

    // this code can pass the wrapper.Handle value to API calls.
    // ...
  }

  public void Dispose()
  {
    // avoid issues when multiple threads call Dispose at the same time.
    lock ( this )
    {
      // do nothing if already disposed of
      if ( disposed )
        return;
      // dispose of all the disposable objects used by this instance
      // including the one that wraps the unmanaged resource
      // ...
      wrapper.Dispose();
      // remember this object has been disposed of
      disposed = true;
    }
  }
 
  // the nested private class that allocates and release the unmanaged resource
  private sealed class UnmanagedResourceWrapper : IDisposable
  {
    // an invalid handle value, that the wrapper class can use to check
    // whether the handle is valid
    public static readonly IntPtr InvalidHandle = new IntPtr(-1);

    // a public field, but accessible only from inside the WinResource class 
    public IntPtr Handle = InvalidHandle;

    // the constructor takes some data and allocates the unmanaged resource (eg a file)
    public UnmanagedResourceWrapper(string someData)
    {
      // this is just a demo...
      this.Handle = new IntPtr(12345);
    }

    // the Dispose method can be invoked only by WinResource class
    public void Dispose()
    {
      Dispose(true);
      GC.SuppressFinalize(this);
    }

    // the finalizer
    ~UnmanagedResourceWrapper()
    {
      Dispose(false);
    }

    // This is where the unmanaged resource is actually disposed of.
    // Notice that it takes an argument only for compliance with .NET coding standards
    // but the disposing argument is never used, because in all cases this class
    // can access and release only the single unmanaged resource it wraps.
    private void Dispose(bool disposing)
    {
      // exit now if this object didn't completed its constructor correctly
      if ( this.Handle == InvalidHandle )
        return;
  
      // release the unmanaged resource 
      // eg. CloseHandle(Handle);
      
      // finally, invalidate the handle
      this.Handle = InvalidHandle;
    }
  }
}

Notice that, if the unmanaged resource must interact with other fields, this interaction should be taken care of inside the WinResource class, not in the nested class. The UnmanagedResourceWrapper works only as a wrapper for the handle and shouldn't contain other fields or methods, besides those shown in the above listing. The code in the WinResource class must coordinate all the resources being used, both managed and unmanaged ones, and must release all of them in its Dispose method. But if the client code omits to call the Dispose method, the destructor in the nested class will orderly release the unmanaged resource during the next garbage collection.

Let's see all the advantages of this simplified approach.

  • The requirement that you shouldn't access reference fields from inside the Finalize method is automatically satisfied, because the only field of the UnmanagedResourceWrapper type is a handle (a value type).
  • If the client code omits to invoke the WinResource.Dispose method before the WinResource object goes out of scope, the WinResource object is removed from the heap anyway at the first GC; only the few bytes used by the UnmanagedResourceWrapper object survive in the heap and will be promoted to generation 1 or 2. Therefore this technique is more efficient than writing a single finalizable object that allocates both managed and unmanaged resource.
  • The UnmanagedResourceWrapper class is private and you can't inherit from it, therefore you can mark it as sealed. This means that you never have to worry about the Dispose/Finalize pattern in derived classes - a topic on which tons of digital ink has been spilled. It is possible to inherit from WinResource as you'd do with disposable class, therefore there are no limitations in this respect. (It's exactly like when you inherit from other disposable classes such as FileStream.)
  • The UnmanagedResourceWrapper is private and nested in another type and it isn't possible to achieve a refernence to one of its instances; therefore, a client can't "resurrect" a UnmanagedResourceWrapper object during the finalization step, a technique that is rarely useful and often dangerous. (Even though I show in my Programming Visual Basic .NET book how you can use it to implement an object pool.)
  • The UnmanagedResourceWrapper constructor performs a single "atomic" action; if this action fails, the value of the handle is still qual to InvalidHandle, therefore the code in the Finalize method can detect this special value and do nothing in that case. There are only two cases: either the unmanaged resource has been correctly allocated or an exception prevented it from being created, and you don't have to worry about an object that has been built only partially because of an exception in its constructor.
  • Many other recommendations related to the Dispose/Finalize pattern become void, such as the one that dictates that you should neither write finalizers inside structures nor calling virtual methods from inside the finalizer. In fact, the UnmanagedResourceWrapper class is sealed and has no virtual methods. Nor do you have to worry about versioning issues.
  • Another advantage: the UnmanagedResourceWrapper class is so simple and generic that you can ofter reuse it as-is (or with minor edits) inside other classes, by means of a plain copy-and-paste action. Being a nested class, you don't even need to change its name to avoid name collisions.

I am sure that in some cases this simplified pattern can't be used, though it always worked well in my applications. I believe that it's quite odd that this simplified approach is rarely mentioned in articles and books on this topic.


Technical matters aside, I think that another kind of consideration about the Dispose/Finalize pattern is in order.

In my opinion, it is essential to put a lot of emphasis on the fact that the Dispose/Finalize pattern should be used only when your type invokes unmanaged code that allocates unmanaged resources (including unmanaged memory) and returns an handle that you must use eventually to release the resource. If the unmanaged resource is already wrapped by a .NET object (e.g. a FileStream or a SqlConnection) or a COM object, the .NET class that uses the resource must implement IDisposable, not the finalizer. And you must implement the Finalize method only if your code assigns the handle to a class-level field. If the handle is assigned to a local variable and the unmanaged resource is released before exiting the method - possibly in the Finally section of a Try block - you don't even have to implement the IDisposable interface. One of the few exceptions to this rule is when your managed code allocates unmanaged memory directly, by means of the System.Runtime.InteropServices.Marshal type.

I talked to many developers who believe that, in doubt, they should always implement the Finalize method, just in case. This is a common mistake. Defining a finalizable class without a real reason to do so can hurt performance, because the CLR takes slightly longer to allocate finalizable objects, because it has to register them in the f-reachable queue. And in the worst case - that is if the caller omits to call to the Dispose method - a finalizable object can have even more impact on performance, because it will be promoted to a higher generation without any real reason for such an overhead.

To recap: when do you really need to implement the Finalize method? Thinking of all the commercial apps I worked on in these years, I'd say that I used this pattern no more than 4 or 5 times. For sure, I used it more frequently in books and articles than in the real world. :-)

11/25/2005 10:37:38 AM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [0]  | 
 Friday, November 18, 2005

Let's consider the following code, which represents a typical situation: you are inside a nested loop and you want to exit both loops when a condition is true:

For i As Integer = 1 To 10
  
Dim exiting As Boolean = False
  
For j As Integer = 1 To 20
     
' If the Evaluate function returns zero you want to exit both loops
     
If Evaluate(i, j) = 0 Then
        
exiting = True
        
Exit For
     
End If
      ' Do something here
   Next
   If exiting Then Exit For
Next

It isn't important to understand what the Evaluate function does, just consider that when this function returns zero you must exit both loops. The above code isn't optimized, because it repeatedly tests the exiting variable. You might optimize the loop by using a Goto statement that points to a label following the second Next keyword, but educated programmers don't use Gotos, right? So, the question is simple: how can you simplify this code and optimize it at the same time by dropping the exiting variable?

The solution is simple, and is based on the fact that Visual Basic supports as many as three different kinds of loops: For, Do, and While. Each kind of loop supports a corresponding Exit keyword (Exit For, Exit Do, and Exit While), thus you can rewrite the code as follows:

Dim i As Integer = 1
Do While i <= 10
  
For j As Integer = 1 To 20
     
If Evaluate(i, j) = 0 Then Exit Do
      ' Do something here
   Next
  
i += 1
Loop

You can use the same technique when you have up to three nested loops.

Incidentally, you can't adopt this technique in C#, because its break statement doesn't have the same "semantics power" of the Exit keyword in VB.

11/18/2005 9:09:12 AM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [0]  | 
 Monday, November 07, 2005

Here's a non-orthodox but quite effective technique I sometimes use to detect and avoid recursive calls to a method. You typically detect recursive calls by defining a boolean class-level field and testing it on entry to a method. This technique is often used in event handlers, for example in TextChanged handlers that modify the Text property of a control and that would therefore trigger an endless recursion:

Dim insideTextChanged As Boolean

Private Sub TextBox1_TextChanged(ByVal sender As Object, ByVal e As System.EventArgs) Handles TextBox1.TextChanged
   ' Exit if this is a recursive call.
  
If insideTextChanged Then Exit Sub
  
' Forbid recursive calls from now on.
   insideTextChanged = True
  
' ...
  
TextBox1.Text = TextBox1.Text & " "
   ' Permit recursive calls.
  
insideTextChanged = False
End Sub

This approach works well, but it requires a lot of code and forces you to define a distinct boolean field for each event handler. If you have many handlers, it quickly becomes a nuisance. In addition, if there is any chance that the method throws an exception, you must wrap all the code in a try block,so that you can reset the insideTextChanged to false in the finally section. Wouldn't it great if you could use a method that allows you to test if you are inside a recursive call? I am thinking of something like this:

Private Sub TextBox1_TextChanged(ByVal sender As Object, ByVal e As System.EventArgs) Handles TextBox1.TextChanged
   ' Exit if this is a recursive call.
  
If IsRecursive() Then Exit Sub
   ' ...
  
TextBox1.Text = TextBox1.Text & " "
End Sub

Here's how you can implement the IsRecursive method:

<System.Runtime.CompilerServices.MethodImpl(Runtime.CompilerServices.MethodImplOptions.NoInlining)> _
Public Shared Function IsRecursive() As Boolean
  
Dim st As New StackTrace
   ' Check whether any method in the call stack is the same as the immediate caller.
  
For n As Integer = 2 To st.FrameCount - 1
      If st.GetFrame(1).GetMethod() Is st.GetFrame(n).GetMethod() Then Return True
  
Next
  
Return False
End Function

Here's the C# version:

[System.Runtime.CompilerServices.MethodImpl(Runtime.CompilerServices.MethodImplOptions.NoInlining)]
public static bool IsRecursive() 
{
  
StackTrace st = new StackTrace();
   // Check whether any method in the call stack is the same as the immediate caller.
  
for ( int n= 2; n < st.FrameCount; n++ )
   {
      if ( st.GetFrame(1).GetMethod() == st.GetFrame(n).GetMethod()
        
return true;
   
}
  
return false;
}

The IsRecursive method compares the immediate caller - that is, st.GetFrame(1).GetMethod() - with all the other methods on the call stack and returns True if it finds a match. It is essential that the IsRecursive method is decorated with the MethodImpl attribute, to ensure that the JIT compiler inlines it in its caller's body. In .NET 1.1 this should never happen, because the JIT compiler never inlines methods that contain loops, but I haven't checked under .NET 2.0 and obviously I can't make promises about future versions, therefore this attribute is your best defence.

11/7/2005 8:38:20 AM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [0]  | 
 Friday, November 04, 2005

Have a look at this simple Visual Basic code snippet:

' The version that does NOT cache the value type in a reference variable.
Dim start As Date = Now
For i As Integer = 1 To
1000
  
For j = 1 As Integer To
100000
      GetObject(i, j)
  
Next
Next
Console.WriteLine("Version 1: "
& Now.Subtract(start).ToString)
GC.Collect() : GC.WaitForPendingFinalizers()

' The version that caches the value type in a reference variable.
start = Now
For i = 1 As Integer To
1000
  
' Cache the value type in an Object variable.
  
Dim o As Object
= i
  
For j As Integer = 1  To
100000
      GetObject(o, j)
  
Next
Next
Console.WriteLine("Version 2: " & Now.Subtract(start).ToString)

GetObject is a very simple routine, that takes two objects and therefore causes a box operation if they are value types:

Private Function GetObject(ByVal o As Object, ByVal o2 As Object) As Object
  
Return
o
End Function

As you can read in comments, the second portion caches the boxed version of the i variable in an Object variable, because this value doesn't change inside the innermost loop. You'd expect that this second version would run faster, even if by a little, and in fact this is what happens with Visual Basic .NET 2003. However, if you try this code with VB 2005 you'll be surprised to see that - as counterintuitive as it sounds - the version that caches the boxed value is 30-40% slower!

You need ILDASM to understand what happens behind the scenes. Visual Basic calls the GetObjectValue static method of the RuntimeHelpers type (in the System.Runtime.CompilerServices namespace) before passing an object variable to an object argument, and this extra call explains the overhead just observed. The weird thing is that this extra call is generated by the VB2003 compiler as well, however it doesn't nullify our manual optimization based on the cached variable. I am doing the benchmark with the RTM version, therefore this overhead is real (in other words, it isn't caused by pieces of the CLR compiled in debug mode), therefore I can only conclude that the 2.0 version of the GetObjectValue method is less efficient than the 1.1 version.

This is what the GetObjectValue method does. (Thanks to Adrian Florea, who found this note in Rotor's source code.)

GetObjectValue is intended to allow value classes to be manipulated as Object but have aliasing behavior of a value class. The intent is that you would use this function just before an assignment to a variable of type Object. If the value being assigned is a mutable value class, then a shallow copy is returned (because value classes have copy semantics), but otherwise the object itself is returned.

Note: VB calls this method when they're about to assign to an Object or pass it as a parameter. The goal is to make sure that boxed value types work identical to unboxed value types - ie, they get cloned when you pass them around, and are always passed by value. Of course, reference types are not cloned."

11/4/2005 7:49:08 AM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [0]  | 
 Monday, October 31, 2005

VB.NET and C# compilers manage string constants in a rather smart way: all strings with same value are stored in a common area known as string intern pool. The following code snippet shows this compiler feature in action:

' VB.NET
Dim s1 As String = "ABCDE"
Dim s2 As String = "ABC" & "DE"
' Prove that s1 and s2 point to the same element in the intern pool
Console.WriteLine(s1 Is s2) ' => True

// C#
string s1 = "ABCDE";
string s2 = "ABC" + "DE";
// Prove that s1 and s2 point to the same element in the intern pool
Console.WriteLine(String.ReferenceEquals(s1, s2)); // => True

This optimization technique doesn't really have any impact on the amount of memory used by most client applications, but it makes a difference if used inside types that are instantiated thousand times, as it often happens in server applications. The problem is, this optimization is applied only to string constants, not to strings built at runtime:

' VB.NET ...continuing previous example...
Dim s3 As String = "ABC"
s3 &= "DE"
' s1 and s3 contain the same value but point to a different string
Console.WriteLine(s1 = s3) ' => True
Console.WriteLine(s1 Is s3) ' => False

// C# ... continuing previous example...
string s3 "ABC";
s3 += "DE";
Console.WriteLine(s1 == s3) // => True
Console.WriteLine(String.ReferenceEquals(s1, s3) // => False

Now, let's suppose you have a component in the data tier and this component contains the the connection string for the database. This connection string is read from somewhere - typically the configuration file - when it's time to open the connection, therefore the compiler can't store the string in the intern pool. If this component is instantiated N times, there will be N copies of the same string in memory, which clearly is a waste if the string is long and N is high. There are two ways to avoid this waste, depending on how the connection string can vary.

If the connection string is guaranteed to be the same for all the instances, then you can store it in a static variable (a Shared variable in VB), so that the string is shared among all the instances of the component. This is the simplest case and I assume you know how to implement it, so let's move to the more interesting situation.

If the connection string can vary - for example, if the data component can connect to two or more different databases or if the connection string can use different login information - you can't store it in a static field. In this case you can resort to a technique based on the String.Intern method. This method receives a string argument and searches the argument in the intern pool: if the search is successful, the method returns a pointer to the existing string in the pool; if the search fails, the method inserts the string in the pool and returns a pointer to the element just added. Here's how you might implement the ConnectionString property in the hypothetical data component to better leverage the intern pool:

' VB.NET
Dim m_ConnectionString As String

Property ConnectionString() As String
   Get
      Return m_ConnectionString
   End Get
   Set(ByVal Value As String)
      m_ConnectionString = String.Intern(Value)
   End Set
End Property
 

// C#
private string m_ConnectionString;

public string ConnectionString
{
  get { return m_ConnectionString; }
  set { m_ConnectionString = String.Intern(value);}
}

The first time a given value is assigned to the ConnectionString property, the search in the pool fails, the String.Intern method adds the string in the pool and returns a pointer to the new pool element. If the same connection string is eventualy assigned to a different instance of the data component, the String.Intern pool returns a pointer to the element already in the pool and doesn't create any duplicate. The total amount of memory that the application uses is reduced and so is the number of garbage collections that occur during the application's lifetime.

10/31/2005 6:14:23 PM (GMT Standard Time, UTC+00:00)  #    Disclaimer  |  Comments [0]  | 
 
Get RSS/Atom Feed
RSS 2.0 | Atom 1.0
Search in the blog
Archive
<July 2009>
SunMonTueWedThuFriSat
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
Categories

Powered by: newtelligence dasBlog 1.8.5223.1