Saturday, February 03, 2007

string interning and unsafe code.

I found this piece of code in a blog by Mark Michaelis (http://mark.michaelis.net/Blog/).

string text;
text = "S5280ft";

unsafe
{
fixed(char* pText = text)
{
pText[1] = 'm';
pText[2] = 'i';
pText[3] = 'l';
pText[4] = 'e';
pText[5] = ' ';
pText[6] = ' ';
}

}

text = "S5280ft";
System.Console.WriteLine(text);

The output of code is “Smile” , despite the assignment statement just before the System.Console.Writeline Statement.

Now here is the explanation I got after posting in different forums. CLR always keeps a table for all the string literals in the code. This, known as string interning, helps CLR in eliminating the requirement to allocate different expensive memory location to if the literal is same.

Following is an extract from the MSDN side

The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system.

For example, if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal string from the intern pool and assigns it to each variable.
The Intern method uses the intern pool to search for a string equal to the value of str. If such a string exists, its reference in the intern pool is returned. If the string does not exist, a reference to str is added to the intern pool, then that reference is returned.


Additional information about string interning can be found in the link (http://msdn2.microsoft.com/en-us/library/system.string.intern.aspx)


Now in our code, after the first assignment is done, the literal “S5280ft” gets stored in the string intern table. Later on the unsafe code changes the value of that string(wait a min….wasn’t string said to be immutable). So in the second assignment statement, really the string literal has been replaced with a reference to the original string, which has now been modified to “Smile”.

Now my question is, would the CLR realize the fact that the string intern table is already changed before the second assignment statement and hence return a new reference ?