Also available at

Also available at my website http://tosh.me/ and on Twitter @toshafanasiev

Wednesday, 8 February 2012

C++/CLI literal keyword bug

In C# you have two similar but importantly different options for defining constants; const and static readonly, as shown:

// names.cs
// compile with csc /t:library names.cs
public static class Names {
  public const string First = "Tosh";
  public static readonly string Last  = "Afanasiev";
}

The compiler generates the following IL for this code:

.assembly names

// lots of guff omitted

.class public abstract auto ansi sealed beforefieldinit Names
       extends [mscorlib]System.Object
{
  .field public static literal string First = "Tosh"
  .field public static initonly string Last
  .method private hidebysig specialname rtspecialname static 
          void  .cctor() cil managed
  {
    // Code size       11 (0xb)
    .maxstack  8
    IL_0000:  ldstr      "Afanasiev"
    IL_0005:  stsfld     string Names::Last
    IL_000a:  ret
  } // end of method Names::.cctor

} // end of class Names

Both Names.First and Names.Last are constants in the sense that you can consume but not modify their values but the way they are bound is the crucial difference. Notice how the First symbol is bound to its value in metadata - it is used as an alias for the literal (hence the literal flag) string 'Tosh'; while the Last symbol is declared but assigned to in the type initialiser (or static constructor, if you prefer), thereby making its value only available once the Names class has been loaded in the current app domain. Hence the value of First is knowable at compile time which the value of Last can only be known at run time.

This difference is most clearly illustrated by examining the code that consumes these values; here are two C# programs and the IL generated for them:


// firstname.cs
// compile with csc /r:names.dll firstname.cs
using System;

static class prog {
  static void Main() {
    Console.WriteLine( "First name: {0}", Names.First );
  }
}


.assembly firstname

// guff

.class private abstract auto ansi sealed beforefieldinit prog
       extends [mscorlib]System.Object
{
  .method private hidebysig static void  Main() cil managed
  {
    .entrypoint
    // Code size       18 (0x12)
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldstr      "First name: {0}"
    IL_0006:  ldstr      "Tosh"
    IL_000b:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                  object)
    IL_0010:  nop
    IL_0011:  ret
  } // end of method prog::Main

} // end of class prog


// lastname.cs
// compile with csc /r:names.dll lastname.cs
using System;

static class prog {
  static void Main() {
    Console.WriteLine( "Last name: {0}", Names.Last );
  }
}


.assembly extern names
{
  .ver 0:0:0:0
}
.assembly lastname

// guff

.class private abstract auto ansi sealed beforefieldinit prog
       extends [mscorlib]System.Object
{
  .method private hidebysig static void  Main() cil managed
  {
    .entrypoint
    // Code size       18 (0x12)
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldstr      "Last name: {0}"
    IL_0006:  ldsfld     string [names]Names::Last
    IL_000b:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                  object)
    IL_0010:  nop
    IL_0011:  ret
  } // end of method prog::Main

} // end of class prog

The difference that immediately jumps out is that the firstname program's code contains a copy of the literal value Names.First, as defined in the metadata of that assembly; while lastname loads the static field Names.Last in order to evaluate the constant. A less obvious difference is that lastname declares a dependency on the names assembly while firstname does not.

So what are the implications of this? Firstly, since Names.First is evaluated at runtime by loading another assembly and accessing a static field, the value can be changed after deployment without recompiling and redistributing the consuming code - great when you have numerous clients, possibly not your own, that need to be kept up to date. Assemblies consuming the Names.Last value would be totally unaware any change to this value unless they were themselves recompiled.

The other side of this coin is that the consuming assembly has a runtime dependency on names.dll (firstname will run fine on its own while lastname fails miserably with names.dll out of reach) and, more subtly, the use of the constant Names.Last is limited to runtime contexts. This last point may not be a problem but consider the case where attribute values are centralised, the following would result in a compile error:


[FicticiousAttribute(Names.Last)]
public class ...

While this would not:

[FicticiousAttribute(Names.First)]
public class ...

So, when choosing how to declare a shared constant you need to think about how shared and how constant it actually is.

Now for the bug.

The keywords in C++/CLI resemble their IL counterparts more closely than in C# - literal is used to denote compile time constants while static initonly is used for those dynamically evaluated runtime constants (obviously they couldn't use const).

Here's where I noticed a bug in the VS2005 and VS2008 C++/CLI compilers (not VS2010 though). If you define a literal (i.e. metadata based, class level compile time constant ) value on an abstract sealed class (that's a static class in C#) you get a compile error:


// values.cpp
// compile with cl /clr values.cpp
using namespace System;

  public ref class Names abstract sealed {
  public:
    static initonly String^ First = "Tosh";
    literal String^ Last = "Afanasiev";
  };

int main() {
  Console::WriteLine( "Hi, my name is {0} {1}!", Names::First, Names::Last );

  return 0;
}

VS2005:

C:\code>cl /clr values.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 14.00.50727.762
for Microsoft (R) .NET Framework version 2.00.50727.5420
Copyright (C) Microsoft Corporation.  All rights reserved.

values.cpp
values.cpp(9) : error C4693: 'Names': a sealed abstract class cannot have any in
stance members 'Last'

VS2008:

C:\code>cl /clr values.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 15.00.30729.01
for Microsoft (R) .NET Framework version 2.00.50727.5420
Copyright (C) Microsoft Corporation.  All rights reserved.

values.cpp
values.cpp(9) : error C4693: 'Names': a sealed abstract class cannot have any in
stance members 'Last'

VS2010:

C:\code>cl /clr values.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 16.00.30319.01
for Microsoft (R) .NET Framework version 4.00.30319.1
Copyright (C) Microsoft Corporation.  All rights reserved.

values.cpp
Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:values.exe
values.obj

C:\code>values
Hi, my name is Tosh Afanasiev!

The compiler complains of instance members on an abstract sealed class when it's not an instance member we're adding.

To get around this you have to suppress error 4693. I'd suggest making the scope of this as tight as possible and clearly documenting your reasons, here's an example:


// values.cpp
// compile with cl /clr values.cpp
using namespace System;

  public ref class Names abstract sealed {
  public:
    static initonly String^ First = "Tosh";
// disabling error for VS2005 and VS2008 compilers
// search http://blog.tosh.me/ for 'literal keyword bug' for details
#pragma warning( push )
#pragma warning( disable: 4693 )
    literal String^ Last = "Afanasiev";
#pragma warning( pop )
  };

int main() {
  Console::WriteLine( "Hi, my name is {0} {1}!", Names::First, Names::Last );

  return 0;
}

There you go, happy constants all round.

Monday, 30 January 2012

Copy directory structure MSBuild

When using MSBuild you may find your self wanting to copy some existing directory structure somewhere. If you have a directory structure, the output of a solution for example, like the following:

output_dir/
 - bin/
   - thing.dll
   - thing.pdb
 - lib/
   - thing.lib
 - inc/
   - thing.h

You'll notice that doing this:

<ItemGroup>
  <out_files Include='output_dir/**/*'/>
</ItemGroup>

<Target Name='copy_files'>
  <Copy SourceFiles='@(out_files)' DestinationFolder='deployment_dir'/>
</Target>

Will flatten your output file structure to this:

deployment_dir/
 - thing.dll
 - thing.pdb
 - thing.lib
 - thing.h

In order to preserve your file structure you could create separate specifications for the bin/, lib/ and inc/ directories (which would not only add a maintenance burden in the case that these directories changed, but also require you to type them out in the first place) or you could simply take advantage of the metadata that MSBuild attaches to the path information it returns from a recursive directory search. The RecursiveDir property is attached to each file name in the array yielded by the search; it holds the value of the recursive directory directive (**) as evaluated for each file found, so doing this:

<ItemGroup>
  <out_files Include='output_dir/**/*'/>
</ItemGroup>

<Target Name='copy_files'>
  <Copy SourceFiles='@(out_files)' DestinationFolder='deployment_dir/%(out_files.RecursiveDir)'/>
</Target>

Will give you the structure you were after:

deployment_dir/
 - bin/
   - thing.dll
   - thing.pdb
 - lib/
   - thing.lib
 - inc/
   - thing.h

You've got loads of control over what gets captured too, say you want to exclude debug symbols, you'd do this:

<ItemGroup>
  <out_files Include='output_dir/**/*' Exclude='output_dir/**/*.pdb'/>
</ItemGroup>

<Target Name='copy_files'>
  <Copy SourceFiles='@(out_files)' DestinationFolder='deployment_dir/%(out_files.RecursiveDir)'/>
</Target>

To get this:

deployment_dir/
 - bin/
   - thing.dll
 - lib/
   - thing.lib
 - inc/
   - thing.h

Using these constructs allows you to manage builds of multiple solutions from a central MSBuild project without requiring that that project know all the solutions' intimate details - it gives you good separation of concerns and hence fewer headaches.

Tuesday, 24 January 2012

Overloading new and delete in C++

While writing C++ you might want to be involved in the details of dynamically allocated memory yourself; to ensure that certain types of object sit together in memory; to minimise space wasted on overhead for arrays of small objects; to improve allocation speed (at the cost of memory held) or to satisfy a fixed memory requirement. Whatever the reason (and once the trade-offs are fully understood), the basic plumbing you would need looks something like this:
#include <iostream>

using std::cout;
using std::endl;

class Thing {
public:
  Thing() {
    cout << "new Thing: " << sizeof( Thing ) << " bytes" << endl;
  }
  ~Thing() {
    cout << "~Thing()" << endl;
  }
  static void* operator new( size_t size ) {
    cout << "allocating " << size << " bytes in operator new" << endl;

    return ::operator new( size );
  }
  static void operator delete( void* mem ) {
    cout << "operator delete" << endl;

    ::operator delete( mem );
  }
  static void* operator new[]( size_t size ) {
    cout << "allocating " << size << " bytes in operator new[]" << endl;

    return ::operator new[]( size );
  }
  static void operator delete[]( void* mem ) {
    cout << "operator delete[]" << endl;

    ::operator delete[]( mem );
  }
private:
  unsigned char value_;
  // not worrying about copy/move for this example
  Thing( const Thing& );
  Thing( Thing&& );
  void operator=( const Thing& );
  void operator=( Thing&& );
};

int main( int argc, char* argv[] ) {

  cout << "creating an object ..." << endl;
  auto* t = new Thing;

  delete t;

  const int SIZE = 5;
  cout << "creating a " << SIZE << " element array ..." << endl;
  auto* a = new Thing[SIZE];

  delete[] a;

  return 0;
}

In this example my overloaded operators defer to the default provided versions, so nothing really interesting is being done in terms of memory allocation, but it does serve as an illustration of the mechanism, as the output shows:
creating an object ...
allocating 1 bytes in operator new
new Thing: 1 bytes
~Thing()
operator delete
creating a 5 element array ...
allocating 9 bytes in operator new[]
new Thing: 1 bytes
new Thing: 1 bytes
new Thing: 1 bytes
new Thing: 1 bytes
new Thing: 1 bytes
~Thing()
~Thing()
~Thing()
~Thing()
~Thing()
operator delete[]
It also illustrates the importance of pairing up the scalar and vector forms of new and delete properly - the additional 4 bytes allocated for the array are for tracking the size of the array for deletion - using the wrong form of delete causes that information to be ignored/erroneously read, depending on which way round you get it wrong.

Friday, 23 December 2011

Ruby List Comprehensions

I'm busy learning Ruby and I'm discovering that I really like it. I've used Python for some time now so naturally it's the differences between these two paradigmatically similar languages that I notice the most.

A feature of Python that I think is really cool is the list comprehension; a way of specifying a list literal using the definition of the list, rather than it's enumeration, as the code below illustrates:

#!/usr/bin/python

# literal as an enumeration
a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]

# literal as a comprehension
b = [ i for i in range( 10 ) ]

This is a really neat feature that can create a sequence from an arbitrary definition, such as the characters in a string, the odd numbers in a sequence etc. You get the idea.

In Ruby, doing this ...

a = [ 1..100 ]

... doesn't give you an array of one hundred elements, but an array containing a single Range object of 1 to 100.

The fact that list comprehensions seem to be missing from Ruby was initially slightly saddening, until I discovered that Ruby's class definitions, like its strings, are mutable. This is what's really got me excited about this language - you want list comprehensions? Add them.

The duck typing offered by dynamic languages along with Ruby's class mutability makes this sort of thing really easy to do, as demonstrated below.

#!/usr/bin/ruby
#saved as 'list_comp.rb'

class Array
  def from_e!( enum )
    enum.each { |x| self << x }
    return self
  end
  def self.from_e( enum )
    return [].from_e! enum
  end
end

This is cool! It lets you create lists in a really intuitive way using definitions, not enumerations - and the best part is that I've added this to the actual, built-in Array class itself - that's the power of Ruby. I love it.

Here are some lists being created from scratch using the class method:

irb(main):001:0> require './list_comp.rb'
=> true
irb(main):002:0> Array.from_e 1..10
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
irb(main):003:0> Array.from_e 'How much wood would a woodchuck chuck?'.each_char
=> ["H", "o", "w", " ", "m", "u", "c", "h", " ", "w", "o", "o", "d", " ", "w", "
o", "u", "l", "d", " ", "a", " ", "w", "o", "o", "d", "c", "h", "u", "c", "k", "
 ", "c", "h", "u", "c", "k", "?"]
irb(main):004:0> Array.from_e 0.step 50, 5
=> [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]
irb(main):005:0> Array.from_e 'one two 3 four 55'.scan /\d+/
=> ["3", "55"]

And here is the instance method in action, modifying an existing array:

irb(main):006:0> a=[0]
=> [0]
irb(main):007:0> a.from_e! 1.step 10,1
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Note the ! on the end of the instance version of from_e - I've noticed and really appreciate the convention of marking mutating (non-const, if you like) instance methods using the exclamation mark and I intend to stick to it. Also, I've tried to follow convention by calling the method from_e - to match the to_s, to_i, etc. pattern.

Now it could be that Ruby already supports this sort of thing and I just haven't found it yet but either way I'm really impressed - it seems you can make this language whatever you want it to be. I'm looking forward to learning more - this is only day 3!

Wednesday, 21 December 2011

OAuth Request Signing

I've started writing a little app that needs to talk to Twitter - to do this it needs to speak OAuth. It runs on Google App Engine and is written in Python and there are many good OAuth (and even dedicated Twitter) modules written in Python so I could have grabbed one of these and got on with it but that wouldn't have been much fun and wouldn't get me any closer to an understanding of OAuth - so I went ahead and implemented it myself.

Since at this stage I don't need my app to allow users to log in and interact as themselves (actually I'm actively avoiding this sort of requirement) I didn't need to worry about the 'dance' part of OAuth (the token request/authentication/exchange bit) - I only need the request signing part and the keys provided by my Twitter app's dashboard page.

I tweeted from an interactive Python session using my OAuth script for authentication, and have pulled down mentions etc. The implementation is deliberately unoptimised and fragmented - this lets it serve better as an executable reference to the OAuth standard (the various functions link to the specific sections of the standard that they implement) - and also it is good enough for what I want to do right now.

Here it is

'''implementing OAuth ( request signing only ) as set out in http://tools.ietf.org/html/rfc5849

License: This code is free for use by any person, for any purpose but is provided as-is, with no guarantees - use at your own risk. One proviso: drop me a mail and let me know how it worked for you.'''

from urllib import quote
import urllib2
import time
import os
import base64
import hashlib
import hmac

def encode( value ):  
  '''#section-3.6'''
  return quote( value or '', '-._~' )

def enc_params( params ):
  '''#section-3.4.1.3.2'''
  ep = [ '%s=%s' % ( encode( k ), encode( v ) ) for k, v in params.items( ) ]
  return '&'.join( sorted( ep ) )

def b64e( value ):
  '''#section-6.8'''
  return base64.b64encode( value )

def make_base_string( method, url, params ):
  '''#section-3.4.1'''
  return '&'.join( [ encode( method.upper( ) ), encode( url ), encode( enc_params( params ) ) ] )

def get_timestamp( ):
  '''#section-3.3'''
  return str( int( time.time( ) ) )

def get_nonce( ):
  '''#section-3.3'''
  return b64e( os.urandom( 32 ) ).strip( '=' )

def hmac_sha1_sig( base_string, consumer_secret, token_secret='' ):
  '''#section-3.4.2'''
  key = '%s&%s' % ( encode( consumer_secret ), encode( token_secret ) )
  h=hmac.new( key, base_string, hashlib.sha1 )
  return b64e( h.digest( ) )

class OAuthClient( object ):
  def __init__( self, consumer_pair, token_pair ):
    '''takes (consumer_key, consumer_secret), (oauth_token, token_secret)'''
    self.consumer_key, self.consumer_secret = consumer_pair
    self.oauth_token, self.token_secret = token_pair

  def create_oauth_params( self, url, data=None ):
    method = 'POST' if data else 'GET'
    params = {
      'oauth_consumer_key'     : self.consumer_key
    , 'oauth_token'            : self.oauth_token
    , 'oauth_signature_method' : 'HMAC-SHA1'
    , 'oauth_timestamp'        : get_timestamp( )
    , 'oauth_nonce'            : get_nonce( )
    , 'oauth_version'          : '1.0'
    }

    if data: params.update( data )

    base_string = make_base_string( method, url, params )
    params[ 'oauth_signature' ] = hmac_sha1_sig( base_string, self.consumer_secret, self.token_secret )

    return params

  def create_oauth_header( self, url, data=None ):
    params = self.create_oauth_params( url, data )
    header = 'OAuth ' + ', '.join( [ '%s="%s"' % ( k, encode( v ) ) for k, v in params.items( ) ] )

    return header

  def open_url( self, url, data=None ):
    '''returns the resource represented by the url, or a tuple of error code and response text'''
    if data: post_data = enc_params( data )
    else: post_data = None
    r=urllib2.Request( url, post_data )
    r.headers[ 'Authorization' ] = self.create_oauth_header( url, data )

    try:
      return urllib2.urlopen( r ).read( )
    except urllib2.URLError, e:
      return e.code, e.read( )

If you save this as something like oauth.py and insert your own consumer/oauth keys over the placeholders you can try it out at a prompt like so

>>> CONSUMER_KEY = 'CONSUMER-KEY'
>>> CONSUMER_SECRET = 'CONSUMER-SECRET'
>>> OAUTH_TOKEN = 'OAUTH-TOKEN'
>>> TOKEN_SECRET = 'TOKEN-SECRET'
>>> import oauth
>>> c = oauth.OAuthClient( ( CONSUMER_KEY, CONSUMER_SECRET ), ( OAUTH_TOKEN, TOKEN_SECRET ) )
>>> mentions = c.open_url( 'http://api.twitter.com/1/statuses/mentions.json' )
>>> c.open_url( 'http://api.twitter.com/1/statuses/update.json', { 'status' : 'here is my status update, blah, blah. Read this blog: http://blog.tosh.me/. } )


That's it, more to follow on the project it was written for.

Wednesday, 9 November 2011

Rolling back in Git

The organisation I work for uses TFS as its source control system but it's far from popular amongst the developers; I've noticed people going several weeks without committing changes to source control, using zip files for local change management and backup, and worse; going weeks at a time without even updating their source from the main repository.

The irony of this behaviour is that it is both motivated by and the root cause of horrific merge sessions tying up multiple developers for days at a time.

No doubt the problems we'd had to do with missing changes was down to something we were not doing right, but the fact remains that it was all too easy for us to make those mistakes.

The zip file approach doesn't appeal to me so I've recently been using Git as an intermediate source control system to give me lightweight branching and the ability to make very fine-grained commits without trampling all over my colleagues' work.

I'm really enjoying Git (though I do intend to try Mercurial for comparison, and because I like Python) and something I did today made me want to write about it.

I'd been making changes to a COM-heavy codebase to try to fix a bug that had been, well, bugging me for days and when I finally had the breakthrough it occurred to me that some of the things I had tried may not have contributed to the fix (I'm normally more scientific than this but that's COM for you - REGDB_E_CLASSNOTREG doesn't necessarily mean that the class is not registered).

Anyway, to cut what's becoming a long story short, I wanted to go back to the state of the code that "should have" worked, and selectively reapply the changes I'd made to ensure I was committing a minimal sufficient set back into TFS (to reduce merge headaches for my colleagues). What I was really impressed with was how easy and fast this sort of operation is with Git.

First, you rename the current branch (master, in my case) containing the whole set of changes, call it bug-fixed:

git branch -m bug-fixed

Next, you read the log to find the commit that corresponds to the point I started making the changes (this is where you're grateful you make regular, fine-grained commits):

git log --pretty=oneline -4

Note: the oneline option makes reading the commit signatures easier and -n specifies the last n commits - I knew it was only three or four commits ago.
Having found the commit you want, you check it out using just enough of its signature to disambiguate:

git checkout 2adff2

And finally you create a new master branch, using the current state as a starting point:

git checkout -b master

This very short (and quickly executed) sequence sets your master branch back in time to the appropriate commit, preserving the later changes in a named branch - genius, do that in TFS! (There are probably people who can but I'm not one of them.)

Furthermore, selectively applying the changes from the newly renamed bug-fixed branch couldn't be easier:

git checkout BRANCH [FILES]

pulls the versions of all the files specified by the space delimited list of files; (you can also use wildcards like *.h ) into your current branch. So to just bring over changes to some_class (declared in some_class.h, defined in some_class.cpp) from bug-fixed you'd do:

git checkout bug-fixed some_class.*

Just bear in mind that as well as bringing them over, it also adds them to the index so they won't show up in

git diff

you have to use

git diff --cached

I intend to give git-tfs a try at some point but I'd also like to investigate using hooks to manage interaction between a central Git repository and a TFS server.

Wednesday, 26 October 2011

Find UUIDs with Regex Visual Studio

Visual Studio supports regular expressions in its search tool, but probably not as you know them. Regular expressions are one of those things that seem to be done slightly differently everywhere, whether it's in the number of features supported or the syntax for using those features, but most flavours are similar enough that you can quickly work out how to get what you want. Not so for Visual Studio ( or not for me at least ) - see the documentation here: http://msdn.microsoft.com/en-us/library/2k3te2cs(v=vs.100).aspx.

This is not meant to be a full run through the syntax and features - the docs give you that - this is just meant to illustrate one of the key differences; how quantifiers are denoted ( and also as a reminder to me next time I'm shouting at Visual Studio for not finding the thing I know damn well is there ).

Normally to find all UUIDs I'd use a pattern like


[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}


but in Visual Studio quantifiers are expressed differently and you need something like


[0-9a-fA-F]^8-[0-9a-fA-F]^4-[0-9a-fA-F]^4-[0-9a-fA-F]^4-[0-9a-fA-F]^12


which is fine once you know about it.

As well as using different symbols for other quantifier constructs ( @ instead of *, # instead of + ) there doesn't seem to be support for an optional expression ( normally ? ) or range quantifiers ( {m,n} ).

Oh and don't forget :b to match whitespace!?!!!

Ho hum, there it is.