Also available at

Also available at my website http://tosh.me/ and on Twitter @toshafanasiev

Friday, 23 December 2011

Ruby List Comprehensions

I'm busy learning Ruby and I'm discovering that I really like it. I've used Python for some time now so naturally it's the differences between these two paradigmatically similar languages that I notice the most.

A feature of Python that I think is really cool is the list comprehension; a way of specifying a list literal using the definition of the list, rather than it's enumeration, as the code below illustrates:

#!/usr/bin/python

# literal as an enumeration
a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]

# literal as a comprehension
b = [ i for i in range( 10 ) ]

This is a really neat feature that can create a sequence from an arbitrary definition, such as the characters in a string, the odd numbers in a sequence etc. You get the idea.

In Ruby, doing this ...

a = [ 1..100 ]

... doesn't give you an array of one hundred elements, but an array containing a single Range object of 1 to 100.

The fact that list comprehensions seem to be missing from Ruby was initially slightly saddening, until I discovered that Ruby's class definitions, like its strings, are mutable. This is what's really got me excited about this language - you want list comprehensions? Add them.

The duck typing offered by dynamic languages along with Ruby's class mutability makes this sort of thing really easy to do, as demonstrated below.

#!/usr/bin/ruby
#saved as 'list_comp.rb'

class Array
  def from_e!( enum )
    enum.each { |x| self << x }
    return self
  end
  def self.from_e( enum )
    return [].from_e! enum
  end
end

This is cool! It lets you create lists in a really intuitive way using definitions, not enumerations - and the best part is that I've added this to the actual, built-in Array class itself - that's the power of Ruby. I love it.

Here are some lists being created from scratch using the class method:

irb(main):001:0> require './list_comp.rb'
=> true
irb(main):002:0> Array.from_e 1..10
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
irb(main):003:0> Array.from_e 'How much wood would a woodchuck chuck?'.each_char
=> ["H", "o", "w", " ", "m", "u", "c", "h", " ", "w", "o", "o", "d", " ", "w", "
o", "u", "l", "d", " ", "a", " ", "w", "o", "o", "d", "c", "h", "u", "c", "k", "
 ", "c", "h", "u", "c", "k", "?"]
irb(main):004:0> Array.from_e 0.step 50, 5
=> [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]
irb(main):005:0> Array.from_e 'one two 3 four 55'.scan /\d+/
=> ["3", "55"]

And here is the instance method in action, modifying an existing array:

irb(main):006:0> a=[0]
=> [0]
irb(main):007:0> a.from_e! 1.step 10,1
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Note the ! on the end of the instance version of from_e - I've noticed and really appreciate the convention of marking mutating (non-const, if you like) instance methods using the exclamation mark and I intend to stick to it. Also, I've tried to follow convention by calling the method from_e - to match the to_s, to_i, etc. pattern.

Now it could be that Ruby already supports this sort of thing and I just haven't found it yet but either way I'm really impressed - it seems you can make this language whatever you want it to be. I'm looking forward to learning more - this is only day 3!

Wednesday, 21 December 2011

OAuth Request Signing

I've started writing a little app that needs to talk to Twitter - to do this it needs to speak OAuth. It runs on Google App Engine and is written in Python and there are many good OAuth (and even dedicated Twitter) modules written in Python so I could have grabbed one of these and got on with it but that wouldn't have been much fun and wouldn't get me any closer to an understanding of OAuth - so I went ahead and implemented it myself.

Since at this stage I don't need my app to allow users to log in and interact as themselves (actually I'm actively avoiding this sort of requirement) I didn't need to worry about the 'dance' part of OAuth (the token request/authentication/exchange bit) - I only need the request signing part and the keys provided by my Twitter app's dashboard page.

I tweeted from an interactive Python session using my OAuth script for authentication, and have pulled down mentions etc. The implementation is deliberately unoptimised and fragmented - this lets it serve better as an executable reference to the OAuth standard (the various functions link to the specific sections of the standard that they implement) - and also it is good enough for what I want to do right now.

Here it is

'''implementing OAuth ( request signing only ) as set out in http://tools.ietf.org/html/rfc5849

License: This code is free for use by any person, for any purpose but is provided as-is, with no guarantees - use at your own risk. One proviso: drop me a mail and let me know how it worked for you.'''

from urllib import quote
import urllib2
import time
import os
import base64
import hashlib
import hmac

def encode( value ):  
  '''#section-3.6'''
  return quote( value or '', '-._~' )

def enc_params( params ):
  '''#section-3.4.1.3.2'''
  ep = [ '%s=%s' % ( encode( k ), encode( v ) ) for k, v in params.items( ) ]
  return '&'.join( sorted( ep ) )

def b64e( value ):
  '''#section-6.8'''
  return base64.b64encode( value )

def make_base_string( method, url, params ):
  '''#section-3.4.1'''
  return '&'.join( [ encode( method.upper( ) ), encode( url ), encode( enc_params( params ) ) ] )

def get_timestamp( ):
  '''#section-3.3'''
  return str( int( time.time( ) ) )

def get_nonce( ):
  '''#section-3.3'''
  return b64e( os.urandom( 32 ) ).strip( '=' )

def hmac_sha1_sig( base_string, consumer_secret, token_secret='' ):
  '''#section-3.4.2'''
  key = '%s&%s' % ( encode( consumer_secret ), encode( token_secret ) )
  h=hmac.new( key, base_string, hashlib.sha1 )
  return b64e( h.digest( ) )

class OAuthClient( object ):
  def __init__( self, consumer_pair, token_pair ):
    '''takes (consumer_key, consumer_secret), (oauth_token, token_secret)'''
    self.consumer_key, self.consumer_secret = consumer_pair
    self.oauth_token, self.token_secret = token_pair

  def create_oauth_params( self, url, data=None ):
    method = 'POST' if data else 'GET'
    params = {
      'oauth_consumer_key'     : self.consumer_key
    , 'oauth_token'            : self.oauth_token
    , 'oauth_signature_method' : 'HMAC-SHA1'
    , 'oauth_timestamp'        : get_timestamp( )
    , 'oauth_nonce'            : get_nonce( )
    , 'oauth_version'          : '1.0'
    }

    if data: params.update( data )

    base_string = make_base_string( method, url, params )
    params[ 'oauth_signature' ] = hmac_sha1_sig( base_string, self.consumer_secret, self.token_secret )

    return params

  def create_oauth_header( self, url, data=None ):
    params = self.create_oauth_params( url, data )
    header = 'OAuth ' + ', '.join( [ '%s="%s"' % ( k, encode( v ) ) for k, v in params.items( ) ] )

    return header

  def open_url( self, url, data=None ):
    '''returns the resource represented by the url, or a tuple of error code and response text'''
    if data: post_data = enc_params( data )
    else: post_data = None
    r=urllib2.Request( url, post_data )
    r.headers[ 'Authorization' ] = self.create_oauth_header( url, data )

    try:
      return urllib2.urlopen( r ).read( )
    except urllib2.URLError, e:
      return e.code, e.read( )

If you save this as something like oauth.py and insert your own consumer/oauth keys over the placeholders you can try it out at a prompt like so

>>> CONSUMER_KEY = 'CONSUMER-KEY'
>>> CONSUMER_SECRET = 'CONSUMER-SECRET'
>>> OAUTH_TOKEN = 'OAUTH-TOKEN'
>>> TOKEN_SECRET = 'TOKEN-SECRET'
>>> import oauth
>>> c = oauth.OAuthClient( ( CONSUMER_KEY, CONSUMER_SECRET ), ( OAUTH_TOKEN, TOKEN_SECRET ) )
>>> mentions = c.open_url( 'http://api.twitter.com/1/statuses/mentions.json' )
>>> c.open_url( 'http://api.twitter.com/1/statuses/update.json', { 'status' : 'here is my status update, blah, blah. Read this blog: http://blog.tosh.me/. } )


That's it, more to follow on the project it was written for.

Wednesday, 9 November 2011

Rolling back in Git

The organisation I work for uses TFS as its source control system but it's far from popular amongst the developers; I've noticed people going several weeks without committing changes to source control, using zip files for local change management and backup, and worse; going weeks at a time without even updating their source from the main repository.

The irony of this behaviour is that it is both motivated by and the root cause of horrific merge sessions tying up multiple developers for days at a time.

No doubt the problems we'd had to do with missing changes was down to something we were not doing right, but the fact remains that it was all too easy for us to make those mistakes.

The zip file approach doesn't appeal to me so I've recently been using Git as an intermediate source control system to give me lightweight branching and the ability to make very fine-grained commits without trampling all over my colleagues' work.

I'm really enjoying Git (though I do intend to try Mercurial for comparison, and because I like Python) and something I did today made me want to write about it.

I'd been making changes to a COM-heavy codebase to try to fix a bug that had been, well, bugging me for days and when I finally had the breakthrough it occurred to me that some of the things I had tried may not have contributed to the fix (I'm normally more scientific than this but that's COM for you - REGDB_E_CLASSNOTREG doesn't necessarily mean that the class is not registered).

Anyway, to cut what's becoming a long story short, I wanted to go back to the state of the code that "should have" worked, and selectively reapply the changes I'd made to ensure I was committing a minimal sufficient set back into TFS (to reduce merge headaches for my colleagues). What I was really impressed with was how easy and fast this sort of operation is with Git.

First, you rename the current branch (master, in my case) containing the whole set of changes, call it bug-fixed:

git branch -m bug-fixed

Next, you read the log to find the commit that corresponds to the point I started making the changes (this is where you're grateful you make regular, fine-grained commits):

git log --pretty=oneline -4

Note: the oneline option makes reading the commit signatures easier and -n specifies the last n commits - I knew it was only three or four commits ago.
Having found the commit you want, you check it out using just enough of its signature to disambiguate:

git checkout 2adff2

And finally you create a new master branch, using the current state as a starting point:

git checkout -b master

This very short (and quickly executed) sequence sets your master branch back in time to the appropriate commit, preserving the later changes in a named branch - genius, do that in TFS! (There are probably people who can but I'm not one of them.)

Furthermore, selectively applying the changes from the newly renamed bug-fixed branch couldn't be easier:

git checkout BRANCH [FILES]

pulls the versions of all the files specified by the space delimited list of files; (you can also use wildcards like *.h ) into your current branch. So to just bring over changes to some_class (declared in some_class.h, defined in some_class.cpp) from bug-fixed you'd do:

git checkout bug-fixed some_class.*

Just bear in mind that as well as bringing them over, it also adds them to the index so they won't show up in

git diff

you have to use

git diff --cached

I intend to give git-tfs a try at some point but I'd also like to investigate using hooks to manage interaction between a central Git repository and a TFS server.

Wednesday, 26 October 2011

Find UUIDs with Regex Visual Studio

Visual Studio supports regular expressions in its search tool, but probably not as you know them. Regular expressions are one of those things that seem to be done slightly differently everywhere, whether it's in the number of features supported or the syntax for using those features, but most flavours are similar enough that you can quickly work out how to get what you want. Not so for Visual Studio ( or not for me at least ) - see the documentation here: http://msdn.microsoft.com/en-us/library/2k3te2cs(v=vs.100).aspx.

This is not meant to be a full run through the syntax and features - the docs give you that - this is just meant to illustrate one of the key differences; how quantifiers are denoted ( and also as a reminder to me next time I'm shouting at Visual Studio for not finding the thing I know damn well is there ).

Normally to find all UUIDs I'd use a pattern like


[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}


but in Visual Studio quantifiers are expressed differently and you need something like


[0-9a-fA-F]^8-[0-9a-fA-F]^4-[0-9a-fA-F]^4-[0-9a-fA-F]^4-[0-9a-fA-F]^12


which is fine once you know about it.

As well as using different symbols for other quantifier constructs ( @ instead of *, # instead of + ) there doesn't seem to be support for an optional expression ( normally ? ) or range quantifiers ( {m,n} ).

Oh and don't forget :b to match whitespace!?!!!

Ho hum, there it is.

Wednesday, 19 October 2011

UUID/GUID parsing

I recently needed to parse a string representation of a UUID to arrive at the binary representation. I opted for sscanf (actually sscanf_s to placate the compiler) but found a detail of its format specifier that I thought was noteworthy.
The documentation defines a set of type specifiers (u for unsigned decimal, x for hex etc.) and a set of modifiers (l for a long int, h for a short int, L for a long double) but no modifier for a single byte width target.
Here's the declaration of a UUID:
typedef struct _GUID {
  unsigned long  Data1;
  unsigned short Data2;
  unsigned short Data3;
  unsigned char  Data4[8];
} GUID, UUID;
I didn't want to simply pass the addresses of the elements of Data4 to sscanf as the smallest target I could specify is a short (16 bit) int, and even if I was willing to rely on the fields being parsed in order (which they almost certainly are), this would still result in some memory trampling when the pointer to the last element of Data4 (8 bit) is written to as though it were a pointer to a short int (16 bit).
To get around this I defined a temporary array whose elements are wide enough for the default target width, passed the addresses of these elements and then used a narrowing conversion to assign to the Data4 member. Note that since I specify 2 characters of hex for each of the Data4 values I can confidently assign via a narrowing conversion as 0xff is the maximum possible value.
Here is the code (note that it requires the input to be of the form "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" (8X-4X-4X-4X-12X) ):

#include <stdio.h>
#include <cassert>

static const int BYTE_COUNT = 8;

GUID parse_guid( const char* const input ) {
 assert( input );
 GUID result;
 // for the %2x values, %8lx and %4hx can go straight in
 unsigned int bytes[ BYTE_COUNT ];
 int number_converted = sscanf_s(
    input
  , "%8lx-%4hx-%4hx-%2x%2x-%2x%2x%2x%2x%2x%2x"
  , &result.Data1
  , &result.Data2
  , &result.Data3
  , &bytes[ 0 ]
  , &bytes[ 1 ]
  , &bytes[ 2 ]
  , &bytes[ 3 ]
  , &bytes[ 4 ]
  , &bytes[ 5 ]
  , &bytes[ 6 ]
  , &bytes[ 7 ]
 );

 assert( number_converted == BYTE_COUNT + 3 );

 // copy over the %2x values discarding high bytes
 for ( int i = 0; i != BYTE_COUNT; ++i ) {
  result.Data4[ i ] = ( BYTE )bytes[ i ];
 }

 return result;
}

Tuesday, 13 September 2011

MSBuild and .sln files

Microsoft's command line build tool MSBuild can consume Visual Studio solution files (.sln) even though they're not MSBuild project files (in contrast .csproj and .vbproj files are).
It does this by first converting the solution file to a valid MSBuild project then executing this project. If you're curious about what this project looks like, or want to tweak it for use in your automated build then you can get MSBuild to leave it on disk for you after a build.
Simply set the MSBuildEmitSolution environment variable to 1 before building and you'll find the newly created project file next to the original solution.
For example:

set MSBuildEmitSolution=1
msbuild <solution-name>.sln

This will leave a file called <solution-name>.sln.proj in the same directory - this is in the standard MSBuild XML format.

Tuesday, 28 June 2011

Anatomy of a Password Strength Regex

So you want to enforce a certain level of complexity for your users' passwords? You could do this in code with lots of length and character presence tests chained together with logical ands or you could express the desired complexity with a single succinct, declarative regular expression.

Ok, let's say you want to ensure that passwords are between 8 and 20 characters long and contain a mix of lower case, upper case, numeric and non-alphanumeric characters. (Normally you'd not want to restrict the length of the password as longer == stronger, but it makes for a more comprehensive example.)

Starting with string length

.{8,20}

This matches a string of between 8 and 20 characters (any characters) but as it is the maximum is not enforced as it would match the *first* 20 of a longer string. For the maximum to work you need to specify that the whole string fits in this length, as follows

^.{8,20}$

Note: if there is no maximum then the following would work

^.{8,}$

The character requirements need care to prevent the expression from becoming unwieldy; they need to be reworded so as to be easily expressed in a regex. For instance 'the password must contain a digit' is expressed as 'either the first character is a digit, or some character after the first character is a digit' or closer to the regex still, 'there is a digit which is preceded by zero or more characters'. This might seem a little odd but the last version is an exact semantic match for a regex construct called a Positive Lookahead.

A Positive Lookahead is a Zero-Width assertion that qualifies the expression that it immediately follows, only allowing that expression to match if the expression contained in the assertion matches. The importance of its zero-width is that it qualifies the expression it follows without itself consuming any characters. The best way to see this is by example.

The expression

^(?=.*\d)

will match the start of the string, but only if at some point after (?=) the start of the string ^ there is a digit \d which is preceded by zero or more characters .*; what's important to note is that the match itself consumes the start of the string only, not any of the following characters, even though they are involved in the match. To extend this to match the digit-containing characters also, you'd simply add an expression for those characters after the lookahead-qualified string start pattern:

^(?=.*\d).*

This would match "gimme 5", but would not match "gimme five".

The lookahead expressions for character case ([a-z] for lower, [A-Z] for upper) and non-alphanumeric or 'special' characters (\W) follow a similar form, resulting in the following completed expression:

^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*\W).{8,20}$

Or, if no maximum is required:

^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*\W).{8,}$

Which may look daunting if encountered in a dark alley, but once the anatomy has been explored, the logic is clear.

Monday, 20 June 2011

Rolling back to a previous version TFS

Reverting your codebase to a previous version isn't something that the Visual Studio TFS tools seem to support ( or at least, not that I've seen ).

To achieve this you can use the tf utility (http://msdn.microsoft.com/en-us/library/cc31bk2e(v=vs.80).aspx), as follows:

tf rollback /toversion:123123 ./*.cs /recursive

where rollback is the command, 123123 is the changeset to which you want to revert, ./*.cs is a file filter to control the kind of files that are reverted ( they call this an 'itemspec' in the documentation ) and as such can be set to anything - *.vb, * etc. The /recursive flag is pretty self-explanatory.

tf checkin /comment:"rolling back to version 123123" .

will then check your changes in ( although that's something you can do from Visual Studio ).

Thursday, 19 May 2011

Installing a trampoline

My wife and I wanted to get the kids a trampoline but didn't want some huge frame taking up the whole garden, also those net things that prevent them from flying off the edge are really ugly, so we copied our friends' idea of sinking the frame into the ground.
We were getting the garden landscaped and turfed so it was the perfect chance to dig a huge pit. Here's how it went:

First I put the frame together and marked the position of the legs on the ground



 Next I dug narrow trenches for the legs, testing the depth every now and then













 Then I got the frame into place and checked the level, adjusting as required
















 When that was done I dropped the frame into the trenches (he makes it sound so easy) and dug out a bowl shape (actually it was a perfect parabola) to allow for bouncing child-feet, filling in the trenches with the earth















 The next day the guys came and turfed the lawn (marvelling at my engineering)















 All that was left then was to lay down a weed resistant membrane and fit the springs and the, err, middle bit



















 And this is the finished result
















And, of course, the best bit

Thursday, 17 March 2011

Fix Broken Installer

Real quick one.

I'm developing a Windows Installer Package using WiX and naturally I've made a bu-bu that's rendered a product uninstallable - it just wouldn't budge from my installed programs list. Luckily I have managed to remove all trace of this monstrosity using a tool called msizap which is available from Microsoft as part of the Windows SDK.

I just issued the following command from the SDK prompt:


msizap t {78FD3CB6-B722-4124-A06B-A371D8938DE4}


Where the GUID is the product id of the MSI I created.

Phew.

XML Format Function for Vim in Python

Visual Studio has a nice feature that lets you format an XML (or other) document to make it more readable (Edit > Advanced > Format Document). I really like this feature and felt the need to add it (for XML at least) to my text editor of choice, Vim.
Vim lets you define your own functions as follows (note that user functions must start with a capital letter):

function! YourFunction()
" definition of function goes here
endfunction

Which is great, but it does involve learning Vim's scripting language - an activity which is not that high on my list right now. What I discovered is that the default build of Vim includes a Python interpreter which can be invoked on Vim's command line like this:

:py[thon] # python command goes here

With multi-line sessions being initiated with a termination sequence of your choosing:

:py[thon] << ENDPY

Where ENDPY is an example of a termination sequence - it's just a symbol that you enter to tell Vim that you're finished writing your Python script/command so that it can go ahead and execute it - you could use any symbol you like; many examples use EOF.
This multi-line bit is what you need to implement your Vim functions in Python - you create a Vim function wrapper around a block of Python; access to Vim's buffers etc. is provided in the form of a built-in Python module called vim (see here for more info).
Continuing the dummy function definition above:
function! YourFunction()
python << ENDPY
import vim

# definition of function goes here (in Python!)

ENDPY
endfunction
You'd make this declaration in your .vimrc file; to invoke it you'd issue the following to Vim while in normal mode:
:call YourFunction()
That's a bit more typing than you'd expect in Vim; to cut this down you'd map this function call to a command as follows (note that, as for functions, user commands must start with a capital letter):
command! -nargs=0 Yfunc call YourFunction()
So now you need only type the following to call your function:
:Yfunc
This is all well and good, but very abstract; plus it doesn't show you the vim object in action. What prompted all of this in the first place was the need to format XML, so here is my solution, complete with command mapping:
function! PrettyXml()

python << ENDPY

import vim
from xml.dom import minidom

try:
  b = vim.current.buffer
  x = minidom.parseString( '\n'.join( b ) )
  b[:] = None
  for l in x.toprettyxml().split('\n'):
    b.append( str( l ) )
  del b[0] # remove empty first line
  
except Exception as ex:
  print ex

ENDPY

endfunction

command! -nargs=0 Pxml call PrettyXml()

This shows me getting access to the contents of the current buffer - the list-like object vim.current.buffer containing the lines of the buffer; using the Python Standard Library to achieve my goal and finally; updating the contents of the buffer.
This solution is a quick hack and as such has (at least) the following limitations:
  • It only deals with the whole buffer; selections are supported by the vim object, but I haven't made use of them
  • It only deals with whole XML documents; I haven't tried to support fragments
  • It is a bit picky about XML declarations - if you specify UTF-16 but encode with UTF-8 or ASCII, it will barf (just 'cos the Python minidom does) - arguably though, that's a feature.
This is not a very complicated example (and arguably quite a lazy solution) but what it does show is the potential - you can use the whole of the Python Standard Library to manipulate the contents of your buffer - there's nothing to stop you writing a function to tweet the current selection, for instance.

Wednesday, 9 March 2011

User Impersonation Class

I recently needed to impersonate a different user temporarily in an application. I've wrapped up the logic in the class below, I'll walk through the various features afterwards.

using System;
using System.Security.Principal;
using System.Runtime.InteropServices;
using System.Security.Permissions;
using System.ComponentModel;
using System.Security;

public sealed class UserImpersonator
  : IDisposable
{
  #region fields

  private readonly WindowsImpersonationContext _context;

  private const int LOGON32_LOGON_INTERACTIVE = 2;
  private const int LOGON32_PROVIDER_DEFAULT = 0;

  #endregion

  #region constructor

  [PermissionSet(SecurityAction.Demand, Name="FullTrust")]
  private UserImpersonator(string user, SecureString password, string domain = null)
  {
    if (String.IsNullOrEmpty(user))
      throw new ArgumentException("user cannot be null or empty");

    if (String.IsNullOrEmpty(domain))
      domain = Environment.MachineName;

    if (password == null)
      throw new ArgumentNullException("password");

    IntPtr token = IntPtr.Zero;
    bool loginResult;

    {
      IntPtr pwd = IntPtr.Zero;
      try
      {
        pwd = Marshal.SecureStringToGlobalAllocUnicode(password);

        loginResult = LogonUser(
          user,
          domain,
          pwd,
          LOGON32_LOGON_INTERACTIVE,
          LOGON32_PROVIDER_DEFAULT,
          ref token
        );
      }
      finally
      {
        Marshal.ZeroFreeGlobalAllocUnicode(pwd);
      }
    }

    if (!loginResult)
    {
      int err = Marshal.GetLastWin32Error();
      throw new Win32Exception(err);
    }

    try
    {
      var identity = new WindowsIdentity(token);
      _context = identity.Impersonate();
    }
    finally
    {
      CloseHandle(token);
    }
  }

  #endregion

  #region methods

  public void Dispose()
  {
    _context.Undo();
  }

  public void EndImpersonation()
  {
    Dispose();
  }

  public static UserImpersonator Impersonate(string user, SecureString password, string domain)
  {
    return new UserImpersonator(user, password, domain);
  }

  [DllImport("advapi32.dll", SetLastError=true, CharSet=CharSet.Unicode)]
  private static extern bool LogonUser(
    string user,
    string domain,
    IntPtr password,
    int logonType,
    int logonProvider,
    ref IntPtr token
  );

  [DllImport("kernel32.dll")]
  private static extern bool CloseHandle(IntPtr handle);

  #endregion
}

I have made the constructor private, forcing the use of the factory method Impersonate and hence making it explicit that code running immediately after this call is running under the context of the user being impersonated; also I have implemented IDisposable to allow calling code to employ the using construct, as follows:

string user, domain;
SecureString password;

// initialise credentials

using ( var u = UserImpersonator.Impersonate( user, password, domain ) )
{
  // run code as different user
}

// back to previous user

I have also defined a more semantic EndImpersonation method that simply calls Dispose.
The implementation is quite simple - the credentials passed in are used to obtain an authentication token from Windows (via the LogonUser function exported by advapi32.dll) which is then used to create a System.Security.Principal.WindowsIdentity object which provides the impersonation context. Impersonation continues until Undo is called on the WindowsImpersonationContext object returned by WindowsIdentity.Impersonate - this is done in the Dispose method, linking UserImpersonator disposal to impersonation lifetime. Note: there was no need to implement a finalizer here as when the WindowsImpersonationContext is up for Garbage Collection, it's own finalizer will take care of business.
There are a couple of implementation details that are worth noting:
My declaration of LogonUser includes strings for the username and domain but a pointer (IntPtr) for the password. The reason for this is that I don't want the user's password sitting around in memory, unencrypted, with absolutely no control over when that memory is overwritten.
The SecureString class maintains an automatically encrypted buffer for storing string data, the contents of which can be decrypted and written to unmanaged memory using the Marshal class' SecureStringToGlobalAllocUnicode method for just long enough to make the logon call, before that memory is zeroed out and freed by Marshal.ZeroFreeGlobalAllocUnicode (in a finally clause, to make sure it happens). Notice that I specify CharSet.Unicode in my declaration, but Marshal does provide ANSI equivalents.
I use the SetLastError property of the DllImportAttribute to instruct the marshalling code to allow me to retrieve specific failure information using Marshal.GetLastWin32Error.
I use the PermissionSetAttribute to ensure that any code directly or indirectly calling the UserImpersonator constructor has full trust, as defined by the machine's Code Access Security Policy. The security implications of user impersonation make this a sensible safeguard.

Thursday, 3 March 2011

Coprime Test

Two integers are coprime if and only if their highest common factor (HCF) is 1.
To put this another way, for any two integers A and B there is at least one integer n such that

an = A and bn = B
(where a and b are integers)

The highest such integer n for two integers is their HCF; for two coprime integers the highest (indeed only) value of n is 1.
So, a simple test for coprimeness is to see whether or not the HCF of the two candidate values is 1.
The Euclidean Algrithm is an efficient way to arrive at the HCF of two integers. It's based on the idea that the difference of two integers is divisible by their HCF. In other words:

A - B = an - bn = (a - b)n

Since a and b are integers, a - b is also an integer and hence A - B is divisible by n.
Repeatedly replacing the larger of A and B with the magnitude of their difference (we assume we're always dealing with positive integers) will eventually yield zero (when A = B) at which point the remaining non-zero number is the pair's HCF. In algebraic terms:

if A = B then an = bn or a = b

Which is the definition of the HCF.

This is a great way of finding the HCF but its efficiency can be improved; repeatedly subtracting B from A until B > A and then switching to subtract A from B is equivalent to taking A mod B and then switching each time, but this second method is more efficient on a machine that supports modular division as a native operation (x86 assembly language's div instruction yields quotient and remainder).
Putting this into code:

typedef unsigned int uint;

uint hcf( uint a, uint b )
{
  while ( a * b )
  {
    if ( a > b )
      a %= b;
    else
      b %= a;
  }

  return ( a > b ) ? a : b;
}

bool coprime( uint a, uint b )
{
  return hcf( a, b ) == 1;
}


Or in Python:

def hcf( n1, n2 ):
  while n1*n2:
    if n1 > n2:
      n1 %= n2
    else:
      n2 %= n1
  return max( n1, n2 )

def coprime( n1, n2 ):
  return hcf( n1, n2 ) == 1


Determining whether two numbers are coprime is important in numerous applications, including cryptography.


Update
Here's a recursive one-liner in Python:

hcf = lambda n1, n2: n1 if n2 == 0 else hcf( n2, n1 % n2 )

This looks might look like it relies on n1 > n2 but as a mod b = a when b > a, the function will automatically transpose the parameters if n1 < n2.

Friday, 4 February 2011

Guarded Event Invocation in C#

When declaring events in C# code I've always used a guarded invocation pattern, as follows:


class SomeClass {

  // ....

  public event EventHandler TaDaa;

  private void FireTaDaa() {
    var t = this.TaDaa;
    if ( t != null ) {
      t.Invoke( this, EventArgs.Empty );
    }
  }

  public void SomeMethod() {

    // ....

    this.FireTaDaa();
  }
}
This is great - having zero subscribers won't cause a NullReferenceException and the caching of the event backing field in the local variable gives cheap implicit thread safety.

Recently though a colleague showed me this pattern:


class SomeClass {

  // ....

  public event EventHandler TaDaa = delegate { };

  // no Fire... method

  public void SomeMethod() {
    
    // ....

    this.TaDaa.Invoke( this, EventArgs.Empty );

  }
}
Which protects you in case there are no subscribers and is every bit as thread safe as the other, more hand-rolled and clunky method. The initialisation of the event backing field with the anonymous, empty delegate means that the event backing field will never be a null reference - no other code has direct access to the dummy subscriber and so cannot remove it from the invocation list of the delegate*.

While the dummy subscriber takes the form of an extra object in memory at runtime - one for each event initialised in this way, these are shared between instances of the class so there is only one extra object per event, per class, per AppDomain so the cost is negligible. Also it saves your code performing the null check and the invocation of the empty dummy subscriber is something that the JIT compiler should be able to optimise out (though I haven't checked this).

All in all I like this, I'm going to adopt it.

*(provided the backing field is only modified using the Delegate.Combine and Delegate.Remove methods - but then the C# compiler enforces this, preventing you from directly accessing the backing field of an event declared as above)

Wednesday, 26 January 2011

Using vim as Visual Studio diff tool

I love vim (http://www.vim.org/). I use it for as much of my work as I can get away with and I have recently started using it as my diff/merge tool for Visual Studio/TFS. So far the diffing has been great but I have yet to use  it to merge conflicted changes, so I can't vouch for that.

To set up vim as your diff tool in Visual Studio choose Tools > Options from the menu and select the Source Control > Visual Studio Team Foundation Server node in the tree on the left (these instructions are TFS specific), giving you this view:


Click the Configure User Tools button and then click Add, giving you this box:


Fill in the values as shown (obviously the Command path may be different on your machine), and hit OK. Use the OK buttons to close the remaining two dialogs.

The file extension can be used to determine what kinds of files are diffed in this way, the command is the program used.

The argument string is as follows:
  • -d puts vim into diff mode (kinda important)
  • -R puts vim into readonly mode (making comparisons safer)
  • %1 and %2 are the names of the files to be compared, the arrow to the right of
    this box gives a full list of those available

Additionally I have added this clause to my .vimrc file:
if &diff
  set columns=240
endif

My .vimrc file specifies 120 columns so doubling this for side-by-side diffing results in a consistent experience.

Setting up for merging is identical except you choose Merge from the operation dropdown and you additionally specify %4 for the merged file name in the argument list (I put this last). Rather obviously, you do not specify -R for merge operations as you need to edit the resulting file.