Also available at

Also available at my website http://tosh.me/ and on Twitter @toshafanasiev

Wednesday, 26 October 2011

Find UUIDs with Regex Visual Studio

Visual Studio supports regular expressions in its search tool, but probably not as you know them. Regular expressions are one of those things that seem to be done slightly differently everywhere, whether it's in the number of features supported or the syntax for using those features, but most flavours are similar enough that you can quickly work out how to get what you want. Not so for Visual Studio ( or not for me at least ) - see the documentation here: http://msdn.microsoft.com/en-us/library/2k3te2cs(v=vs.100).aspx.

This is not meant to be a full run through the syntax and features - the docs give you that - this is just meant to illustrate one of the key differences; how quantifiers are denoted ( and also as a reminder to me next time I'm shouting at Visual Studio for not finding the thing I know damn well is there ).

Normally to find all UUIDs I'd use a pattern like


[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}


but in Visual Studio quantifiers are expressed differently and you need something like


[0-9a-fA-F]^8-[0-9a-fA-F]^4-[0-9a-fA-F]^4-[0-9a-fA-F]^4-[0-9a-fA-F]^12


which is fine once you know about it.

As well as using different symbols for other quantifier constructs ( @ instead of *, # instead of + ) there doesn't seem to be support for an optional expression ( normally ? ) or range quantifiers ( {m,n} ).

Oh and don't forget :b to match whitespace!?!!!

Ho hum, there it is.

Wednesday, 19 October 2011

UUID/GUID parsing

I recently needed to parse a string representation of a UUID to arrive at the binary representation. I opted for sscanf (actually sscanf_s to placate the compiler) but found a detail of its format specifier that I thought was noteworthy.
The documentation defines a set of type specifiers (u for unsigned decimal, x for hex etc.) and a set of modifiers (l for a long int, h for a short int, L for a long double) but no modifier for a single byte width target.
Here's the declaration of a UUID:
typedef struct _GUID {
  unsigned long  Data1;
  unsigned short Data2;
  unsigned short Data3;
  unsigned char  Data4[8];
} GUID, UUID;
I didn't want to simply pass the addresses of the elements of Data4 to sscanf as the smallest target I could specify is a short (16 bit) int, and even if I was willing to rely on the fields being parsed in order (which they almost certainly are), this would still result in some memory trampling when the pointer to the last element of Data4 (8 bit) is written to as though it were a pointer to a short int (16 bit).
To get around this I defined a temporary array whose elements are wide enough for the default target width, passed the addresses of these elements and then used a narrowing conversion to assign to the Data4 member. Note that since I specify 2 characters of hex for each of the Data4 values I can confidently assign via a narrowing conversion as 0xff is the maximum possible value.
Here is the code (note that it requires the input to be of the form "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" (8X-4X-4X-4X-12X) ):

#include <stdio.h>
#include <cassert>

static const int BYTE_COUNT = 8;

GUID parse_guid( const char* const input ) {
 assert( input );
 GUID result;
 // for the %2x values, %8lx and %4hx can go straight in
 unsigned int bytes[ BYTE_COUNT ];
 int number_converted = sscanf_s(
    input
  , "%8lx-%4hx-%4hx-%2x%2x-%2x%2x%2x%2x%2x%2x"
  , &result.Data1
  , &result.Data2
  , &result.Data3
  , &bytes[ 0 ]
  , &bytes[ 1 ]
  , &bytes[ 2 ]
  , &bytes[ 3 ]
  , &bytes[ 4 ]
  , &bytes[ 5 ]
  , &bytes[ 6 ]
  , &bytes[ 7 ]
 );

 assert( number_converted == BYTE_COUNT + 3 );

 // copy over the %2x values discarding high bytes
 for ( int i = 0; i != BYTE_COUNT; ++i ) {
  result.Data4[ i ] = ( BYTE )bytes[ i ];
 }

 return result;
}