Also available at

Also available at my website http://tosh.me/ and on Twitter @toshafanasiev

Wednesday, 19 October 2011

UUID/GUID parsing

I recently needed to parse a string representation of a UUID to arrive at the binary representation. I opted for sscanf (actually sscanf_s to placate the compiler) but found a detail of its format specifier that I thought was noteworthy.
The documentation defines a set of type specifiers (u for unsigned decimal, x for hex etc.) and a set of modifiers (l for a long int, h for a short int, L for a long double) but no modifier for a single byte width target.
Here's the declaration of a UUID:
typedef struct _GUID {
  unsigned long  Data1;
  unsigned short Data2;
  unsigned short Data3;
  unsigned char  Data4[8];
} GUID, UUID;
I didn't want to simply pass the addresses of the elements of Data4 to sscanf as the smallest target I could specify is a short (16 bit) int, and even if I was willing to rely on the fields being parsed in order (which they almost certainly are), this would still result in some memory trampling when the pointer to the last element of Data4 (8 bit) is written to as though it were a pointer to a short int (16 bit).
To get around this I defined a temporary array whose elements are wide enough for the default target width, passed the addresses of these elements and then used a narrowing conversion to assign to the Data4 member. Note that since I specify 2 characters of hex for each of the Data4 values I can confidently assign via a narrowing conversion as 0xff is the maximum possible value.
Here is the code (note that it requires the input to be of the form "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" (8X-4X-4X-4X-12X) ):

#include <stdio.h>
#include <cassert>

static const int BYTE_COUNT = 8;

GUID parse_guid( const char* const input ) {
 assert( input );
 GUID result;
 // for the %2x values, %8lx and %4hx can go straight in
 unsigned int bytes[ BYTE_COUNT ];
 int number_converted = sscanf_s(
    input
  , "%8lx-%4hx-%4hx-%2x%2x-%2x%2x%2x%2x%2x%2x"
  , &result.Data1
  , &result.Data2
  , &result.Data3
  , &bytes[ 0 ]
  , &bytes[ 1 ]
  , &bytes[ 2 ]
  , &bytes[ 3 ]
  , &bytes[ 4 ]
  , &bytes[ 5 ]
  , &bytes[ 6 ]
  , &bytes[ 7 ]
 );

 assert( number_converted == BYTE_COUNT + 3 );

 // copy over the %2x values discarding high bytes
 for ( int i = 0; i != BYTE_COUNT; ++i ) {
  result.Data4[ i ] = ( BYTE )bytes[ i ];
 }

 return result;
}

No comments:

Post a Comment