Also available at

Also available at my website http://tosh.me/ and on Twitter @toshafanasiev

Monday, 22 November 2010

SQL DateTime Explored

Here is a quick rummage in the innards of Microsoft SQL Server's DateTime type - I have intentionally kept wordiness to a minimum - it's all there in the code.

print 'datetime is represented by eight bytes of data';
print '==============================================';

declare @now as datetime;
set @now = getdate();

declare @now_raw as binary( 8 );
set @now_raw = cast( @now as binary( 8 ) );

select
 @now [@now: getdate()]
, @now_raw [@now_raw: cast( @now as binary( 8 ) )];

/*****************************************************************************
*****************************************************************************/

print 'the base datetime is midnight ( as in beginning of ) 1st January 1900';
print '=====================================================================';

declare @base as datetime ;
set @base = cast( 0 as datetime );

select
 @base [@base: cast( 0 as datetime )]
, cast( @base as binary( 8 ) ) [cast( @base as binary( 8 ) )];

/*****************************************************************************
*****************************************************************************/

print 'these eight bytes represent two four byte integers';
print '==================================================';

declare @days_raw binary( 4 );
declare @ticks_raw binary( 4 );
/* note: substring( value, start, length ) is offset from 1 */
set @days_raw = substring( @now_raw, 1, 4 );
set @ticks_raw = substring( @now_raw, 5, 4 );

select
 @days_raw [@days_raw: substring( @now_raw, 1, 4 )]
, @ticks_raw [@ticks_raw: substring( @now_raw, 5, 4 )];

/*****************************************************************************
*****************************************************************************/

print 'the first represents the number of days since the base date';
print '===========================================================';

declare @days int;
set @days = cast( @days_raw as int );
select
 datediff( day, @base, @now ) [datediff( day, @base, @now )]
, @days [@days: cast( @days_raw as int )];

/*****************************************************************************
*****************************************************************************/

print 'if padded correctly it can just be cast to datetime';
print '===================================================';

select
 cast( @days_raw + 0x00000000 as datetime ) [cast( @days_raw + 0x00000000 as datetime )];

/*****************************************************************************
*****************************************************************************/

print 'the date value can be separated from the time without doing binary manipulation';
print 'note: 0 is implicitly converted to the base datetime';
print '===============================================================================';

declare @today datetime;
set @today = dateadd( day, datediff( day, 0, @now ), 0 );
select
 @today [@today: dateadd( day, datediff( day, 0, @now ), 0 )];

/*****************************************************************************
*****************************************************************************/

print 'the second is the number of ticks since midnight, a tick is 1/300 seconds long';
print '==============================================================================';

declare @ticks int;
set @ticks = cast( @ticks_raw as int );
select
 datediff( second, @today, @now ) [datediff( second, @today, @now )]
, @ticks [@ticks: cast( @ticks_raw as int )]
, @ticks / 300 [@ticks / 300]
, @ticks / 300.0 [@ticks / 300.0];

/*****************************************************************************
*****************************************************************************/

print 'a datetime can be cast to a float representing number of days since base';
print '========================================================================';

select
 cast( @now as float ) [cast( @now as float )];

/*****************************************************************************
*****************************************************************************/

print 'this can be checked by constructing a float from the days and ticks values';
print '==========================================================================';
/* note: float conversion prevents data loss during integer division */
select
 @days + ( @ticks / cast( ( 300 * 60 * 60 * 24 ) as float ) ) [@days + ( @ticks / cast( ( 300 * 60 * 60 * 24 ) as float ) )];

/*****************************************************************************
*****************************************************************************/

print 'which means that [ cast( cast( @date_time_value as float ) as datetime ) == @date_time_value ] should be invariant';
print '==================================================================================================================';

select
 @now [@now]
, cast( cast( @now as float ) as datetime ) [cast( cast( @now as float ) as datetime )]

print 'and hence that the mean of a set of datetime values can be calculated by casting to float';
print '=========================================================================================';

print 'create table #temp_date ( value datetime );'
print '';
create table #temp_date ( value datetime );

print 'insert into #temp_date ( value ) values ( getdate() );';
insert into #temp_date ( value ) values ( getdate() );
print 'insert into #temp_date ( value ) values ( ''1980-01-19'' );';
insert into #temp_date ( value ) values ( '1980-01-19' );
print 'insert into #temp_date ( value ) values ( ''2000-01-01'' );';
insert into #temp_date ( value ) values ( '2000-01-01' );

print '#temp_date:';
select t.value from #temp_date t;

select cast( avg( cast( t.value as float ) ) as datetime ) [cast( avg( cast( t.value as float ) ) as datetime )]
from #temp_date t;

drop table #temp_date;


Running this lot gave me the following:

datetime is represented by eight bytes of data
==============================================
@now: getdate()         @now_raw: cast( @now as binary( 8 ) )
----------------------- -------------------------------------
2010-11-22 13:51:35.577 0x00009E3600E46761

(1 row(s) affected)

the base datetime is midnight ( as in beginning of ) 1st January 1900
=====================================================================
@base: cast( 0 as datetime ) cast( @base as binary( 8 ) )
---------------------------- ----------------------------
1900-01-01 00:00:00.000      0x0000000000000000

(1 row(s) affected)

these eight bytes represent two four byte integers
==================================================
@days_raw: substring( @now_raw, 1, 4 ) @ticks_raw: substring( @now_raw, 5, 4 )
-------------------------------------- ---------------------------------------
0x00009E36                             0x00E46761

(1 row(s) affected)

the first represents the number of days since the base date
===========================================================
datediff( day, @base, @now ) @days: cast( @days_raw as int )
---------------------------- -------------------------------
40502                        40502

(1 row(s) affected)

if padded correctly it can just be cast to datetime
===================================================
cast( @days_raw + 0x00000000 as datetime )
------------------------------------------
2010-11-22 00:00:00.000

(1 row(s) affected)

the date value can be separated from the time without doing binary manipulation
note: 0 is implicitly converted to the base datetime
===============================================================================
@today: dateadd( day, datediff( day, 0, @now ), 0 )
---------------------------------------------------
2010-11-22 00:00:00.000

(1 row(s) affected)

the second is the number of ticks since midnight, a tick is 1/300 seconds long
==============================================================================
datediff( second, @today, @now ) @ticks: cast( @ticks_raw as int ) @ticks / 300 @ticks / 300.0
-------------------------------- --------------------------------- ------------ ---------------------------------------
49895                            14968673                          49895        49895.576666

(1 row(s) affected)

a datetime can be cast to a float representing number of days since base
========================================================================
cast( @now as float )
----------------------
40502.5774951003

(1 row(s) affected)

this can be checked by constructing a float from the days and ticks values
==========================================================================
@days + ( @ticks / cast( ( 300 * 60 * 60 * 24 ) as float ) )
------------------------------------------------------------
40502.5774951003

(1 row(s) affected)

which means that [ cast( cast( @date_time_value as float ) as datetime ) == @date_time_value ] should be invariant
==================================================================================================================
@now                    cast( cast( @now as float ) as datetime )
----------------------- -----------------------------------------
2010-11-22 13:51:35.577 2010-11-22 13:51:35.577

(1 row(s) affected)

and hence that the mean of a set of datetime values can be calculated by casting to float
=========================================================================================
create table #temp_date ( value datetime );
 
insert into #temp_date ( value ) values ( getdate() );

(1 row(s) affected)
insert into #temp_date ( value ) values ( '1980-01-19' );

(1 row(s) affected)
insert into #temp_date ( value ) values ( '2000-01-01' );

(1 row(s) affected)
#temp_date:
value
-----------------------
2010-11-22 13:51:35.577
1980-01-19 00:00:00.000
2000-01-01 00:00:00.000

(3 row(s) affected)

cast( avg( cast( t.value as float ) ) as datetime )
---------------------------------------------------
1996-12-24 04:37:11.857

(1 row(s) affected)




Note: rounding errors sometimes creep into the purportedly invariant condition stated above so watch for the stray 5 milliseconds or so!

Wednesday, 3 November 2010

Controlling Text Input Form Submission

The content of this post is not really ground-breaking stuff but it's the kind of thing I can see myself referring back to and it may be of use to someone else.
I recently implemented an incremental search box on a web application and wanted to control what effect the <ENTER> key had on the form's behaviour. The application is built so that it will work if JavaScript is disabled - the user simply enters a search term and clicks search ( or presses <ENTER> ) to get a list of matches; the client side script supplements this by serving up results as the user types ( fairly standard behaviour ).
As well as handling individual keystrokes I wanted to launch the top search term on <ENTER>. In order to achieve this I had to disable the browser's default behaviour of submitting the form when <ENTER> is pressed in a single line text input.
Below is the neatest solution I could find that worked in all ( major ) browsers ( detail omitted ):
var searchBox
  = document.getElementById( 'searchBox' );

// ....

// use content of search box to refine results
searchBox.onkeyup = function( e ) {
  e = e || window.event; // ensure event obj
  
  if ( e.keyCode == 13/*RETURN*/ ) {
    // use topmost search result
  }
  else {
    // update results using searchBox.value
  }
}

// prevent form submission when <ENTER> is pressed
searchBox.onkeypress = function( e ) {
  e = e || window.event; // ensure event obj

  return e.keyCode != 13/*ENTER*/;
}

I found that the onkeyup event was the best place to use the input's value and the onkeypress event was the best place to control the form's behaviour - I tried other combinations as well as using the event's cancelBubble property and stopPropagation() method as described here http://www.quirksmode.org/js/introevents.html but in the end only the above actually did what I wanted.

Monday, 25 October 2010

Singleton Pattern in C#

The Singleton Pattern is a strategy that provides a single instance of a certain class for an entire application ( strictly speaking in .NET it's per Application Domain ); and ensures that any reference to an instance of that class in that application is to that single instance. This approach can help maintain consistency in multithreaded scenarios and simplify resource sharing while requiring a minimum of not-very-object-oriented static-heavy code.
This is one of the Design Patterns attributed to the Gang of Four.
This pattern is typically implemented using lazy instantiation - only creating the instance the first time it is referenced - which can be achieved using double-checked locking to minimise the performance overhead of thread synchronisation, as follows:
sealed class SingletonOne {
  private SingletonOne() {
    // constructor logic
  }
  private static SingletonOne s_instance;
  private static object s_padlock = new Object();
  public static SingletonOne Instance {
    get {
      if ( s_instance == null ) {
        lock( s_padlock ) {
          if ( s_instance == null ) {
            s_instance = new SingletonOne(); 
          }
        }
      }
      return s_instance;
    }
  }
}

There is nothing actually wrong with this ( as opposed to the examples you see that implement
no thread-synchronisation at all ) but it is more handmade and marginally less efficient than the following implementation which takes advantage of some built-in CLR features:
sealed class SingletonTwo {
  private SingletonTwo() {
    // constructor logic
  }
  public static readonly SingletonTwo Instance = new SingletonTwo();
}

What this second implementation gives you is a single instance that is thread-safely single ( ensured by the CLR's initialisation of static fields and enforcement of the readonly attribute ) but without the overhead of the null check or even the accessor call - this field access is about 20 times faster than the above property access ( see complete program at the end of the post ); as well as far less code to write, maintain, curse etc.
One thing to be aware of, however, is the timing of this new initialisation scheme. Disassembling the program with ILDasm.exe shows the class declaration as follows:
.class private auto ansi sealed beforefieldinit SingletonTwo
  extends [mscorlib]System.Object

The point of interest here being the beforefieldinit attribute - this tells the CLR to ensure that the field is initialised at some point before the field is accessed; in other words, it could occur quite a while before - at any point when the CLR deems appropriate based on factors such as CPU load ( there's a very good article on this here: http://csharpindepth.com/Articles/General/Beforefieldinit.aspx ).
If the timing of a singleton's initialisation code is important and it should run just before first access, not at some arbitrary point earlier on; this can be achieved by instructing the compiler to omit the beforefieldinit attribute, which in C# is as simple as defining a class constructor ( sometimes called a static constructor ). The following class definition shows this in action:
sealed class SingletonThree {
  private SingletonThree() {
    // constructor logic
  }
  public static readonly SingletonThree Instance;
  static SingletonThree() {
    Instance = new SingletonThree();
  }
}

This class definition disassembles to the following:
.class private auto ansi sealed SingletonThree
  extends [mscorlib]System.Object

( note the lack of beforefieldinit; note also that it's irrelevant what the class constructor actually does - its presence signals the compiler to omit that attribute )
So all in all: cleaner, quicker code and control over timing - what's not to like?
By the way, for my money the best reference of on MSIL internals such as beforefieldinit is Inside Microsoft .NET IL Assembler by Serge Lidin, the guy behind IL itself.

Below is a complete program that includes the performance testing from which I arrived at the property access vs. field access figures:
using System;
using System.Diagnostics;

sealed class SingletonOne {
  private SingletonOne() {
    // constructor logic
  }
  private static SingletonOne s_instance;
  private static object s_padlock = new Object();
  public static SingletonOne Instance {
    get {
      if ( s_instance == null ) {
        lock( s_padlock ) {
          if ( s_instance == null ) {
            s_instance = new SingletonOne(); 
          }
        }
      }
      return s_instance;
    }
  }
}

sealed class SingletonTwo {
  private SingletonTwo() {
    // constructor logic
  }
  public static readonly SingletonTwo Instance = new SingletonTwo();
}

sealed class SingletonThree {
  private SingletonThree() {
    // constructor logic
  }
  public static readonly SingletonThree Instance;
  static SingletonThree() {
    Instance = new SingletonThree();
  }
}

class test {
  static void Main() {
    Console.WriteLine(
      "Testing SingletonOne: {0}", SingletonOne.Instance == SingletonOne.Instance
    );
    Console.WriteLine(
      "Testing SingletonTwo: {0}", SingletonTwo.Instance == SingletonTwo.Instance
    );
    Console.WriteLine(
      "Testing SingletonThree: {0}", SingletonThree.Instance == SingletonThree.Instance
    );

    // performance testing
    const int ITERATIONS = 50000000;

    // test property access
    GC.Collect();
    GC.WaitForPendingFinalizers();
    Stopwatch timerOne = Stopwatch.StartNew();
    for ( int i=0; i<ITERATIONS; i++ ) {
      SingletonOne s = SingletonOne.Instance;
    }
    timerOne.Stop();

    // test field access
    GC.Collect();
    GC.WaitForPendingFinalizers();
    Stopwatch timerTwo = Stopwatch.StartNew();
    for ( int i=0; i<ITERATIONS; i++ ) {
      SingletonTwo s = SingletonTwo.Instance;
    }
    timerTwo.Stop();

    Console.WriteLine(
      "Property access: {0} ms", timerOne.ElapsedMilliseconds / ( double ) ITERATIONS
    );
    Console.WriteLine(
      "Field access: {0} ms", timerTwo.ElapsedMilliseconds / ( double ) ITERATIONS
    );
  }
}

Tuesday, 12 October 2010

Overflow checking in VB.NET

I am currently porting a VB6 application to VB.NET and having run the project through the VS2008 conversion wizard, I am now trying to fix the as yet unknowable number of errors that it could not resolve. One of these relates to truncation of integers to yield hi- or lo- words.
The hi-word was ok - VB.NET gives you shift operators so the following is possible.

Dim i As Int32
'...
Dim s As Short
s = i >> 16

Ordinarily ( i.e. in C# ) the lo-word would not pose a problem either as an unchecked ( the default ) conversion from int to short would yield the correct result, simply discarding the upper two bytes

static short LoWord( int val ) {
  return ( short ) val;
}

This can be confirmed by disassembling the compiled IL code ( using ILDasm.exe )

.method private hidebysig static int16  LoWord(int32 val) cil managed
{
  // Code size       8 (0x8)
  .maxstack  1
  .locals init (int16 V_0)
  IL_0000:  nop
  IL_0001:  ldarg.0
  IL_0002:  conv.i2
  IL_0003:  stloc.0
  IL_0004:  br.s       IL_0006
  IL_0006:  ldloc.0
  IL_0007:  ret
} // end of method prog::LoWord

The conv.i2 instruction labelled IL_0002 does not perform overflow checking. However, a similarly written function in VB.NET does not behave the same way:

Function LoWordVb( ByVal i As Integer ) As Short
  Return CShort( i )
End Function

This disassembles to:

.method public static int16  LoWordVb(int32 i) cil managed
{
  // Code size       7 (0x7)
  .maxstack  1
  .locals init (int16 V_0)
  IL_0000:  ldarg.0
  IL_0001:  conv.ovf.i2
  IL_0002:  stloc.0
  IL_0003:  br.s       IL_0005
  IL_0005:  ldloc.0
  IL_0006:  ret
} // end of method prog::LoWordVb

Which uses the same overflow checking conv.ovf.i2 instruction as the following C# function:

  static short LoWordChecked( int val ) {
    checked {
      return ( short ) val;
    }
  }

Which disassembles to:

.method private hidebysig static int16  LoWordChecked(int32 val) cil managed
{
  // Code size       9 (0x9)
  .maxstack  1
  .locals init (int16 V_0)
  IL_0000:  nop
  IL_0001:  nop
  IL_0002:  ldarg.0
  IL_0003:  conv.ovf.i2
  IL_0004:  stloc.0
  IL_0005:  br.s       IL_0007
  IL_0007:  ldloc.0
  IL_0008:  ret
} // end of method prog::LoWordChecked

( There are a few details that you need to be aware of when using overflow checking in C# which I won't go into - there's a very comprehensive and well written post on the subject here: http://www.codeproject.com/KB/cs/overflow_checking.aspx )
So, while C# offers extremely granular control of overflow checking by means of the checked construct, VB.NET does not ( that I could find out about ), so short of controlling this behaviour globally for the entire application ( something which I am a little wary of doing ), there's no immediately obvious way to perform this apparently simple task.
My solution was to copy, memory wise, the lower bytes of the Int32 to the Int16; this may not be the most elegant or efficient solution, but it does work.
VB.NET does not offer any of the syntax-level pointer manipulation that C# does ( once you use the terrifyingly named 'unsafe' construct and compiler switch ), however the .NET BCL makes everything you need available via the System.Runtime.InteropServices namespace in Mscorlib.dll - once you get to know these tools you can use them from any CLR targeting language.
First you need an instance of the GCHandle structure for the managed object you are copying from ( in this case it will have to be a boxed copy of the Int32 ), you obtain this as follows:

Dim i As Int32
' ....
Dim h As GCHandle = GCHandle.Alloc( i, GCHandleType.Pinned )

And the GCHandle must be cleaned up responsibly to prevent resource leaks ( yes, in .NET ) - I suggest a try/finally block immediately after allocation that calls h.Free() in the finally clause - ensuring that it's freed. 
One thing that may occur to you is that passing a value type instance such as Int32 as a parameter of type Object will result in a box operation, generating a reference that is never explicitly held onto by the calling function and that this could cause problems - this is not the case as both the Pinned and Normal members of the GCHandleType enumeration result in the GCHandle instance preventing the reference passed in from being collected until the handle is itself freed. Using Pinned rather than Normal gives our code the opportunity to use the address of the instance without worrying about it being moved by the garbage collector.
The next step is using the Marshal class to copy from memory to a managed instance

Dim s As Short
Dim o As Object
o = Marshal.PtrToStructure( h.AddrOfPinnedObject(), GetType( Short ) )
' simple unbox
s = CShort( o )

That's it! These elements can be put together in a function as follows ( a complete program is available at the end of the post ):

Imports System.Runtime.InteropServices
'....
Function LoWordVbUnchecked( ByVal i As Integer ) As Short
  Dim s As Short
  Dim h As GCHandle = GCHandle.Alloc( i, GCHandleType.Pinned )
  Try
    Dim o As Object
    o = Marshal.PtrToStructure( h.AddrOfPinnedObject(), GetType( Short ) )
    s = CShort( o )
  Finally
    h.Free()
  End Try
  Return s
End Function

A lot of words, you might think, but I think that this example serves to cover many interesting aspects of developing code for the .NET CLR.

Complete program:

'author: Tosh Afanasiev ( http://tosh.me/ )

Imports System
Imports System.Runtime.InteropServices

Module prog

  Public Sub Main()
    Dim i As Int32 = &Haaaabbbb
    Dim s As Int16 = i >> 16
    Console.WriteLine( "i: {0:x}", i )
    Console.WriteLine( "hi: {0:x}", s )
    Console.WriteLine( "using LoWordVb causes an OverflowException for this value" )
    Try
      s = LoWordVb( i )
    Catch ex As OverflowException
      Console.WriteLine( "told you: {0}", ex )
    End Try
    Console.WriteLine( "LoWordVbUnchecked is ok though" )
    s = LoWordVbUnchecked( i )
    Console.WriteLine( "lo: {0:x}", s )
  End Sub

  Function LoWordVb( ByVal i As Integer ) As Short
    Return CShort( i )
  End Function

  Function LoWordVbUnchecked( ByVal i As Integer ) As Short
    Dim s As Short
    Dim h As GCHandle = GCHandle.Alloc( i, GCHandleType.Pinned )
    Try
      Dim o As Object
      o = Marshal.PtrToStructure( h.AddrOfPinnedObject(), GetType( Short ) )
      s = CShort( o )
    Finally
      h.Free()
    End Try
    Return s
  End Function

End Module

Friday, 10 September 2010

Python slice operator

I've no doubt that this is in the Python docs ( http://docs.python.org/ ) somewhere ...

Python's slice operator for sequence-like objects is invariant as follows:


s[:n] + s[n:] == s


where s is a sequence and n is in the range [ 0, len(s) )

Monday, 6 September 2010

Python floating point range function

I recently noticed that Python's built in range( [start,] stop [,step] ) function does not support floating point values for the optional step parameter ( not 2.6.2 at least ), so until this is fixed, or in cases where you're stuck with it, here's a workaround:


def rangef( min, max, step=1.0 ):
 '''rangef( min, max, step=1.0 )\nlike range() but supports floats'''
 sf = 1.0 / step # scalefactor
 for i in range( int( min * sf ), int( max * sf ) ):
  yield i / sf

Tuesday, 4 May 2010

Amazon S3 url gotcha

I have recently been tasked with producing a 'one click' operation to provision a Windows server on EC2 to host an SQL Server database, a web application and an FTP server ( I won't go into the details of my solution as they warrant a post of their own ).

In order to get a production ready server online and secure from scratch I've had to
  • provision storage with S3
  • configure Elastic Block Storage ( EBS ) disks
  • upload and register a previously bundled Amazon Machine Image ( AMI )
  • configure Security Groups ( AWS firewalls )
  • manage key pairs
  • allocate Elastic IP addresses
all in all making it a really engaging introduction to the Amazon Web Services API.

One interesting featurette of the S3 service that particularly stood out was the impact of AWS Region on access to objects stored under S3.

Under the default, US - Standard ( East Coast ) Region objects can be accessed by one of two methods - the  subdomain way and the path way; the urls http://<bucket-name>.s3.amazonaws.com/<key-name> ( subdomain ) and http://s3.amazonaws.com/<bucket-name>/<key-name> ( path ) are both valid.

In order to reduce latency you could choose to deploy in one of the other regions ( US - Northern California, EU - Ireland or the newly added Asia Pacific - Singapore ) as the geographical distribution of your user base dictates. If you do, note that the path form of S3 object urls is not valid ( at time of writing 2006-03-01 was the latest version, see http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index.html for details ) and the subdomain version must be used instead.

This poses no problem so long as you have control over the source of urls or your code can be made to follow 301 redirects but a problem I ran into was that without modification neither the official Java sdk ( http://aws.amazon.com/sdkforjava/ ) nor .NET sdk ( http://aws.amazon.com/sdkfornet/ ) can register AMIs from bundled data stored in S3 buckets in any region other than US - Standard. This is due to the fact that 'http://s3.amazonaws.com/' is prepended to any string you give them that does not already start that way, and for all regions but US - Standard the path form that is insisted upon results in a 301.

As it happens the user base of the product I am working on is never expected to grow beyond quite a small number and so I can afford the slight hit in latency terms and simply deploy to the default region.

I've written this post in case anyone else encounters the same problem and can either save themselves some trawling or let me know a simple way around this for future reference.

Friday, 16 April 2010

Visual Basic ( I know, I know ) variable type shortcuts

Over the past few days I've had the undiluted pleasure of working on a legacy VB6 application that nobody else in the company will so much as read through.
During this process I have come across the cryptic hieroglyphs that are variable declaration type shortcuts. For my own reference and for anyone else who may be enduring a similar experience I am posting them here:
!
Single
#
Double
%
Integer
&
Long
$
String
@
Currency

Which lets you write Dim i% instead of Dim i As Integer. Lucky you.

Incidentally, date and time literals should be delimited by a pair of # characters.

That is all.

Thursday, 28 January 2010

Making instanceof work for you

When writing generic JavaScript code, you need to be able to handle a wide range of input values and often take different actions based on their type. To detect a value's type, the instanceof operator is often employed, but to varying degrees of success due to a subtlety in its behaviour.

The instanceof operator returns true if the operand to its left is an instance of the object type to it's right ( go here for more detail ), but the results can sometimes be surprising:

// "hi there" is a String, right?
( "hi there" instanceof String ) == false
( new String( "hi there" ) instanceof String ) == true
// sometimes?

The key here is the phrase instance of object type; literal values such as "hi there" and 3.141 are not objects, they are primitive values, or primitives for short, and as such are not instances of any object type, and so do not meet the criteria for instanceof .

The confusion arises from the fact that literals apparently support object instance methods:

// these both work
window.alert( 3.141 .toString() ); // note the gap
window.alert( "hi there".split( /\s+/ )[0] );

JavaScript, like the Java Runtime and the .NET CLR ( and many other VMs ), does something behind the scenes to make calls like the two above legal: it wraps a primitive value in a true Object and applies the instance method call to that object, a process known as boxing.

The fact that JavaScript boxes primitives for method calls but not for instanceof evaluation is a design decision, rather than an error; and armed with this knowledge, we can make instanceof work for us. Here is a little utility function I wrote ( I borrowed the name from a CLR IL opcode ) that I have found very useful:
function isinst( value, type ) {
// box before testing
return new Object( value ) instanceof type;
}

And now:

isinst( 7, Number ) == true
isinst( "hi there", String ) == true
isinst( new String( "hi there" ), String ) == true

Hooray!!

I'm not sure how correct this approach but it works and it's easy to remember.

Monday, 25 January 2010

Removing Subversion bindings

There's no rocket science here, but if you find yourself de-svn-ifying directory trees on an even vaguely regular basis, you'll want some automated way of doing it.

If you use Tortoise SVN ( http://tortoisesvn.tigris.org/ ), and you don't mind copying the contents of the directory you are trying to unbind, a very simple solution is to right click in the directory and choose 'export' from the Tortoise menu ( or use svn export on the command line ) - this will export ( i.e. copy directory structure minus bindings ) the tree to the location you specify. It should be noted, however, that it exports the tree in the state in which it finds it, not a copy of the repository you checked out of - any changes you have made locally are exported ( though unversioned files are not ).

If you want to avoid the copy of an export, or otherwise want to remove the source control bindings in place, a script may be the answer.

I'm not much of a shell scripter and I'm a huge fan of Python ( http://python.org/ ) so I wrote a Python script for removing Subversion ( http://subversion.tigris.org/ ) bindings which I have found so useful that I'm sharing it.

Feedback is extremely welcome, but please use with caution - it does remove entire directories.


'''
this utility script was written by tosh afanasiev.
it comes with no warrantee of any sort.

http://tosh.me/
'''
import os, shutil, stat

SVN_DIR = '.svn'

def remove_bindings( dirname, binding_dir=SVN_DIR ):
    '''
    deletes all svn binding directories in a directory tree.
    a different name can be specified for @binding_dir,
    with the result that 
    directories with that name will be removed.
    '''
    
    # walk the directory
    for root, dirs, files in os.walk( dirname ):

        # test for a binding directory
        if binding_dir in dirs:

            # if found, walk the binding directory
            path = os.path.join( root, binding_dir )
            for broot, bdirs, bfiles in os.walk( path ):
                for f in bfiles:
                    # ensure that all files are writeable
                    os.chmod(
                      os.path.join( broot, f )
                    , stat.S_IWRITE
                    )

            # and finally remove the binding directory
            shutil.rmtree( path )

def main():
    '''
    this function is called if you execute the
    script rather than import it
    '''
    dirname = raw_input( 'directory name:\n' )
    remove_bindings( dirname )

if __name__ == '__main__':
    main()

Thursday, 14 January 2010

Installing IronPython Studio

Right, this is my first blog post - I'm going to jump right in with a brief note on installing IronPython Studio.

I found an excellent resource: http://blog.benhall.me.uk/2008/03/ironpython-studio.html
( thanks Ben ) which really has to take credit for the content of this post - I just wanted my own record of it.

So, here are the steps:
  1. Use the link below to get the Visual Studio 2008 Shell ( isolated mode ) Redistributable
    ( http://www.microsoft.com/downloads/details.aspx?FamilyId=ACA38719-F449-4937-9BAC-45A9F8A73822&displaylang=en ) [tidy, no?]
  2. Run this to unpack the actual installer ( plus license, notes etc. )
  3. Run the extracted installer ( make sure you've closed any VS related apps )
  4. Now go to http://www.codeplex.com/IronPythonStudio to get the VS Shell add-in
  5. Extract and run the installer you found there
More info on extending Visual Studio can be found at http://msdn.com/vsx

That's it, I'd like to find a way of installing the version that integrates with an existing VS 2008 install - but that's for another day.