Also available at

Also available at my website http://tosh.me/ and on Twitter @toshafanasiev

Wednesday, 18 July 2012

An Ode to Python in C#

One of my favourite features of Python is the List Comprehension - a feature that allows a list (Python's fundamental sequential collection type) to be declared in a literal-like way using the definition of its contents, rather than their enumeration. What's the difference?

Well, in Python an enumerative list literal looks like this

>>> numbers = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
>>> numbers
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>

Each item in the list is included in the literal - they are enumerated. This is not a huge problem for 0 to 9 but will not scale particularly well, as you can imagine. The list comprehension allows you to declare a list literal using an expression that describes the contents of the list rather than, for want of a better word, listing them:

>>> numbers = [ i for i in range( 10 ) ]
>>> numbers
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>

Actually that's not quite right - it's not really the list comprehension that I'm out to emulate here - that's just one use for it, and the one people usually encounter first. My real goal is to reproduce the bit that makes the list comprehension work - the generator expression between the square brackets. Generators can be used outside the context of lists, for example

>>> sum( i for i in range( 10 ) )
45
>>>

Here I've passed a generator to a function whose required parameter is simply something that can be used in a for loop - that's a Python for loop, the C# equivalent would be a foreach loop so I guess the C# equivalent of a generator would be a construct that allows us to define arbitrary IEnumerable<T> instances in as literal-like a way as possible.

It's no coincidence that the IEnumerable<T> interface (in the System.Collections.Generic namespace) is the fundamental building block of LINQ - in the .NET world it represents the essence of a sequence of items. Having a terse, expressive, literal-like way of defining arbitrary IEnumerable<T> objects allows us to decouple control flow structures from the operations being controlled while also giving us the option to define operations inline, without the need for separate class definitions that could hurt readability.

If I just wanted to stick to trivial examples like the one above, all the tools I need are actually present in the Enumerable class (System.Linq namespace - the one that defines all those great IEnumerable<T> extension methods) - it has a Range function similar to Python's and of course the ToList() extension

using System;
using System.Linq;

// ...

var numbers = Enumerable.Range( 0, 10 ).ToList();

But what about a more interesting list of numbers? Python's range function allows you to additionally specify a step parameter (to use it, you also have to specify the start value)

>>> numbers = [ i for i in range( 0, 10, 2 ) ]
>>> numbers
[0, 2, 4, 6, 8]
>>>

I suppose we could achieve this effect in C# by scaling a step 1 range by projection

var numbers = Enumerable.Range( 0, 5 ).Select( i => i * 2 ).ToList();

But this is already less readable and has only pushed the problem back - this is all based on a fairly specialised sequence, and while that sequence can be projected to anything you like using the Select() extension, it does require you to know up front how long the sequence will be. And what about access to the previous value in the sequence?

What we need is to be able to take any for loop

for ( var i = 0; i < 10; ++i )
 Console.WriteLine( i );

and package it up and pass it around like any other data structure, having it play nicely with all the IEnumerable<T> based goodness in .NET - including deferred execution.

To do it we need something like this

public static class Generator
{
 public static IEnumerable<T> For<T>( Func<T> setup, Predicate<T> test, Func<T, T> adv )
 {
  for ( var i = setup(); test( i ); i = adv( i ) )
  yield return i;
 }

 public static IEnumerable<T> For<T>( T start, Predicate<T> test, Func<T, T> adv )
 {
  return For( () => start, test, adv );
 }
}
which can be used like so
var numbers = Generator.For( 0, i => i < 10, i => ++i );
or
var pow2 = Generator.For( 1, i => i < 1000, i => i * 2 );
or
var letters = Generator.For( 'a', c => c <= 'z', c => ++c );
var capitals = letters.Select( c => Char.ToUpper( c ) );
etc.

Note: I've declared the setup parameter as an operation in the actual implementation, for flexibility, while providing a value overload for convenience.

Just as you'd expect from LINQ-like expressions these generators are not evaluated until they are iterated over - deferred execution. This is quite cool - you want the alphabet in a list? On one terse line?

var alphabet = Generator.For( 'a', c => c <= 'z', c => ++c ).ToList();

As is often the case though, there's a gotcha. Let's take an aside to explore this. Many developers, particularly those who've not done much C++, will write a for loop like this

for ( int i = 0; i < 10; i++ )
 // ...

And while there's nothing wrong with the semantics of this, the performance of this loop can be less than optimal if the integer is replaced with an STL iterator for instances of which copying and destruction can be non-trivial.

// C++
std::vector<int> ints;

// populate the vector

// iterator is unnecessarily copied by i++
for ( auto i = ints.begin(), e = ints.end(); i != e; i++ )
 // ...

For this reason C++ programmers will usually write their loops like this

for ( int i = 0; i < 10; ++i )
 // ...

using the prefix increment operator even for integers, just out of habit.

With the Generator class though, the semantics become important. A for loop's header consists of three clauses representing three distinct operations - setup, test, advance. Not values, operations. The advance operation does not yield a value that replaces that of the control variable - the for loop is not smart enough for that - the advance operation is responsible for modifying the control variable directly. Hence in the case of a language level for loop (performance considerations and the STL aside) it does not matter whether you choose the pre- or postfix increment operator - the resulting value is discarded and the variable ends up in the correct state.

The Generator.For<T> method takes three functions as arguments to setup, test and advance the control variable respectively. For a value type like an int, each invocation of the advance operation results in a local copy of the control variable, the modification of which is not visible outside of the function. For this reason, it is the value yielded by the advance operation that is important - this is used to overwrite the control variable.

So these are ok

var letters = Generator.For( 'a', c => c <= 'z', c => ++c );
var numbers = Generator.For( 0, i => i < 1000, i => i + 5 );
But this
var endless = Generator.For( 0, i => i < 5, i => i++ ); // don't do this
will give you a sequence of zeros that never ends. Remember, the postfix increment operator yields the value of the operand before the increment.
i == i++ // think about it

Ok great, for loops are covered. But I think there's also value in supporting while and do-while.

For instance, say you want to accept a password from a user and store it in a SecureString object. Let's also say that you want to be able to test the code that does this without having to sit there hammering away at the keys. What you need is an API that accepts an arbitrary sequence of characters

public static class Extensions
{
  public static SecureString ToSecureString( this IEnumerable<char> characters )
  {
    return characters.Aggregate(
        new SecureString()
      , ( ss, c ) => { ss.AppendChar( c ); return ss; }
    );
  }
}

public class Safe
{
  public void Unlock( IEnumerable<char> passwordChars )
  {
    var pwd = passwordChars.ToSecureString();

    // use SecureString here ....
  }
}

and then some terse, readable way of driving it from, for instance, the console
public static class Generator
{

  // ... For<T> methods omitted

  public static IEnumerable<T> YieldWhile<T>( Func<T> source, Predicate<T> condition )
  {
    var value = source();

    while ( condition( value ) )
    {
      yield return value;

      value = source();
    }
  }
}

// ...

// ReadKey( true ) suppresses key output
var fromPrompt = Generator.YieldWhile(
    () => Console.ReadKey( true )
  , k => k.Key != ConsoleKey.Enter
  ).Select( k => k.KeyChar );


// production code

var safe = new Safe();
safe.Unlock( fromPrompt );


// test code

var safe = new Safe();
safe.Unlock( "1234" );

So YieldWhile<T> lets us enumerate a sequence of values while they continue to satisfy some condition. What about the reverse - enumerating a sequence of values only while some condition is satisfied?


public static class Generator
{
  // other methods omitted

  public static IEnumerable<T> WhileYield( Func<bool> condition, Func<T> source )
  {
    while( condition() )
      yield return source();
  }
}


which can be use like so

var input = Generator.WhileYield( () =>
    {
      Console.Write( "more values? (y/n): " );
      return Console.ReadLine() == "y";
    }
  , () => Console.ReadLine()
  ).ToList();


These simple functions allow you to define arbitrary and possibly large or unpredictable sequences of values in a terse, readable and flexible way. You can use the results with LINQ. You can have an enumerable data structure with a human being on the end of it supplying the values. You can even create a list of the first ten integers, just like a Python programmer :)

A complete listing of the Generator class follows:


public static class Generator
{
  public static IEnumerable<T> For<T>( Func<T> setup, Predicate<T> test, Func<T, T> adv )
  {
    for ( T i = setup(); test( i ); i = adv( i ) )
      yield return i;
  }

  public static IEnumerable<T> For<T>( T start, Predicate<T> test, Func<T, T> adv )
  {
    return For( () => start, test, adv);
  }

  public static IEnumerable<T> YieldWhile<T>( Func<T> source, Predicate<T> condition )
  {
    var value = source();

    while ( condition( value ) )
    {

      yield return value;
      value = source();
    }
  }

  public static IEnumerable<T> WhileYield<T>( Func<bool> condition, Func<T> source )
  {
    while ( condition() )
      yield return source();
  }
}