10 Commandments for CIOs in 2009


 
As we go into 2009 and start to wonder if there is any future for technology at all, it might not be a bad idea to first look remember that technology is one of the greatest generators of productivity known to man.  In an era where leverage is limited, and therefore you can’t simply accomplish earnings via leverage, increasing productivity starts to look more attractive.

In other words, in the coming years, sadly to say, the COO is going to matter a lot more than the CFO.

So what is the role of the CIO in all this?  Well, the CIO needs to be helping the COO build a leaner, meaner company.  This presentation is the first in a series that maps out a way for CIOs to help their companies succeed in this post-financial-apocalypse world.

10 Commandments for CIOs in 2009

Consulting Firm Archetypes Continued: FEAR Consulting

Last weekend, while at the Twin Cities Code Camp, over a few drinks some fellow consultants and I were able to share some “war stories” from our consulting pasts.

Boy, and I thought I might have ever had it bad.  Some fellow Magenicons (my former employer) described to me some experiences they had at employers prior to Magenic I can only describe as working for “FEAR Consulting”.  FEAR consulting is the kind of place that would truly make Machiavelli proud.  At FEAR, you report time in six minute increments.  Yes, six minute increments.  In other words, day-to-day biology has a role in your time report (for the uninitiated, they probably have a line item for “bathroom” when you submit time).  Everyone hates being micro-managed, but at FEAR, that would be an improvement over the nano-management you are subject to there.

True story – at FEAR, when you go to a colleague to ask them a question, you have to barter some of your billable time with another consultant who answered the question.  Which is a serious problem, because at FEAR, you do not get your full salary unless you bill 45 hours per week.  Or you have to make up the deficit with part of your two weeks of PTO you are allocated for the whole year.  Which adds up fast.  And god forbid you go on the bench… muhaaaaa… no vacation for you!

Now, god forbid you want to participate in, say, a code camp or a user group.  You, proud developer, even land a speaking engagement on your own time on a Saturday (if you have not been called in to work).  FEAR will, literally, tell you that you can’t go, since you are an agent of the company, and they want to keep their “trade secrets” in house.

Yes, life sucks at FEAR.  Why does anyone stay there?

Well, FEAR understands that you can go a long way by making people exhausted.  They make you feel bad about yourself, so that you don’t ever go anywhere else.  In fact, at FEAR, they are masters of telling you are worthless, that you could never do any better.  They are the corporate version of the guy who controls you by killing your self-esteem, and while occasionally giving you a carrot, otherwise continually beats you down with a stick.  If you worked for them in 1999, you had no idea the market was good, because the message at FEAR is that you are always replaceable, and if you ever make a mistake, you are done, and nobody in their right mind will hire you.

I previously wrote about BOZO Consulting.  Someone working for FEAR would be infinitely better off by getting a job at BOZO, which while they are indifferent towards your progress, FEAR actively hinders it so you never step out of line.  BOZO says don’t make mistakes or take risks, but might tolerate it if you do anyway.  FEAR will dock your pay if QA reports a bug.

There are companies that fit the archetype, but I would never name them, because they tend to have very active legal departments who love to intimidate by threatening to sue… something that as a humble blogger for a good company, is not in my best interest.

I wish I could use this post to reach out to employees of FEAR and tell them there is a better way (and of course, in the process, recruit them), but unfortunately, at FEAR, not only can’t you read blogs at work, but you hate your job so much you probably spend your time as far from a computer as possible.

Consulting Firm Archetypes Continued: FEAR Consulting

C# 3.0 is a Dynamic Language

There, I said it.

I hope, finally, we can start to put the old Dynamic vs Static language schizm issue to bed.  C# now is a dynamic language.  Rubyists eat your heart out (and I mean that in good fun).

Lets go down the line – things that make a language dynamic:

‘Eval’, or more generally, ability to construct a data structure and compile it all at runtime.

All this is possible within the Expression namespace.  While I can plug my own MetaLinq with it’s ExpressionBuilder as a way to more intuitivley build an expression versus using the factory methods to do it, either way, it is clearly possible to have "Eval" style functionality within C#.  In C#, as you can in Lisp, you can write programs that write programs.  And with 3.0, you don’t even have to deal with Reflection.Emit to do so.

Higher Order Functions

Lambda Expressions are part of C# 3.0 – part of what makes a where clause in LINQ able to work.  The syntax is actually pretty reasonable as well.

Implicit Typing

Gone is the requirement that you put put a formal type on everything you use.  Now, it is a good idea that they keep that requirement for public references – things that will escape the comfortable confines of your assembly – but within that, being able to use the var keyword, especially as it relates to programs where the shape of the objects are expected to change a lot, is a good thing, contributing to the dynamicness of the language.

Continuations

Anyone that has implemented an enumerator in C# 2.x knows that we have had continuations (via the yield return statement) for some time, though I do not see them used a lot in practice.  Don "COM Is Love" Box has a great post on this from 2005 talking about the concept.

Introspection

Leaving aside the reflection namespace, the fact that you can cast Lambdas to expression trees in C# 3.0 – the whole basis for what makes something like i4o possible – is demonstrating that introspection is a huge part of the new innovative stuff coming out in C#-land.

Now, it does not mean C# is dynamically typed – which is different than being a dynamic language.  There is a great whitepaper from Erik Meijer and Peter Drayton about the subtle differences there.  However, what I am saying is that the lines are almost certainly blurring when talking about the differences between static and dynamic languages.

Of course, if we really wanted dynamic typing, we could have forgotten all this semicolon nonsense and just switched to VB 5 :)

C# 3.0 is a Dynamic Language

Announcing MetaLinq – Linq to Expressions

It is with great pleasure that I announce yet another flavor of LINQ – MetaLinq – the ability to query over and edit expression trees using LINQ.

Why MetaLinq?  Well, Oren Novotny, who is working on yet another project, SLINQ, aka Linq to Streams, and who also has contributed to i4o, was asking me if I knew an mechanism where one could walk an expression tree and replace certain nodes.  Being the sort of person who always like a challenge, I decided to help, creating an enumerator extension method for Expressions that allows you to easily walk the expression tree, with the plan of editing the nodes in the tree as I go along (i.e. using a LINQ query over the resulting "walk the tree" enumeration).  Then it hit me.  Expression trees are immutable.  I find this by not reading stuff online, but hacking away and finding all those darn properties of expression nodes are read only.

Damn.  How to get around this one?

Well, after reading Jomo Fisher’s blog, it became evident that if you want to get a variation of a tree, you have to copy part, or all, of the tree, and carefully replace the parts you want changed as you are doing so.  Thankfully, Jomo put a reasonable approach up on his blog, which uses a visitor pattern and does a selective replacement.  And his method works.

That said, I decided to take a different approach.  ExpressionBuilder, which is part of MetaLinq for the moment, is the result.  The ExpressionBuilder namespace allows you to create an Editable Shadow of an expression tree, modify it in place, and then by calling ToExpression on the shadow tree, generate a new, normal, immutable tree.  It has a class, EditableExpression, that has a factory method (CreateEditableExpression) that takes any expression, and returns an EditableExpression that mirrors the immutable Expression.

For example, to get the editable tree, you would do this to get an editable copy:

Expression immutable = someExpression; //you can’t change immutable directly

EditableExpression mutable = EditableExpression.CreateEditableExpression(immutable);

//..then do this to convert it back

Expression newCopy = mutable.ToExpression;

//pretend there are parens after ToExpression -  shortcoming in my blog software that does not allow me to say ToExpression with parens afterwards

In other words, you can now edit expression trees.  ExpressionBuilder is to Expressions what StringBuilder is to Strings.

I will warn you that you can easily shoot yourself in the foot with this.  As of the current version, you can easily create a cyclic graph, which, of course, will create an infinite loop when you try to convert it back into an immutable expression.  While I will be adding code to check for cycles in the future, there is no getting around that having full edit capability on the expression tree can cause subtle bugs.

The project is hosted on codeplex, and as with i4o, it is open source, and other contributors are welcome to help add to it or fix bugs.  For more information, email me at aaron.c.erickson@gmail.com .

Announcing MetaLinq – Linq to Expressions

LINQ to Objects and Bi-Directional Binding

There is a lot of great technology coming from Microsoft in this year – there is almost not enough time to take it all in.  That said, there are some areas where we can try to anticipate where some issues are going to occur, be they fast access to objects (i4o), or in this latest installment, enabling bi-directional binding to work correctly on results from LINQ to Objects operations.

You might ask, why is this an issue?  Well, as Rocky Lhotka as pointed out before, the results of a query from LINQ to Objects do not return a filtered view of the original collection, but a whole new object (something called a Sequence) that implements IEnumerable<T>, so you can iterate over it and, at least in a read-only sense, bind to it.  Now, presumably you could, in theory, add or remove to the result (though I don’t know for sure – have not tried yet)… but the problem is that even if you could, you would be adding or removing from a different collection, as well as losing anything that was implemented for you in the IEnumerable<T> you start with in the first place.  Which means that, if you are using CSLA.net, you not only will have wierd things happen on add/remove, but you will also break features that framework provides, like N-level undo.

Needless to say, this is the kind of thing that might rain on the LINQ to objects parade for certain kinds of use cases… if LINQ were not extensible that is :).  Writers of frameworks – especially the kind of frameworks that have custom collections, will want to implement IQueryable<T> on their custom collections in order to allow for bi-directional binding LINQ generated subsets.  IQueryable<T> allows you to, in its CreateQuery<TElement> method, to specify what exactly comes back from the different LINQ methods (i.e. Where, Select, GroupBy, etc.).  Of particular interest for filtering is handling your Where call such that it returns an IQueryable<T> whose concrete implementation you can control, rather than using the default that LINQ gives you.

The technique I am using to do this relies on two classes.  The first class is a simple collection class that implements ICollection<T> and IQueryable<T>, that I am calling CollectionExtendingIQueryable<T>.  The second class is designed to be a read/write view of the first one that is derived from the first one – called ViewOnCollectionExtendingIQueryable<T>.  This second class (the view) also implements IQueryable<T>.  The result of a LINQ query that projects an identity projection – that is – a projection of the whole objects of the enumerable we are enumerating – will now be typed to ViewOnCollectionExtendingIQueryable<T>, which can have all the stuff behavior the parent has, and the same data, but is assigned a different expression that IEnumerable<T>.GetEnumerator() will use when generating it’s result from the where clause.

More simply, we generate a read/write view collection that differs from the original collection only in it’s GetEnumerator implementation.  In most other respects (most importantly the underlying concrete collection) – it is the same object.  If you add or remove from the filtered version, you remove from the concrete collection, which potentially, removes it from other filtered views.

Of course, there is a lot of other work you have to do to fully implement IQueryable<T>.  I have done some of it, such as making sure non-identity projections work like normal LINQ projections – but I am sure there is other work needed to do a full implementation.  That said, the important part of this is that it proves the concept that you can have LINQ to Objects and support filter style projections, if you are willing to dive into supporting IQueryable<T>.

Source code is below, with a demo console implementation:

FilteredCollection.cs:

using System;
using System.Linq;
using System.Collections.Generic;
using System.Linq.Expressions;

namespace TestExtendingIQueryable
{
    //NOTE: This is a proof of concept – not designed to be production code

    public class RandomThing
    {
        public int SomeVal;
        public RandomThing(int x) { SomeVal = x; }
    }

   
    public class CollectionExtendingIQueryable<T> : ICollection<T>, IQueryable<T>
    {
        public CollectionExtendingIQueryable()
        { _internalList = new List<T>(); _ex = System.Linq.Expressions.Expression.Constant(this); }
       
        protected Expression _ex;
        protected List<T> _internalList;

        internal List<T> UnderlyingList { get { return _internalList; } }

        public void RemoveBottomItem()
        {
            _internalList.RemoveAt(0);
        }

        public void Add(T item)
        {
            _internalList.Add(item);
        }

        public int FilteredCount
        {
            get
            {
                int cnt = 0;
                foreach (T item in this)
                    cnt++;
                return cnt;
            }
        }

        public int UnfilteredCount
        {
            get
            {
                int cnt = 0;
                foreach (T item in _internalList)
                    cnt++;
                return cnt;
            }
        }

        #region IQueryable<T> Members

        IQueryable<TElement> IQueryable<T>.CreateQuery<TElement>(Expression expression)
        {
           
            MethodCallExpression mex = expression as MethodCallExpression;
            switch(mex.Method.Name)
            {
                case "Where":
                    return (IQueryable<TElement>) new ViewOnCollectionExtendingIQueryable<T>(expression, this);
                case "Select":
                   
                    UnaryExpression selectHolder = mex.Arguments[1] as UnaryExpression;
                    LambdaExpression theSelect = selectHolder.Operand as LambdaExpression;
                   
                    Expression<Func<T, TElement>> selectorLambda
                        = Expression.Lambda<Func<T, TElement>>(theSelect.Body,theSelect.Parameters);
                    Func<T, TElement> selector = selectorLambda.Compile();
                    return this.Select<T, TElement>(selector).AsQueryable<TElement>();
                default:
                    return null;
            }
        }

        TResult IQueryable<T>.Execute<TResult>(Expression expression)
        {
            throw new Exception("The method or operation is not implemented.");
        }

        #endregion

        #region IEnumerable<T> Members

        IEnumerator<T> IEnumerable<T>.GetEnumerator()
        {
            MethodCallExpression mex = _ex as MethodCallExpression;
            UnaryExpression whereHolder = mex.Arguments[1] as UnaryExpression;
            LambdaExpression theWhere = whereHolder.Operand as LambdaExpression;
            Expression<Func<T, bool>> theParmedWhere
                = Expression.Lambda<Func<T, bool>>(theWhere.Body, theWhere.Parameters);
            Func<T, bool> filter = theParmedWhere.Compile();
            //if we had indexes in this collection, they would be used here
            foreach (T item in _internalList)
                if (filter(item))
                    yield return item;
        }

        #endregion

        #region IEnumerable Members

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            MethodCallExpression mex = _ex as MethodCallExpression;
            UnaryExpression whereHolder = mex.Arguments[1] as UnaryExpression;
            LambdaExpression theWhere = whereHolder.Operand as LambdaExpression;
            Expression<Func<T, bool>> theParmedWhere = Expression.Lambda<Func<T, bool>>(theWhere.Body, theWhere.Parameters);
            Func<T, bool> filter = theParmedWhere.Compile();
            foreach (T item in this)
                if (filter(item))
                    yield return item;
        }

        #endregion

        #region IQueryable Members

        IQueryable IQueryable.CreateQuery(Expression expression)
        {
            _ex = expression;
            return this;
        }

        Type IQueryable.ElementType
        {
            get { return typeof(T); }
        }

        object IQueryable.Execute(Expression expression)
        {
            throw new Exception("The method or operation is not implemented.");
        }

        Expression IQueryable.Expression
        {
            get { return _ex; }
        }

        #endregion

        #region ICollection<T> Members

        void ICollection<T>.Add(T item)
        {
            _internalList.Add(item);
        }

        void ICollection<T>.Clear()
        {
            _internalList.Clear();
        }

        bool ICollection<T>.Contains(T item)
        {
            return _internalList.Contains(item);
        }

        void ICollection<T>.CopyTo(T[] array, int arrayIndex)
        {
            _internalList.CopyTo(array,arrayIndex);
        }

        int ICollection<T>.Count
        {
            get { return _internalList.Count; }
        }

        bool ICollection<T>.IsReadOnly
        {
            get { return false; }
        }

        bool ICollection<T>.Remove(T item)
        {
            return(_internalList.Remove(item));
        }

        #endregion
    }

    public class ViewOnCollectionExtendingIQueryable<T> : CollectionExtendingIQueryable<T>, IQueryable<T>
    {
        protected Expression _specificEx;
       
        public ViewOnCollectionExtendingIQueryable(Expression ex, CollectionExtendingIQueryable<T> baseCollection)
        {
            _internalList = baseCollection.UnderlyingList;
            _specificEx = ex;
        }

        IEnumerator<T> IEnumerable<T>.GetEnumerator()
        {
            MethodCallExpression mex = _specificEx as MethodCallExpression;
            UnaryExpression whereHolder = mex.Arguments[1] as UnaryExpression;
            LambdaExpression theWhere = whereHolder.Operand as LambdaExpression;
            Expression<Func<T, bool>> theParmedWhere
                = Expression.Lambda<Func<T, bool>>(theWhere.Body, theWhere.Parameters);
            Func<T, bool> filter = theParmedWhere.Compile();
            //if we had indexes in this collection, they would be used here
            foreach (T item in _internalList)
                if (filter(item))
                    yield return item;
        }

        #region IQueryable<T> Members

        IQueryable<TElement> IQueryable<T>.CreateQuery<TElement>(Expression expression)
        {

            MethodCallExpression mex = expression as MethodCallExpression;
            switch (mex.Method.Name)
            {
                case "Where":
                    _specificEx = expression;
                    return (IQueryable<TElement>) this;
                case "Select":

                    UnaryExpression selectHolder = mex.Arguments[1] as UnaryExpression;
                    LambdaExpression theSelect = selectHolder.Operand as LambdaExpression;

                    Expression<Func<T, TElement>> selectorLambda
                        = Expression.Lambda<Func<T, TElement>>(theSelect.Body, theSelect.Parameters);
                    Func<T, TElement> selector = selectorLambda.Compile();
                    return this.Select<T, TElement>(selector).AsQueryable<TElement>();
                default:
                    return null;
            }
        }

        TResult IQueryable<T>.Execute<TResult>(Expression expression)
        {
            throw new Exception("The method or operation is not implemented.");
        }

        #endregion

        #region IEnumerable Members

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            throw new Exception("The method or operation is not implemented.");
        }

        #endregion

        #region IQueryable Members

        IQueryable IQueryable.CreateQuery(Expression expression)
        {
            throw new Exception("The method or operation is not implemented.");
        }

        Type IQueryable.ElementType
        {
            get { return typeof(T); }
        }

        object IQueryable.Execute(Expression expression)
        {
            throw new Exception("The method or operation is not implemented.");
        }

        Expression IQueryable.Expression
        {
            get { return _specificEx; }
        }

        #endregion
    }
}

Program.cs:

using System;
using System.Linq;
using System.Collections.Generic;

namespace TestExtendingIQueryable
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Demonstration of using LINQ to generate filtered results, using a");
            Console.WriteLine("collection class specifically designed to filter rather than project");
            Console.WriteLine("by default.");
            Console.WriteLine("");
            Console.WriteLine("We will generate a collection with 100 random numbers, between 1 and 300");
            Console.WriteLine("We will then generate two views on the numbers, one for those below 100,");
            Console.WriteLine("the other for those below 200.  Removing the bottom most item, which is");
            Console.WriteLine("the only item that wont be random (fixed at 42, to fit in both ranges)");
            Console.WriteLine("will affect the count of all three collections (original, filteredview,");
            Console.WriteLine("and different filtered view.");
            Console.WriteLine("");
            Console.WriteLine("Lastly, we will do a typical projection, which will demonstrate that");
            Console.WriteLine("the filtering logic gets out of the way when you select something that");
            Console.WriteLine("is not amenable to filtering.");
           
            CollectionExtendingIQueryable<RandomThing> random = new CollectionExtendingIQueryable<RandomThing>();
            Random rnd = new Random();
            random.Add(new RandomThing(42)); //first one has to be under 100 to run our removal tests correctly
            for (int i = 0; i < 99; i++)
                random.Add(new RandomThing(rnd.Next(300)));
                       
            var filteredResult = from r in random
                                where r.SomeVal < 100
                                select r;

            var differentFilteredResult = from r in random
                                          where r.SomeVal < 200
                                          select r;

            Console.WriteLine("Filtered results (random numbers under 100)");
            foreach (var x in filteredResult)
                Console.Write(x.SomeVal + ",");
            Console.WriteLine("");
            Console.WriteLine("———————————-");
            Console.WriteLine("Filtered result Count = " + ((CollectionExtendingIQueryable<RandomThing>)filteredResult).FilteredCount);
            Console.WriteLine("Different filtered result Count = " + ((CollectionExtendingIQueryable<RandomThing>)differentFilteredResult).FilteredCount);
            Console.WriteLine("Now we are going to remove the bottom item from the first filtered result, which should reduce the count of all filtered results by one");
            Console.WriteLine("Press any key to continue…");

            Console.ReadKey();

            Console.WriteLine("Count of original collection = " + random.UnfilteredCount);
            ((CollectionExtendingIQueryable<RandomThing>)filteredResult).RemoveBottomItem();
            Console.WriteLine("Count of original collection after removal from filtered list = " + random.UnfilteredCount);
            Console.WriteLine("Count in the filtered list = " + ((CollectionExtendingIQueryable<RandomThing>)filteredResult).FilteredCount);
           
            Console.WriteLine("Count in the different filtered result = " + ((CollectionExtendingIQueryable<RandomThing>)differentFilteredResult).FilteredCount);
            Console.WriteLine("Press any key to test projection…");
            Console.ReadKey();

            var projectedResult = from r in random
                                  where r.SomeVal < 100
                                  select r.SomeVal;
            foreach (var x in projectedResult)
                Console.Write(x + ",");
            Console.WriteLine("");
            Console.WriteLine("———————");
            Console.WriteLine("Press any key to exit the demo…");
            Console.ReadKey();

        }
    }
}

 

LINQ to Objects and Bi-Directional Binding

Introducing i4o – indexes for objects.

Introducing i4o (indexes for objects) – the first easy to use mechanism in the .NET framework that allows you to declaratively create indexes on your objects – indexes that LINQ can use to make your LINQ query operations up to 1000 times faster than without indexing.  i4o makes the idea of "indexed LINQ" an easy to implement reality, not just theory.  I first wrote about Indexed LINQ back in January.  I have taken the idea much, much farther since.

First of all, lets go back to understanding what problem this particular technology solves.  Put simply, you need indexes for objects for the same reason you need indexes on databases – it makes accessing any particular piece of data faster.  The idea has been around ever since someone got a bright idea to put an index in the back of a book – that is, sequential search is always slower than associative search.  In other words, if we know a little something about what we are looking for, we don’t have to look through the whole damn pile to find it. 

Relational Databases, since they tend to be stored on disk, have been using indexing for almost their entire history in order for them to have any sort of reasonable performance.  If you have ever done any kind of performance optimization on a program, one of the first things you always look at are whether the database it uses is properly indexed – especially if it has queries or stored procedures that take a long time to execute.  A well designed index inserted into a database can, and very frequently does, create an order of magnitude performance improvement.

Enter LINQ.  We now have a tool that allows us to do relational style queries over objects in memory.  And this is great – since we can compress an idea in LINQ much more concisely that we could without LINQ (i.e. select * from collection where someproperty=somevalue, is simpler than foreach something in somecollection, if something.someproperty=somevalue, yield return something).  Simply, LINQ allows us to use set syntax, right in the language.  This is a good thing – which should greatly reduce the amount of code we need to write for a lot of different kinds of problems.

The problem is that all this LINQ code we have now, will increase the amount of set operations programmers will do.  Set operations – such as joins – are great, because they allow for more efficient expression.  Unfortunately, they also will create a huge performance nightmare if there is no indexing mechanism.  To get a vivid demonstration of this problem, and it’s solution, create two tables with 1M random records each, then try to do an inner join based on an ID or string field in either table.  Then do the same thing with an index.

There are others at Microsoft who have started to think about this problem (http://blogs.msdn.com/kfarmer/archive/2006/03/15/552615.aspx, http://weblogs.asp.net/brianbec/archive/2006/03/15/440293.aspx).  I think those guys are moving in the right direction… but ultimatley, people will need to be able to put indexes on fields for their objects in the same way that they do with a database – declare which field is going to be indexed, and have the where or join clauses automatically figure out how to use the index.  That is where i4o comes in.

After you reference i4o, there are only 2 things you need to do in order to be using indexed LINQ.  Step 1 – put the [Indexable()] attribute on your property (of a class) you want to index.  Step 2 – when you want to use a collection of that class, use IndexableCollection<T> where T is the class with Indexable() on one or more properties.  There is no step 3 :).  If you use LINQ on the indexable collection now, it will return results based on any present indexes, rather than interating through the whole set, if your query is based on an equality test for one of the indexable fields.

So, how does it work?  The key to indexable collection is that it is aware of the Indexable attribute on the class it is a collection of.  In it’s constructor, it creates blank indexes for each property that has the indexable attribute.  In its add (and eventually it’s remove), it adds the item to each applicable index.  Normal indexing rules apply – whereas just like in a database, the more heavily you index something, you incur more overhead on the add, but save the time later on the search.

The real fun comes in when it is time to actually use the index.  To do that, we use Extension methods over the Where and Join methods for IEnumerable.  By evaluating the expression tree sent to either method, we can determine if the left hand side of the delegate sent to us is an indexable property.  If so, we use the index rather than searching through the colleciton sequentially for the item.  We then evaluate the right side of the expression tree, get it’s hash code, and use that hash code in our index lookup to find our items that might match the index.  We then look at all the items where the hashcode matches – and run the standard packaged operator (be it where or join) on the result.

The first version of i4o is available now.  It supports the where and join clauses (group join is implemented, but not tested), and has a sample that comes along with the default installation that demonstrates the performance increase you get just from using it with a where clause (i.e. Finding a common name thorugh 1M records in <10 ms with an IndexableCollection<T> versus over >100 ms using just Collection<T>).  The where clause sample demonstrates the performance increase in a simple case – when you start doing joins with this (coming soon … scans N items where N is the size of the collection, join scans the product of the lengths in either collection) – the theoretical performance improvement goes up by yet another order of magnitude.

Here is a screenshot of the tests from the demo code in action:

If you have any questions, comments, or concerns … or would like to contribute to the project, please email me at aaron.c.erickson@gmail.com .

kick it on DotNetKicks.com

Introducing i4o – indexes for objects.

LINQ Presentation at CNUG

It’s official – come learn about LINQ at the Chicago .NET User Group – I will be doing a presentation where we visually demonstrate how LINQ can allow you to not only query against databases, but query against any arbritrary object that supports IEnumerable.

Even more fun, I get to open for Microsoft’s General Manager of the .NET framework – who will be giving a “State of .NET” speech after my LINQ presentation.

The announcement is here.  Please RSVP, as it will likely be a highly subscribed event (i.e. for the “State of .NET”, certainly not me – lol).

LINQ Presentation at CNUG

Generic Collections, LINQ, and the Indicies

Here is the situation.  You have a generic collection class that inherits from Collection, or some other base that allows the class to be a generic collection.  Furthermore, you are doing the equivalent of that in LINQ, with the idea that you intend to be able to quickly and efficiently do object based queries.

Of course, we know that in order to do fast and efficient queries over very large sets of data, you typically need to have an index on the field you intend to put in the “where clause” of the query.  Without an index, such an operation becomes a O(N) operation, whereas with an index, it is an O(C) operation.  To put it another way, there are very good reasons why people put indexes on database tables, and those reasons do not change just because you are dealing with objects in memory.

So… the question is, how will LINQ deal with the need for indicies?

To speculate about this, lets first discuss how we would do this outside of the LINQ world.  Lets say I have a class called Student, and a related collection class called Students that inherits from Collection.  Now, I want to be able to quickly locate a student by Social Security number.  To do something like that, and make it efficient, I would implement an internal list in the form of a private member variable called _studentIndex that is of type Dictionary.

To make the index populate seamlessly, I would override Add as follows:

new public void Add(Student student)

{

  base.Add(student);

  if (!_studentIndex.ContainsKey(student.SSN))

    _studentIndex.Add(student.SSN,student);

}

In the Item and Remove overrides, I would do a similar type of operation (lets assume that SSN is unique for purposes here).

Then, to retrieve a Student by SSN quickly, it becomes fairly trivial:

public Student FindByStudent(string SSN)

{

  if (_studentIndex.ContainsKey(SSN))

    return(_studentIndex[SSN]);

  else

    return null;

}

Now, fast forward to LINQ.  Say, you do the equivalent operation when you have where Student.SSN == “000-11-2222′.  How will LINQ know to put an index on SSN.  While you can designate a primary key on the object, there is a good chance that SSN might not be the primary key (i.e. especially if you are in a case, like most universities are, where usage of SSN as a primary key through the system is considered a very bad idea).  In order to do efficient queries LINQ will need some mechanism to designate an index.

I would love for someone more experienced at LINQ than I am to tell me how indicies are supposed to happen.  If they are not there, well, who knows, perhaps we have a nice idea for future enhancement (i.e. perhaps there is a way to put an attribute on a property that would designate it as an index).

Update: A little further research – some people from Microsoft have already been playing with the idea:

http://weblogs.asp.net/brianbec/archive/2006/03/15/440293.aspx

http://blogs.msdn.com/kfarmer/archive/2006/03/15/552615.aspx

It looks like… technically, you can do it, but setting it up is far from trivial (i.e. like in SQL server, where you simply create an index and the query knows to use it if available).

Generic Collections, LINQ, and the Indicies

More on Technical Mortgages and Sexy User Interfaces

Imagine – you are the CIO involved in a merger.  Your company has applications, and the company you are merging with has a similar set of applications.  And the due diligence team from Big 4 Corp that the Big Investment Banking Sachs hired to decide which IT functions are going to survive is showing up in 3 weeks.  What is more important?

a.) The internals of the application is perfect, the UI is so-so (and who cares right, the UI is just a css file), and the data model is something oriented towards your object model (i.e. something that db4o generated, or something that is oriented towards use with NHibernate).  Or better yet, you use some fancy OODBMS that, while nobody has really heard of it, is clearly technically superior (har, har, har)

b.) The internals of the application consist of typed data sets being managed by a static class with accessors (ugly, hackish, but servicable) – but the UI is fabulously designed and intuitive… almost sexy.  And while the “procedures“ in the one static class have bad structure, they do, I don’t know, some calculation or other really useful trick that is very meaningful from a business perspective.  The DB schema is easy to generate reports from (i.e. you don’t have to write something in SQL reporting services that links with objects and custom assemblies – not that doing something like that is hard, but for non-programmers.. might be too much to ask).

I would guess “B“.  “A” might matter if we thought Big 4 IT due diligence teams were really good at knowing modular programs from hackish ones.  But for the most part, and I may be going out on a limb here, but the IT due diligence teams that investment banks who do M&A work hire tend to be people like the “contract IT auditor“ for a Big 4 company – being nice – not the kind of people that would know a well designed object model from their elbow.  Often – the biggest hurdle will be that you pass some checklist, which will usually say “uses big database“, and in the best case, will say “no SQL injection vulnerabilities“.  Having worked for that kind of company in the past, I can say that detailed code review is not their strong suit.  And even if it were, the window of time required for due diligence in most big M&A transactions is way too short for that kind of review to take place.

What I am not doing is advocating the idea that quality does not matter.  It does.  A well designed set of application internals will allow you to make quick changes once you consummate the transaction.  But – if you have chosen well designed internals over polished UI and distinguishing features that set you apart, you lose, since you will be discarded before your “great modular opus“ gets to see the light of day.

So – to the thesis – if forced to choose between the sexy UI and the well designed internals (i.e. well designed, as in elegantly designed object model that, while taking longer to write, pays off in the long term maintenance) … choose the sexy UI.  By taking out the technical mortgage, you gain leverage needed to make sure you can outlast your competitor in the initial decision phase, after which, you will have more than enough time to pay off the mortgage afterward.

More on Technical Mortgages and Sexy User Interfaces

It’s all the Programmers Fault! A Post about Bad User Interfaces and How to Avoid Them.

Programmers to Blame for Hard To Use Software

In one of the least informed articles of our young year, the author of the linked article asserts that programmers are to blame for software that is “hard to use”.  It starts out finding a problem with the following:

“One of his peeves is when a text-editing program like Microsoft Word asks users if they want to save their work before they close their document.

That question makes little sense to computer novices accustomed to working with typewriters or pen and paper, he said. For them, a clearer question would be: "Throw away everything you’ve just done?"“

Lets see.  First of all, how many computer novices are accustomed to working with typewriters these days, or for that matter, pen and paper, for composing documents?  When is the last time you have ever even seen a typewriter.  And even if the message is bad, and should be replaced with “Throw away everything you’ve just done?”, I can assure you that the feature probably originated from a user who, when using MS Word 0.9 Beta, closed a word document that had 20 pages of unsaved work, lost it, and in an irate voice, yelled “give me a @#$(@ prompt before I lose all my work, you ignorant programmer!”. 

As an aside, even the suggested rephrasing of the message – “Throw away everything you’ve just done” – is wrong, because if you have been using more than one app, closing the app would NOT throw away everything you have just done – only that which you have done with that particular app.

The truth of the matter is this.  Most user interface of most programs is designed by the project sponsor, or other non-programmer, not from the person who actually programs them.  How many times as a programmer been given free rein to just throw a UI together, and not have it vetted by the program sponsor, the power user who will do UAT, or some person in the chain of command.  Maybe I’ve been living on planet X, but in every piece of software I have ever written the UI is where the most user feedback got incorporated, given it is the most obvious visible manifestation of the software.  More often than not – too much dickering occurs over the UI from the chain of command, since many of the issues turn out to be preference or fashion issues, not true usability ones.

The problem – is that most users are not experts in what makes a good UI.  There are ways to use fonts, colors, and graphics, to increase program usability.  There are ways to manage how tabbing and shortcuts work to enable heads down data entry people to use your application easily.  Magenic, recognizing this, employs consultants who specialize in taking ugly sponsor designed user interfaces and turning them into usable and intuitive user interfaces.  If more companies followed the lead of the companies that “get it”, and spend the extra money to hire usability experts – perhaps we would not see the aforementioned article once every 6 months.  That money has more ROI than almost anything else you can do on the project – as it assures that users will find the application easy enough to use – ideally – so easy that no training is required.  Money spent on the UI expert is a one time cost, money spent on trainers has to be re-spent every time you have new users on the application. 

If you are having the above problem with users complaining about hard to use software… put your money where your mouth is, and hire a UI expert, and let them do the UI design.  The UI expert can get what is important to the user, but then translate that into a intuitive and eminently usable UI.  And stop blaming the programmers for stuff they didn’t do, nor have any real control over.

It’s all the Programmers Fault! A Post about Bad User Interfaces and How to Avoid Them.