Generic Collections, LINQ, and the Indicies

Here is the situation.  You have a generic collection class that inherits from Collection, or some other base that allows the class to be a generic collection.  Furthermore, you are doing the equivalent of that in LINQ, with the idea that you intend to be able to quickly and efficiently do object based queries.

Of course, we know that in order to do fast and efficient queries over very large sets of data, you typically need to have an index on the field you intend to put in the “where clause” of the query.  Without an index, such an operation becomes a O(N) operation, whereas with an index, it is an O(C) operation.  To put it another way, there are very good reasons why people put indexes on database tables, and those reasons do not change just because you are dealing with objects in memory.

So… the question is, how will LINQ deal with the need for indicies?

To speculate about this, lets first discuss how we would do this outside of the LINQ world.  Lets say I have a class called Student, and a related collection class called Students that inherits from Collection.  Now, I want to be able to quickly locate a student by Social Security number.  To do something like that, and make it efficient, I would implement an internal list in the form of a private member variable called _studentIndex that is of type Dictionary.

To make the index populate seamlessly, I would override Add as follows:

new public void Add(Student student)



  if (!_studentIndex.ContainsKey(student.SSN))



In the Item and Remove overrides, I would do a similar type of operation (lets assume that SSN is unique for purposes here).

Then, to retrieve a Student by SSN quickly, it becomes fairly trivial:

public Student FindByStudent(string SSN)


  if (_studentIndex.ContainsKey(SSN))



    return null;


Now, fast forward to LINQ.  Say, you do the equivalent operation when you have where Student.SSN == “000-11-2222′.  How will LINQ know to put an index on SSN.  While you can designate a primary key on the object, there is a good chance that SSN might not be the primary key (i.e. especially if you are in a case, like most universities are, where usage of SSN as a primary key through the system is considered a very bad idea).  In order to do efficient queries LINQ will need some mechanism to designate an index.

I would love for someone more experienced at LINQ than I am to tell me how indicies are supposed to happen.  If they are not there, well, who knows, perhaps we have a nice idea for future enhancement (i.e. perhaps there is a way to put an attribute on a property that would designate it as an index).

Update: A little further research – some people from Microsoft have already been playing with the idea:

It looks like… technically, you can do it, but setting it up is far from trivial (i.e. like in SQL server, where you simply create an index and the query knows to use it if available).

Generic Collections, LINQ, and the Indicies

More on Technical Mortgages and Sexy User Interfaces

Imagine – you are the CIO involved in a merger.  Your company has applications, and the company you are merging with has a similar set of applications.  And the due diligence team from Big 4 Corp that the Big Investment Banking Sachs hired to decide which IT functions are going to survive is showing up in 3 weeks.  What is more important?

a.) The internals of the application is perfect, the UI is so-so (and who cares right, the UI is just a css file), and the data model is something oriented towards your object model (i.e. something that db4o generated, or something that is oriented towards use with NHibernate).  Or better yet, you use some fancy OODBMS that, while nobody has really heard of it, is clearly technically superior (har, har, har)

b.) The internals of the application consist of typed data sets being managed by a static class with accessors (ugly, hackish, but servicable) – but the UI is fabulously designed and intuitive… almost sexy.  And while the “procedures“ in the one static class have bad structure, they do, I don’t know, some calculation or other really useful trick that is very meaningful from a business perspective.  The DB schema is easy to generate reports from (i.e. you don’t have to write something in SQL reporting services that links with objects and custom assemblies – not that doing something like that is hard, but for non-programmers.. might be too much to ask).

I would guess “B“.  “A” might matter if we thought Big 4 IT due diligence teams were really good at knowing modular programs from hackish ones.  But for the most part, and I may be going out on a limb here, but the IT due diligence teams that investment banks who do M&A work hire tend to be people like the “contract IT auditor“ for a Big 4 company – being nice – not the kind of people that would know a well designed object model from their elbow.  Often – the biggest hurdle will be that you pass some checklist, which will usually say “uses big database“, and in the best case, will say “no SQL injection vulnerabilities“.  Having worked for that kind of company in the past, I can say that detailed code review is not their strong suit.  And even if it were, the window of time required for due diligence in most big M&A transactions is way too short for that kind of review to take place.

What I am not doing is advocating the idea that quality does not matter.  It does.  A well designed set of application internals will allow you to make quick changes once you consummate the transaction.  But – if you have chosen well designed internals over polished UI and distinguishing features that set you apart, you lose, since you will be discarded before your “great modular opus“ gets to see the light of day.

So – to the thesis – if forced to choose between the sexy UI and the well designed internals (i.e. well designed, as in elegantly designed object model that, while taking longer to write, pays off in the long term maintenance) … choose the sexy UI.  By taking out the technical mortgage, you gain leverage needed to make sure you can outlast your competitor in the initial decision phase, after which, you will have more than enough time to pay off the mortgage afterward.

More on Technical Mortgages and Sexy User Interfaces

It’s all the Programmers Fault! A Post about Bad User Interfaces and How to Avoid Them.

Programmers to Blame for Hard To Use Software

In one of the least informed articles of our young year, the author of the linked article asserts that programmers are to blame for software that is “hard to use”.  It starts out finding a problem with the following:

“One of his peeves is when a text-editing program like Microsoft Word asks users if they want to save their work before they close their document.

That question makes little sense to computer novices accustomed to working with typewriters or pen and paper, he said. For them, a clearer question would be: "Throw away everything you’ve just done?"“

Lets see.  First of all, how many computer novices are accustomed to working with typewriters these days, or for that matter, pen and paper, for composing documents?  When is the last time you have ever even seen a typewriter.  And even if the message is bad, and should be replaced with “Throw away everything you’ve just done?”, I can assure you that the feature probably originated from a user who, when using MS Word 0.9 Beta, closed a word document that had 20 pages of unsaved work, lost it, and in an irate voice, yelled “give me a @#$(@ prompt before I lose all my work, you ignorant programmer!”. 

As an aside, even the suggested rephrasing of the message – “Throw away everything you’ve just done” – is wrong, because if you have been using more than one app, closing the app would NOT throw away everything you have just done – only that which you have done with that particular app.

The truth of the matter is this.  Most user interface of most programs is designed by the project sponsor, or other non-programmer, not from the person who actually programs them.  How many times as a programmer been given free rein to just throw a UI together, and not have it vetted by the program sponsor, the power user who will do UAT, or some person in the chain of command.  Maybe I’ve been living on planet X, but in every piece of software I have ever written the UI is where the most user feedback got incorporated, given it is the most obvious visible manifestation of the software.  More often than not – too much dickering occurs over the UI from the chain of command, since many of the issues turn out to be preference or fashion issues, not true usability ones.

The problem – is that most users are not experts in what makes a good UI.  There are ways to use fonts, colors, and graphics, to increase program usability.  There are ways to manage how tabbing and shortcuts work to enable heads down data entry people to use your application easily.  Magenic, recognizing this, employs consultants who specialize in taking ugly sponsor designed user interfaces and turning them into usable and intuitive user interfaces.  If more companies followed the lead of the companies that “get it”, and spend the extra money to hire usability experts – perhaps we would not see the aforementioned article once every 6 months.  That money has more ROI than almost anything else you can do on the project – as it assures that users will find the application easy enough to use – ideally – so easy that no training is required.  Money spent on the UI expert is a one time cost, money spent on trainers has to be re-spent every time you have new users on the application. 

If you are having the above problem with users complaining about hard to use software… put your money where your mouth is, and hire a UI expert, and let them do the UI design.  The UI expert can get what is important to the user, but then translate that into a intuitive and eminently usable UI.  And stop blaming the programmers for stuff they didn’t do, nor have any real control over.

It’s all the Programmers Fault! A Post about Bad User Interfaces and How to Avoid Them.