Cycling in the forecast

As much as I miss cycling, and I really do miss cycling, I am touched by the number of my non-cycling friends who are asking whether I have started riding this year. I too have been thinking about this as I see other (braver) cyclists on the road. I need to give my bike a mechanical check, and finally learn how to clean/lube my chain, and then I, too, can think about riding.

The fact is, as some of my previous posts intimate, I am a cautious rider and so I have not started riding because of ice, sand/debris on roads, and winter potholes.

The ice has melted, mostly a week or two ago. The potholes are really a perennial hazard ... the only thing is to get familiar with the locations of the new crop. The temperatures and the (de-icer) sand and winter debris, which can be quite hazardous to cyclists, are the two final external impediments to me riding. The weather forecast is for more spring-like temperatures this week, and this morning I heard and then saw the first street-sweeper of spring. Not quite as inspiring for many as the first cuckoo in spring, but thrilling for me nonetheless.

Some of you know we have a friend in the ICU right now, and I have been preoccupied with his critical illness. There is little I can do for him right now, other than thoughts and prayers, and support his family where needed. However, I can look after myself and my family in fresh recognition of the frailty of our lives, and a good bike ride will help me, at least, burn off some worry and breathe in some optimism (it's not called inspiration for nothing).

The origin of Scruffies

The one thing I could not find when I previously wrote about scruffies and cleanies, was any reference to these terms on the internet. Thanks to David Karger I now know this is because the original distinction was made between Scruffies vs Neats. Wikipedia tells the story of these divergent schools of thought in the Artificial Intelligence research community. In fact many people note that the semantic web is "just the latest" approach to the AI problems that have stumped researchers for decades. I am sure Zak Kohane was well aware of all this when he presented, and of the rich irony of extending the term into the semantic web world.

The Wikipedia article notes some famed AI scruffies and neats (as they are known in that field). I recognise just a couple of the names, and note that Rodney Brooks at MIT is a scruffy... and his work has given America its most popular robot: the Roomba vacuum cleaner.

One point for the scruffies I would say!

The ultimate question revisited

This is another way of thinking about the world (apropos an earlier post featuring the cartoon below).

Proud to be a Scruffy

Zak Kohane was at C-SHALS last week. Zak is unbelievably impressive in the contributions he makes as a physician, researcher, teacher and leader in the healthcare community. As well as his MD, he also has a PhD in Computer Science from MIT. Zak has been a leader in the PHR world including PCHRI conferences I have attended the last couple of years.

Zak presented his own experiences using and promoting semantic web technologies in getting things done. He asserted that there are two kinds of engineers in the world: scruffies and cleanies.

I had previously heard that the two kinds of engineers are starters and finishers. Some start projects, do amazing imaginative work, and leave the project when it is 90% done, too bored to finish it off and make it usable by real humans. Finishers on the other hand know how to make sure the software is of high-quality, easy to integrate, easy to maintain and easy to use. Good software teams need the right balance of starters and finishers.

However, Zak's classification is different (starters and finishers can be either cleanies or scruffies). A cleanie is one who sees progress proceed in stepwise fashion where each step needs to be well thought through and in some way complete before proceeding to the next step. A cleanie sees that overall success in a field may take a while, but finds satisfaction in the beauty of completing each step even though that work is only a building block and not useful on its own. A scruffy sees the world as a random walk through the problems of the world, providing just enough of an answer to get some basic results and building on that. A scruffy knows the work is not complete, and knows s/he can return to it as needed. A scruffy knows some of the work will be useless because it is scruffy, but will be progressing fast enough that some number of failures are not too discouraging.

I used to inhabit a techie world of databases where, in the 1980s and 1990s, large corporations tried to combine all their data into a "data warehouse". This involved getting everyone to agree on how the data is used: do we think about customers at the level of the single company who buy a product or the various locations that use it; do we think of a product as a particular formulation or the various packages in which we sell it. These are important questions and big political fights would ensue. This was a project for cleanies ... those who understood that collating a unified view of the data in the enterprise was a necessary step to seeing how all the activities worked together and could be improved. By combining the sales, delivery and customer service records for a customer, the company could be much smarter about building a strong relationship... but it required the cleanies to be able to combine all that data first. They are still at it, and will never finish (by the time they do, someone has changed a key piece of data elsewhere which needs to be examined).

Meanwhile, the scruffies decided to get on with things and created "data marts" (notice the plural). These are subsets of the company data, designed to help out with a particular task, and so make it much easier to agree on data combinations. The scruffies got two or three departments in a room for a couple of hours, showed them the benefits of agreeing on how to combine these few items of data for this particular purpose, and got on with it. There are now multiple data marts in every company, many overlapping, and the cleanies scoff that the scruffies are making the problem worse ... but the cleanies are never done, and the scruffies have already produced a decade of good results.

This story is a narrative I could not have told before Zak presented his classification. As he spoke, my understanding of my time working in data management fell into focus. I realized that I am a "recovering cleanie" turned scruffy. My undergrad degree is in Mathematics, which is the ultimate cleanie project (especially for those in the pure math world - I suppose applied math is, by nature, scruffy). I have therefore always loved the idea behind cleanie projects, but have come to see over the years that they generally do not and cannot succeed. There are exceptions, I am sure, but I would like you to name one.

Venture capital is a scruffy industry. Startups are scruffy. They cannot afford to build a perfect marketing campaign before executing it and recognise they could never agree on that perfection. I have learned that those projects which appeal to my cleanie nature do not gain traction in the investor world or the customer world. People have objectives to meet to earn their annual bonus (or keep their job) and they don't have time to wait for the cleanie approach to work - so they choose the scruffy approach which mostly does the job.

Non-profits are scruffy as well. Non-profits can be nothing else ... they respond to failures in society, government, civilization. Advocacy groups might possibly be cleanie - they want to fix the underlying issues so that no-one is poor, sick, uneducated, subject to prejudice, harrassment, abuses. Religious groups are cleanie - they have the ultimate long-term cleanie agenda to fix the world, one soul at a time. Most other successful non-profits, those providing services, educating, urging and organizing action are scruffy: they are working with too few resources, happiest when doing something, even if the methodology or the philosophy is incomplete, inconsistent. Most non-profits are startups and most stay small. Some grow big and grow to have cleanie ambitions and some succeed (perhaps more than in industry). The Bill and Melinda Gates Foundation is cleanie with some scruffy undertones - they want to find a vaccine for malaria, for AIDS, for TB (cleanie projects all) - but they also want to get a bed-net into every home at risk from malaria and to provide condoms to all who need them (scruffy approaches to existing problems). Even one of their goals "to develop health solutions that are effective, affordable, and practical" has a scruffy ring to it.

So here I am, a Venture Capitalist, proud to be a scruffy, in a scruffy industry, funding scruffy companies, producing scruffy products, that customers buy because stuff gets done. I am also a Venture Cyclist, a board member of two non-profits: JCDS, a school with cleanie ambitions (as private schools often must, to meet the high expectations of yuppie parents), and Hazon, a scruffy organization hoping to change the world one bike ride and one meal at a time.

There are certain projects that require a cleanie approach: making sure teachers in a school have no criminal record - this should be a cleanie project - but even this simple and obvious need is confounded in a world of scruffy databases and scruffy (or expedient) laws.

My experience is that scruffy trumps cleanie... and we should learn to live with it, and even love it.


I attended C-SHALS 2008 at the end of last week... C-SHALS: The conference on the semantic web in healthcare and life sciences. It was a great event, and covered a lot of ground.

The semantic web is a concept that encompasses the idea that computers should be able to read and have some "understanding" of the contents of web pages, so that software can do useful things with the information, rather than needing human interpretation.

By that definition, one might think that Google (and the other search engines) provide some functionality of the semantic web. After all, Google can find web sites based on some keyword input, and if you put the tilde sign (~) before a word in a Google query, it will return pages using all related words. For example, search for "venture capital ~healthcare" and you will get a wider set of results than a search for "venture capital healthcare". Google uses some codification of related words based on syntax (looking for plurals or different forms of the same word) and a good thesaurus to find related words. However, this first level of understanding what is on a web page does not rise to the level required by those active in promulgating the semantic web, most especially the w3c (World Wide Web consortium at MIT).

The Semantic Web is about computer software being able to piece together meaning by combining data from different websites in useful ways. An example is providing data labelled in a way it can be combined with other data from elsewhere. This could be as simple as knowing that the "address" on one website is similar to the "location" on another website. By using labels from agreed vocabulary lists (often called taxonomies or ontologies) you can tell the computer exactly what your data is describing. For example you might have an agreed vocabulary related to government identification documents, and one term is "Passport ID", which is agreed to mean the unique ID number on a passport issued by a sovereign government recognised by the UN. Hence various human readable tables of passport numbers labelled "Passport num", "Passport No." etc could all have a notation which refers to the agreed vocabulary list and the particular term, and thus the data could be combined without worry that there is some mismatch.

C-SHALS looked at combining data from life science research and point-of-care settings, combining data from genetic studies across species, sharing models of knowledge from universities to government regulators, and more. We were reminded, more than once, that the semantic web is the latest attempt to solve problems of making knowledge available to computer programs which used to be called Artificial Intelligence (AI) which is also a separate, if related, field. The business community never saw a great deal of use for AI, although venture capitalists lost a bunch of money proving that. Subsequently there was a wave of Knowledge Management (KM) work, some of which bore fruit, but this was about sharing human knowledge and in many ways the world wide web with all its public and private web sites has become the best success of the KM effort. Now, the KM folks are all working on semantic web.

The reason that C-SHALS was such a success is due in large part to the two co-chairs of the W3C Semantic Web Health Care and Life Sciences Interest Group and the work they do everyday: Tonya Hongsermeier and Eric Neumann. Dr. Tonya Hongsermeier is head of knowledge management at Partners Healthcare, a Boston-based hospital and healthcare system. Her work is at the forefront of applying semantic web technologies to get the right knowledge to the nurse or physician at the right moment in patient care. This is really "mission critical", and Tonya's work is recognised globally as providing leadership in this arena. Eric Neumann is a neuro-biologist turned techie who lived through the Knowledge Management wars at a large pharma company and has combined all this experience into becoming the pre-eminent evangelist and expert for semantic web technologies in the life science arena (esp pharma and bio-tech companies). The fact that Tonya and Eric are both based in Boston help make this area the epicenter for advancing this field, despite the lamentable fact that the only two reasonably successful commercial efforts represented at C-SHALS were from California.

One last comment, for now, must include David Karger's contribution to the field. David is a good friend, an Israeli folk-dance maven, and a professor at MIT. He also presented at C-SHALS, and his work on the Simile project was noted from many angles to be pivotal in showing that the benefits of semantic web concepts could be reaped early and often with simple, yet powerful tools. More on this in future posts. Meanwhile, practice saying Semantic Web a few times - it is easier than tongue-twisters using the phrase C-SHALS, and you will find yourself an expert in something important over the next few years.

The $1,000 Genome

As you know, Lifestyle 3.0 demands you get your own gene sequence.

Various universities are now planning to sequence the genes of at least 1,000 people (I don't think they are looking for volunteers). The coverage states this will cost $50 million - meaning $50,000 per person. Google is supporting a Harvard project which aims to sequence 100,000 people or even 1 million!

Conventional wisdom predicts a massive shift in the use of genetic sequencing in regular medical care when the price is at $1,000 per person. Given how I imagine pricing will come down an exponential curve (similar to Moore's Law for micro-processors), I expect that this will be the case in about 10 years.

I have been following the debate on quite what we will do with all this information, and have many theories, but it is no longer theoretical to expect it to happen.