Open Source and What I Learned In School

Wed 07 January 2009 By ismith

Here you can find a dependency graph for the major subsystem that I work with in my job. (If you use some browsers, like FF3, you will probably have to click this image to see it at full size.) Of main importance are the number of items on this list that don't start with "nuxeo-", these are modules that we are consuming from the internet-ocean of open source projects. All of the code shown is java but that's not all the code! Be sure to glance and some of the "version numbers" for some of things that we use.

This is not everything I need to deal with for my part of the system, and there are substantially larger dep graph for the Nuxeo EP product and the Nuxeo RCP products (although these have some overlap with this image). For reference, it took 976 jar files to build Nuxeo EP from source, last time I checked. Nuxeo's technical stuff is quite small--perhaps a dozen folks.

This is the beginning of my argument that the java + open source + linux world is quite different than the one I grew up in. I was certainly aware that the projects I was doing in school were small--toys even. I was also aware that in the "big, bad, real world" there were these huge, often legacy, business systems that folks had to work on. It was certainly the case that I heard faculty members say, "Well, in the real world things will be more complex, but for the purposes of this course..."

I am coming to the conclusion that, now, the roles are almost reversed. If you are working on old, legacy systems you probably aren't working with open-source tech, and maybe not even java (or its modern brethren). In my opinion, if you are in this state, you are hosed.
You are also hosed because you are constantly unable to catch up to what is going on in the open source world and, worse, are forced to write code that you know already exists and you could just get off the shelf. In other words, your system(s) are probably actually smaller than what people are doing in the "open" world.

I see two big differences in this reversal of roles. These are 1) speed of change and 2) knowlege dissemination. On 1, a big system based on open source actually changes much faster than any legacy system ever could--even with the same number of developers. This is simply because there is no central control and coordination. Each user (customer) is forced to pick, choose, and understand to deal with changes... Oh, what? There is a new rev of apache commons IO? Does it break apache commons logging if we switch to it? WTF?!?!? We custom patched commons IO last time?? Ugh. 2) The knowlege and expertise regarding any WOS (wad of software, wad of stuff, wad of **), combination of WOSes, or subsystem is likely to be available to anyone--since it's probably a moderately out-of-date web page and the source code. In the case of older legacy systems, the "architecture committee" (politburo) dealt with both of these issues; whether it was better or worse than the present situation is a matter of opinion.

I was told in undergraduate school that the "thinking skills" like "analysis", "algorithms" and "design" were the things that were important in my training for the real world of software, not really the details of any particular programming language (cobol! jpl!). Details were something you would learn along the way, as needed, since they were likely to be constantly changing anyway. The "thinking" was what you were going to get a fat paycheck for!

Strangely, none of these three things turned out to be strictly true in terms of being needed for the world of open-source I'm in now. (One could certainly argue they are still needed in other areas, but remember to click on that diagram above again if you need reminding.) One could argue that they are to varying degrees unecessary now... although it still true that software folks get paid to think.
I think there are three things (none of which were even discussed in 1988!) that I would advise students to be good at if they want to work on big open-source systems when they graduate:

1) Know what information streams to get and understand how to interpret different kinds of input about software (e.g. beta from google vs beta from IBM vs beta from some kid in Finland--- freshmeat vs. If you don't know what code is out there, you can't very well use it and your organization can't profit from it. Related to this, ability to do quick-n-dirty searching for things that might be out there, especially in the case where it stands to reason that somebody "must have done this before."

2) Knowledge of enough different kinds of programming models and systems to know if A and B have can be combined without too much pain. Is one event driven and the other not? What about multi-threading? Does B use maven for dependency management and A use Ivy? Does system A have a bunch of baggage that makes it too heavy to drag along? (There are a host of related things about how to combine two things to get a useful third thing, but I like this as a proxy.)

3) Ability to communicate with other people via the internet to discover (dig up) critical bits of information that you need (quickly). This can be the ability to write a good test case, a maven pom with questions in the comments, or a good forum post in the right forum to get the attention of the lead developer.

If you look at this from 10K meters, it suggests to me that the java + Open Source + linux world actually favors people with more language skills--effective reading and writing are #1 and #3 on my list. #2 has, I would argue, no relation to what I learned in school. If you want to argue it has antecedents in the computing world, the two most obvious might be the days of "hacking" PDP-11s by combining two rolls of paper tapes and the smalltalk world of using the built-in libraries. Neither of these, though, had the size or churn rate of the current crop of open source systems in java.

There are some assumptions built into the above and I'll conclude with a treatment of two of them. First, I believe that the model of open source software is both valuable and a competitive advantage for an organization (e.g. Nuxeo, duh!). I believe that the huge dependency graph that I started this article with is actually better than the alternative of doing all the work yourself. I believe this on fundamentally economic grounds; there is no way that a smallish organization could do all of the things that are now necessary to be an enterprise computing software system without open source. Smallish organizations are typically more nimble and this helps distribute capital more effectively in terms of funding innovation.

The second assumption is that the cost structure of software--the business model, if you will--can make sense in a world where "everything is already out there." Above, I mentioned a couple of problems with a lack of central management of software as well as three skills that I feel are needed to be effective in an open-source world. I believe that this can, are, and will be monetized (Nuxeo, duh!) effectively by software companies. Even though "the software is out there" the problems above are still significant and the skills are far from evenly distributed. Folks with the skills and the companies that can reduce--or even eliminate--the problems of dealing with the complexity of open source software graphs like the one shown before will still be highly profitable.

The times, they are a changin'...

Category: Product & Development