|
Let me state for the record that I applaud the Open Source Movement (OSM). The existence of independently developed competing software solutions to business needs helps to motivate all the players, Microsoft included, to innovate continuously and to keep prices down. In turn, competing software brings its own innovations and product designs to the marketplace and offers alternatives that accommodate different work styles and preferences. The OSM has delivered some impressive products and potentially represents a real choice in the software industry.
Having said that, let me also state for the record that my experience in actually trying to use an Open Source product has been, if not an unmitigated disaster, then at least a severe and costly (in terms of my time) disappointment.
There are two categories of people who will try an open source product for the first time: those who are developer/programmers, and those who are not. They all share the same basic motivation, I think, which is to seek out a cost-effective alternative software solution to a business problem. In my case, the problem was to simplify the creation and maintenance of ETL routines for a data warehouse with multiple inhomogeneous data sources. There are products designed for this purpose by Informatica, Oracle, Business Objects, and of course, Microsoft. You get Microsoft’s offering with the SQL Server database. The others charge an arm and a leg for their respective products. I wanted to look at a product that was not Microsoft’s and was less expensive than the others.
Enter Pentaho Data Integration (PDI). This package was an open source project called Kettle (Kettle Extract, Transform, Transport, and Load Environment) that is now packaged and marketed by Pentaho Open Source Business Intelligence. It is a neat looking product, with a GUI called Spoon (the complete metaphor also includes a Pan and a Kitchen) that adheres to a visual convention for modeling a transformation or a job similar to its competitors. It works with or without a central repository. It converts a transformation model or a job to XML files that can be used by the Kettle suite or other software. The objects with which you create a model seem at first glance very intuitively designed.
Unfortunately, when intuition fails, the documentation (such as it is) is not much help. There is a user guide that incompletely describes each icon and menu option, but doesn’t provide any practical tutorials. There are some example transformation files and job files that you can examine to get a clearer picture of how some features work, but they still leave a lot of uncharted territory. This gap is supposed to be made up by the “online community” – forums in which you can post questions and get answers, often from the original designer of the program.
It seems clear from the postings on the PDI forum that most of the people using this software are programmers who have the time and the budget to do their own debugging, and to read from and post to blogs and bulletin boards. The rest of us don’t really have that luxury. For us, the alternative is to hire Pentaho for a consulting gig, and that’s where the money comes in.
Pentaho is following the same business model with Kettle as RedHat has done with Linux. They have taken charge of coordinating the efforts of independent programmers who volunteer to create enhancements to the Kettle software; assemble the results and what documentation there is as a free downloadable package; and offer consulting assistance and training to entities that want to implement the product in their environments. The gaps in information and the bugs that surface are all income opportunities.
Here’s my complaint: If you brand a product that is hard to use; includes limited, incomplete, or inaccurate documentation; and has basic features that don’t work, you undermine the credibility of the product, the brand, and the whole Open Source approach, and you limit its ability to achieve serious market penetration. If the plan is to bring in those consulting dollars to plug the gaps, it doesn’t seem like a very good plan since, if the product lacks or loses credibility out of the box, the consulting potential goes to zero!
A truly fundamental function of any software is to save the work you have done with it. But I couldn’t save my work using the PDI version 2.5.2. In response to my post on the forum, the creator of Kettle informed me that “a workaround” had been included in version 3. It may have been just an unfortunate choice of words, but “workaround” implies to me that the problem is pretty serious and has not been fully resolved.
On a more complex level, consider the basic creation of statistics. Suppose a table has an attribute with a finite set of values v1, v2, v3 … vN. An ETL procedure has to denormalize these attributes: a target table has a set of columns colv1, colv2, colv3…colvN, and for each occurrence of an attribute value in the original table with key KN, the count in corresponding column of the target table is incremented by 1. In a procedure, this is handled with a CASE statement, such as:
“case source.attrcol when v1 then target.colv1 = target.colv1+1 when v2 then target.colv2 = target.colv2+1…when vN then target.colvN = target.colvN+1 end”
Granted there is a lot going on here, but it is very simple to program in this way. PDI’s denormalizer tool seems to be able to put the value vN into target.colvN, but not to simply count the number of times vN occurred with this key. There doesn’t seem to be any practical help for this in the documentation or the examples. The only option seems to be to pay Pentaho either for consulting or for a training class. Not likely to happen on spec.
If all OSM projects are marketed this way, I can’t see them achieving much in the way of market penetration. A few committed OSM geeks in big corporate IT departments with deep pockets will probably launch “pilot” implementations that get institutionalized without enjoying wide adoption. The SMB market, where cost-effective software alternatives are needed most, will remain largely undeveloped because they can’t afford to invest their own resources in “not ready for primetime” software, and would consider spending on consulting a high-risk low-return proposition. OSM vendors should reconsider at least this aspect of their business.
|