MarkLogic in the Cloud

| | Comments (3) | TrackBacks (0)

Anyone who knows me or reads this blog regularly can tell you that I'm a totally sold on cloud computing. Recently, I wrote a little about MarkLogic, blogged about IaaS, and I got to thinking: MarkLogic should build an infrastructure service that runs in the Amazon's cloud, scales to Internet levels, and comes with a pay-per-use sales model.

MarkLogic's native XML database is proven, mature, and has been deployed to many companies with household names; however, they aren't cheap. They are competitively priced, but still prohibitively expensive for many companies. The product is licensed by CPU sockets (IIRC), and for a license to run on a one-CPU-socket machine is something like $10,000. That's a lot of zeros for small and medium businesses, especially when relational products that are trusted and better known like MySQL, PostgreSQL, and SQL Server are available at no cost or for pennies on the dollar compared to MarkLogic. Switching from a tried and true relational database to an unknown database product from a less known source sounds really risky to many purse-string holders. If MarkLogic provided its wares as a service that organizations could consume over the Internet on a pay-per-use basis, reluctant companies would be able to utilize MarkLogic's product in an actual program that goes to production, allowing naysayers to see that this alternative has a lot of benefits and that the risks are not as great as they fear.

As the software's creator, who better to create this infrastructure service? They have already used it to build a highly scalable and highly available system called MarkMail that is currently indexing and searching over 7,000 news groups. For example, they know the problems they had with caching and proxying caused by Squid which they used in their early release of the use group indexing service. They solved it, made it scale, and could do the same with a general purpose data storage service, I'm sure.

If you survey the landscape of infrastructure services running in the cloud today, you're choices are few. You have Amazon's SimpleDB, Microsoft's SQL Data Services (eventually), or Google's App Engine datastore. (There might be others, but I don't know of them.) These services all have different and proprietary interfaces. If MarkLogic were to create an alternative to these, its interface would be standard XQuery 1.0. I don't know about you, but I would find a standardized API more appealing from a business and development point of view. From the former perspective, it would assure me that I'm not writing code that is locking me into one vendor's service; from the later, it would allow me to use existing knowledge, libraries, and tools rather than forcing me to use what is provided by the vendors or forcing me to create my own.

Considering some of the recent partnerships that MarkLogic's competitors IBM and Oracle have made with Amazon, I wonder if MarkLogic can afford not to, at the very least, offer a cloud-compatible license and a pricing model that allows for elastic scalability. While this would be a great first step, I'm convinced that a full blown data storage service would be adopted by many companies trying to store large amounts of semi-structured content with Internet-sized demands. If MarkLogic does go after this market, running it in Amazon's cloud rather than someone else's would be beneficial to others running in Amazon's data centers because the transfer rate of data sent within their private network is free. If MarkLogic does not provide a data storage service that is powered by a native XML datastore, supports a standardized query interface, scales to Internet levels, and is highly available, redundant, and performant, then someone else should using an open source alternative. I would be very interested in such a service regardless of who provides it.