June 2008 Archives

Lately, I've been doing a lot of investigation of native XML databases in general and MarkLogic in particular. The motivation for this analysis was sparked by something I read in a research paper that was published last year by Michael Stonebraker et al wherein they state that:

...major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines....We conclude that the current RDBMS code lines, while attempting to be a "one size fits all" solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of "from scratch" specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow's requirements, not continue to push code lines and architectures designed for yesterday's needs.

I found this paper on the Web site of the Very Large Data Base Endowment, an organization promoting and exchanging scholarly work related to databases, while looking for information about designing large-scale, terabyte-size databases.

Because the problems that I'm typically confronted with involve XML and Web services, I wondered:

  • What specialized XML database engines are available?
  • Are these specialized databases able to outperform RDBMS in my scenarios as Stonebraker and his colleagues concluded?

I surfed around a learned that there are two types of XML databases: Hybrid and native XML databases. The difference between these two is an XML database that uses a hybrid architecture shreds the XML into a relational model for storage while a native one stores the information in its native document form. The former is what Stonebraker is advising against. To further explain the difference, a native XML database often exhibits the following characterizes (as described on XMLmind's Web site):

  • Their basic unit of storage is the XML document, the structure of which is preserved according to a data model such as the XML Infoset or the XQuery/XPath Data Model;
  • They accept any well-formed XML document irrespective of shape (a property sometimes called Schema independence); and
  • They support an XML-aware query language, typically XQuery, XPath and/or XSLT.

For more information about native XML databases, check out Ronald Bourret's article Going Native: Making the Case for XML Databases on xml.com.

With this understanding, I compiled the following list of specialized native XML database servers. I intend to investigate these different products as time permits. (This list doesn't include embeddable engines, only standalone servers.) I'm sure there are others. If you know of any, please let me know.

Product

Vendor

Comment

DB2

IBM

Documentum XML Store

EMC

Formerly X-Hive/DB by X-Hive

eXist

N/A

Open source

MarkLogic Server

MarkLogic

MonetDB/XQuery

MonetDB

Open source

Progress Sonic XML Server

Progress Sonic

Oracle Database

Oracle

SQL Server

Microsoft

Support in the 2005 edition is very poor IMO and should not be used carelessly

TEXTML Server

IXIA Software

TigerLogic XDMS

TigerLogic

Company was recently renamed from RainingData

XMS

Xpriori

XQuantum XML Database Server

Cognetic Systems, Inc.

XStreamDB

Bluestream

Of these, I've delved into MarkLogic the deepest so far. I've been in contact with these guys a number of times via phone and email, I've posted messages on their developer list, installed their community edition (which is limited to a measly 100 MB), read a bunch of their documentation, and watched some of their Web casts. We may have one of their consultants onsite in the near term as well to explain how we can utilize their product. In the coming weeks, I'll blog about more details about MarkLogic, so stay tunes.