Recently in Databases Category

EF, LINQ to SQL, and TVP

| | Comments (0) | TrackBacks (0) |

SQL Server 2008 has a great new feature called Table Value Parameters (TVP). I think there are a lot of applications for this new technology, but what about using it with some of Redmond's other new inventions, namely Entity Framework (EF) or LINQ to SQL? Specifically, can you drag and drop a stored procedure from the Server Explorer on to Entity Data Model Designer (in the case of EF) or the Object Relational Designer (in the case of LINQ to SQL)? The answer to this question is yes, but with a catch. Once you try to build the solution, you'll get one of the following compilation errors:

  • DBML1005: Mapping between DbType 'Structured' and Type 'System.Object' in Parameter 'source_key_list' of Function 'dbo.stp_GetCustomerSK' is not supported.
  • The function 'stp_GetCustomerSK' has a parameter 'source_key_list' at parameter index 0 that has a data type 'table type' which is not supported, the function was excluded.

The first error is what you'll get when you're trying to use TVPs with LINQ to SQL and the second what you'll get when using EF. These errors were reported from Visual Studio 2008 SP 1.

This state of affairs reinforces my opinion about these two ORM technologies. I think EF is an immature version one technology that should be avoided in most cases. (Try back with version two or three.) In the latter case, I think that it should only be used on products with a short life span considering that Microsoft is no longer investing any R & D dollars into it and will probably deprecate it in the next release of the .NET framework.

RDF Gateway and RESTful Web Services

| | Comments (5) | TrackBacks (0) |

I was looking into RDF Gateway the other week, and found it to be a very interesting product. It provides an RDF store which can be accessed using SPARQL or T-SQL and persists its data in a SQL Server database. It also allows you to expose your SPARQL queries by creating RDF Server Pages (RSPs). This technology is similar in nature to ASP or JSP in that it mixes markup with SPARQL quires and a JavaScript-like language called RDF Query Language (RQL) similarly to the way ASP embeds JScript/VBScript in markup and JSP intermixes it with Java. RSP documents are interpreted by RDF Gateway, which runs as a Windows NT service. These scripts are requested over HTTP, making them accessible to a wide array of applications and programming environments. After just a cursory investigation, it isn't hard to see how one could use this product to expose an RDF store as a collection of RESTful Web services. In case you're having trouble visualizing this, have a look at the following figure that shows the overall architecture of how this might be done:

In my view, each RESTful service would have one RSP script. The client would indicate which action or operation they wanted to invoke by setting a parameter on the query string. The RSP page would grab this argument, and invoke the corresponding function to fulfill that specific type of request. I realize that this design would make the service an REST-RPC hybrid (using Richardson and Ruby's term which they coined in their book RESTful Web services), but it would avoid the need to use something like Squid simply to do URL rewriting (since RDF Gateway doesn't have such abilities natively).

To show how this would work, I'll describe the idiom used by the different RSP pages above. Because each service would have one RSP page, it would perform the role of "dispatcher." It would grab the action off the query string, switch on it, and call the appropriate function. To demonstrate how this might look, consider the following snippet of RQL code:

<%
import "operations/a1.rql";
import "operations/a2.rql";
import "/common.rql";
%>
<Response>
<%
Response.contentType = "text/xml";

var action = Request["action"]; // Dispatcher

switch (action)
{
case "a1":
a1();
break;
case "a2":
a2();
break;
default:
_Error(); // Imported from common.rql
break;
}
%>
</Response>

This code fragment is the service's entry point, its "main" if you will. To keep the code organized, each of the service's operations is fulfilled by different functions which are factored out into separate RQL script and imported in the main RSP page. Each of the RSP pages for the different services would be almost identical. (I can imagine a little code generator to produce this code and a plug-in for Visual Studio or other editors to call it.)

When the action is fired, it would process the body of the HTTP request, use SPARQL to update or query the RDF store, and return the results in XML or JSON format. (The snippet above precludes the use of JSON, but that could be easily be changed and which format should be returned could be based on the value of a query string argument, e.g., json=1.)

Like the format of the results, the contents of the request would be XML or JSON. In the latter case, I think that RQL's eval statement could be used to turn it into an object that could be processed directly. In the case of XML, a little munging would have to be done, and, I'm not sure what support RDF Gateway has for this. Once converted to objects that could be used in the RQL script, they would be passed to SPARQL to perform the operation-specific processing as shown in the following snippet:

function a1()
{
    var input;    

// RQL doesn't have a contentType property on their
// Request object, so we have to see if the input
// is JSON by trying to eval it. I realize that
// this makes the system susceptible to script injection,
// but this is just a POC.
//
// The object returned from eval and MungeXml contains
// two properties, lbound (lower bound) and ubound (upper
// bound).

try {
input = eval(Request.binaryRead());
input.lbound; input.ubound;
}

catch(ex)
{
input = MungeXml();
}

// Use input parameters in a SPARQL expression to update
// or query the datastore.

var rs = (SELECT TOP 5 ?p ?s ?o USING mydata
WHERE {?p ?s ?o} and between(?o, #(input.lbound),
#(input.ubound)));

    Response.Write("<data>");    

    for (; rs.EOF; rs.MoveNext())
    {
        Response.Write("<datum><p>");
        Response.Write(rs["p"]);
        Response.Write("</p><s>");
        Response.Write(rs["s"]);
        Response.Write("</s><o>");
        Response.Write(rs["o"]);        
        Response.Write("</o></datum>");
    }    

    Response.Write("</data>");
}

As I mentioned at the beginning of this post, I've only had time to perform a short investigation into RDF Gateway. So, if there are glaring problems or emissions with the ideas presented here, please let know as a learn more about this exciting product.

If you would like to get started with RDF Gateway, surf over to Intellidimension's Web site and request a 60 day trial license by emailing sales@intellidimension.com.

Number of Files for tempdb and User Databases

| | Comments (0) | TrackBacks (0) |

Part of optimizing the IO subsystem of a machine running Microsoft SQL Server is deciding how many files a user database and tempdb should be broken up into. Denny Cherry provides some guidelines to determine this in an article that he published earlier this year. In it, he provides a couple of formulas that can be used to mathematically determine the appropriate number of files depending on the hardware being used. In the following discussion of these equations, let C, O, and P be defined as follows:

C = Number of CPUs
O = Number of cores per CPU
P = Physical files

According to Cherry, the number of files that should make up a user database can be computed using this formula:

P = C * O / 4

For example, if you have dual dual cores, you would have one physical file for the user databases:

P = C * O / 4
P = 2 * 2 / 4
P = 1

Or, with a dual quad core machine, the number of physical files is two as determined thus:

P = C * O / 4
P = 2 * 4 / 4
P = 2

To compute the number of files that makeup tempdb, use this formula:

P = C * O

For example, if you have dual dual cores, you would have 4 files for tempdb:

P = C * O
P = 2 * 2
P = 4

Or, if you have dual quads, the number of physical files is 8:

P = C * O
P = 2 * 4
P = 8

Keep in mind that I'm talking about files here not filegroups. Filegroups can purportedly be used to increase parallelism as noted on Microsoft's Web site; however, I would take that suggestion with a grain of salt. In his book, Inside SQL Server 2005: The Storage Engine Kalen Delaney has this to say about using file groups to increase parallelism:

Most SQL Server databases have a single data file in the default filegroup....As a user acquires greater database sophistication, she might want to use multiple devices to spread out the IO...The easiest way to do this is to create the database file on a RAID device. Still no need for filegroups...the user might decide that she really wants multiple files...In this case, she still does not need the filegroups — she can accomplish her goals by using create database with a list of files in separate drives.


More sophisticated database administrators might decide to have different tables assigned to different drives or to use the table and index partition feature of SQL Server 2005. Only then they will need to use filegroups...

There's usually no performance benefit in doing so.

To make this distinction clear, what I'm talking about here is a create statement such as this one:

CREATE DATABASE Archive
ON
PRIMARY
(NAME = Arch1,
FILENAME = 'W:\archdat1.mdf', -- Drive one: W
SIZE = 100MB,
MAXSIZE = 200,
FILEGROWTH = 20),
(NAME = Arch2,
FILENAME = 'X:\archdat2.ndf', -- Drive two: X
SIZE = 100MB,
MAXSIZE = 200,
FILEGROWTH = 20),
(NAME = Arch3,
FILENAME = 'Y:\archdat3.ndf', -- Drive three: Y
SIZE = 100MB,
MAXSIZE = 200,
FILEGROWTH = 20)
LOG ON
(NAME = Archlog1,
FILENAME = 'Z:\archlog1.ldf', -- Drive four: Z
SIZE = 100MB,
MAXSIZE = 200,
FILEGROWTH = 20),
(NAME = Archlog2,
FILENAME = 'Z:\archlog2.ldf', -- Drive four: Z
SIZE = 100MB,
MAXSIZE = 200,
FILEGROWTH = 20)

In doing this, you are instructing SQL Server to create different files to store your data. You are not telling it what objects to put in which files, however. This can only be done with filegorups. Rather, SQL Server decides which files to put which things in. If each of these files is on a different drive, or array of drives, you can achieve greater performance as Delaney points out.

Why Native XML Databases?

| | Comments (0) | TrackBacks (0) |

Lately, I've been doing a lot of investigation of native XML databases in general and MarkLogic in particular. The motivation for this analysis was sparked by something I read in a research paper that was published last year by Michael Stonebraker et al wherein they state that:

...major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines....We conclude that the current RDBMS code lines, while attempting to be a "one size fits all" solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of "from scratch" specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow's requirements, not continue to push code lines and architectures designed for yesterday's needs.

I found this paper on the Web site of the Very Large Data Base Endowment, an organization promoting and exchanging scholarly work related to databases, while looking for information about designing large-scale, terabyte-size databases.

Because the problems that I'm typically confronted with involve XML and Web services, I wondered:

  • What specialized XML database engines are available?
  • Are these specialized databases able to outperform RDBMS in my scenarios as Stonebraker and his colleagues concluded?

I surfed around a learned that there are two types of XML databases: Hybrid and native XML databases. The difference between these two is an XML database that uses a hybrid architecture shreds the XML into a relational model for storage while a native one stores the information in its native document form. The former is what Stonebraker is advising against. To further explain the difference, a native XML database often exhibits the following characterizes (as described on XMLmind's Web site):

  • Their basic unit of storage is the XML document, the structure of which is preserved according to a data model such as the XML Infoset or the XQuery/XPath Data Model;
  • They accept any well-formed XML document irrespective of shape (a property sometimes called Schema independence); and
  • They support an XML-aware query language, typically XQuery, XPath and/or XSLT.

For more information about native XML databases, check out Ronald Bourret's article Going Native: Making the Case for XML Databases on xml.com.

With this understanding, I compiled the following list of specialized native XML database servers. I intend to investigate these different products as time permits. (This list doesn't include embeddable engines, only standalone servers.) I'm sure there are others. If you know of any, please let me know.

Product

Vendor

Comment

DB2

IBM

Documentum XML Store

EMC

Formerly X-Hive/DB by X-Hive

eXist

N/A

Open source

MarkLogic Server

MarkLogic

MonetDB/XQuery

MonetDB

Open source

Progress Sonic XML Server

Progress Sonic

Oracle Database

Oracle

SQL Server

Microsoft

Support in the 2005 edition is very poor IMO and should not be used carelessly

TEXTML Server

IXIA Software

TigerLogic XDMS

TigerLogic

Company was recently renamed from RainingData

XMS

Xpriori

XQuantum XML Database Server

Cognetic Systems, Inc.

XStreamDB

Bluestream

Of these, I've delved into MarkLogic the deepest so far. I've been in contact with these guys a number of times via phone and email, I've posted messages on their developer list, installed their community edition (which is limited to a measly 100 MB), read a bunch of their documentation, and watched some of their Web casts. We may have one of their consultants onsite in the near term as well to explain how we can utilize their product. In the coming weeks, I'll blog about more details about MarkLogic, so stay tunes.

Webinar on MarkLogic Coming Up

| | Comments (1) | TrackBacks (0) |

MarkLogic will be conducting a Webinar this week on May 13th about using their native XML database to manage information in an increasingly markup-laden world. To attend the lecture, surf over to their Web site.

« DAM | Main Index | Archives | SOA »