October 2009 Archives

In a previous post, I talked about candidate architectures for federated identity systems. My current favorite includes a single IP-STS that issues identity-only claims that are globally unique across all security realms. Because most applications will require more than just identity claims to make an access control decision, the architecture includes an RP-STS per realm. These token services translate globally unique user attributes into ones that are specific to him/her in each of these fiefdoms. The additional claims allow the applications in these realms to determine if the subject should be allowed access to their resources. A diagram of this arrangement doesn't have many boxes or arrows (as you can see in the first figure); however, it gets more complicated when you think about what it would require to solve some fairly common use cases. For instance, how does a Web application in one realm call a Web service in another? This need may arise when one BU owns the Web front-end and another owns the back-end service. Each application in these two BUs (i.e., security realms) would trust different RP-STSs. So, how (on paper) can this use case be solved with the previously described candidate architecture?


In order for this to work, the Web front-end RP 1 needs to trust RP-STS 1; RP-STS 1 must trust the IP-STS. Also, the Web service RP 2 trusts RP-STS 2. The RP-STSs must trust each other. These relationships can be seen in the second figure.



That's a few more lines, and we still haven't shown the information flows ;-)


Given these trusts, the chain of events that result is as follows: When an unauthenticated user accesses RP 1, he is redirected to RP-STS 1 (via WS-Federation). From there, he is redirected again to the IP-STS (also on the front-channel). The user authenticates, and the IP-STS issues him a security token ST0 containing identity-only claims. It redirects him back to RP-STS 1. That STS accepts ST0 as proof of the callers identification (because it trusts the IP-STS), and issues him a new token ST1 which contains application-specific claims that RP 1 will need authorize his access. It is very important to note that at this point ST0 is gone. RP-STS 1 may have copied the identity claims into ST1, but for all practical purposes ST0 with its signature, claims, and whatnot is completely gone. ST1 is returned to the user, and relayed to RP 1.


Now, our Web front-end RP 1 wants to call the back-end Web service RP 2 to get some additional resource. To do so, RP 1 needs a token for the user that is specific to RP 2. This RP only trusts RP-STS 2, so RP 1 must request an ActAs token from RP-STS 2 (using WS-Trust). To get this, RP 1 must be able to authenticate (e.g., using an X.509 cert) and send it ST1 that it got from an issuer that RP-STS 2 trusts, namely, RP-STS 1. RP-STS 2 does some verification of the token and authorization to ensure that RP 1 and ST1 can be used to get an ActAs token for RP 2. If so, RP-STS 2 transforms ST1 into an ActAs token ST2 that contains application-specific claims for the user accessing RP 2 (via RP 1). ST2 is returned to RP 1 who includes it with the request sent to RP 2. RP 2 trusts the issuer, so it accepts the claims in it and uses them to determine if the user is allowed access.


This sequence of events is shown in the final figure:

I know what you're thinking: This is maddening. Anyone that does this is psycho. You're right that this is complicated stuff. (Anyone who tells you differently is trying to sell you something.) If you're dealing with financial data, healthcare records, government secrets, or other sensitive information, however, what alternatives do you have? Let me know if you can think of a better way.

Part of the MBA program that I'm enrolled in involves taking a number of economics, accounting, and finance classes. I'm starting to use the knowledge I've gained from them to purchase stocks. One thing about investing that I'm finding is the importance of having good information. I'm sure that the pros have tons of tricks, and maybe I'll learn some of them over the years. One that occurred to me already was to look for companies that are about to go public. One way to find this information is to watch for companies filing an S-1 form with the SEC. The SEC makes these listing available in Atom format, so it's pretty easy to stay on top of it. When you subscribe to a feed of S-1 filings, however, the results include amendments and updates to previously submitted S-1 filings. I wanted to remove these false positives. To do this, I used YQL and Yahoo! Pipes.

I've been using Yahoo! Pipes for a long time for trivial data munging; however, this time, I could not find an easy way to do what I needed using this service. I had heard of YQL, but never dug into it. Unable to use Yahoo! Pipes in and of itself, I thought I'd see if YQL could help before turning to Perl or Python. In doing so, I found that YQL is simple yet powerful. Here's what I had to do to turn the SEC's feed into the one that I really wanted.

YQL stands for Yahoo! Query Language, and, as its name states, it's all about querying data. Yahoo! gives you lots of data sets that you can query, but YQL also supports a way of querying your own data (or the SEC's as the case may be). These non-Yahoo! data sets are called Open Data Tables, and many people are extending YQL using them. Open Data Tables are just an XML document that contains a few elements, the most significant being the execute element. As others have described, this element contains some JavaScript, but not just any old JavaScript; the element contains ECMAScript for XML (E4X). E4X is JavaScript with the ability to embed XML literals directly in the code and includes syntax sugar to make working with XML much easier (kinda like in VB.NET but with neutral sweeteners not LINQ). In YQL, E4X is the standard stuff with additional objects that Yahoo! has added. One of these, the y global object, includes helpers that allow you to easily call RESTful Web services with no effort whatsoever (relatively speaking). I used this to get the SEC's feed and begin munging it in my custom Open Data Table.

The actual Open Data Table XML document is pretty boring stuff except for the url element(s) and the E4X script in the execute element. The url element(s) contains the location(s) of the data you want to pull into your script; the execute element contains the code to process it. Here's the only bit of code I had to write to remove the amendments and alter the SEC's feed more to my liking:


        default xml namespace = "http://www.w3.org/2005/Atom";


        var xml = request.get().response; // Call the URL defined in the url element

        var entries = <entries/>;


        y.log("Called SEC Web site and about to iterate over results.");


        for each(var entry in xml.entry)


            // Include only S-1 filings

            if (entry.category.@term.toString() === "S-1")


                y.log("Adding S-1 filing: " + entry.title);


                var link = entry.link.@href.toString().replace('-index.htm', '.txt');


                y.log("Link to filing's plain text version: " + link);


                var newEntry = <entry>

                    <link rel="alternate" type="text/plain" href={link}/>

                    <title>{entry.title.toString().replace(/ \(.*/, "")}</title>




                y.log("Adding entry to collection of filings");

                entries.* += newEntry;




        response.object = entries;


This trivial, little snippet does what every program does: get some input (by fetching the feed from the stipulated URL), process it, and output the results. The syntax and objects provided are so high-level though that this cannot be much easier. The entire Open Data Table can be found here and you can see the result in the YQL Console. One really important thing about writing and debugging these scripts in that you tack on "?debug=true" to the URL of the YQL Console. Without this, YQL will cache stuff, making development almost impossible.

One really sucky part IMO about YQL is that it places your output in an "envelope" that can't be removed. What I mean by this is that the output of any YQL query is some XML surrounding whatever XML you generated from the script in the execute element. In my case, I started with Atom and wanted to end with Atom, so I could keep an eye out for new IPOs in my blog reader. Because of this limitation, I had to use Yahoo! Pipes in the end after all. The pipe is very simple; it contains a YQL module followed by a Sub-element module that picks out the entry element I created.

As you can see, YQL helps do some pretty cool things with relative ease. If you haven't checked it out, I would recommend that do. Start by visiting the YQL developer site, and let me know if you have questions, thoughts, or other YQL experiences by leaving a comment or by contacting me. Lastly, if you want to subscribe to the feed of S-1 filings, you can find it here (no promises about uptime or availability).

  1. Run KeyTool IUI by running <root after unpacking>\ktl241sta\run_ktl.bat
  2. Create a new PKCS12 keystore
    1. Click View, Select task, Create, Keystore



    2. Select PKCS12 as the format
    3. Select a target
    4. Specify a password (optional)



    5. Click OK
  3. Import the key from the JKS keystore into the new on that you just created
    1. Click View, Select task, Import, Keystore's entry, Private key, From other keystore



    2. Select the JKS keystore
    3. Specify its password (optional) . JKS keystore's have a default password of "changeit" (from what I gather)
    4. Select the PKCS12 format for the target keystore
    5. Specify the target keystore file
    6. Enter a password for the target keystore (optional)
    7. Click OK



    8. Select the alias(es) of the key(s) you want to import into the empty PKCS12 keystore from the JKS keystore
    9. Enter the password for the key ("changeit" by default)



    10. Wave to your little Java friend up there in the corner
    11. Click OK
    12. Enter in the new private key's alias (to become part of the CN?)
    13. Enter and confirm a password for the key (optional)
    14. Wave again
    15. Click OK
  4. Import the key in the new PKCS12 keystore into the Windows Certificate Store
    1. Start the MMC and add the cert snap-in for the local computer account
    2. Expand the "Certificates (Local Computer)" node
    3. Right-click the Personal node, select All tasks, and choose Import
    4. Click Next (missing your little Java friend?)
    5. Click Browse
    6. Change the file filter to "Personal Information Exchange (*.pfx;*.p12)
    7. Select the PKCS12 keystore you created in step 3 and click Open



    8. Click Next
    9. Enter the password for the keystore if you created; otherwise leave it blank
    10. Mark the private key as exportable if you want to get it out of the Windows Cert Store at a later time
    11. Click Next (Java Dude, where are you?!)
    12. Use the default Cert store (Personal) and click Next
    13. Click Finish!


    Your cert will be in the Certificates subdirectory; it's thumbprint should match the output of keytool:



    Thanks Duke!!!



Over the last year, I've been designing and implementing various candidate architectures for a federated identity management system. Quite honestly, when I started, I didn't know much about these types of systems. Not surprisingly my thoughts have really evolved and changed as I've learned more about them. I think there are many other people who are just starting to think about federated identity as well, so I thought I would explain the progression of my thoughts in hopes that it will help shape yours more quickly.

Architecture 1 - A Single STS with Centrally Stored Claims

Initially, the candidate architecture that I came up with had a single STS. I imagined that this lone service would issue all types of claims for all of its RPs. This meant that the STS would need a copy of all claim types and values that the RPs would need to authorize access. As a result, this first architecture included a claims service. This service would provide a way to register an RP with the STS, configure which claims it required, and provide a way for RPs to publish their values into a claims store that the STS would then draw from when issuing security tokens.

As you can see in the first figure, this architecture would include an STS that would delegate authentication to an IdP. The STS would get some identity-related claims from the IdP, but it would also get application-specific claims from the claims service. This solution has obvious flaws such as massive storage requirements, synchronization issues, etc. I abandoned this idea for a more decentralized approach.

Architecture 2 - A Single STS that Queries RPs for Claims as Needed

Even with automated synchronization, the first architecture was fatally flawed. When the second beta of ADFS v. 2 (PKA Geneva Server) came out, it included a little domain specific language (DLS) that allowed attributes to be retrieved from various attribute stores, including AD DS, LDAP directories, SQL Server databases, and more. These attribute stores could be configured on an RP by RP basis. With this capability, I built a system that fetched identity-related claims from a central directory and queried the RPs' data stores for additional application-specific values JIT. These were burnt into the security token that ADFS issued and sent back to the RPs. It ended up being some sort of weird circle as you can see in the second figure.

This design is a pretty clear progression from the first, but it still has some real problems. Chief among them is the odd back-channel thing. It just feels wrong to have the STS that is supposed to provide centralized identity management dipping into applications' databases for stuff that it's just going to turn around and give them right back. By having a centralized STS, identity information should not be as strewn around the enterprise as it often is today. This STS should be the single source for identities. With this candidate architecture, the authoritative sources of identity remain the various application databases.

Architecture 3 - A single STS that Issues Identity Claims Only

The third design I came up with was one that still had just one STS. This STS was different from the others though. It was an Identity Provider STS (IP-STS). This distinction meant it would only issue identity-related claims - global properties that described a user regardless of the application being accessed. This architecture means the weird back-channel thing was no longer needed, and that the IP-STS was the single authority for identities; however, with only identity-related claims, the RPs would often not have enough information to make an access control decision. Enterprise applications today need thousands of permission-granting attributes to authorize access. Just having the globally unique attributes issued by the IP-STS - things such as user ID, first name, last name, address, etc. - many applications would not have enough to determine if the subject should be allowed access. For this reason, the RPs would need to get additional attributes out of their own databases using the user ID as a key. After doing so, they would have enough information to make their authorization decisions.

This architecture is fine, and will be used by many companies I imagine. There is one though that I think is better.

Architecture 4 - An IP-STS and an RP-STS

An RP-STS is a relying party STS or resource provider STS (depending on who you ask). It is a lightweight STS that is married to an RP. These two components often share a database, but they don't necessarily have to. The RP trusts the RP-STS and the RP-STS trusts the IP-STS. When an unauthenticated user accesses the RP, he is redirected to the RP-STS and again to the IP-STS. There the user logs in, gets an identity token, and sends it back to the RP-STS. This service uses the user ID claim in it to look up application specific values that it knows the RP is going to need to authorize access. (It will almost certainly copy the identify-related claims into the new token as well.) It then returns this to the subject who sends it onto the RP. This expanded set of claims is all the RP needs. It doesn't have to query its database; it can authorize the user and return the resource if allowed.

This candidate architecture is common in the literature on this topic and obvious to my heroes - Hervey Wilson, Michele Bustamante, Vittorio Bertocci, Dominick Baier - and other experts in this field; however, it took me a long time to see it :-( Now that I have, however, I've also seen that it introduce a number of issues related to trust relationships, user provisioning, key management, and more. I'll try to find some time to blog about those in the future; if you're wondering about them before I do, feel free to get in touch with me.

I presented to the Portland Web Innovators user group tonight.  I talked about a Twitter bot that I created a while ago called @tweetybot.  It is a normal Twitter user that, believe or not, will make phone calls when it receives direct messages.  Ultimately, what I did was trivial; the magic and heavy lifting is being done by Twilio, a Web service that hides all the complicated telecom stuff behind a simple RESTful API.  You can read more about it on my previous blog post.

Here's the deck I presented just in case you missed the show:

I would like to especially thank Portland Web Innovators for giving me the opportunity to talk, Portland Incubator Experiment (PIE) for hosting the event, Twilio for providing a killer service and some really nice swag, and to all those that attended.