Finding IPOs using YQL and Yahoo! Pipes

| | Comments (0) | TrackBacks (0)

Part of the MBA program that I'm enrolled in involves taking a number of economics, accounting, and finance classes. I'm starting to use the knowledge I've gained from them to purchase stocks. One thing about investing that I'm finding is the importance of having good information. I'm sure that the pros have tons of tricks, and maybe I'll learn some of them over the years. One that occurred to me already was to look for companies that are about to go public. One way to find this information is to watch for companies filing an S-1 form with the SEC. The SEC makes these listing available in Atom format, so it's pretty easy to stay on top of it. When you subscribe to a feed of S-1 filings, however, the results include amendments and updates to previously submitted S-1 filings. I wanted to remove these false positives. To do this, I used YQL and Yahoo! Pipes.

I've been using Yahoo! Pipes for a long time for trivial data munging; however, this time, I could not find an easy way to do what I needed using this service. I had heard of YQL, but never dug into it. Unable to use Yahoo! Pipes in and of itself, I thought I'd see if YQL could help before turning to Perl or Python. In doing so, I found that YQL is simple yet powerful. Here's what I had to do to turn the SEC's feed into the one that I really wanted.

YQL stands for Yahoo! Query Language, and, as its name states, it's all about querying data. Yahoo! gives you lots of data sets that you can query, but YQL also supports a way of querying your own data (or the SEC's as the case may be). These non-Yahoo! data sets are called Open Data Tables, and many people are extending YQL using them. Open Data Tables are just an XML document that contains a few elements, the most significant being the execute element. As others have described, this element contains some JavaScript, but not just any old JavaScript; the element contains ECMAScript for XML (E4X). E4X is JavaScript with the ability to embed XML literals directly in the code and includes syntax sugar to make working with XML much easier (kinda like in VB.NET but with neutral sweeteners not LINQ). In YQL, E4X is the standard stuff with additional objects that Yahoo! has added. One of these, the y global object, includes helpers that allow you to easily call RESTful Web services with no effort whatsoever (relatively speaking). I used this to get the SEC's feed and begin munging it in my custom Open Data Table.

The actual Open Data Table XML document is pretty boring stuff except for the url element(s) and the E4X script in the execute element. The url element(s) contains the location(s) of the data you want to pull into your script; the execute element contains the code to process it. Here's the only bit of code I had to write to remove the amendments and alter the SEC's feed more to my liking:

    <execute><![CDATA[

        default xml namespace = "http://www.w3.org/2005/Atom";

       

        var xml = request.get().response; // Call the URL defined in the url element

        var entries = <entries/>;

 

        y.log("Called SEC Web site and about to iterate over results.");

 

        for each(var entry in xml.entry)

        {

            // Include only S-1 filings

            if (entry.category.@term.toString() === "S-1")

            {

                y.log("Adding S-1 filing: " + entry.title);

 

                var link = entry.link.@href.toString().replace('-index.htm', '.txt');

 

                y.log("Link to filing's plain text version: " + link);

 

                var newEntry = <entry>

                    <link rel="alternate" type="text/plain" href={link}/>

                    <title>{entry.title.toString().replace(/ \(.*/, "")}</title>

                    <update>{entry.updated.text()}</update>

                </entry>;

 

                y.log("Adding entry to collection of filings");

                entries.* += newEntry;

            }

        }

 

        response.object = entries;

    ]]></execute>


This trivial, little snippet does what every program does: get some input (by fetching the feed from the stipulated URL), process it, and output the results. The syntax and objects provided are so high-level though that this cannot be much easier. The entire Open Data Table can be found here and you can see the result in the YQL Console. One really important thing about writing and debugging these scripts in that you tack on "?debug=true" to the URL of the YQL Console. Without this, YQL will cache stuff, making development almost impossible.

One really sucky part IMO about YQL is that it places your output in an "envelope" that can't be removed. What I mean by this is that the output of any YQL query is some XML surrounding whatever XML you generated from the script in the execute element. In my case, I started with Atom and wanted to end with Atom, so I could keep an eye out for new IPOs in my blog reader. Because of this limitation, I had to use Yahoo! Pipes in the end after all. The pipe is very simple; it contains a YQL module followed by a Sub-element module that picks out the entry element I created.

As you can see, YQL helps do some pretty cool things with relative ease. If you haven't checked it out, I would recommend that do. Start by visiting the YQL developer site, and let me know if you have questions, thoughts, or other YQL experiences by leaving a comment or by contacting me. Lastly, if you want to subscribe to the feed of S-1 filings, you can find it here (no promises about uptime or availability).