Creating a Twitter Bot that Literally Speaks

| | Comments (1) | TrackBacks (0)
I have been really reluctant to start using Twitter because of what happen when I discovered IRC years ago: I wasted hours and hours chatting with strangers about absolutely nothing.  I was at a Web Innovators meeting the other day, and the speaker, Rick Turoczy, asked who in the room was not using Twitter yet. Of the fifty or so attendees, only one other soul besides me raised their hand.  So, despite the potential waste, I felt compelled to contribute some noise of my own.

A few weeks after signing up, it was starting to seem worthless.  But then, an entrepreneur friend sent me a link to some various Twitter tools, and she got me thinking about the potential utility of this social networking phenomenon.  Inspired by these tools and Kevin Kelly's talk on about what is in store from the Web over the next 5,000 days, I thought I would create a Twitter robot that connected a few different Web services to do something interesting.  The result demonstrates the marvel of cloud and utility computing.

One system that I wanted to use was a new service called Twilio that allowed you automatically make phone calls using a text to speech interface.  This service runs in Amazon's cloud, and allows you to initiate phone calls using a RESTful API.  I decided to use this service from the bot as follows: Any time one of the bot's followers sends it a direct message with their phone number, it would call them.  It would ask them to record a greeting, and then it would update its status with a URL to the recorded greeting.


Before I talk about all the technical details of how I implemented this, let me describe the big picture:

  • Followers of the bot send it a direct message of the form "callme NNN-NNN-NNNN", the N's being the Twitter user's American phone number. (I don't know if Twilio works with non-American numbers.)
  • Twitter sends an email to the address of the bot with some special headers and the direct message in the body of the mail.
  • Procmail is configured to pipe all emails with these Twitter-specific headers to a script.
  • This script initiates a call with Twilio.
  • Twilio makes a phone call and records the message that the recipient leaves.
  • Twilio invokes a callback script and provides a URL to the recording.
  • This callback handler updates the bot's status with this URL, so that followers can click on it and hear the greeting.
This sequence of actions is illustrated in the following diagram:


This figure isn't accurate in a few different respects, but it gives the general idea.

Technical Details

To begin with, I needed to create a new Twitter account for the bot. I chose @tweetybot.  (While creating this new account, I found that Twitter doesn't allow multiple accounts to reuse the same email.  You can work around this using sub-addressing.)  I configured the account such that an email would be sent to [email protected] every time someone sent a direct message.  On the email server, I added the following procmail recipe:

* X-Twitterrecipientname: tweetybot
* X-Twitteremailtype: direct_message
This results in an email being piped into the script with the direct message in the body.  What about the sub-address?  It isn't being used.  I included it, so that I could reuse my PSU email address. However, if I create another bot, I could update the recipe like this (IINM):

ARG = $1

* X-Twitterrecipientname: tweetybot
* X-Twitteremailtype: direct_message
* ARG ?? ^^tweetybot^^

* X-Twitterrecipientname: bot2
* X-Twitteremailtype: direct_message
* ARG ?? ^^bot2^^

By doing this, the lines in blue could be omitted, or they could be left and the sub-addressing-related stuff (in yellow) could be removed.  Either way would work (I think).

Initiating the Call

Once the email is piped into the Python script,, its contents are parsed for the command, callme, and the phone number to call.  If the command isn't found the email is dropped on the floor.  (As I've written it, the bot only handles one command; however, typical bots support multiple commands, so parsing would have to be beefed up in most scenarios.)  If the command is found, a new call is initiated with Twilio using their REST API.  The twilorest library that's imported can be found on the Twilio Web site.  For posterity, you can download the complete script from my site.  The part that initiates the call is this:

d = {
    'Caller' : CALLER_ID,
    'Called' : phoneNumber,
    'Url' : '',
account.request('/%s/Accounts/%s/Calls' % (API_VERSION, ACCOUNT_SID), 'POST', d)

Note that no additional data can be provided when initiating calls.  If the service's interface allowed for this and returned it later when invoking the callback (a common idiom in asynchronous APIs), the user name of the follower who sent the direct message could be included in the eventual status update.

The TwiML Document

As you can see from the snipped above (in yellow), one of the arguments passed to Twilio is a URL.  This refers to an XML document in a markup language called TwiML which contains instructions directing Twilio to record the call that initiated (in blue above) using a Record element like this:

<Record action="" maxLength="55"/>
This element contains an action attribute which informs Twilio of the URL to send the notification to once the phone call has been made, recorded, and transcoded.

The need for this document and the way that the URL for it is provided when initiating a call makes for an awkward API (IMHO).  I say this because Twilio will immediately pull down the XML document after initiating the phone call.  Requiring this data when initiating the call would make interacting with the service less complex and more performant (by avoiding a round-trip).  Apparently, this extra request/response might not be necessary in future versions of the API.

The Callback Script

After Twilio does its work, it sends an HTTP POST request to the callback handler, playback.cgi, provided in the TwiML.  This message will contain a parameter called RecordingUrl, which points to the transcoded MP3 version of the phone call (in yellow below).  Given this, it then uses the Twitter JSON API to update the bot's status (in blue below) to let the world know that someone recorded a greeting and where they can go to listen to it:

my $ua = LWP::UserAgent->new();
my $recording_url = uri_escape(param('RecordingUrl'));
my $request = HTTP::Request->new("POST", "", undef, "status=Someone has recording a greeting. Click here to play it back: $recording_url");

$request->authorization_basic('tweetybot', 'PASSWORD');
One thing to note about this call and the one to Twilio is that the respective credentials are being sent in the clear!  Neither service from what I could find supports HTTPS or a more secure method of authentication.  This is really bad, and limits their applicability and usage.

Also note that the name of the follower who record the message can't be included in the status update, as mentioned above, because Twilio doesn't allow opaque data to be passed in and out with the current API.


In just two hours with no prior understanding of Twitter's API or Twilio's, I was able to create a bot that uses these innovative Web services to respond to an IM, call an arbitrary phone number, record the user's message, transcode the resulting audio clip, and update the bot's status with a URL pointing to the follower's actual voice.  Isn't cloud computing incredible?!  Kevin Kelly was right: We have created just one machine, the Internet, and our phones, laptops, servers, and other devices are just ways to interface with it.  Considering that we can do this today, it's mind boggling to imagine what we'll be able to do in the next 5,000 days.  I can't wait!

(Note: You can get all of the scripts and artifacts from my stash and they are licensed under the GNU GPL v. 2.)