Sifter: an API for website testing and optimization

I’m happy to finally give Sifter a name and a home. It’s a multi-armed bandit (MAB) API for testing web pages, optimizing which version (“arm”) of the test gets displayed. These arms could be article titles, ad positions, logos, colors, etc. Here’s the gist of how how a MAB test works:

  • A user visits a page that is under test.
  • The page makes a request to sifter asking for an arm to display.
  • Sifter makes a calculated decision about which arm to display and returns the result to the page as a value between 0 and (# arms - 1).
  • The page renders the selected version (specific colors, logos, layouts, etc).
  • The user interacts with the page in one way or another. Maybe they sign up, maybe they buy something, maybe they don’t.
  • When the interaction is complete, send the reward earned (if any) back to Sifter. This could be something binary like click/no click, or some other value such as the amount of money spent.
  • Sifter updates the bandit algorithm, influencing the arm that is selected the next time the page is rendered.

There is an endless number of uses for something like this. Here are a few off the top of my head:

Run a news website?

  • Run a few different title options for an article on your front page.
  • Figure out how to get users to read more of a paginated article:
  • Test the number of words shown on each page (say 500 or 1000)
  • Each time the user clicks “next page”, update the default test result value to the user’s progress through the article. (See the /select_arm route in the docs for more on default values)

Run a website with promoted content?

  • Allow advertisers to choose multiple promotions to run at the same time, and optimize which one gets shown based on clicks and/or user feedback.
  • Test the amount of labeling around the fact that this content is promoted vs organically created by users.

Test the audience demographic response.

  • Set up a standard A/B test.
  • Display the same content to every user.
  • Report back the results of the test where the “arm” represents a specific user demographic, rather than an alteration to the web page.
  • Your test results will show which demographic responds best to your content.

There are a lot of features that I think are pretty clever, like setting a default test result value and a TTL for the test, so that the default value gets reported when the test “expires,” and the ability to update this default value and TTL by sending a “heartbeat.” And there’s a lot on my To Do list, such as confidence intervals, multivariate testing, bucketing/aggregating test results for high-traffic websites, and more.

Right now the project is in a limited beta mode. I’m running it on a few pages that don’t get much traffic and I would love some help working out whatever bugs may come up. If you’re interested, please get in touch!

I was going to spend today working on a new project, but realized that the amount of work required just to get the boilerplate register/log-in/log-out code written is way too repetitive. So I stripped some code out of my multi-armed bandit API and called it bootstrap-bootstrap. It’s super basic, built with mongo/tornado/bootstrap, easy to deploy to heroku, and now up on github.

An API for website testing/optimization

This is something that I’ve been working on very heavily for the last 6 weeks or so, and I’m excited to start showing it off. It’s currently being hosted at

In a normal A/B test on a website, you randomly present users with one version of a feature or another. This could be the placement of your logo, for example. You let the test run for some amount of time, then measure the relative performance of the two versions (“arms”) and go with the better one from that point on.

Multi-Armed Bandit tests are different. They are designed to learn which arm is preferred, and move towards that version of the website as quickly as possible. There are many nuances to these types of tests, and I highly recommend John Myles White’s book on the subject (there’s also a video of him talking about this subject at tumblr recently).

The API that I built provides a couple of different Multi-Armed Bandit algorithms, which are exposed as simple routes to request an arm to play, and to update the algorithm with the test results. There’s also a little admin view to see how your test is performing:


I’m at the point now where I need some help testing. This thing is not at the point where you should deploy it on a production site, but if you have something to test out that doesn’t take a lot of traffic (and you don’t mind a little latency), it should be reliable. There’s a also a little client library on github if you just want to fuss with it. Either email me at, or send me a message through tumblr to get an account.

I’m honored to be giving a keynote talk at the 2013 BigData TechCon conference in Boston (April 8-10). There are a lot of excellent speakers at the event that I’m looking forward to seeing, including Claudia Perlich (Media6Degrees), Jonathan Seidman (Cloudera, formerly Orbitz), and Oscar Boykin (twitter).

Registration is open hereHere’s a short abstract for my talk:

Social networks are able to collect large amounts of activity data from their user and customer base. As Big Data professionals, we conduct experiments on custom data sets to measure the effectiveness of our products or advertising methodologies. Since a social network is effectively useless without an active community, our companies owe it to their users to create new and better products based on this information. Learn how our data analysis and predictive analytics must take a different approach than Big Data in fields like finance, medicine, and defense.

I’m extra excited that the conference is in my home town. It’s been a while since I’ve spent a few days in Boston and I look forward to seeing all of my friends up there. I can already taste the Silhouette popcorn.

New year, new projects.

Happy new year! 2012 was a pretty crazy year for me. The biggest thing was probably moving into my great new apartment with Lana, but I also worked on a bunch of great projects, including:

I also made (and almost completed) a little app for displaying trending data. It’s probably not in good enough shape for someone to easily clone it and use it out of the gate, which is really why I’m making this post. I have a lot of ideas for 2013 that I think have great potential, and I don’t want them to get stuck in limbo like that last project (which I will definitely finish, I promise).
My plans for 2013 start with this blog. I just found a nice and simple bootstrap-based tumblr theme on github and modified it to my liking. It’s not perfect yet, but that’s part of the plan. I’m also building a website testing framework that will help me tweak the remaining details.

The framework is an API for A/B and various bandit algorithm tests, where a web app can request a version of the page to display, and then report the results of the test back, thus influencing the version shown to the next user. The idea for this was inspired by John Myles White’s excellent book on the subject (watch his talk about it here).

It’s been fun to have a big project to work on again. Let’s hope 2013 turns out as well as 2012.