So Moneyball didn’t win any Oscars last night.
It was definitely my favourite film of this Oscar season and the only one I’ve watched multiple times.
I remain fascinated by its take on irrationality in American business culture by way of building more efficient baseball teams.
I mentioned the film to friends, as one does when taken with a new cultural crush.
Yeah, Luke Galea told me. My work is a lot like that.
Curious about the relationship Galea sees between redefining baseball and building dating websites, I asked him to explain.
This post is about that conversation, and it’s going to be a lengthy one.
I’ve structured the post to provide six takeaways for anyone whose work involves website creation, regardless of whether you have access to the kind of technical and staffing resources that Avid does.
If you’ve only got a minute, here are the key points:
- Data is worth the investment — if you’ve got buy-in from your leadership
- Develop goals you can test
- Don’t assume you know what your readers want
- Have the guts to follow your data
- Your data is only as good as your sample size
- Work within your limits
Before We Begin
Geeks with far more impressive pedigrees than mine have been doing that since the movie came out last September.
Some of the other posts explore HR practices in software companies; others argue the prevalence of accessible tools means that the web development industry is in the midst of shifting to data-driven testing or A/B testing.
Galea and I touched on both of those ideas throughout our conversation.
Rather than retread that ground, however, this post will examine, in some depth, the relationship between the kind of logic Moneyball espouses and Avid’s experience building better dating websites.
1. Data is Worth the Investment — If You’ve Got Buy-In
Moneyball spends a lot of time framing Billy Beane, the general manager of the Oakland A’s, as the right leader to restructure the world of baseball, largely because he’s willing to examine the game’s irrationalities and exploit them.
This willingness partially stems from a disappointing professional ball career that fell far short of the promise baseball scouts saw in Beane as a high school athlete.
Beane’s willingness to invest in and leverage data about his ball club was unique in professional baseball in the early 2000s, though the Sabermetric method has since been more widely adopted largely as his peers sought to replicate his success.
Since many web-based organizations no longer have to rely on shipping a physical product (e.g., CDs) to deliver their services to their users, it’s possible to actively test a product on an ongoing basis.
Unsurprisingly, the testing mindset is becoming more standard in web development, particularly for large organizations with a large consumer base (e.g., Amazon).
But what struck me as I spoke with Galea about Avid Life Media is how deeply this willingness to experiment permeates their corporate culture.
So we talked about it.
Elizabeth Monier-Williams (EMW): I admit to being fascinated by the monitors.
Luke Galea (LG): They do become a gathering place when we’re running a test. It’s exciting and a little gut wrenching to watch people respond in real-time to our decisions.
EMW: Would it be accurate to view the monitors as physical evidence of your commitment to testing?
LG: There’s a well-known quote that trying to get a perfect product through testing alone is as futile as waiting for 1,000 monkeys banging on typewriters to produce Hamlet.
While a lot of our product vision is set by the management team, our goal is to create a culture of testing where no idea gets shot down no matter how outlandish it seems.
And no matter how good an idea is, nothing gets rolled out without testing.
Ideally, we want every idea to be testable.
If you want to successfully create a culture of testing, you need buy-in from the people creating the vision right through to everyone involved in the thought experiment.
2. Develop Goals You Can Test
Each season becomes an opportunity to test out his theories about how to build a better baseball team.
Beyond the baseball diamond, your goals might be converting web traffic into sales or donations depending on the business model in play.
During our discussion, Galea referenced two famous examples of iterative testing in web design:
- President Barack Obama’s 2007 fundraising campaign, where simple changes to images and button design on his website helped convert visitors into an additional $60 million in campaign donations.
- The evolution of Amazon.com’s Add to Shopping Cart button, which has helped to make them an industry leader in conversion rates.
While the Avid team does run tests to determine which buttons, colours, and other design features prompt the greatest user response, they also try to pose deeper questions about site experiences by addressing user behaviour.
Luke Galea: Ashley Madison launched over 10 years ago when email, not texting, was the go-to method for talking to someone online. Many email conventions still exist in our interface, such as having to have a subject line to send a message.
If we want to test whether people want a text-based interface versus the traditional email model, we can’t just suppress one option over another like you would for a button colour.
We have to change the code and assign our users to two separate groups, query our own data and then do the analysis.
Elizabeth Monier-Williams: What kind of outcome would you look for in those instances?
LG: Anything that increases the number of valuable users in our subscriber base — people who apply more often to meet up with other users, send and respond to messages, spend more time on the site and generally demonstrate deeper engagement.
Facebook operates the same way. Your experience with their site might be totally different from mine since their development team hives users off into different groups to test out new ideas.
Their timeline feature is a good example of a test that some users saw and others didn’t until it became a standard site feature.
When you’re designing a test, keep your ultimate goals foremost in your mind.
Where possible, try to design situations that speak more to user behaviour than surface-level design questions. Their payoff will be greater in the long run.
3. Don’t Assume You Know What Your Users Want
Their somewhat irrational criteria for selecting promising ball players are steeped in tradition and anecdotal experience.
When confronted by Beane and his data-crunching associates, their responses range from incredulity to indignation. After all, they know baseball.
If you run a dating website, many of the same irrational assumptions can impair your assessment of user behaviour.
After all, dating is a common life experience. When you’ve done something yourself, it’s hard not to make assumptions based on that experience — or your cultural biases — and view your perspective as the norm.
As Wikipedia notes, their blog uses:
statistical observations from OkCupid user interactions to explore the data side of the online dating world.
The articles are fascinating, if racy in places.
Elizabeth Monier-Williams: I had a look at the OKCupid blog. Do you find similar patterns among your users?
Luke Galea: Yeah, there’s a lot of research about why people cheat and about dating websites in general. Our best users, from a photo perspective, are women in soccer uniforms photographed on a field. That example flies in the face of the rhetoric around glam or beauty headshots, but it makes sense.
Seeing someone in a sports uniform breaks the ice. It tells you something real about that person and her interests. That makes her more approachable.
EMW: Do photos play a big part in how people connect on your sites?
LG: Not really. Unlike most dating sites, we don’t require photos. We do make it easy for users to privately share photos with people they want to connect with.
Generally, we promote profiles with public photos to first-time users. We want them to have a good experience their first time out and if they’ve used other dating sites, they may expect to see photos.
Our long-term users tend to rely on content in the text-based profiles to find people who interest them.
EMW: That contradicts the general expectation I’ve seen in media coverage of Ashley Madison and the like — that they’re sites purely driven by sex.
LG: We try to avoid the trap of deciding for our users.
On Cougar Life, we had an algorithm that attempted to match potential partners. It was similar in concept to the code that sites like Goodreads use to suggest books you might like based on your preferences.
We knew the suggestions were good based on the data, but we found people didn’t want to be told who to consider, even if the software was good at predicting successful matches.
EMW: Have there been other times your gut read was completely wrong?
When we moved to the Shush version the site uses now, it originally had the same neutral palette. Then our creative director proposed trying it in crazy pink just to see what would happen:
We hated it, but we put it out there and tested the idea.
Our users loved it. We were all totally wrong.
Since then, we’ve tested regional variants to see if users would respond better to a locally-customized experience.
Our users hated it. The numbers and qualitative feedback demanded that we go back to the hot pink shush version.
EMW: Why do you think your users are resistant to change?
LG: Ten years is a long time on the web. We have over 10 million users now. Any major change we make, such as to the landing page, has to manage their existing expectations. The pink is now perceived as part of our brand.
Designs that might appeal to new users, and the comparatively small revenue they bring, generally aren’t worth outweighing the long-term community’s preferences.
It might be impossible to completely eliminate bias from your perception of your user base, but testing will help you to stay honest with users and make decisions that respect their needs.
4. Have the Guts to Follow Your Data
The team is losing. The manager won’t use the players on the field as Beane intended. The fans are getting impatient.
The scouts he has already fired are doing interviews with anyone who’ll hold a microphone to their mouths, claiming he’s a fool with no business running a major-league team.
It’s a watershed moment where conceding defeat would be the easiest decision.
Elizabeth Monier-Williams: You mentioned when we first started talking about Moneyball and web development that the scene where Beane watches his club imploding is the one you relate to most.
Luke Galea: Yeah, if you make decisions based on data, they often tell you things that feel counter intuitive or flat out wrong. Picking an option, collapsing the test and sticking with the decision as site traffic drastically plummets is really hard.
That period of failure is inevitable. If we run a test based on two or three weeks of data, you can almost guarantee the traffic on Days 1, 2 and 3 will be pretty horrific.
EMW: Can you give me an example?
LG: Pricing is a good one. Let’s say we run a test to adjust cost of a credit package for Ashley Madison [NOTE: All prices used in this example are fictitious.] If the price is $50 a month and we raise it to $70, our revenue is going to fall in the first few days. Your gut instinct might be to lower the price instead to say $40 a month. Doing so would fix the short-term results, but might lead to a long-term revenue loss.
The people who use our service want discretion and security for obvious reasons. If our price is too low, they might assume the service we provide would be of comparatively poor quality.
Committing to a data-driven decision process requires considerable resolve — and resisting the temptation to tweak your results partway through is crucial to running an accurate test.
5. Your Data’s Only as Good as Your Sample Size
Beane and his team were using over 90 years of full baseball seasons as the norm against which to test their data.
Statistical variations might be no big deal to a mathematician, but they can produce some wild results in the here and now — such as the 20-game winning streak the team later enjoyed.
In the case of the web, you might be lucky to have one year of good baseline data, let alone 10.
Elizabeth Monier-Williams: In terms of sample size, what do you need to run the kind of testing Avid conducts?
Luke Galea: You need to get decisive results in a period of time that still allows you to nimbly respond to market demand.
Generally, each possible model you’re testing needs 500 to 1,000 successes for the math to work properly. Big sites that get lots of traffic and that can afford a long testing window lend themselves to this kind of testing in a way that internal products (such as accounting software used by 200 people in a finance department) generally don’t.
The more drastic the change, however, the less data you need. For example, if you’re designing a button, it’s easier to test a red option against a green option than light purple against dark purple.
We also find that data gathered during some times of the year can be a total write-off. Anything we collect about our users’ behaviour during Thanksgiving or the Christmas holidays will probably not hold true in January.
It’s also important to remember that what works in one market won’t necessarily translate to another.
Galea notes that when Avid brought Cougar Life into the United Kingdom, the cost of a credit package was simply the Canadian price converted into the more expensive British pounds.
Over time, Avid’s data indicated that this method didn’t correlate with UK users’ expectations for the buying power of their currency.
They expected to pay more for Cougar Life’s service and were mistrustful of the low fee.
The price was subsequently increased.
“What you have to remember,” Galea notes, “Is that all sites are suboptimal. You constantly need to build and test your vision. If you start with something small and simple, it’s easiest to refine as you go.”
6. Work within Your Limits
The testing the Avid team engage in across their websites is hard to deploy on sites that lack the necessary traffic volume.
Enterprise organizations may also lack the personnel to do customized analysis. The Avid team includes product design, QA, development, creative design and analysis, along with an in-house mathematician.
“Those tools are dead simple to implement,” Galea says. “You don’t even have to write code.”
The core exercise of testing is scalable whether you have 10 users or 10 million — all you’re doing is leveraging data to design a better experience for your users.
Getting into a testing mindset feels daunting if it’s not something you or your organization have done in the past.
My biggest takeaway form this conversation, however, is that testing is the way that web development is moving.
My guess is that familiarity with testing conventions and practices, at least in theory, will be a necessary skill set for anyone involved in creating or running a web-driven user experience.
I’d like to thank Luke Galea for his generosity with his time and for providing the Avid Life Media screenshots used in this post.