Rethinking Recommendation Engines

Over two years ago, Netflix announced a Recommendation Engine contest - anyone who invents an algorithm that does 10% better than their current recommendation system will win $1 Million dollars. Many research teams raced to attack the problem, excited by the unprecedented amount of data available. Initially quite a lot of progress was made, but then slowly the progress stalled and now teams are stuck at around the 8.5% improvement mark.

In this post we argue that the improvement in recommendation engines is not an algorithmic problem, but rather a presentation issue. Respinning recommendations as filters and delivering them without setting high expectations is more likely to yield progress than crunching more data faster.

Building a recommendation engine is a complex endeavor, which we discussed here a year ago. But in addition to being a technical challenge, there are also fundamental psychological questions: do people want recommendations and if so, then when are they open to them? Perhaps an even bigger question is: what happens when the user receives one or more bad recommendations? How tolerant will they be?

Genetics of Recommendation Engines

All recommendation engines are trying to solve the following problem: given a set of ratings for a particular user, along with those of the whole user base, come up with new items that this user will like. There are many algorithms that can be applied to the problem, but all of them focus on three elements: personal, social and fundamental:

Personalized recommendation - recommend things based on the individual's past behavior
Social recommendation - recommend things based on the past behavior of similar users
Item recommendation - recommend things based on the item itself
A combination of the three approaches above

A social recommendation is also known as collaborative filtering - people who liked X also like Y. For example, people who liked Lord of The Rings are likely to enjoy Eragon and The Chronicles of Narnia. The problem with this approach is that peoples tastes do not in reality fall into simple categories. If two people share the same taste in fantasy movies, it does not mean that they will also both like dramas or mysteries. A good way to think about this problem comes from genetics. Many times we meet people who have features that we recognize and have seen in others. For example, eyes might look familiar, or lips, but it is a totally different person.

The other kind of recommendation is an item-based recommendation. The best example of this system is the Pandora music recommendation service. It works by ranking each musical piece by more than 400 different characteristic - musical genes. It then automatically matches the pieces based on these characteristics. There are challenges with tuning the algorithm to work well, but it is also challenging to apply it to other verticals. For movies, for example, you'd need to come up with ranking each movie along many scales, starting from director, cast, plot; and then obscure things like musical score, locations, light, camera work, etc. It certainly can be done, but this is complicated.

The Guy In The Garage

The complexity of the recommendation problem is due to its vast space of possibilities. Much like it's hard to figure out which exact gene is responsible for a particular human trait, it is hard to figure out which bits of the movie or music make us rate it as 5 stars. Reverse engineering human thinking is hard. Which is exactly why one of the contestants highlighted in the Wired article is relying on a very different trick to make his algorithm work.

Nicknamed Guy In The Garage, Gavin Potter from London is relying on human inertia. Apparently, the rating of the movie depends on the ratings of previous movies that we just saw. For example, if you watch three movies in a row and rate them with 4 stars, and then watch the next one which is slightly better, you will rate it 5. Conversely, if you rate three movies in a row with 1 star, then the same movie that you would otherwise rate as 5 would only get 4 stars from you.

Just when you think that this is not true, you will discover that this algorithm now sits in the 5th place and still is making progress, while other algorithms are spinning. Enhancing formulas with a bit of human psychology is a really good idea and this is where we turn next.

Replacing Recommendations with Filters

How many times has this happened to you: a friend recommended you a movie or a restaurant, so you went there all excited - but ended up disappointed? A lot! It is obvious that hype sets the bar high, increasing the chances of a miss. In math speak, this kind of miss is known as afalse positive. Consider now what would happen if instead of recommending a movie, a friend tells that you are not going to like certain movie, so do not bother renting it.

What bad can come of that? Not much, because likely you are not going to watch it. But even if you do and you like it, you are not going to be experience negative feelings. This example demonstrates the difference between our reaction to a false negative and a false positive. False positives upset us, but false negatives do not. The idea of respinning recommendations as filters is about leveraging this phenomenon.

When Netflix makes recommendations, it sets itself up for a sure failure. Sooner rather than later it is going to miss and recommend you a movie that you are not going to like. What if instead of doing that, it would show you new releases and have a button: filter the ones I am not going to like. The algorithm is the same, but perception is different.

Filters in Real-Time Culture

And this idea becomes increasingly important and powerful in the age of real-time news. We are increasingly oriented towards continuously filtering new information. We do this with our RSS Readers everyday. We think of the world in terms of streams of news, where things of the past are not relevant. We do not need recommendations, because we are already over subscribed. We need noise filters. An algorithm that says: 'hey, you are definitely not going to like that' and hide it.

If the machines can do the work of aggressively throwing information out for us, then we can deal with the rest on our own. Borrowing from the spam box in emails, if all the tools around us had a button that said 'filter this for me', and maybe even had a mode where such a filter is on by default, we'd all to get more things done.

Conclusion

Building a perfect recommendation engine is a very complex task. Regardless of the method, collaborative filtering or inherent properties of things - recommendations are an unforgiving business, where false positives quickly turn users off. Perhaps applying psychology to the problem can make people appreciate what these complex algorithms are doing. If instead of recommending things, machines would filter things we definitely won't like, we might be more forgiving and understanding.

Now tell us please about your experiences with recommendation engines. Were there ones that worked really well? Would you be open to filtering instead of recommendation? Besides movies and news, where would you like to have these filters?

See also our follow-up post 10 Recommended Recommendation Engines.