深层解析:构建facebook应用商店推荐引擎

Under the Hood: Building the App Center recommendation engine
 

As more apps on Facebook Platform have launched over the years, the types of apps available have become more diverse, making it crucial that people see the most relevant and highest quality apps in channels like news feed and App Center. 

近几年,随着越来越多的app在facebook应用商店上发布,app的类型越来越多样化,用户通过像news feed( 周边朋友圈动态 )和app center这样的渠道来发现高相关和高质量的应用,这是非常重要的。

 

While news feed has always functioned as a recommendation engine, the App Center is the latest way for people to discover apps, and it's increasingly becoming a prominent channel for developers to distribute their apps. On average, 220 million people visit the App Center each month, and those visitors are 40% more likely to return the next day.

虽然news feed总是被定义为一个推荐引擎,但是app center是发现app的最新方式。并且,它日益成为开发者发布他们的app的主导型平台。平均而言,每个月有2.2亿独立用户访问app center,其中40%的用户会在第二天再次光顾。

 

We built the App Center to give the growing audience of app users a central place on Facebook to browse apps. However, given the multitude of apps that use Facebook, recommending the right apps to the right people is a tough challenge. We needed to build a system that could handle large-scale data and traffic, respond quickly, and incorporate user feedback in realtime.

我们建立了应用中心让越来越多的apps用户在facebook的中心位置浏览apps,然而,考虑到大量的用户和apps,推荐正确的apps到正确的用户是艰难的挑战。我们需要构建一个系统,它可以应对大规模的数据,请求,快速响应,并实时根据用户反馈来调整推荐结果。

 

The goal is for curation of the App Center to be driven by quality and personalization, instead of editorialization. Just as with news feed, personalization in App Center will improve over time as people and their friends engage with more apps. 

我们的目标是以质量和个性化来驱动,而不是带有主观意识的编辑驱动。正如news feed,应用商店的个性化会随着用户和用户的朋友产生更多的浏览,下载app等行为增多,推荐效果将会提升。

 

Building a recommendation engine

To efficiently solve this problem, we built a recommendation engine directly into App Center, so that, just as with news feed, each person would have a personalized experience. The recommendation engine powers the App Center and helps it learn people’s preferences in order to serve them with app recommendations that are timely, socially relevant, and unique to them. This allows a more diverse set of apps to become discoverable, particularly those in harder to find or up-and-coming categories. 

为了有效地解决这个问题,我们直接在应用商店建立了推荐引擎,就像news feed,将会给每个人带来个性化体验。推荐引擎支撑着应用商店,并且帮助应用商店学习用户的偏好,进而提供更及时的,社交相关的,独一无二的服务体验。这样,使得多样的app更容易被发现,尤其是那些难以发现尾部应用和刚刚展露头角优质的新应用。

 

The system follows an aggregator-leaf architecture—very similar to that of a search engine. Because we have a lot of data, it is necessary to partition the objects into multiple subsets (shards) where each leaf node is only responsible for one subset. The aggregator acts as a central controller, receiving the recommendation request from the front end web server and distributing to leaf nodes. Each leaf node then finds a set of best candidates from the objects stored on the local machine and returns them to the aggregator. The aggregator then performs a final merge and returns the best results to the client. 

这个App推荐系统由收集叶节点(aggregator-leaf)概念构成,跟搜索引擎相似,它处理大量资料,并将物件分成各种子集,让每一个叶节点只负责一个子集,而收集器扮演中央控制角色,接收前端网络服务器的推荐请求(recommendation request),再分布到各个叶节点。每一个叶节点会从本机找到一组最适合的推荐候选,再回传给收集器,收集器整合之后将结果显示给客户端。

 

After that, the frontend collects user feedback, which is then integrated into the app recommendation engine. We scale this system in two ways: The first is to increase the number of shards so that we can handle more data. The second way is to have multiple replicas so that we can handle more traffic. Using replicas also adds redundancy to the system, which allows us to tolerate the failure of some machines. 

接着前端搜集使用者反馈,再整合进App推荐引擎。我们通过2种方式扩张:1)增加shards来处理更多的数据;2)增加更多副本来响应更多的请求。增加副本需要增加redundancy,允许用户忍受针对某些机器请求的失败。

 

Determining high quality 

Growth in the App Center is tied to quality, and we determine that quality based on user ratings and positive/negative user signals for an app over time. 

应用商店的发展壮大离不开app的质量,而质量的又由用户评分和用户积极和消极的信号来决定。

 

In order to accurately measure quality, we developed a system that randomly surveys the user to rate an app shortly after we detect that the user has used the app. Then, when we compute the average rating for an app, we include a confidence adjustment based on the number of ratings the app has received.  

为了准确的评估应用的质量,我们开发了一个系统来随机的对刚刚用过这个app的用户进行调查,让用户给应用打分。然后,我们在考虑打分次数基础上(比如威尔逊区间)计算应用的平均得分。

 

We found that the number of daily active users (i.e. the average number of users who used the app in a day) was a good measure of how large the app is, while the number of monthly active users could be inflated by spikes of activity during the month. So we settled on a formula for app quality that is primarily a function of its average rating as well as average daily active users. 

我们发现,每日活跃用户的数量是证明app的数目是非常大的。而每月活跃用户数由于在每月的峰值夸大影响。所以,我们设计了一个公式来计算app的质量平均得分和平均活跃用户。

 

Algorithmic elements

From the algorithmic point of view, the App Center recommendation system has three major elements: candidate selection, scoring and ranking, and real-time updates. 

从算法角度来看,App中心推荐系统的主要有三大部分:候补选择(candidate selection)、评分和排名(scoring and ranking)和即时更新(real-time updates)。

 

The key to candidate selection is efficiency and high recall. We use several heuristics to choose promising candidates, the first being the selection of popular items based on a user’s demographic information. The second heuristic we use is the selection of social items, because we believe that people are generally interested in their friends’ activities. The third heuristic is to select items related to objects liked or interacted with by the user in the past.

候补选择的关键时速度高效和高覆盖,我们用了多种算法来选择有潜力有前途的备选app,候补选择的第一机制是依据使用者的地理人口资讯来筛选,第二是依据朋友动态和使用情况的社群资讯,再者是依据使用者过去按赞或互动纪录的行为。

 

Once we obtain a set of candidates, we fetch their features from local storage and calculate ranking scores for them. A good scoring function should be able to capture high order interactions with three types of features. 

 一旦我们得到了一组候选app,我们从本地存储读取它们的特征,然后计算排名得分。一个很好的得分函数应该能够捕捉到至少三种类型的特点在高阶上的相互作用。

 

The first type is explicit features we can obtain directly, like demographic information about the user. The second type is dynamic features such as number of likes and impressions for objects. The third type—learned latent features—is more interesting. These features are learned from the user-object interaction history, which can capture user preference and object flavor.  

第一种特征是显示的,我们可以直接获得,比如用户地理信息;第二种特征是动态的,比如喜欢的次数;第三种特征是潜在的特征,这个非常有趣,这些特征是从用户在应用商店中交互行为日志,我们可以从这些日志中分析出用户的喜好和app的“口味"。

 

The underlying principle of learning latent features is low-rank approximation of matrix. The basic problem is to find out the values of missing entries for the user object response matrix.  The idea is to approximate the response matrix using the product of two low-rank matrices. Each row of matrix U is the latent representation of a user and captures the intrinsic taste of a user. Each column of matrix O is the latent representation of an object. It reflects the flavor of that object. The dot product between these two vectors is the predicted response from the user to the object. 

学习潜在特征的基本原理是矩阵的低秩逼近,就是矩阵分解。最根本的问题是要找出用户app交互行为矩阵中失踪元素的值。我们的想法是使用两种低秩矩阵的乘积来近似响应矩阵。矩阵U的每一行是一个用户的潜在表示和捕捉用户的固有的喜好。矩阵O的每个列是一个对象的潜在表示,它反映了物体的口味。这两个矢量之间的点积用来预测用户和对象之间的关系。 

 

Remember, we have more than 950 million users, and even more objects. Our matrix is huge, and the major challenge is how to learn the latent features efficiently. We developed algorithms to compute the latent traits given the huge amount of historical data and update them in real-time as new user feedback comes in. 

请记住,我们有超过950万用户,甚至更多的app应用。我们的矩阵是巨大的,主要的挑战是如何有效地学习潜在的特征。我们开发了算法来计算给出的大量历史数据的潜在特征,并通过实时的反馈来更新潜在特征。

 

This ability to do real-time updates as new objects and events come in is one of the most important features of recommending the best apps for people. When feedback comes in, we need to do several things. One is to update the index so that new objects will be available for candidate selection. The new actions from each user are added to the index in real-time so that friends’ activities are immediately available for recommendation. We also update the user history so that we can make recommendations based on user’s latest activities. The dynamic features are also updated so that the current counts for shares, likes, impressions can be accurately used for scoring. The latent features are also updated in real-time, so that the system can learn user taste and object flavor based on latest activities.  

随着新的app发布,以及用户和app的交互以及反馈,实时的更新的能力是推荐引擎非常重要的一个特征。当反馈消息传递过来后,我们需要做几件事情。第一是更新索引,使得新的对象将可用于候选选择。朋友新产生的动作被索引来更及时的推荐,我们同时更新用户的行为记录以便逆用用户最近的行为计算推荐。动态的特征也需要及时的更新比如分享次数,点赞数等,用来计算得分。同时,潜在的特征也会被及时更新。

 

The App Center has been available to people worldwide since August 1, 2012, and we will continue to make updates, such as the recently launched My Apps page, as we build a personalized App Center and app recommendation service for each person on Facebook. 

app应用商店自从2012年8月1号上线后,我们持续的更新,比如我们最近上线的我的app主页,提供了个性化的界面以及推荐服务。

 

Wei Xu, Xin Liu, TR Vishwanath, and the open graph engineering team all worked together to integrate the recommendation engine and App Center.

 
原文地址:https://www.cnblogs.com/banli/p/3462368.html