Abstract
In the last couple of years, short videos have become the new darling of the digital mediascape. After the internet boom in India, many influencers are emerging daily. We all have our favorite creators and can spend hours watching their content. For a platform like ours, we needed a user-creator affinity recommendation model such that we recommend creator stories to users based on the affinity (likeability) factor where a consumer’s (user) likeability for a creator is defined by: Follow, Profile Visit, Like, Comment, Share, etc.
Overview
Affinity means a natural liking for and understanding of someone or something. Affinity is a temporal factor that changes with time and interest niche. Our goal is to capture user-creator affinity strength, which also captures users’ interest niche i.e., what type of stories a consumer (user) prefers more.
Business Goals
- Improve stories recommendation algorithm such that the user’s session time increases
- Improve user niche discovery for content
- Improve visibility of long-tail creators and content discovery based on consumer’s likeability factor.
Expected Outcome
Recommend a list of story ids of creators for whom user-creator affinity is high.
NOTE: A creator is also a user on the platform. Hence, I will address users as consumers who watch a creator’s video.
Interaction between Creator-Consumer on Roposo
Out of these different interactions b/w consumer-creator, we decided to pick profile visit as a stronger signal to map out similarity between creators.
Approach
High-Level Approach
We divided the problem into 2 parts:
- For each consumer (user) find the top K creators based on the Multi-Criteria Decision Making TOPSIS technique for which the True Affinity score is high; where affinity is defined by like, follow, profile visit, comment, loop_count, perc_seen (percentage of video seen based on video duration), etc.
- Post finding these True Top K creators for whom affinity is high based on MCDM. We take the embedding of these creators (embedding computed using node2vec embedding methodology) and find out the nearest neighbors and recommend similar creators.
In Summary*: First, we find out true high-affinity creators for a consumer based on MCDM. Then we find similar creators with respect to high-affinity creators.*
Implementation Details (TLDR) — Refer to the above Approach Figure with Steps
Step 1: Finding True Top Affinity creator for a Consumer (user) from interactions.
This is a multi-criteria decision-making (MCDM) or multi-criteria decision analysis (MCDA) problem as we wanted to rank all creators for a consumer (user) with whom the consumer interacted in the last 30 days.
We use the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) an MCDM algorithm to rank creators in order of affinity. TOPSIS is based on the concept that the chosen alternative should have the shortest geometric distance from the ideal solution and the longest geometric distance from the worst solution.
Scikit Criteria_:_ Link
One can check out my blogs to get a detailed understanding of MCDM: Ranking of entities with Multi-Criteria Decision Making Methods (MCDM) — Part One | Ranking and Selection of the best with Multi-Criteria Decision Making (MCDM) — Part Two
Sum up: Post step1, for every consumer we have a ranked creator set based on affinity factors with whom the consumer has interacted in the last 30 days.
**Step2 & Step3: Creator Graph — Profile Visits to Embedding**We constructed a Creator-Creator graph based on the profile visits of a consumer. Connections between those creators were made for which profile visits by consumers co-occurred on a particular day.
The graph weights were defined by co-occurrence strength (number of times profile visits by a consumer co-occurred).
We computed the creator embeddings based on Paper Link Node2vec+ that uses word2vec skip-gram model.
Node2vec Params: How to set p and q?
The top and bottom panels correspond to the node2vec embedding generated using q = 0.5 and q = 2. One can see that in the top panel, nodes that fall into the same local network neighborhood (i.e., homophily) are colored the same. On the other hand, in the bottom panel, structurally equivalent nodes are colored the same.
Params q=0.5 and p=1 in this setting node2vec discover clusters/communities of characters that frequently interact with each other. Since the edge b/w nodes are based on co-appearances.
Sum up: Post Step2 & Step3, we now have creator embedding computed based on creator-creator graph build based on co-occurrence of profile visit.
**Step4: Recommending Top Creators**Now, we have true Consumer (User)-Creator Affinity Ranked based on MCDM and we have embeddings of all (active) creators on our platform.
We pick the top 5 True Affinity Creators ranked from the MCDM technique and recommend Nearest Neighbours to get the top 100 high-affinity creators.
Why top 5 True Affinity Creators were picked as query vectors? Why not pick the best top 1 or create a mean vector of top 5 creators and show similar creators to the query vectors in embedding space?
Idea of picking top 5 creators is inspired from Pinterest Research Paper PinnerSage.
It is true a user cannot be represented by one particular “interest” embedding.In general even in example of movies everyone shows interests in multiple genres likes honor, action, sci-fi, comedy, etc.To identify user interest we pick top 5 creators from the ranked set.
For vector similarity search we used Approximate Nearest Neighbour Algorithm (ANN) ScaNN over creator embeddings for fast vector similarity search.
Approach Summary (Recap)
- We use MCDM to rank creators for each consumer using the interactive features (affinity-defining features). This ranked set of creators is the True Affinity Creators for a consumer (user).
- Now, we create a creator-creator graph based on profile visits co-occurrence in a session of a consumer.
- In this graph, we apply the random-walk algorithm Node2Vec+ with a set breadth-first search and depth-first search parameters. This gets us a creator vector representation.
- At last, we pick the top 5 creators from the True Affinity set (ranked set based on MCDM) for a consumer and use the creator embeddings to find the top 100 most similar creators from the entire creator. For a fast vector similarity search, we use the ScaNN algorithm.
Stories Recommendation
Our expected outcome is a list of storyid. Hence, from the 100 top affinity creators for each consumer (from the above approach), we pick the latest not watched story of each creator and add it to the recommendation pool of the consumer, stories ranked based on creator similarity score wrt. user’s true creator affinity.
Conclusion
This approach of Topsis MCDM and Node2Vec+ not only ranks creators for a consumer but also helped us to find similarities between creators of the same niche using a profile-visit co-occurrence graph.
Reference
- PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest
- Billion Scale Recommendation at Taobao
- DeepWalk: Online Learning of Social Representations
- Ranking of entities with Multi-Criteria Decision-Making Methods (MCDM) — Part One
- Ranking and Selection of the best with Multi-Criteria Decision Making (MCDM) — Part Two
#bigdata #ai #7wdata #artificialintelligence #cloud #fact #engineering #didyouknow #technology #physics #nasa #space #facts #universe #knowledge #dailyfacts #biology #factz #chemistry #astronomy #education #earth #memes #cosmos #amazing #nature #allfacts #tech #innovation #astrophysics