YOU MUST EAT THIS GRAPH NEWS, GRAPH OMAKASE. 3 weeks April
Twitter Recommendation algorithm
[https://github.com/twitter/the-algorithm]
This is a feature of the Twitter graph algorithm. To view the summary of the summary
[https://newsletter.theaiedge.io/p/how-twitter-and-tiktok-recommend ] I recommend a post.
I opened an algorithm on Twitter. After considering what this case implies, I came to two conclusions.
1. [ ESG in the SNS] An authentic appeal to provide valuable and diverse information to users through algorithm opening on SNS, which may cause bias and distortion of information. 2. [Graph can analysis in product] ‘Can Graph work in product?’ A case where customers can clearly answer and refer to questions.
I think this parking will be especially helpful for those who are curious about the results recommended to me when using Twitter and for DBA who are thinking about architecture to use graphs in the actual field.
GraphJet: Real-Time Content Recommendations at Twitter
[http://www.vldb.org/pvldb/vol9/p1281-sharma.pdf]
Summary
What comes to your mind when you think of Twitter? I think there are many factors, but I think it is definitely real-time information exchange of information. Since information must be delivered up to 280 characters per tweet, of course, only the core information must be written in a compact manner to be more exposed to people. So, after all, how does Twitter process this “real-time” well and recommend it to users? We face the problem of that. Contrary to my expectation that it will be very complex and a lot of resources, Twitter solves it in the form of an API. Of course, I think we could have achieved API all-in-one because there were previous attempts (DB, HDFS, etc.).
I’m very interested in graph recommendations, so I focused on how Storage engine → Recommendation engine is organized. To give you a brief description, 1. Model the graph in the form of a bipartite (user-tweet) and use analytics (e.g., algorithm-walk) to score each node. 2. And load the results onto the Storage Engine. 3. The loaded information is thoroughly optimized through index by index.(e.g., edge deletion after a certain period of time)
Surprisingly, these configurations were conceived in 2016, almost seven years ago. Of course, we’re still using this architecture, but there’s a lot more to it. However, it seemed to me that the trial and error and the complaints of effort, such as trying in many ways to devise an architecture and why the attempt was not adopted..Haha) I also think it’s well-melt in the paper, so I think it’ll give fun to those who are curious about the history of Twitter algorithms.
Insight
It is well expressed how we approached to fuse batch processing and real-time processing well, so I recommend you focus on this.
SimClusters: Community-based Representations for Heterogeneous Recommendations at Twitter
[https://github.com/twitter/the-algorithm/tree/main/src/scala/com/twitter/simclusters_v2]
Summary
Is the community deteciton algorithm useful for actual recommendations? This paper answers the question. Since the existing matrix factorization method has a computational optimization problem, it is a paper that attempts to reduce the amount of computation by combining simplicity search and community detection to solve the problem. As a result, the speed improvement was 10 to 100 times and the performance improvement was 3 to 4 times.
Recommendations are made on the bipartite graph just like graphjet. (user-user) and (user-item) utilize two bipartite graphs. Unlike before, just because A followed B, B designed the graph as directed graph using Twitter SNS characteristics that do not automatically follow A.
In this way, the modeled graph is extracted from the candidate groups to be recommended to the user through the next three steps. 1.similarity graph extraction You can simply view it as a projeciton graph, 1-mode graph. 2. Apply the community value extracted from community detail into similarity graph 3.2 as user feature.
In addition, hashtag and sweet content of tweets updated in real time are considered items, and communities are derived from the user-item bipartite graph and applied to recommendations similar to the previous process. However, the difference from user-user is that the relationship between user-user was assumed to be Long-term relationship, and user-item assumed short-term relationship, so an aggregate function (decay) was applied additionally to the cycle of updating it.
Insight
Each and every one has a gem of insights, so there are so many good insights that it’s hard to say hastily. For example, it is recommended to select the number of communities that can be seen as important hyperparameters in the community detection algorithm, how to overcome matrix factorization, and recommend various applications such as home page exposure detail page topical ranging.
Unlike the existing recommendation system (black box), the recommendation is carried out through the community detection result that it belongs to ‘results’ communities and derived ‘results’ recommendation results, so it is possible to interpret the results. That was a very interesting paper.
TWHIN : Dense knowledge graph embeddings for Users and Tweets.
kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval
[https://arxiv.org/pdf/2205.06205.pdf]
Summary
Unlike previous papers, it has a relatively simple structure. It has the operating principle of extracting similar users through approximate neighbor (ANN), analyzing which items they interact with, and using the information additionally to provide papers.
Through these processes, the distribution of various user-item distributions is mixed, which improves performance in terms of diversity and Recall indicators.
Insight
If you emphasize similar, similar, you will eventually encounter the problem of diversity. In other words, I would like to receive recommendations for new items, but there is a problem that only similar items are recommended because the recommendation is carried out based on the characteristics of similar users. This can also be a big problem for the user experience.
If you think about it again, it’s very interesting that ANN has improved this part, which is also a unique problem in relation-based recommendations, graph recommendations. With this in mind, why don’t you focus on the smooth texture method that played a big role in improving performance.
coffee chat With Katanagraph’s Hadi (Hadi & Hardy)
We had an online coffee chat with Hadi, who works for Katanograph, on the topic of how to deliver graph technology to customers early in the weekend. Since I work as a software engineer at Katana-graph, a platform that can be used as all-in-one from Data Pipeline to graph analytics and graph embedding and prediction, my job character was very similar to mine. I think that’s why we were able to talk without interruption for more than an hour.
The difference is that we had slightly different opinions, such as algorithm first vs. problem definition (graph modeling) first, on-premise vs. cloud and data injection is important vs. model customizing is important, and I think it was a very good time to talk about each other’s views. As such, if you are studying graphs or using them in the field, you feel buried in your own perspective, so if you have any needs for a new perspective, please feel free to contact me! Welcome :)
thanks to read my posting . if you have a question in this posting or any question about graph techonology , so connect me ! https://www.linkedin.com/in/ii-tae-jeong/