YOU MUST EAT THIS GRAPH NEWS, GRAPH OMAKASE. 2weeks October
Preview
How Transitive Are Real-World Group Interactions? — Measurement and Reproduction
→ Transitivity, one of the graph metrics, how is it measured in Hypergraph and what aspects can be accessed in the generation model?
A*Net: A Scalable Path-based Reasoning Approach for Knowledge Graphs
→KG Reasoning , applying the shortest path algorithm, which is already an expensive algorithm, to the knowledge graph? We want to solve the scalability problem by applying the A* algorithm to the knowledge graph.
Topological representation learning for e-commerce shopping behaviors
→ Segment customers by changing their shopping paths into a graph? Amazon does this idealized concept but unrealistic scenario and it’s a lot of interesting experiment what i think. Especially, deriving topological similarity: how do you implement an encoder to compare similarity between paths?
How Transitive Are Real-World Group Interactions? — Measurement and Reproduction
[https://arxiv.org/pdf/2306.02358.pdf]
One of the purposes of a generative model is to generate real-world data based on historical data. Today we’re going to talk about a different kind of generative model than the text generative models we’re familiar with, such as ChatGPT, which is a special kind of hypergraph generative model. Some of you may not be familiar with the difference between a graph and a hypergraph, so I’ll summarize it in a nutshell: a graph is a graph connected by edges for ‘node pairs’, while a hypergraph is a graph connected by hyperedges for ‘node groups’. It is a useful data structure for managing and analyzing group interactions because it deals with ‘groups’ 1:N compared to ‘pairs’ 1:1 relationships.
In this paper, before we talk about hypergraphs, we talk about transitivity. Transitivity refers to the possibility that a node is connected to a certain node through an indirect connection, even if the node is not directly connected. While this is intuitive in terms of graphs, transitivity is not intuitive in hypergraphs because there are a lot of cases to consider when various groups are indirectly connected. In this paper, we narrow down the number of cases to a few and use them as a metric to measure transitivity.
In this paper, before we talk about hypergraphs, we talk about transitivity. Transitivity refers to the possibility that a node is connected to a certain node through an indirect connection, even if the node is not directly connected. While this is intuitive in terms of graphs, transitivity is not intuitive in hypergraphs because there are a lot of cases to consider when various groups are indirectly connected. In this paper, we narrow down the number of cases to a few and use them as a metric to measure transitivity.
In this paper, we interpret the results based on the metrics. They are included in the Observation section. The interpretations are very insightful, because the results were all assumed, but the interpretations were difficult to interpret with numbers. If you are applying hypergraphs in research or in the field, I recommend that you take a look at this part. Even if you don’t use the generative model directly, I think it will help you a lot when you extract and interpret the statistics of your data.
In addition, two other reasons why our generation model can outperform other hypergraph generation models are community and hierarchy. Part of the fun of this paper is seeing how we designed it and how we approached and solved the computational amplification issues that arose while designing it.
A*Net: A Scalable Path-based Reasoning Approach for Knowledge Graphs
[https://arxiv.org/pdf/2206.04798.pdf]
Knowledge Graph Reasoning, one of several alternatives to large language model hallucinations. By exploring the paths between the nodes of the knowledge graph, it extracts the paths that can help the answer and presents them as the basis for the answer. Since the designed knowledge graph is data that converts facts and knowledge into graph form, the paths themselves are existing facts. In other words, KGR’s biggest appeal is that it provides data-based evidence for answers. However, there are also problems.
The first is the question of how to extract the paths that are relevant to the answer, and how to efficiently present them if there are many paths extracted. The second problem is that even if the answer-related knowledge is in the data, if there are other relationships or nodes between two nodes (nodes involved in a path), they are not included in the query, so they will be omitted and not included in the path and presented as an answer. We designed this algorithm to solve the two problems mentioned above: 1. Calculating the importance of a path, and 2. Finding additional paths to compensate for cases where the answer is related but has no intermediate stops.
We utilize the A* algorithm as a complement to the Bellman Ford algorithm, which is popularly known as the Shortest Path (SP) algorithm. The important point here is that the graph data applied in the existing SP is in the form of a homogeneous graph, but in this paper, it is applied to a knowledge graph. Therefore, the complexity and computation amount are considerable compared to the existing homogeneous graph because SP must be computed on a heterogeneous graph. How to handle this efficiently is one of the interesting points of this paper.
The algorithm claimed in the paper utilizes a distance function and a heuristic function to find the SP. The distance function is a function that measures the distance between the current node and the goal node. The SP algorithm is also an NP-hard task, so the key is to get the optimal answer in a given case, so the question is how to get the optimal answer. Here, we utilize a heuristic function.
In the paper, it is called Neural Priority Function. In a nutshell, you can understand it by setting a weight for each path and synthesizing the values from the iterative traversal. In addition, there are many trials and errors, such as how to set up negative sampling when there is no ground truth, and it is very interesting to solve them. The book is well written and easy to read for those who have read or studied SP algorithms and reasoning.
It is noteworthy that it outperformed other knowledge graph embeddings and GNNs. Another way to interpret this is that topology is still important in reasoning.
Topological representation learning for e-commerce shopping behaviors
This is a great paper from Amazon on something I’ve been thinking about for a while: how to graphically represent a user’s shopping behavior and use it to make graphical inferences about customer segments and what they’ll buy at the end of their shopping trip. I’m sure we’ve all seen graph embeddings before, but I was really impressed by the distributional approach.
I remember working on a side project on user anomaly detection using the same graph modeling and subgraph embeddings (graph2vec). I analyzed the results by assuming that a large difference in embeddings between days indicates a new behavior, and there were some interesting results. I was impressed by the more granular approach to distributional differences here, whereas I was simply focusing on “differences”.
You can see that it reflects both Topology Semantic and Topology Similarity. To compare Topology Similarity, we need to compare the difference between the distributions. To do this, we utilize Optimal Transport Theory and Wasserstein distance. This is where it gets really interesting because we’re using a combination of the two concepts above on how we’re going to measure the distributions, how we’re going to manage the distributions to extract the similarity, and how we’re going to do the optimization here.
The Ablation study also mentions how the results change w/o the above factors. The main thing to note is the large drop in TSE (similarity) compared to KE (features). In other words, it’s an interesting experiment that indirectly confirms that topology matters.
I read the paper in a single sitting because it covered an area that I was personally very interested in. It’s because this paper broadened my horizons as I was only approaching topology similarity as path similarity, which was the part that I was worried about when meeting and talking with clients, 1. adding references that can be presented as utilized by other companies, and 2. approaching topology similarity as path similarity. I recommend this paper because I think it will help you a lot if you are thinking about something similar to me (customer 360 + graph)!
omakase chef linkedin :https://www.linkedin.com/in/ii-tae-jeong/