YOU MUST EAT THIS GRAPH NEWS, GRAPH OMAKASE. 4weeks June

Jeong Yitae
6 min readJun 26, 2023

--

Structural Patterns and Generative Models of Real-world Hypergraphs

[https://arxiv.org/pdf/2006.07060.pdf]

Introduction

Hyper Preferential Attachment (Hyper Preferential Attachment), to briefly explain the preferred attachment, the theory is that nodes with many degrees are more likely to connect when a new node occurs. To compare it to a social network, users with many followers are more likely to be connected because they are exposed more frequently than users with relatively few followers.

Since the existing generator models did not consider group-interaction when setting the null-model, let’s consider the null-model generator in this paper by considering group-interaction! That is the motivation of this paper. The argument for that motivation is the aforementioned PA theory.

But how do we consider group-interaction through that generator? It’s through multi-level composition. Fine-grained group-interaction and look at how much information-loss will be.

Preliminary

[Measurement for information loss of transforming hypergraph to graph]

  1. Giant Connected Component: In a network, the giant connected component refers to the largest connected subgraph that contains a significant portion of the nodes. It represents a cohesive and well-connected part of the network. The presence of a giant connected component indicates that a large portion of the nodes in the network are interconnected, fostering robustness and efficient information flow.
  2. Heavy-Tailed Degree Distribution: The degree distribution of a network describes the frequency of nodes with different numbers of connections (degrees). A heavy-tailed degree distribution means that there are a few nodes with an extremely high number of connections, while the majority of nodes have relatively few connections. This distribution is characterized by a long tail in the degree distribution plot.
  3. Small Effective Diameter: The effective diameter of a network measures the average shortest path length between pairs of nodes, considering only the paths that exist. A small effective diameter indicates that most nodes in the network are relatively close to each other, facilitating efficient communication and information diffusion. It implies that it takes only a few steps or hops to reach any other node in the network.
  4. High Clustering Coefficient: The clustering coefficient of a network quantifies the extent to which nodes in the network tend to form clusters or tightly interconnected groups. A high clustering coefficient suggests that nodes in the network have a tendency to form local neighborhoods or communities, where nodes are densely connected to their neighbors. It indicates the presence of cohesive substructures within the network and promotes resilience, information propagation, and the formation of specialized communities.
  5. Skewed Singular Values: In the context of network analysis, the singular values are derived from the adjacency matrix or the graph Laplacian matrix of the network. Skewed singular values suggest that the network has a heterogeneous structure, where some singular vectors (associated with large singular values) are more significant than others. This implies that certain patterns or structures within the network are dominant, while others are less prominent. Skewed singular values can provide insights into the connectivity patterns and the importance of different components or subgraphs within the network.

Summary

Decomposition method

Utilize the decomposition technique for measuring information loss in hypergraph.We separate them into 4-level cliques, 1-level, 2-level, 3-level, and 4-level, and see how much measurement changes accordingly. In this case, the comparison target is real-world data set.

The real-world hypergraph, too, is derived by applying a decomposition. In the end, the higher the similarity to the value from the real-world, the better the performance of the proposed idea.

The idea of this paper is simple. Existing hyperedge
Determines whether new hypercomponent inventory is established based on the sample hyperedge distribution derived by sampling the distiribution. This will allow generation to take into account the Group Preferential Attachment mentioned earlier.

Insight

The excellence is verified with naivePA, subset sampling as the baseline. The performance of the models is graded through the total score in the last row of the experimental table. In the end, if you look at generation from a group interaction perspective, you can see that you can form a network closer to the real-world.

I’ll leave you with a good site to get a general idea of hypergraph learning. Thanks to the neat explanation and data, I also took time to look at this site and re-establish my concept, which is so nice.

[https://www.jiajunfan.com/hl/main.html]

Evolution of Real-world Hypergraphs: Patterns and Models without Oracles

[https://arxiv.org/pdf/2008.12729.pdf]

Introduction

Hypergraph Forest Fire (Hypergraph Forest Fire), this paper is proposed to compensate for the strong dependence on external factors, which are disadvantages of HyperPA. In the previous HyperPA, when generating, new nodes were dependent on a specific group of trees, but in this paper, the dependence was reduced by self-burning and expanding.

Keywords that differ from the previous paper HyperPA are: 1. Plus take into dynamic patterns 2. self-contained for dimming of oracles function.

Preliminary

[Goodness of Fit. , For evaluation empricial dataset to network science manner]

It is difficult to argue that an empirical dataset genuinely follows a target probability distribution, since other statistical models unexamined yet may describe the dataset with a better fit. Thus, it is more sound to rely on comparative tests which compare the goodness of fit of candidate distributions . We especially utilize the log likelihood-ratio test to this end. When a dataset D with two candidate distributions A and B is given, the test computes $$log(L_{A}(D) / L_{B}(D)) , where LA(D) stands for the likelihood of the dataset D with respect to the candidate distribution A. Positive ratios imply the distribution A is a better description available for the dataset between the two distributions, and negative ratios imply the opposite.

Summary

As I explained earlier, we talk about a more rigorous hypergraph generator model, including dynamic pattern creatia, which we did not apply in our previous paper, HyperPA. The following elements have been added.

(T1) Measure how often the hyperedge overlapping frequency decreases over time, that is, how much the intersection decreases.
(T2) Measure how much hyperedge increases compared to node over time. An example is an element, such as a dimension.
(T3) Measure how much the distance between nodes decreases over time. For example, elements such as shrinking diameters.

The generator performance is measured through structural, dynamic pattern creata with the newly included dynamic pattern.

In the previous study (HyperPA), when generating New Hyperedge, it was conducted by referring to the existing hyperedge distribution. Therefore, there was a limitation that the criteria were highly dependent on randomly selected hyperedge. To overcome that limitation, this paper proposes a method of measuring and expanding similarity based on bruning and its results.

Insight

If you look at experiment, there are a lot of fitted lines that I mentioned in Preprimary. It is often used to verify the excellence of the generator model because it is possible to indirectly check how valid the distribution is through the fitted line.

If you approach it in terms of network robustness, I think there will be a lot of fun insights. For example, in the case of a communication network, it is important to identify and prevent problematic factors in advance, not to take action after a problem occurs.

There is so much randomness about what will appear after death. Therefore, we adopt a method of reducing the uncertainty as much as possible. To reduce that uncertainty, we must first manage which factors are most important in the network and how vulnerable they are.

You need simulation to manage that. At this time, I think the idea of this paper is a method that can be used frequently in the simulation process. I think the insight is more important because it approaches from the perspective of group interaction beyond the pair-to-pair graph.

Graph Machine Learning Explainability with PyG

[https://medium.com/@pytorch_geometric/graph-machine-learning-explainability-with-pyg-ff13cffc23c2]

In order to consider both implementation and thesis trend follow-up factors, I periodically take time to refresh the official tutorial provided by PyG through medium in addition to Omakase. Among them, the post I introduced you to this time is a posting that is well explained about XAI.

This is a great post for those of you who used GNN and had an interpellation issue about why it was good and what node, edge feature made it better.

--

--

Jeong Yitae
Jeong Yitae

Written by Jeong Yitae

Linkedin : jeongyitae I'm the graph and network data enthusiast from hardware to software(application)

No responses yet