YOU MUST EAT THIS GRAPH NEWS , Graph Omakase / 4th Week of Dec

Jeong Yitae
7 min readDec 24, 2022

--

Today’s Network Knowledge
Connected components

Before entering, do you know about the difference between subgraph and connected component?

(original content)One standard way to categorize graphs is as connected or disconnected. A disconnected graph can be decomposed into a series of graphs that are not connected to each other. I would refer to one of those as a component. A subgraph on the other hand is a subset of vertices of the original graph along with a subset of edges. reference

Reachability is the key. Furthermore, the connected graph can be called connected, disconnected graph depending on whether the nodes in the graph can arrive through the edge. However, think of subgraph as a graph extracted as a condition ___ whether it belongs to a particular subset of nodes, a particular subset of edges, etc.

Simply put, you can think of it as “graph data form depending on whether or not it leads to a connection between nodes” or “graph data form depending on whether or not it belongs to a subset.”

Chef tmi) connected component is divided into week and strong again. At this time, the criteria for sharing is “directionality or not.” When there is direction, the concatenated form is the opposite of strong connected component, and when there is no direction, the concatenated form is weekly connected component.

TGL:GNN Time Training General Framework for 1 Billion Scale Graphs

[https://arxiv.org/pdf/2203.14883.pdf]

It’s a paper that people in the field will like. You’ll have a lot of questions about how to learn and deduce graphs with billion-scale. This paper is a good reference book for those guidelines and references.

The core of the temporary graph reference is to store information that changes over time in the edge in timestamp format and update it through message passing. It is also important to save the newly updated new information as a resume through information exchange between nodes. This paper proposed the solution as random chunk scheduling.

This paper clearly addresses many of the challenges faced in the field, including node memory, attentionagggregator, temperalsampler, and multi-gpu (parallel-sampling). After reading the paper, there are occasional sections that you access in terms of data structures such as binary search and pointer, so if you want a detailed understanding, it is recommended that you review the data structures for a while.

It also uses the DGL framework. It includes the concept of Message Flow Graph (MFG). I’m sure some of you are not used to it. Think of it simply as a concept of how to sample information received from message passing. (If you are interested in the details, I recommend you to visit DGL. Alternatively, if you need additional explanations, please email or link in to the chef! Welcome 🙂)

** ** ** 서 ))) Tutorial: Extending GNN in production: Challenges and opportunities story

[https://www.youtube.com/watch?v=HRC4hZKiUWU&t=1609s]

It was released at the LoG conference. The video shows the limitations of using gnn in production. I recommend that you watch it whenever you have time. Full of quality content and chips.

There will also be parts that utilize GDB, but if you have any questions or concerns about seminars such as practical training tutorials, please leave a message. If there is a certain amount of demand, the chef will try to prepare for the implementation and sharing session.

** **By the way, I will start by mentioning pyg limitations at the beginning of the paper. I think it’s because of the composition of DGL(aws) versus PyG(intel, nvidia, stanford) Te GNN & jraph(google). I feel like I’m watching the Three Kingdoms. The philosophy of each framework and the direction of development are different, so I think it is another pleasure of GNN to compare and utilize them.

APAN: Asynchronous Propagation Attention Network for Real-time Temporal Graph Embedding

[https://arxiv.org/abs/2011.11545]

This is a paper I found while studying cs329S, which is an electrocute related to MLops. I was also very happy to try to deploy gn with web as a side project next year, and I thought it would be helpful for nearly half of the readers in the field. In fact, high latency is often cited as the biggest problem in GNN’s conference. The gossip was long. It is a good paper from an industrial point of view, as the thesis states the phrase “since users cannot talk about the high latency of neighbor query in a giant graph database, deploying a synchronous CTDG model in online payment platform.”

This paper focuses on how to update nodes by focusing on message passing. When you first encounter a gn, the space for storing information on the node is usually compared to a mailbox. It’s an idea that transforms the mechanics of the mailbox.

Specifically, the previous event, message querying, timestamp sorting, and reading out occur during existing temporary embedding. At this point, the cost is significant. This was solved by importing only the history messages of the most recent neighbors, processing information with aggregation mean, summarized when updating, and message decaying (FIFOqueue concept.

I think it’s the best mailbox blueprint for online deployment. The results and interpretation of the experimental section are very interesting. It is amazing to see that the performance does not decrease compared to other models designed to reflect all the graph information well, and the speed is also fast.

Residual Network and Embedding Usage: New Tricks of Node Classification with Graph Convolutional Networks

[https://arxiv.org/abs/2105.08330]

This paper is a collection of efficiency tricks from node-classification. You don’t like the performance after implementing baseline? It’s a good paper to refer to when you do. In fact, the chef usually tries to improve performance by starting with correct & smooth when he doesn’t like the performance, but most of them were satisfied. 🙂

These are easy-to-understand methods such as Mini batch training, normalization and dropout, Adversarial training, and Label propagation, and simple code implementation. I think it’s also the beauty of this paper to combine these methodologies and closely watch what happens and compare the results of the experiment and how they differ from the results you expect.

Pay Attention to MLPs

[https://arxiv.org/abs/2105.08050]

The transformer structure has hegemony in the deep learning model.It is not an exaggeration to say that it is used in many industries.

The idea of replacing positional encoding, the core information of transformer, with the concept of a spatial gating unit is the main thesis. The self-attention concept, which is one of the elements of the transformer, is contrasted in many areas such as ‘calculation amount’ and ‘projection’ to solve the idea in an interesting way.

A spatial gating unit is a concept derived from a gated linear unit. Simply put, it is a concept that determines whether or not the information derived through gate is useful information. It’s a concept that uses tensor contraction (double inner dot operation) to reflect spatial information.

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

[https://arxiv.org/pdf/2110.14446.pdf]

This paper borrows homophily, which is one of the main metrics used when analyzing network data in complex system networks, to large scale gnn. Homophily, to put it simply, is a metaphorical species. If you think about it in network data, it’s an indicator that assumes that similar people (nodes) are linked.

So you might be wondering how this idea works with embedding works. When embedding a graph, it is often inferred using information received from connected nodes. When you infer, you will learn weight that fits the unique label of the node or link well. In this case, embeddings are often performed based on the assumption that the more similar the information received from the connected nodes, the better the unique label is matched (homophily)

But on the contrary, what if the connected nodes are not similar to ego nodes? In other words, what about non-homophily? As you can all expect, you’ll experience a graded performance phenomenon. This can also be seen as performance dependent on what neighbors around the node are. So if the final learned model fits well only in certain situations, and if not in certain situations, it’s a very low-value model in the end. (To check this in terms of experimental setup, we’ll divide it into transitive and inductive learning.)

To overcome the previous phenomenon, this paper proposes a model called LINKX. The model structure is very simple. Node feature, which is micro node information, is multi-layer acceptance; graph topology information, which is macro graph structure information, is learned by logistic regression. After that, combine the two results and burn them again to multi-layer acceptance.

Some of you may have wondered why the above process was divided into 2 sub-layer (MLP). It’s at the heart of the paper. Simply put, although it’s the same dataset, the values derived from MLP and logistic regression must have different contexts. It’s in a similar vein to the core of multimodal learning.

In summary, the key to this idea is to independently apply node information and graph structure information to solve the problem that homophily-dependent data is heavily influenced by neighboring information.

For your information, to prove that the proposed model is effective in non-homophily, we need to determine whether the dataset is homophily or not. In the process, we use degree instead of using null-model’s random wired method, which is the existing homophily measurement method. It was modified to overcome the sensitivity of edge homophily to the number of classes and size of each class limit it utility.

Link Prediction on Heterogeneous Graphs with PyG

[https://medium.com/@pytorch_geometric/link-prediction-on-heterogeneous-graphs-with-pyg-6d5c29677c70]

PyG is actively distributing tutorials and posting blogs. Among them, I brought a medium blog with codes and explanations that are the cleanest and most applicable to business people. I hope it helps you.

## Graph Omakase Business Trip Service Guide ##

I’ve given you information about hypergraphs over several parking spaces. Will this be useful? I’m sure there are people who are wondering. If you’re curious about where hypergraph is helpful in analyzing and inferring in the field,
When using PyG for the first time, if you don’t know how to code,

If you are in the above two groups, please send us a linked message or email reply, we will help you with prototyping according to the data 🙂 At this time, the nature of each data is different, so I think it will be more detailed and high-quality prototyping if you send us the data definition form!

how about this week of graph omakase ? is it informative to you? i’m always endeavor to help this information to your work. so keep attention this content thank you :)

--

--

Jeong Yitae
Jeong Yitae

Written by Jeong Yitae

Linkedin : jeongyitae I'm the graph and network data enthusiast from hardware to software(application)

No responses yet