YOU MUST EAT THIS GRAPH NEWS, GRAPH OMAKASE. 1week July

5 min readJul 2, 2023

After visiting the Information Science conference, I got various insights, but the best of them was “Efficient AI”. Everyone, regardless of academia and industry, was so enthusiastic that they devoted all their sessions in the morning and afternoon of the conference.

The main concern is to learn a lot of data on the model. Here, there will be a variety of criteria and indicators for learning well, but I thought that one of them was “distributed training”. It’s about dividing the data evenly and delivering it efficiently to the model.

I think one of the biggest problems that occurs when delivering is the “bottleneck problem”. Whenever a large amount of data is uploaded to memory simultaneously, there are various attempts to solve the problem of OOM. Reducing batch size is the basic approach what i knew, but literally separating a single piece of data into a large amount of data and letting the model learn about it reduces continuity of information and loses significant bios, which can be disadvantageous in generalization. Also, depending on the batch size, there are so many things to think about because overfitting should be prevented through the appropriate combination considering lr, optimizer, epochs, dropout rate, etc.

Then, how do you create a model learning pipeline robustly in batch size, but integrate the right data and learning well? That’s how you worry about it leads to this worry. How do I solve this problem? In this week’s Omakase, we’re going to talk about a methodology that you can easily apply and its origins.

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks.

[https://arxiv.org/pdf/1905.07953.pdf]

The following are the three key elements of Cluster-GCN.

1.METIS, graph partioning algorithm.

As GCN, overall message passing occurs, the larger the graph is, the more linearly proportional the computation is, so it can be inferred that the computational cost is significant. In other words, OOM is likely to occur because all the graphs are uploaded to memory at once. In this paper, we’re going to make use of useful information by minimizing redundancy through subgraph sampling in All Graphs.

So, in the end, the key is “subgraph sampling” thus, extract useful information. In this paper, we apply a graph partitioning algorithm called “METIS” to extract the information well. METIS is an algorithm that designs edge connectivity efficiently so that graphs that are considered important for each batch can be co-located. Finally, through this, useful subgraphs are put in a batch to proceed with learning.

2.Embdding utilization.

We propose a modified message passing algorithm for message-passing. If efficient subgraphs were extracted through METIS earlier, the information should be ‘well’ learned. The reasons for considering this embedded utilization technique are as follows. We have had difficulty learning all graph structures.

so far, so we have done sub-graph sampling. Applying the results of this sub-graph sampling simply as a 1-hop will result in a significant lack of information compared to traditional methods. Therefore, it is concluded that the layer should be increased within the sub-graph and applied up to as distant nodes as .

At this time, in order to solve the chronic problem over-smoothing of the trainig deep layer, materials such as identity are added to the prior layer weight to add weight according to the distance. So, eventually, the enhanced message passing is born in the digital enhancement.

3.Partition evaluation with Entropy.

So far, we’ve looked at how to extract subgraphs (1) efficiently and transfer that information to nodes (2) well. We need a standard to determine whether this series of processes is efficient. In this paper, the criteria were examined using entropy. As a control, we use random partitioning. Simply put, we compare entropy for each subgraph selected by subgraph vs. random derived through METIS. You can think of it like that.

the scaling tutorial is also written in the form of colab in PyG. I think the effect of studying will be doubled if you don’t just study theory, but also try to apply it in practice. let’s try it!

https://colab.research.google.com/drive/1XAjcjRHrSR_ypCk_feIWFbcBKyT4Lirs?usp=sharing

Partial Parallelization of Graph Partitioning Algorithm METIS.

[http://courses.csail.mit.edu/6.895/fall03/projects/papers/kasheff.pdf]

There will be a total of three processes for partitioning. Coarsening. Initial Partition. Refining. The process of folding and grouping progresses arbitrarily. You can think of it as that. There are roughly three key points here. 1. Efficient presentation of graph structures that are transformed simultaneously. 2. Set an appropriate threshold during the curing process and stop if certain criteria are met. 3. Parallelization process to efficiently apply the previous two processes.

Basic knowledge for approaching graph partitioning, such as how to parallelize adjacency matrix(adj) , how to represent adj to apply all structure information, and how to solve data placement issues is all written in the graph.

Of course, computer science (i.e. round-robin, lock, multi-processor…) prior knowledge is required for efficient parallelization. However, i recommend you shouldnt’ avoid it and study it ! because it’s knowledge you’ll face someday for growth, and I recommend you to understand the concept through search and enjoy this paper together. I recommend it because I think it will be very useful knowledge in the future.

reference information.

cs240A graph partioning.

[https://sites.cs.ucsb.edu/~gilbert/cs240a/slides/old/cs240a-partitioning.pdf]

For those of you who are thirsty for a mathematical, computer-engineered approach, I’ve prepared it as a reference. I’m confident that just reading and studying this slide will lay the foundation for partitioning.

cs224w scalable.

[http://web.stanford.edu/class/cs224w/slides/16-scalable.pdf]

Scalability is an important challenge, so we put it in a section in cs224w. If you’re tired of looking at complex elements like the Cluster-GCN formula, I recommend you look at this section. We’ve talked about advanced version, simple gcn, so you can get extended insights.

Knowledge graph (KG) related content. These days, the heat of LLM is getting hotter day by day without cooling down. For me, a curious man, I’m also interested in discovering the efficient use-case of LLM, but on the contrary, what field is LLM not suitable for solving, that is, LLM not suitable?That’s the question. Furthermore, can graph solve the unsuitable part? That’s what I thought. I wasn’t the only one who had that worry.

The following two posts include how KG can contribute to LLM and KG’s own analytics story. How can we use KG now in the era of LLM? These are the posts that I recommend to those who were worried about it.

Knowledge Graphs + Large Language Models = The ability for users to ask their own questions?

https://medium.com/@peter.lawrence_47665/knowledge-graphs-large-language-models-the-ability-for-users-to-ask-their-own-questions-e4afc348fa72

Graph Analytics using Virtuoso’s SPARQL-BI extensions to SPARQL

[https://medium.com/virtuoso-blog/graph-analytics-using-virtuosos-sparql-bi-extensions-to-sparql-5e75b4be32b3]