YOU MUST EAT THIS GRAPH NEWS, GRAPH OMAKASE. 5 weeks MARCH

7 min readApr 9, 2023

Preview

[GNN architecture principle] I brought a paper like the Essence of Mathematics that approaches and solves key problems with formulas. Let’s take a look at the problem definition part, which was only abstract, into a formula, and look at the process of solving the problem.]

[Pretrained graph natural network model that has learned a large amount of molecular data, it would be nice if it contains only good information, but it is important to select information because it also contains noise information. Let’s talk about denoising & pre-trained model usage for that information selection.]

[Have you thought about how network analysis will be applied in the stock market and virtual currency market? Let’s talk about network analysis for portfolio optimization.]

Graph Neural Networks Inspired by Classical Iterative Algorithms

[https://arxiv.org/pdf/2103.06064.pdf]

preliminary

IRLS , Iterative Reweighted Least Squares

suggested resource ; [Medium]

IRLS is to iteratively reweight the least-squares regression problem based on the current estimates of the model parameters. The weights are chosen such that points with large residuals (i.e., large differences between the predicted and observed values) are down-weighted, while points with small residuals are up-weighted. By doing this, the algorithm focuses on fitting the model to the most informative data points.

proximal gradient descent

Gradient Descent is a basic optimization algorithm that does not incorporate any explicit regularization, Proximal Gradient Descent adds an additional proximal operator to the cost function that provides a regularization term, which can lead to improved generalization performance of the model.

summary

I think this paper will be helpful to those who have the following two concerns. 1. I want to improve the GNN architecture, but if you’re wondering where to start, 2. Attention, why does weight work when it’s delivered through simple internal? There are times when it doesn’t work out, but if you’re wondering how to define and apply the case when it doesn’t work out.

The key idea of the paper is to apply the regularization term etherally using IRLS and proxy gradient descent, which we discussed in the pre-primary mentioned earlier.

oversmoothing, long-range dependenices , and spurious edges these things are one common challenging problem of Graph neural network. The common feature of the three questions here is the edge. This is because the information transmitted between nodes is amplified or reduced. The fundamental cause of this amplification and reduction is approached with maximum a posteriori estimation (MAP) to strengthen regulation, thereby developing a more reliable information delivery function.

insight

GNN principle and problem definition
By approaching graph natural network as a formula, you can quantitatively understand the qualitative aspects. Since there are many formulas, it may be boring or difficult to read the paper, but as there is an old saying called Gojingamrae, I think you will experience a wider view after seeing it once.

Graph attention

Until now, similarity has been measured based on values generated through internal product between nodes and cosine similarity. The derived values are often different because the purpose is different just by looking at the internal product cosine similarity. Because of the difference in these derived values, each time we would choose how to measure the similarity, depending on the qualitative part. The author of the paper wants to solve the problem that the similarity function is heuristic through IRLS’s bound technique. Penality, the core of bound techniques, is sometimes applied well, and there are cases where it is not. In that case, the values are also presented in table format.

Edge uncertainity

Will this idea be applied well to high homophily (similar nodes are oversmoothing caused by the transfer of information between similar nodes), which requires this regulation because the core idea is a regularization term? In other words, will these methods work for heterophily graph, real-world graph datasets? That’s the question. The answer to that question also includes experiments and interpretation of the results. Consider that each node has different characteristics as a label variable, and apply the proposed methodology to each low & high homeophily dataset using homeophily, an indicator that measures that variable.

PRE-TRAINING VIA DENOISING FOR MOLECULAR PROPERTY PREDICTION

[https://arxiv.org/pdf/2206.00133.pdf]

preliminary

Force field

force field is a mathematical model used to describe the interactions between atoms and molecules in a system. It specifies the energy associated with a particular configuration of the system, based on parameters such as atomic charges, bond lengths, and angles. A force field can be used to calculate the forces acting on each atom in the system, which in turn determine the motion of the atoms over time.

Boltzmann distribution

A Boltzmann distribution ****is a probability distribution that describes the probability of a particle occupying a particular energy state in a system at thermal equilibrium. It is characterized by a single parameter, the temperature of the system, and is given by the formula:

$$ P(E) = (\frac{1}{Z}) * exp(\frac{-E}{kT}) $$

P(E) = (1/Z)*exp(-E/kT)

where P(E) is the probability of occupying an energy state E, Z is the partition function, k is the Boltzmann constant, and T is the temperature of the system. The Boltzmann distribution describes the probability of a particle being in a particular energy state, rather than the probability of a random variable taking on a particular value.

summary

In this paper, the authors used pre-training techniques based on denoising to improve the prediction of molecular properties of 3D structures. They used a large dataset of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. The denoising objective corresponds to learning the molecular force field directly from the equilibrium structure. They applied this pre-training technique to TorchMD-NET architecture, a transformer-based architecture that maintains updated atomic-specific scalar and vector functions at each layer using a self-attention mechanism. The authors used gate equilateral blocks applied to processed scalar and vector functions for denoising.

insight

noise handling, 3D coordinate information and denoise term logic

The reason why I mentioned Boltzman distribution at the beginning is that distribution can be managed through a term called Energy state. The strategy for how to denoise according to the distribution is shown in this paper. In particular, in the molecular field, the key is to understand all of these changes because the values vary depending on the length and size of the atoms that come out of the force-field bond. The key is the idea of adding related information and 3D coordinate (X, Y, Z) to check the fluctuations more precisely and then denoise them.

Edgedelta , pretraining model handling

“Edge-Delta”, which initializes to zero the weights that multiply incoming vertex features in the edge update functions (and treats these weights as absent for the purpose of computing the initial weight variance).

In order to utilize the existing pretrained model, you must have been using Fine-tuning, a method that frees and learns layer parameters. I hope it works in all areas, but it’s hard to expect good performance to apply it in a molecular & graphical network.

The reason for this is that the excessive bias (smoothness) of the pretraining model can result in biasing in molecular discovery. The weighted result is obtained by applying edgedelta to obtain the weight variation result.

A network-based strategy of price correlations for optimal cryptocurrency portfolios

[https://arxiv.org/pdf/2304.02362.pdf] ** cryptocurrency market version.

network analytics and portfolio of asset distribution

[https://www.smallake.kr/wp-content/uploads/2018/07/네트워크-분석과-자산배분전략한국투자증권.pdf] ** stock market version.

preliminary

MST(Minimum Spanning Tree)

A Minimum Spanning Tree (MST) is tree-like subgraph of a connected, undirected graph that connects all the vertices of the graph with the minimum possible total edge weight. In other words, it is a subset of the edges of the graph that forms a tree connecting all vertices while minimizing the total weight of the edges.

Summary

I brought a topic that can be light but also heavy. It’s about money. It is an idea to optimize the portfolio through correlation in the stock market or virtual currency market. Let me explain the gist of the paper using the stock market as an example. There are many ways to value a company in the stock market. It is too much to evaluate only corporate financial statements, but the field of corporate value measurement is not correct because it should be judged comprehensively in consideration of various factors such as politics, environment, and internal risks.

There’s a limit to measuring all the exogenous/endogenous variables that affect this value, so the key is to select only the variables that affect it and design the portfolio well. In this paper, the variables are assumed to be the correlation of the price of virtual currency in each country and each network is designed.The key here is the connected edge weight between each node. Because the core of MST is to select only the important nodes through the set weight in the network.

Insight
How to approach real-world problems through minimal spanning tree (MST).
So the point is weight. Thinking experiments on how to design weights and whether they are reasonable are important areas. Also, we face the NP-hard problem. In order to solve that computational problem, it seems important to model the network that is appropriate in the end. I think it will be of great help to those who had concerns about this method because this paper describes how they approached the problems, trial and error that occurred while approaching them, and even deriving key results.

YOU MUST EAT THIS GRAPH NEWS, GRAPH OMAKASE. 5 weeks MARCH

Preview

summary

insight

Written by Jeong Yitae

No responses yet