YOU MUST EAT THIS GRAPH NEWS, GRAPH OMAKASE. 3 weeks May
Context-Aware Knowledge Graph Chatbot With GPT-4 and Neo4j
[https://medium.com/neo4j/context-aware-knowledge-graph-chatbot-with-gpt-4-and-neo4j-d3a99e8ae21e]
How can knowledge graph improve Chatgpt performance? In this article, we talk about chatgpt chronic problem hallucinations, how to improve chatgpt, which confidently answers lies by wrapping them up like truth.
Use chatgpt to generate a query to query the knowledge graph loaded on Neo4j and query the prompt to chatgpt based on the results derived through the query. Now that you apply the prompt based on the knowledge graph, you can get a closer answer to the actual facts from chatgpt.
Since neo4j is a native graph store rather than a triple store, there are some parts that are not suitable for using knowledge graph, but it is more efficient than other db because it has the characteristics of graph database.
* As I said in an interview with knowledge graph engineering, they usually use context or neptune when dealing with knowledge graph. I’ll tell you the details in the interview.
LEARNING ON LARGE-SCALE TEXT-ATTRIBUTED GRAPHS VIA VARIATIONAL INFERENCE
[https://arxiv.org/pdf/2210.14709.pdf]
Introduction
motivation
Applying both the advantages of LLM (local pattern of text) and GNN (global pattern of text, proxy) caused a scalability problem, so let’s think about how to solve the problem and use EM (Expectation–maximization algorithm)! That’s how the idea came about.
idea
It is an idea to optimize parameters by fixing or optimizing LLM and GNN every step of E step, M step.
Preliminaries
EM algorithm high-level explanation
- initialization: Start by initializing the model’s parameters, either randomly or based on some prior knowledge.
- Expectation (E-step): Given the current parameter estimates, calculate the expected value of the missing or latent variables. In this step, you estimate the values of the hidden variables based on the observed data and the current model parameters.
- Maximization (M-step): Using the expected values obtained in the E-step, update the model parameters to maximize the likelihood of the observed data. This step involves finding the parameters that maximize the likelihood or log-likelihood function given the observed data and the expected values of the hidden variables.
- Repeat E-step and M-step: Iterate between the E-step and M-step until convergence. In each iteration, the E-step estimates the expected values of the hidden variables, and the M-step updates the model parameters based on these values. This process continues until the parameters converge to a stable solution.
The EM algorithm is based on the idea of maximizing the expectation of the complete data log-likelihood function, where the complete data includes both observed and latent variables. However, the presence of the latent variables makes the direct maximization of the likelihood function intractable. Therefore, the EM algorithm iteratively estimates the values of the latent variables and updates the model parameters until convergence.
Summary
Procedure
To give you a brief overview of the two steps in this paper, Step and Step M, the Step — LM module uses text properties as input and learns to predict the next word in the order in which the preceding word is specified. E step 의 output 은 trained model paramters 입니다. M step — Prior to the GNN module, put the parameters derived from Estep as node attributes. We train GNN with the parameters derived in this. The output of the M step is the trained model parameters reflecting the graph structure.
After all, the key is to optimize the model parameters, which is the result of the E,M steps. The criterion for this optimization is lower bound, which is determined by pseudo-likelihood techniques.
Experiment
Both transformative and inductive experiment perform well. The fact that the idea in this experiment, which mainly aims to check out-of-distribution (OOD), performed well can be inferred that each model parameter acted as white noise during training, suggesting that large models and data are not universal but that the methodology of extracting and applying appropriate parameters through models has been proven.
It is also impressive that the performance of the experiment varies depending on whether the structure information is reflected. Given that performance fluctuations occur by as little as 10%, and as much as 50%, embedded space, proxy information (correlation) acts as an important factor.You can infer that. I think it’s an experiment that shows that gn is really effective, which was considered super subarchitecture.
Insight
It was very interesting to me how to change the text feature, which is Discrete variable, to the input of gn, and the aspect of approaching point learning from the EM algorithm side. Unlike the existing joint model training and joint loss paradigms, I think it would have been because it was very unique that EM algorithm was applied.
NeuKron: Constant-Size Lossy Compression of Sparse Reorderable Matrices and Tensors
[https://arxiv.org/pdf/2302.04570.pdf]
Introduction
motivation
To the best of our knowledge, existing lossy-compression methods for sparse matrices create outputs whose sizes are at least linear in the numbers of rows and columns of the input matrix. For example, given an 𝑁-by-𝑀 matrix A and a positive integer 𝐾, truncated.singular value decomposition (T-SVD) outputs two matrices of which the numbers of entries are 𝑂(𝐾𝑁) and 𝑂(𝐾𝑀).
idea
Our key idea is to order rows and columns to facilitate our model to learn and exploit meaningful patterns in the input matrix for compression.
Preliminaries
The Kronecker product of two graphs involves taking every node in the first graph and connecting it to every node in the second graph, resulting in a new graph with a combination of edges and nodes from both original graphs. This operation is denoted by the ⊗ symbol.
Summary
The order of the proposed algorithm in this paper is as follows.
1. If reorderable matrix A is spar, rearrange rows and columns to create patterns that can be utilized for better compression.
** we mainly use cut-off threshold to determine whether the matrix is dense or sparse.
2.A, encode the location (i,j) of each item a_ij into a sequence and enter it into the LSTM.
3. Using supervised learning, train the LSTM to predict the value of each item a_ij based on the position (i,j) of A.
4. Once trained, LSTM encodes items into sequences and provides them to the LSTM for prediction, which is used to compress new matrices.
5.Effective compression is achieved by combining the outputs of the LSTM to approximate the original matrix within the log time.
Simply put, we do positional coding on the spar matrix and we make a new spar matrix using LSTM(ML) based on that positional coding value. I think you’re most curious about why LSTM came out suddenly.
Because LSTM is trained to predict values based on location in the original sparse matrix, it can capture patterns and correlations that can be utilized for efficient compression of the new sparse matrix.
As a result, the compressed sparse matrix new sparse matrix (LSTM output) can be stored in a constant amount of space smaller than the size of the original sparse.
Insight
Matrix compression skills are good for studying. Since the paper was developed in the form of describing what is good and what is bad and talking about ideas, it is good to study the strengths and weaknesses of representative matrix compression skills. Recommended for those interested in data engineering.
Google “We Have No Moat, And Neither Does OpenAI”
[https://www.semianalysis.com/p/google-we-have-no-moat-and-neither]
LLMs from big tech companies such as Google Bard and MS Chatgpt all have fake moats. Starting with a strong tone, we talk about the importance of the open-source model. A lot of things happened in March and April.
It also mentions the contents such as LLM parameter exposure, chatgpt, and Bard battle, and eventually gives an overview of how to efficiently conduct large model training, which is currently an important context. I think it’s a good article to read if you’re working in AI and data regardless of industry and academia.