From RAG to GraphRAG , What is the GraphRAG and why i use it?

Jeong Yitae

13 min readMar 12, 2024

Before discussing RAG and GraphRAG,

The era of ChatGPT has arrived. It’s an era so influenced by large language models that it could be called the third industrial revolution. Nowadays, even my mother uses ChatGPT for her queries, showing how its usage spans generations and is ever-expanding.
The reason for this broad utilization likely lies in its ability to accurately fetch and convey the information users seek. In an age overwhelmed by information, it serves to selectively provide the ‘necessary’ information.
Despite the significant progress made to date, there have been numerous challenges. For instance, one such challenge is the ‘hallucination’ phenomenon, where inaccurate information is provided. This issue stems from various causes, with a primary one being the misinterpretation of user intent, leading to irrelevant information being fetched.
The solution is straightforward: accurately understand the user’s intent and deliver ‘relevant’ information.
Efforts to improve this involve various approaches, mainly categorized into four types:
1. Building large language models from scratch, which allows for clear data context from the outset but comes with high construction costs.
2. Adopting ‘well-trained’ large language models and further training them in specific domains, which is cost-effective and relatively accurate but challenging to maintain the balance between the model’s context and domain-specific context.
3. Using large language models as is, but adding additional context to user queries, which is cost-effective but risks subjectivity and potential bias in context provision.
4. Keeping the large language model while providing extra context on ‘relevant information’ during the response process, which allows for up-to-date, cost-effective responses but involves complexity in identifying and integrating relevant documents.
Additionally, these methods can be compared in five aspects: cost, accuracy, domain-specific terminology, up-to-date responses, transparency, and interpretability.
For a detailed comparison, refer to https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you/.
This post discussed various methodologies attempted to address the hallucination phenomenon in large language models. Specifically, it will examine the Retrieval Augmented Generation (RAG) technology, which involves fetching ‘relevant’ information and providing context, and explore RAG’s limitations and GraphRAG as a means to overcome them.

Brief Introduction to RAG

What is RAG (Retrieval Augmented Generation)? As mentioned, it’s a technology that interprets user queries ‘well’, fetches ‘relevant’ information, processes it into context, and then incorporates this useful information into responses.
As referenced in the cited site, RAG is characterized by its cost-effectiveness, relative accuracy, adequacy in providing domain-specific contextualization, ability to reflect the latest information, and transparency and interpretability in tracing the source documents of the information, making it a predominantly chosen approach.

Figure 1. RAG Operation Process / https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you/
The key lies in ‘properly’ interpreting queries, fetching relevant information, and processing it into context.
As seen in Figure 1, the process from user query → response generation via a pre-trained large language model (LLM) → delivery of the response to the user, now includes an additional step where a Retrieval Model fetches relevant information for the query. This added Retrieval Model is where the aforementioned three elements take place.
To perform these three tasks effectively, the process is divided and implemented/improved in four stages: 1. Pre-Retrieval 2. Chunking 3. Retrieval 4. Post-Retrieval.

Pre-Retrieval

Data granularity refers to the level of detail or precision of the data to be searched by the RAG model to enhance the generation process, processed in advance before the retrieval step.
Combining the strength of large pre-trained language models with a retrieval component, the RAG model generates responses by searching a text segment (e.g., sentence, paragraph, or document) database for relevant information.
Data granularity can range from sentence-level (e.g., individual facts, sentences, or short paragraphs) to paragraph-level (e.g., entire documents or articles).
The choice of data granularity affects the model’s performance and its ability to generate accurate and contextually relevant text.
Fine-grained data can provide more specific and detailed information for the generation task, while coarse-grained data can provide broader context or general knowledge.
Choosing the right data granularity to optimize the effectiveness of the RAG model is crucial. It involves balancing the need to provide detailed and relevant information against the risk of overloading the model with too much data or too general data that becomes unhelpful.

Chunking

This is the process of appropriately processing the input form of source data for quantification into a large language model. Since the number of tokens that can be input into a large language model is limited, it’s important to segment and input the information properly.
For example, in a conversation between people, an ideal situation is assumed where the conversation is evenly distributed within a given time.
If one person speaks for 59 minutes and the other for 1 minute in an hour, the conversation is dominated by one person ‘inputting’ information, resembling not an exchange but an infusion of information.
Conversely, if each person speaks for 30 minutes, it’s considered an efficient conversation because information is evenly exchanged.
In other words, to provide ‘good’ information to a large language model, it’s crucial to give ‘appropriate’ context. Given the limited length (tokens), it’s important to preserve the organic relationship between contexts within the given context limit. Therefore, in processing relevant data, the issue of ‘data length limit’ arises.

Retrieval

This stage involves searching a document or text segment database to find content related to the user’s query. It includes understanding the intent and context of the query and selecting the most relevant documents or texts from the database based on this understanding.
For instance, when processing a query about “the health benefits of green tea,” the model finds documents mentioning the health benefits of green tea and selects them based on similarity metrics.

Post-Retrieval

This stage processes the retrieved information to effectively integrate it into the generation process. It may include summarizing the searched text, selecting the most relevant facts, and refining the information to better match the user’s query.
For example, after analyzing documents on the health benefits of green tea, it may summarize key points like “Green tea is rich in antioxidants, which can reduce the risk of certain chronic diseases and improve brain function,” to generate a comprehensive and informative response to the user’s query.

RAG owns limitaion

RAG has its efficient aspects compared to other methods, such as cost, up-to-date information, and domain specificity, but it also has its inherent limitations. The following illustration seems to depict these limitations well within the RAG process. Based on the illustration, we will examine a few representative limitations.

Missing Content: The first limitation is failing to index documents related to the user’s query, thus not being able to use them to provide context. Despite diligently preprocessing and properly storing data in the database, not being able to utilize it is a significant shortfall.
Missed the Top Ranked Documents: The second issue arises when documents related to the user’s query are retrieved but are of minimal relevance, leading to answers that don’t satisfy the user’s expectations. This primarily stems from the subjective nature of determining the “number of documents” to retrieve during the process, highlighting a major limitation. Therefore, it’s necessary to conduct various experiments to define this k hyperparameter properly.
Not in Context — Consolidation Strategy Limitations: Documents containing the answer are retrieved from the database but fail to be included in the context for generating an answer. This happens when numerous documents are returned, and a consolidation process is required to select the most relevant information.
Not Extracted: The fourth is a fundamental limitation of LLMs (Large Language Models), which tend to retrieve ‘approximate’ rather than ‘exact’ values. Thus, obtaining ‘approximate’ or ‘similar’ values can lead to irrelevant information, causing a significant impact due to minor noise in future responses.
Wrong Format: The fifth issue appears closely related to instruction tuning, a method of enhancing zero-shot performance through fine-tuning the LLM with an Instruction dataset. It occurs when additional instructions are incorrectly formatted in the prompt, leading to misunderstanding or misinterpretation by the LLM, resulting in erroneous answers.
Incorrect Specificity: The sixth issue involves either insufficiently using the user query information or excessively using it, leading to problems during the consideration of the query’s importance. This is likely to occur when there’s an inappropriate combination of input and retrieval output.
Incomplete: The seventh limitation is when, despite the ability to use the context in generating answers, missing information leads to incomplete responses to the user’s query.

In summary, the main causes of these limitations are 1. Indexing — retrieving documents relevant to the user’s query, 2. Properly providing information before generating an answer, and 3. The suitable combination of input and pre/post-retrieval processes. These three factors highlight what’s crucial in RAG and pose the question of how these issues can be improved.

When using GraphRAG

it can address some of the limitations of RAG as mentioned above from the perspectives of Pre-Retrieval, Post-Retrieval, and Prompt Compression, considering the contexts of Knowledge Graph’s Retrieval and Reasoning.
Graph Retrieval focuses on enhancing context by fetching relevant information, while Graph Reasoning applies to how information, such as chunking and context inputs, is traversed and searched within RAG.
Pre-Retrieval can leverage knowledge graph indexing to fetch related documents. By semantically indexing documents based on nodes and edges within the knowledge graph, it directly retrieves semantically related documents.
The process involves considering whether to fetch nodes or subgraphs. Extracting nodes involves comparing the user query with chunked nodes to find the most similar ones and using their connected paths as query syntax.
However, this approach requires specifying how many nodes within a path to fetch and depends heavily on the information extraction model used for creating the knowledge graph, highlighting the importance of the model’s performance.
Additionally, Variable Length Edges (VLE) may be used to fetch related information, necessitating database optimization for efficient retrieval. Discussions on database design and optimization, involving database administrators and machine learning engineers, are crucial for enhancing performance.
Subgraphs involve fetching ego-graphs connected to relevant nodes, potentially embedding multiple related ego-graphs to compare the overall context with the user’s query.
This method requires various graph embedding experiments due to performance differences based on the embedding technique used.
Post-Retrieval involves a re-ranking process that harmoniously uses values from both RAG and GraphRAG. By leveraging semantic search values from GraphRAG alongside RAG’s similarity search values, it generates context. GraphRAG’s values allow for verifying the semantic basis of the retrieval, enhancing the accuracy of the fetched information.
Using a single database for both vectorDB and GraphDB allows for semantic (GraphRAG) and vector (RAG) indexing within the same database, facilitating verification of retrieval accuracy and enabling improvements for inaccuracies.
Prompt Compression benefits from graph information during prompt engineering, such as deciding which chunk information to inject into prompts.
Graphs enable the return of only relevant information post-retrieval, based on the relationship between the query context and the documents. This allows for tracing the source of irrelevant information for improvements.
For instance, if an inappropriate response is generated, graph queries can be used to trace back to the problematic part for immediate correction.

Overall, GraphRAG provides a comprehensive approach to addressing RAG’s limitations by integrating knowledge graph techniques for better information retrieval, reasoning, and context generation, thereby enhancing the accuracy and relevance of the responses generated.

GraphRAG architecture

There are 4 modules for executing GraphRAG Query Rewriting , Augment , Retrieval 에서 Semantic Search , Similarity Search.

Query Rewriting

Rewriting User’s query impelemnt in this process. if user write and order the engine, we can add the additional and useful context its query prompt format. In this process, we redefined this things for clarify the users intention.

Pre-Retrieval & Post-Retrieval

This phase involves contemplating what information to retrieve and how to process that information once retrieved. During the Pre-Retrieval phase, the focus is primarily on decisions related to setting the chunking size, how to index, ensuring data is well-cleaned, and detecting and removing any irrelevant data if present.
In the Post-Retrieval phase, the challenge is to harmonize the data effectively. This stage mainly involves two processes: Re-ranking and Prompt Compression. In Prompt Compression, the query result, specifically the Graph Path, is utilized as part of the Context + Prompt for answer generation, incorporating it as a prompt element. Re-ranking employs the results of Graph Embedding combined with LLM (Large Language Model) Embedding to enhance the diversity and accuracy of the ranking.
This approach is strategic in enhancing the performance and relevance of the generated answers, ensuring that the process not only fetches pertinent information but also integrates it efficiently to produce coherent and contextually accurate responses.

Get factors ready for GraphRAG

To effectively store, manage, and retrieve graph-shaped data, software that reflects the unique characteristics of the data is necessary. Just as RDBMS (Relational Database Management System) serves to manage table-form data efficiently, GDBMS (Graph Database Management System) exists to adeptly handle graph-shaped data. Especially in the context of Knowledge Graph Reasoning, if the database is not optimized for graph structures, the cost of reversing relevance through JOIN operations significantly increases, potentially leading to bottlenecks.
Hence, GDBMS is essential in GraphRAG for its efficiency in managing all these aspects. For retrieving graphs, a model that generates graph queries is required. Although it might be clear which data is related, automating the process of fetching associated data from specific data points is crucial. This necessitates a natural language processing model dedicated to generating graph queries.
Unfortunately, there’s a lack of datasets for graph query generation, highlighting the urgent need for data acquisition. Neo4j has taken a step forward by launching a data crowdsourcing initiative, which can be explored further through the provided link for those interested in contributing or learning more.
Regarding the extraction of information to create graph forms, an information extraction model is necessary to infer the relationships between well-chunked documents.
Two main approaches can be considered: using NLP’s Named Entity Recognition (NER) or employing a Foundation model from Knowledge Graph. Each approach has its distinct differences.
NLP focuses on semantics from a textual perspective, heavily relying on a predefined dependency among words, whereas Knowledge Graphs, formed from a knowledge base through Foundation models, focus on nodes and can regulate the amount of information transmitted between edges.
For embedding graph data, a model is utilized to add additional context to the Reranker, incorporating a holistic perspective through Graph Embedding, diverging from the conventional sequence perspective of LLMs (Large Language Models). This allows for structural characteristics to be imparted, complementing the sequence perspective that focuses on relationships over time with a graph perspective that ensures all chunks (nodes) are evenly represented, thereby filling in any potentially missed information.

GraphRAG limitations

GraphRAG, like RAG, has clear limitations, which include how to form graphs, generate queries for querying these graphs, and ultimately decide how much information to retrieve based on these queries. The main challenges are ‘query generation’, ‘reasoning boundary’, and ‘information extraction’. Particularly, the ‘reasoning boundary’ poses a significant limitation as optimizing the amount of related information can lead to overload during information retrieval, negatively impacting the core aspect of GraphRAG, which is answer generation.

Applying GraphRAG

GraphRAG utilizes graph embeddings from GNN (graph neural network) results to enhance text embeddings with user query response inference. This method, known as Soft-prompting, is a type of prompt engineering. Prompt engineering can be divided into Hard and Soft categories. Hard involves explicitly provided prompts, requiring manual context addition to user queries. This method’s downside is the subjective nature of prompt creation, although it’s straightforward to implement.
On the contrary, Soft involves implicitly providing prompts, where additional embedding information is added to the model’s existing text embeddings to derive similar inference results. This method ensures objectivity by using ‘learned’ context embeddings and can optimize weight values. However, it requires direct model design and implementation, making it more complex.

When to Use GraphRAG

GraphRAG is not a cure-all. It’s not advisable to use advanced techniques like GraphRAG without a clear need, especially if traditional RAG works well. The introduction of GraphRAG should be justified with factual evidence, especially when there’s a mismatch between the information retrieved during the retrieval stage and the user’s query intent. This is akin to the fundamental limitations of vector search, where information is retrieved based on ‘approximate’ rather than ‘exact’ values, leading to potential inaccuracies.
When efforts like introducing BM25 for exact search in a hybrid search approach, improving the ranking process, or fine-tuning for embedding quality do not significantly enhance RAG performance, it might be worth considering GraphRAG.

Conclusion

This post covered everything from RAG to GraphRAG, focusing on methods like fine-tuning, building from scratch, prompt engineering, and RAG to improve response quality. While RAG is acclaimed for efficiently fetching related documents for answering queries at relatively lower costs, it faces several limitations in the retrieval process. Advanced RAG, or GraphRAG, emerges as a solution to overcome these limitations by leveraging ‘semantic’ reasoning and retrieval. Key considerations for effectively utilizing GraphRAG include information extraction techniques to infer and generate connections between chunked data, knowledge indexing for storage and retrieval, and models for generating graph queries, such as the Cypher Generation Model. With new technologies emerging daily, this post aims to serve as a resource on GraphRAG, helping you become more familiar with this advanced approach. Thank you for reading through this extensive discussion.

Reference

https://medium.com/@bijit211987/top-rag-pain-points-and-solutions-108d348b4e5d
https://luv-bansal.medium.com/advance-rag-improve-rag-performance-208ffad5bb6a
Barnett, Scott, et al. “Seven failure points when engineering a retrieval augmented generation system.” arXiv preprint arXiv:2401.05856 (2024).
https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you/
Luo, Linhao, et al. “Reasoning on graphs: Faithful and interpretable large language model reasoning.” arXiv preprint arXiv:2310.01061 (2023).
https://towardsdatascience.com/advanced-retrieval-augmented-generation-from-theory-to-llamaindex-implementation-4de1464a9930