How do we measure if GraphRAG will help with the RAG pipeline? From token limitation to direct evaluation.
From Local to Global: A Graph RAG Approach to Query-Focused Summarization.
https://arxiv.org/pdf/2404.16130
I have brought a paper that discusses the ambitiously prepared open-source GraphRAG by Microsoft. I believe it would be beneficial for you to ponder on this paper and gain insights on three key points:
- The problem definition of the existing RAG and the logic behind choosing GraphRAG as the solution It’s a common concern among those considering the validity of GraphRAG. It’s good to read the paper while keeping in mind what aspects of RAG were lacking that led to the use of graph retrieval.
- The GraphRAG workflow in this paper This paper implements GraphRAG in a different manner compared to previous workflows. It would be worthwhile to consider how it differs.
- Evaluation methods As research on evaluation methods for RAG is active in the industry, it would be interesting to see how GraphRAG is evaluated in this somewhat new field.
If you’re familiar with the GraphRAG paper, you might have been surprised by Figure 1. The term “Community Detection” probably caught your eye, right? For someone like me, who had been immersed in network science papers before being swept up in the trend of studying RAG due to the rise of Large Language Models (LLMs), this term was a delightful surprise. It indicated a different approach from the conventional Named Entity Recognition that has been central to previous GraphRAG applications.
Let’s briefly discuss Retrieval Augmented Generation (RAG). The prominence of RAG can be attributed to the significance of LLM prompting in generating responses, where introducing useful information into the prompt proved beneficial. This retrieval of information is notably cheaper compared to the costs of fine-tuning.
However, this introduces a challenge. The retrieved information, while helpful, can be too extensive, causing LLMs to lose track mid-context or fetch inaccurate information, which directly impacts performance negatively.
We have observed through various studies and industry presentations that even if an LLM can handle a long context (token limit), it performs best when these contexts are effectively synthesized (a concept known as Chain of Thought…).
This paper approaches the contexts (Source Documents) from a graph perspective, applying community detection algorithms to utilize communities (elements with similar connections) for extracting context in both macro and micro perspectives before presenting it to the LLM.
Consider a scenario where a macro-level question is posed: “What is the main theme of this dataset?” In response, an LLM would analyze the entire dataset to extract the ‘main theme.’ However, if the dataset is vast, the issue of ‘losing context,’ as mentioned earlier, arises.
To address this, the paper suggests dividing the Source Document according to a hyperparameter known as chunk size. Next, it extracts entities within each chunk, focusing on three key elements: name, type, and description. By defining categories for entities, it becomes possible to retrieve elements tailored to specific categories. This process is referred to as domain-tailored summarization in the paper, which varies depending on the input domain. Additionally, the extracted elements (nodes) are linked through ‘claims’ that capture covariance information, including subject, object, type, description, source text span, and start/end dates.
Remember the description extracted earlier? This naturally leads to an abstraction summarization. This helps initially complete the context for each document and chunk. However, relying solely on the chunk-based description can be problematic, so the paper recommends further instance matching between chunks to utilize semantic connections and enhance information.
The derived nodes and edges are then processed using the Leiden algorithm to identify communities, forming a hierarchical community structure. Each node contributes to a mutually-exclusive, collectively-exhaustive information framework. Even nodes that are not directly connected are grouped into communities through global community detection, suggesting indirect associations among similar nodes.
Community detection results in two types of communities: Root and Sub-community. Each level reflects the modularity of how well the communities are segmented, with the Root containing the highest-ranking nodes and the Sub containing the nodes beneath them. This hierarchy helps determine which information should be injected into the LLM context. For instance, if a user query consists of 80% global and 20% local questions, responses should start from the Root level.
To evaluate the usefulness of these ideas, the paper sets six conditions and checks the outcomes for each, including four hierarchical levels: root, high, intermediate, and low, a condition where only chunk-based summarization is performed, and a condition using conventional Semantic Search.
To compare the outcomes of each condition, we use four metrics: Comprehensiveness, Diversity, Empowerment, and Directness. The interpretations of the experiments are intriguing, analyzing from the perspectives of the Global approach, Naive RAG, Community summaries, and source texts. Primarily, it compares how the results across these four metrics differ relative to the token cost incurred. The findings provide valuable insights into the curious aspects of GraphRAG, making it highly recommended to take a closer look.
Recommended Resources:
- Amazon Bedrock and GraphRAG: This resource extensively explains how the Document-Chunk is implemented, as mentioned in the paper. (It is so well-explained that I find myself revisiting it regularly.)
- Multi-hop reasoning Evaluation paper: This paper evaluates using multi-hop reasoning, providing a different perspective. Understanding the differences can give you a sense of how GraphRAG proceeds.