LIGHTRAG: SIMPLE AND FAST RETRIEVAL-AUGMENTED GENERATION

14 min readJan 19, 2025

Paper : https://arxiv.org/pdf/2410.05779

github : https://github.com/HKUDS/LightRAG

Introduction

Hello, I’m Yi-Tae Jeong. Recently, there has been a significant surge of interest in LightRAG, leading me to ponder the reasons behind its rising popularity. From my observations, it seems that many professionals have encountered a similar journey in adopting retrieval-augmented generation (RAG) methodologies:

Starting with Vector-based RAG: The initial step often involves utilizing vector-based RAG approaches.
Addressing Hallucination: As hallucination issues occasionally arise, the need for reasoning becomes evident, often leading to an exploration of advanced technologies like knowledge graphs.
Integrating GraphRAG: Efforts to combine knowledge graphs with vector RAG systems frequently point to Neo4j’s GraphRAG.
Challenges in Ontology Extraction: Users then realize the complexity of extracting ontologies and constructing knowledge bases.
Exploring Microsoft GraphRAG: To simplify the process, Microsoft GraphRAG’s end-to-end capabilities for entity extraction and knowledge base construction come into play.
LightRAG Discovery: Concerns over high latency and LLM cost eventually lead to LightRAG as a promising alternative.

The underlying motivations for this transition are twofold: resolving hallucination issues and optimizing ontology extraction while minimizing costs. Today, let’s focus on the latter — GraphRAG — and explore how to effectively implement it from a cost-efficiency perspective.

Core Challenges Highlighted by LightRAG

LightRAG identifies key limitations in existing RAG systems:

Flat Data Representation: Heavy reliance on flat data structures hinders the ability to capture relationships between entities.
Limited Contextual Awareness: Conventional RAG systems lack the ability to preserve and utilize dynamic relationships between entities for contextual understanding.

To address these, LightRAG sets ambitious goals:

Dependency Awareness: Identify and utilize interdependencies between entities within documents.
Retrieval Efficiency: Enhance response times through efficient retrieval mechanisms.
Low-Cost Adaptability: Reduce costs associated with updating knowledge bases for new data.

Architecture

If you look at Figure 1, you can see that LightRAG proceeds in the following four steps.

1.Graph-based Text Indexing
2.Index Graph used for Retrieval
3.Dual-level Retrieval Paradigm

If we think about what is different from GraphRAG as we know it, we can find the following. Dual-level Retrieval Paradigm. Before we get into the specifics, the dual-level retrieval paradigm is where we apply two levels of abstraction to entities, one for macro-level answers and one for micro-level answers.

Haven’t you heard something similar before? Yes, Microsoft GraphRAG is similar to this in terms of community detection and traversing it. We don’t do community detection here.

So what you’ll find interesting is the comparison between LightRAG’s dual-level retrieval paradigm and Microsoft’s community detection, and how we were able to improve the cost-effectiveness and accuracy results. There’s a lot of other interesting things, so I’ll explain them later.

With the above architecture, I understand the following three points that LightRAG claims to do well. I’ve quoted them from the text and added my own questions.

LightRAG employs efficient dual-level retrieval strategies: low-level retrieval, which focuses on precise information about specific entities and their relationships, and high-level retrieval, which encompasses broader topics and themes.

You can utilize low-level retrieval to focus on precise information about specific entities and their relationships. You can also use high-level retrieval, which encompasses broader topics and themes.

-> How to categorize low-level and high-level.

LightRAG employs efficient dual-level retrieval strategies: low-level retrieval, which focuses on precise information about specific entities and their relationships, and high-level retrieval, which encompasses broader topics and themes.

-> What criteria is for between low-level , high-level keywords and how use it?

LightRAG effectively accommodates a diverse range of quries, ensuring that users receive relevant and comprehensive responses tailored to their specific needs. Additionally, by integrating graph structures with vector representations, our framework facilitates efficient retrieval of related entities and relations while enhancing the comprehensiveness of results through relevant structural information from the constructed knowledge graph.

-> 1. How LightRAG’s user query tailor operation works 2. Alignmenet between Vector context and Knowledge graph context

By eliminating the need to rebuild the entire index, LightRAG reduces computational costs and accelerates adaptation, while its incremental update algorithm ensures timely integration of new data, maintaining effectiveness in dynamic environments.

-> How indexing rebuild cost reduce?

This is an overview of the architecture and the benefits of utilizing it: 1. Answer user queries at an abstract level 2. Integrate efficiently with existing Vector RAG systems 3. Improve the cost of RAG indexing 3. improve the cost of existing RAG indexing. Next, we’ll see what algorithms have been developed to accomplish this.

Graph-based text indexing

The first step. This is the process of designing a graph from text data. It is transformed into a graph through three simple functions. 1. Recognize (extracting entities and edges), 2. Profiling (profiling entities), and 3. Deduplication (removing duplicates). It consists of recognizing, inspecting, and deduplicating entity edges. This is where profiling #2 is unique: it stores entities and edges in a key-value store.

There will be entities that occur in different documents, and the relationship may be different for each document, so we use a key-value store to store them all, and the key is that when the entity is retrieved, we get the relevant relationship among the various relationships and use it as the kg context.

In addition, when you extract entities, you also extract high-level keywords. When you extract entities in a document, the prompts directly tell you to take entities that reflect a concept, theme, or topic and store them separately. It’s interesting to take a look at some of the excerpts below.

As I just mentioned, we’re prompted directly to capture the overarching ideas present in the document, and in the example below, we’re given text and asked to type in a few shot example high_level_keywords that we’ve predefined.

I personally think this is the most important part, because I think it depends on how much domain expertise and quality examples with domain bias can be provided to LLMs who are not sure how to extract high_level keywords. Someone’s high_level_keyword may not be someone else’s high_level_keyword, so it is a priority to provide prompts and examples that have a good balance of general and specific.

# lightrag/prompt.py/entity_extraction

Identify high-level key words that summarize the main concepts, themes, or topics of the entire text. These should capture the overarching ideas present in the document.

Format the content-level key words as ("content_keywords"{tuple_delimiter}<high_level_keywords>)

Text:
while Alex clenched his jaw, the buzz of frustration dull against the backdrop of Taylor's authoritarian certainty. It was this competitive undercurrent that kept him alert, the sense that his and Jordan's shared commitment to discovery was an unspoken rebellion against Cruz's narrowing vision of control and order.

Then Taylor did something unexpected. They paused beside Jordan and, for a moment, observed the device with something akin to reverence. “If this tech can be understood..." Taylor said, their voice quieter, "It could change the game for us. For all of us.”

The underlying dismissal earlier seemed to falter, replaced by a glimpse of reluctant respect for the gravity of what lay in their hands. Jordan looked up, and for a fleeting heartbeat, their eyes locked with Taylor's, a wordless clash of wills softening into an uneasy truce.

It was a small transformation, barely perceptible, but one that Alex noted with an inward nod. They had all been brought here by different paths

("content_keywords"{tuple_delimiter}"power dynamics, ideological conflict, discovery, rebellion"){completion_delimiter}

Indexing

Notice something unusual about the properties of the node in the image above. The original chunks id is used to reference where this entity came from, and the description is a property that reflects the LLM’s own second thought about the entity and relationship.

If you look at the prompts, you’ll see “Comprehensive description” and “explanation as to why you think the source entity and the target entity are related to each other”. This is similar to the CoT, in that we’re asking LLM to self-reflect, and we’re using the results to distinguish between entities from different data and infer context.

If you think about it simply, documents in similar domains will have the same entities, but there is a high probability that the entities will be used in different contexts, so we store and manage all of them, extract the meta, and use it for generation.

# lightrag/prompt.py/entity_extraction

- entity_description: Comprehensive description of the entity's attributes and activities
- Format each entity as ("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>)

- relationship_description: explanation as to why you think the source entity and the target entity are related to each other
- relationship_keywords: one or more high-level key words that summarize the overarching nature of the relationship, focusing on concepts or themes rather than specific details
Format each relationship as ("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_keywords>{tuple_delimiter}<relationship_strength>)

Dual level retrieval

If we take the core of the prompt, we extract the high-level and low-level keywords for the user’s query. At this time, the abstraction level is adjusted by filling in the words overarching and specific, and a few-shot example is shown to reflect this. The LLM divides the user’s query into macro and micro perspectives by looking at the abstraction level words and examples in the previous prompt.

# lightrag/prompt.py/keywords_extraction

You are a helpful assistant tasked with identifying both high-level and low-level keywords in the user's query.

---Goal---

Given the query, list both high-level and low-level keywords. High-level keywords focus on overarching concepts or themes, while low-level keywords focus on specific entities, details, or concrete terms.

---Instructions---

- Output the keywords in JSON format.
- The JSON should have two keys:
  - "high_level_keywords" for overarching concepts or themes.
  - "low_level_keywords" for specific entities or details.

# lightrag/prompt.py/keywords_extraction_examples
Output:
{
  "high_level_keywords": ["International trade", "Global economic stability", "Economic impact"],
  "low_level_keywords": ["Trade agreements", "Tariffs", "Currency exchange", "Imports", "Exports"]
}

In rag_response, or generation, we finally retrieve the keywords stored in the indexing step from the json format and utilize them. It categorizes the query entered by the user into high-level and low-level keywords, matches these categorized keywords with the high-level and low-level keywords already stored in the knowledge base, and gives an answer that reflects the abstraction level accordingly.

# lighrag / prompt.py / rag_response
You are a helpful assistant tasked with identifying both high-level and low-level keywords in the user's query.

---Goal---

Given the query, list both high-level and low-level keywords. High-level keywords focus on overarching concepts or themes, while low-level keywords focus on specific entities, details, or concrete terms.

---Instructions---

- Output the keywords in JSON format.
- The JSON should have two keys:
  - "high_level_keywords" for overarching concepts or themes.
  - "low_level_keywords" for specific entities or details.

######################
-Examples-
######################
{examples}

#############################
-Real Data-
######################
Query: {query}
######################
The `Output` should be human text, not unicode characters. Keep the same language as `Query`.
Output:

Integrating Graph and Vectors for Efficient Retrieval

Once you have extracted multiple contexts, it is time to synthesize them. First of all, we fetch at least 10 topks from the vector knowledge base, and the prompt says to list up 5 relevant contexts from the Vector and KG contexts and provide them to the user as a reference. We’ve added the reranker function that we often utilize here.

Another good thing to look at is ‘When handling information with timestamps’. It talks about how to derive meaningful context from a constantly updated knowledge base.

The key takeaway is “The most recent information is not always the best. You should focus on three main criteria: content, relationship, and timestamp. If you have a time-specific query, prioritize Temporal.” I guess this falls under the Rapid knowledge update and efficient retrieval mentioned at the beginning.

# lightrag / operator.py

async def get_vector_context():
        # Reuse vector search logic from naive_query
        try:
            # Reduce top_k for vector search in hybrid mode since we have structured information from KG
            mix_topk = min(10, query_param.top_k)
            results = await chunks_vdb.query(query, top_k=mix_topk)
            if not results:
                return None


# lightrag / prompt.py / mix_rag_response
---Role---

You are a professional assistant responsible for answering questions based on knowledge graph and textual information. Please respond in the same language as the user's question.

---Goal---

Generate a concise response that summarizes relevant points from the provided information. If you don't know the answer, just say so. Do not make anything up or include information where the supporting evidence is not provided.

When handling information with timestamps:
1. Each piece of information (both relationships and content) has a "created_at" timestamp indicating when we acquired this knowledge
2. When encountering conflicting information, consider both the content/relationship and the timestamp
3. Don't automatically prefer the most recent information - use judgment based on the context
4. For time-specific queries, prioritize temporal information in the content before considering creation timestamps

---Data Sources---

1. Knowledge Graph Data:
{kg_context}

2. Vector Data:
{vector_context}

---Response Requirements---

- Target format and length: {response_type}
- Use markdown formatting with appropriate section headings
- Aim to keep content around 3 paragraphs for conciseness
- Each paragraph should be under a relevant section heading
- Each section should focus on one main point or aspect of the answer
- Use clear and descriptive section titles that reflect the content
- List up to 5 most important reference sources at the end under "References", clearly indicating whether each source is from Knowledge Graph (KG) or Vector Data (VD)
  Format: [KG/VD] Source content

Add sections and commentary to the response as appropriate for the length and format. If the provided information is insufficient to answer the question, clearly state that you don't know or cannot provide an answer in the same language as the user's question."""

I thought it might be difficult to understand the prompt, so I imported Figure 3 from the paper for better understanding. As shown below, extract high-level keyword and low-level keyword and put Entity, relationship (KG), and sources (Vector) into the retrieval context accordingly.

Evaluation & metric

- (RQ1): How does LightRAG compare to existing RAG baseline methods in terms of generation performance?

- (RQ2): How do dual-level retrieval and graph-based indexing enhance the generation quality of LightRAG?

- (RQ3): What specific advantages does LightRAG demonstrate through case examples in various scenarios?

- (RQ4): What are the costs associated with LightRAG, as well as its adaptability to data changes?

We have described the architecture and detailed modules with prompts above. We need to experiment to prove that this architecture is really efficient. In this thesis, we set up and experiment with the following four research questions.

RQ1,2 are experiments to observe whether it is better than traditional RAG (vector) and how much the dual-level perspective contributes to performance improvement, and RQ3,4 are experiments to compare GraphRAG vs. LightRAG performance and cost-effectiveness.

In this posting, we will focus on RQ3 and RQ4, as it is important to look at GraphRAG vs. LightRAG. The metrics we used for RQ3 and 4 are as follows, which should be familiar to many of you, as they were used in Microsoft GraphRAG experiments with VectorRAG.

i) Comprehensiveness: How thoroughly does the answer address all aspects and details of the question?
ii) Diversity: How varied and rich is the answer in offering different perspectives and insights related to the question?
iii) Empowerment: How effectively does the answer enable the reader to understand the topic and make informed judgments?
iv) Overall: This dimension assesses the cumulative performance across the three preceding criteria to identify the best overall answer.

RQ3. What specific advantages does LightRAG demonstrate through case examples in various scenarios?

LLM as judgment results. In each of the three categories, LightRAG was judged to be superior, and if you look at the key words in the reviews, you can see that factors such as MAPK … , which not only covers a broad array of metrics but also includes nuanced explanations of how some … , empowers the reader more effectively by detailing … , are at play.

This success is due to LightRAG’s hierarchical retrieval paradigm, which combines in-depth explorations of related entities through low-level retrieval to enhance empowerment with broader explorations via high-level retrieval to improve answer diversity” low-level ,high-level retrieval .

Commenting on these results, the paper authors say “Indirectly, we can interpret this to mean that the hierarchy created by LightRAG using prompts and KV store is better than the hierarchy created by community-based traversal, but is the cost also better? We will discuss this in the next RQ4.

RQ4. MODEL COST AND ADAPTABILITY ANALYSIS

GraphRAG

Generates 1,399 communities, of which 610 level-2 communities are used for information retrieval.
Each community report averages 1,000 tokens, totaling 610,000 tokens.
Individual communities must be explored sequentially, requiring hundreds of API calls, resulting in high search overhead.
Adding new data requires a complete reorganization of the existing community structure, requiring approximately 1,399 × 2 × 5,000 tokens to reorganize, making it cost inefficient.

LightRAG

Uses less than 100 tokens for keyword generation and search, and performs operations in a single API call.
Search mechanism integrates graph structure and vector representation, eliminating the need for initial bulk data processing.
When integrating new data, it can be merged directly into existing graphs, dramatically reducing tokens and overhead.

Here are the key takeaways

Efficiency comparison: LightRAG performs search operations with significantly fewer tokens and API calls than GraphRAG, making it more cost and time efficient.
Dynamic data management: LightRAG maintains existing graphs and easily incorporates new data, while GraphRAG requires all communities to be reconfigured, which is inefficient.
Lower costs: LightRAG has much lower overhead than GraphRAG during search and data updates.

That’s all for today’s LightRAG overview, but I’ve gotten a little carried away with the details. In this case, it was important to store entities and edges in key-value form when profiling and efficiently retrieve them using only key indexes when combining various entities.

From a dual-level perspective, we utilized keyword prompts to divide high-level and low-level queries. In this case, the keyword propmpt includes kg_context and vector_context in the prompt to utilize the best of both worlds, making the RAG system more complete. This was different from the existing GraphRAG paradigm.

In particular, if we recall the difference between LightRAG and Microsoft GraphRAG, which is well known to the public, is that unlike community-based traversal, LightRAG combines high-level and description perspectives to answer macroscopic questions with high abstraction levels.

Entity extraction prompts still have the same continue perspective, but the key point is that the cost of answering high-level questions is reduced, so you can get all the benefits of GraphRAG relatively quickly and cheaply.

When I see GraphRAG papers, I often wonder, “How did they design the prompts to extract the ontology?” I’m starting to think, “How did they design the prompts to extract the ontology?” because I feel that it’s important to extract and combine the various data metas that occur when integrating data into entities, and I feel that the GraphRAG trend is going in that direction.

I think it’s an area where there’s no right answer. I think domain knowledge is still important, and how you embed it into the prompts is going to be one of the keys to GraphRAG performance.

[Contact Info]

Gmail: jeongiitae6@gmail.com

Linkedin : https://linkedin.com/in/yitaejeong