GraphRAG interview — How we get good quality knowledge graph simply w/Alonso

4 min readJan 13, 2025

As technology advances, large language models (LLMs) and information retrieval/management methods evolve at breakneck speed. Knowledge graphs and vector databases are attracting a lot of attention, and recently, more sophisticated QA systems have been emerging through the application of RAG (Retrieval-Augmented Generation). Today, we spoke with content, who operates these kinds of technology stacks firsthand. He shared insights about his challenges and experiences.

his recent presentation video : Enhancing RAG-based apps by constructing and leveraging knowledge graphs with open-weights LLMs , https://www.youtube.com/watch?v=sjprfQw5TJw

I recommend you to watch above video, if you watch that, you get the knowledge of graphRAG from RAG to Knowledge graph construction.

Q1. Is a Graph Database absolutely necessary?

content
“For the scalability and maintenance of a knowledge base, choosing the right database is essential. Whether you opt for vector databases, graph databases, or both, it’s crucial to understand their respective strengths and limitations. When dealing with large-scale knowledge management, it’s important to select a DB that best serves your needs in terms of expandability and relationship handling.”

Q2. Any strategies for building a customized evaluation framework for the enterprise?

content
“We don’t use a ‘closed’ model; instead, we use an ‘open’ model such as Llama 70B with quantization techniques to make it more lightweight. We leverage that open model as a kind of ‘judgment engine.’ Prompt engineering is particularly important, as it shapes the model’s output according to the project’s goals. In an enterprise setting, the key is to securely and efficiently link and evaluate in-house data. That means creating a custom evaluation framework that can assess various scenarios in detail.”

Q3. How are you designing and operating your Graph Agent?

content
“I believe a single type of database is insufficient, so we use multiple solutions — Lance, Qdrant, Milvus, Kuzu, Neo4j — to handle both vector and graph databases. Initially, we found LangChain convenient for building a basic RAG setup. However, as the project grew, we needed to manage debugging, metadata, and custom functions more directly, which led us to build everything from scratch.
It seems common to use tools like LangChain for quick concept proofs in the early RAG stage, then eventually switch to a ‘from scratch’ implementation for the final production phase because of flexibility and optimization needs.”

Q4. What are the advantages of a Graph Retriever compared to Dense or Sparse retrievers?

content
“I see two major strengths of GraphRAG:

  1. In terms of RAG configuration: Multi-hop reasoning,
  2. As a database: JOIN functionality.

There’s another noteworthy aspect: counting. For example, when the question is ‘how many documents have ~~?,’ vector-based approaches often yield approximate results, whereas graph-based approaches leverage relationship-based exact counting. You might achieve similar results with SQL, but it often involves repeated JOINs and recursive queries. Graph databases specialize in handling these kinds of relationship-based arithmetic operations, which becomes especially helpful for multi-hop queries.”

Q5. How important is ontology, and how do you tackle ontology extraction?

content
“Defining an ontology and extracting it automatically is crucial but not easy. We are still testing a variety of tools, but often end up doing something close to a ‘from scratch’ approach. While frameworks like LangChain are well-developed, they can feel too heavy for highly customized enterprise projects where you need precise tuning.”

Q6. How do you approach entity resolution and linking strategies?

content
“These are quite challenging areas, especially as the scale grows. The more precise you want your mapping to be, the more resources you need. In the end, it can be hard to measure how much all that extra work truly improves your search or QA performance. Balancing accuracy with resource allocation is crucial for an overall solution.”

Q7. How do you handle prompt engineering?

content
“We often use something called an ell module for a simpler, lighter approach to prompt engineering, whereas DsPY feels heavier to us. The goal is to build an environment where custom prompts can be created and refined on the fly, integrating smoothly with various models.”

Q9. Any tips on designing a domain/lexical (document) graph structure?

content
“We typically follow this two-step process:

  1. Use an LLM to propose an initial guide or outline for the structure.
  2. Collaborate with domain experts to verify accuracy and bring in external references or documents to finalize the knowledge graph.

Referring to documentation like OpenAI’s Structured Outputs can help streamline this process.”

Q10. Is graph embedding really necessary?

content
“It’s not an absolute must in every context, but it’s quite convenient for text attribute searches. When you want to combine structural and lexical features in a single search flow, graph embeddings can effectively unify both relationship-based and context-based information. They become particularly powerful for searching or inferring relationships and semantics simultaneously.”

Conclusion

Our conversation with content sheds light on various strategies for integrating knowledge graphs with large language models:

  • DB Selection and RAG Setup: While frameworks like LangChain are useful for initial proof-of-concept, many teams migrate to a from-scratch implementation in production for better customization and debugging capabilities.
  • Graph Advantages: Multi-hop reasoning, JOIN operations, and the ability to handle counting or other complex queries.
  • Open Models and Prompt Engineering: Leveraging open models such as Llama 70B (with quantization) alongside a custom evaluation framework for enterprise needs.
  • Challenges in Ontology and Entity Resolution: Striking the right balance between automation tools and manual tweaking to achieve high accuracy without excessive resource expenditure.

Many organizations and research teams are actively using RAG, graph databases, and LLMs to develop advanced search systems and intelligent agents. content’s experience and insights offer valuable lessons drawn from real-world practice.

If you want to communicate or discuss certain graph techniques, feel free to message me; I’m always open.

Linkedin — https://www.linkedin.com/in/yitaejeong/

--

--

Jeong Yitae
Jeong Yitae

Written by Jeong Yitae

Linkedin : jeongyitae I'm the graph and network data enthusiast from hardware to software(application)

No responses yet