cancel
Showing results for 
Search instead for 
Did you mean: 
Chetan_Tiwary_
Community Manager
Community Manager
  • 948 Views

RAG : Stop Hallucinating, Start Retrieving!

The very design of LLMs inherently leads to a degree of unpredictability in their output. Furthermore, the knowledge contained within LLMs is limited by their static training data, meaning there's a specific cut-off date for the information they possess and hence may produce outdated or inaccurate information.

This is HALLUCINATION !

Common issues encountered with LLMs include:

*Providing incorrect information when they lack a direct answer.
*Delivering outdated or overly general responses when a user requires something specific and current.
*Generating content based on unreliable or non-authoritative sources.
*Producing inaccurate replies stemming from confused terminology, where different training materials might use the same terms for distinct concepts.

 

Hence RAG ( Retrieval-Augmented Generation ).

RAG is is a AI/ML architecture that enhances a model's output by leveraging authoritative, external data to boost its accuracy, relevance, and overall usefulness.

It has two key components : 1. Retriever  2. Generator  and the overall process can be divided into 4 main steps :

+-------------------+                                               +---------------------+
| Company Docs|                                         | Public Knowledge|
| Internal PDFs, |                                          | (e.g., Wikipedia) |
| APIs, Wikis, etc.|                                         +----------+----------+
+---------+---------+ |
             |                                                                         |
             v                                                                         v
[ INGESTION ] -->          Embed into vector DB (Pinecone, FAISS, etc.).

          |
          v
[ RETRIEVAL ] -->           Search relevant chunks using user query.

         |
         v
[ AUGMENTATION ] --> Combine query + retrieved chunks into a rich prompt.

        |
        v
[ GENERATION ] --> LLM generates smart, grounded, fact-friendly output.

 

In short this is a simple RAG pipeline :

Chetan_Tiwary__0-1750016578431.png

 

 

1. Indexing Phase :

Document ingestion: Load PDFs, docs, logs. 

Chunking: Divide content into paragraphs or token windows.

Embedding: Convert text to vectors using models like Sentence-BERT/OpenAI.

Store in vector DB: Tools like Pinecone, Weaviate, Milvus etc ensure fast similarity searches. 


2. Inference Phase

Embed the query using the same embedding model.

Retrieve top-k chunks via ANN or cosine similarity search. 

Compose prompt:Inject relevant chunks and original question.

Generate answer with the LLM.

Post-process: Filter content, rerank, format answers with citations.

 

Some of the use cases could be :

1. ChatBots trained on a company's internal docs, SOPs, reports etc.

2. Technical Assistant bot - you can feed the historic SNOW or JIRA data to automate ticket acknowedgement or for RCA analysis. Also, why would not it help assign those tickets to a suitable agent based on his/her availability, expertise and rating /performance ? 

3. Techincal Chatbots to query historical RHEL or Openshift KBs to generate basic and general torubleshoot queries like how to create a network bond or how to restart a container etc.

4. Legal policy or compliance chatbot - for complex data like GDPR , copyright or patent laws.

RAG is useful because it directs the LLM to retrieve specific, real-time information from your chosen source (or sources) of truth. RAG can save money by providing a custom experience without the expense of model training and fine-tuning. It can also save resources by sending only the most relevant information (rather than lengthy documents) when querying an LLM.

You thought this was all ? Last but not the least , RAG got updated and improved :

Reranking: This involves refining the relevance of retrieved information by re-scoring the top data chunks before they are sent to the Large Language Model (LLM).
Dynamic RAG (DRAGIN): This pattern allows the system to fetch more context during the generation process itself, which is particularly useful for tasks involving multiple turns or steps.
Secure RAG: This focuses on protecting sensitive data by encrypting vectors and managing access controls to the information.
GraphRAG: This approach integrates knowledge graphs with text chunks to provide a richer, more contextual understanding of the information being used.

WARNING - Proceed at your own risk !!!

https://developers.redhat.com/articles/2024/12/04/level-your-generative-ai-llms-and-rag#how_to_creat... 

https://developers.redhat.com/articles/2024/11/20/rhel-ai-action-streamline-ai-workflows-rag-lab-and... 

 

Labels (5)
0 Kudos
0 Replies
Join the discussion
You must log in to join this conversation.