Sunday, 29 March 2026

Building Enterprise RAG: .NET, Semantic Kernel, and Weaviate


As LLMs evolve toward GPT-5.4, the challenge for .NET developers isn't just "connecting to an AI"—it's building a reliable, secure, and scalable Retrieval-Augmented Generation (RAG) pipeline.

In this post, I’ll walk through a demo implementation using Semantic Kernel to orchestrate the flow, Weaviate as the vector memory, and a critical layer of Data Sanitization.

The Architecture

A production-ready RAG application consists of three main stages:

  1. Sanitization: Cleaning raw data to remove noise and protect PII.

  2. Ingestion: Embedding the clean data and storing it in Weaviate.

  3. Orchestration: Using Semantic Kernel to retrieve context and generate answers via GPT-5.4.


1. The Gateway: Data Sanitization

Before data ever touches a vector database, it must be "sanitized." This prevents "Garbage In, Garbage Out" and ensures compliance.

Why Sanitize?

  • Lower Costs: Removing HTML/boilerplate reduces token usage.

  • Better Accuracy: Cleaner text leads to higher-quality embeddings.

  • Security: Redacting PII ensures sensitive data isn't leaked to the LLM.

C#
// Simple Sanitization Utility
public string Sanitize(string rawText) {
    var clean = Regex.Replace(rawText, "<.*?>", string.Empty); // Remove HTML
    clean = Regex.Replace(clean, @"\s+", " "); // Normalize whitespace
    return clean.Trim();
}

2. The Memory: Weaviate + Semantic Kernel

Weaviate provides a highly scalable vector store that integrates seamlessly with .NET via the Semantic Kernel connectors. By using AddWeaviateVectorStore, we can perform sub-second semantic searches across millions of documents.

C#
var kernel = Kernel.CreateBuilder()
    .AddOpenAIChatCompletion("gpt-5.4", apiKey)
    .AddWeaviateVectorStore(endpoint, apiKey)
    .Build();

3. The RAG Flow in Action

The magic happens when Semantic Kernel acts as the "brain," retrieving only the most relevant snippets from Weaviate to ground the GPT-5.4 response in your private data.

Key Benefits of this Stack:

  • Type Safety: Leverage C#’s strong typing for AI plugins.

  • Dependency Injection: Seamlessly integrate AI services into your existing ASP.NET Core apps.

  • Performance: Weaviate’s efficiency paired with the native speed of .NET 8/9.


Source Code: https://github.com/LeelaPrasadG/RAG_Weaviate_SemanticKernel_CSharp

Sunday, 8 March 2026

A Practical Guide to Faithfulness, Relevancy, Retrieval Quality, and Robustness


Retrieval-Augmented Generation (RAG) systems are only as strong as their ability to retrieve the right context and generate answers grounded in that context. While building a RAG pipeline is one challenge, evaluating whether it is actually performing well is another.

This is where RAGAS(Retrieval Augmented Generation Assessment) becomes extremely useful.

RAGAS provides a set of evaluation metrics that help measure both the retriever quality and the generator quality in a RAG pipeline. Instead of relying on vague judgment, these metrics give you a structured way to identify exactly where your system is failing and what needs improvement.

In this article, we will walk through the key RAGAS metrics, what each one measures, how to interpret them, and how to debug low scores.


Why RAG Evaluation Matters

In a RAG system, a poor answer may happen because:

  • the retriever fetched irrelevant chunks,

  • the retriever missed critical information,

  • the LLM hallucinated,

  • the answer did not directly address the question,

  • or the model got distracted by noisy context.

Without proper metrics, it is difficult to know which part of the pipeline is causing the issue.

RAGAS helps break this problem into measurable components.


The 6 Core RAGAS Metrics

1. Faithfulness

Faithfulness checks whether the generated answer stays grounded in the retrieved context.
Its primary purpose is to detect hallucinations — situations where the LLM introduces facts that are not supported by the source material.

Simple way to think about it

Imagine a journalist writing an article based only on interview notes. If the journalist adds details that were never mentioned in the notes, those details are unfaithful to the source. The same logic applies here.

What it measures

  • Whether answer claims are supported by retrieved context

  • Whether the model is inventing unsupported details

Ideal score

1.0

Formula

Supported claims / Total claims

If the score is low

A low faithfulness score usually means the model is hallucinating.

What to improve

  • Strengthen the prompt to enforce context grounding

  • Reduce temperature

  • Use a stronger LLM

  • Limit generation freedom when strict factuality is required


2. Answer Relevancy

Answer Relevancy measures whether the generated answer actually addresses the user’s question.

An answer may be factually correct and still be irrelevant.

Example

If someone asks, “What is the capital of France?” and the answer is “The Eiffel Tower is beautiful,” the statement may be true, but it does not answer the question.

How RAGAS approaches this

RAGAS uses a smart reverse-engineering method:

  1. It generates hypothetical questions from the answer

  2. It compares those generated questions with the original question using embeddings

  3. It computes similarity to determine how relevant the answer is

What it measures

  • Whether the answer is on-topic

  • Whether the answer addresses the intent of the question

Ideal score

1.0

If the score is low

The answer is likely drifting away from the actual question.

What to improve

  • Review your prompt template

  • Ensure question intent is clearly preserved

  • Validate whether the system properly handles different query types such as what, why, when, and who


3. Context Precision

Context Precision measures whether the most relevant retrieved chunks are ranked higher in the retrieval results.

This is not just about retrieving relevant content. It is about retrieving it in the right order.

Simple way to think about it

Imagine a librarian gives you five books to answer a question. Context Precision asks whether the most useful book was placed at the top of the stack.

What it measures

  • Ranking quality of retrieved chunks

  • Whether top-ranked chunks are the most useful

How it works

  • Each retrieved chunk is checked for relevance

  • Precision is calculated at different ranks

  • More relevant chunks near the top produce a higher score

Ideal score

1.0

If the score is low

It usually means irrelevant chunks are being ranked too high.

What to improve

  • Improve your embedding model

  • Add a re-ranker

  • Tune similarity thresholds

  • Improve chunk quality and chunk boundaries


4. Context Recall

Context Recall measures whether the retriever fetched all the necessary information required to answer the question.

This metric focuses on retrieval completeness.

Simple way to think about it

Suppose you are studying from a textbook for an exam. Context Recall asks whether you covered all the chapters needed to answer the exam questions, or whether you missed some important sections.

What it measures

  • Whether all required supporting information was retrieved

  • Whether the retriever missed important facts

How it works

  1. Break the reference answer into claims

  2. Check whether each claim is supported by the retrieved context

  3. Compute: claims found / total claims

Ideal score

1.0

If the score is low

The retriever is likely missing important information.

What to improve

  • Increase top-k retrieval

  • Improve chunking strategy

  • Use better embeddings

  • Expand retrieval scope

  • Consider hybrid retrieval if semantic search alone is insufficient


5. Entity Recall

Entity Recall checks whether the retrieved context contains the important entities mentioned in the reference answer.

These entities may include:

  • people

  • organizations

  • places

  • dates

  • products

  • events

Simple way to think about it

If the correct answer mentions Einstein, 1905, and Princeton, did your retrieved documents include those entities?

What it measures

  • Whether key named entities were successfully retrieved

  • Whether important contextual anchors are present in the retrieval output

Ideal score

1.0

If the score is low

Important entities are not being surfaced by retrieval.

What to improve

  • Use entity-aware chunking

  • Add keyword-based retrieval alongside semantic retrieval

  • Improve metadata filtering

  • Increase retrieval coverage for entity-heavy questions


6. Noise Sensitivity

Noise Sensitivity measures how much irrelevant information in the retrieved context causes the model to make mistakes.

This metric reflects the robustness of the overall RAG system.

Simple way to think about it

Imagine taking an open-book exam, but someone mixed random Wikipedia pages into your notes. Noise Sensitivity measures how often those irrelevant pages confuse you into writing wrong answers.

Important note

Unlike the other metrics, lower is better here.

  • 0.0 = excellent robustness

  • 1.0 = poor robustness

What it measures

  • Whether irrelevant chunks confuse the model

  • Whether noisy context causes wrong claims in the answer

Ideal score

0.0

If the score is high

The model is overly influenced by irrelevant context.

What to improve

  • Filter noisy chunks before generation

  • Use re-ranking

  • Improve prompt robustness

  • Reduce overly broad retrieval

  • Strengthen chunk quality controls


    Quick Reference Table

 

How to Interpret These Metrics Together

One of the biggest mistakes in RAG evaluation is looking at only one metric.

A system may have:

  • high context recall but low faithfulness,

  • high answer relevancy but poor context precision,

  • or strong retrieval but high noise sensitivity.

Each metric tells only part of the story.

A practical grouping

Generator-focused metrics

  • Faithfulness

  • Answer Relevancy

Retriever-focused metrics

  • Context Precision

  • Context Recall

  • Entity Recall

System robustness metric

  • Noise Sensitivity

This grouping helps you quickly identify whether the issue is in:

  • generation,

  • retrieval,

  • or end-to-end robustness.

   Debugging Guide: What to Do When Scores Are Low

     


Example: Evaluating a Simple RAG Sample

Let us take a simple example.

Question

What is the Eiffel Tower and where is it located?

Generated Response

“The Eiffel Tower is a famous iron lattice tower located in Paris, France. It was built in 1889.”

Reference Answer

“The Eiffel Tower is a wrought-iron lattice tower in Paris, France. It was constructed from 1887 to 1889.”

Retrieved Contexts

  1. The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.

  2. The tower was constructed from 1887 to 1889 as the centerpiece of the 1889 World's Fair.

  3. The Eiffel Tower is named after Gustave Eiffel, whose company designed and built the tower.

  4. Paris is known for its café culture and fashion industry.

In this example:

  • Faithfulness should be reasonably high because the answer is supported by context

  • Answer Relevancy should be high because the response addresses the question directly

  • Context Precision depends on whether the most relevant chunks are ranked at the top

  • Context Recall checks whether all facts needed for the reference are present

  • Entity Recall checks whether key entities such as Eiffel Tower and Paris are present

  • Noise Sensitivity measures whether the unrelated Paris sentence causes confusion

This is a great example because it includes both useful context and a small amount of noise, making it ideal for demonstration.


Key Insights

1. Faithfulness and Answer Relevancy evaluate the Generator

These tell you whether the LLM is producing grounded and question-focused answers.

2. Context Precision, Context Recall, and Entity Recall evaluate the Retriever

These help assess retrieval quality, completeness, and entity coverage.

3. Noise Sensitivity evaluates robustness

This tells you how well the full system handles irrelevant context.

4. Intermediate evaluation is critical

Looking only at final answers is not enough. You need to inspect retrieval quality and grounding behavior separately.

5. The best evaluation comes from combining metrics

No single metric fully explains RAG quality. The real value comes from using these signals together.


Final Thoughts:

If you are building enterprise-grade RAG systems, these metrics can help you move from guesswork to measurable quality improvement. They not only show whether a RAG pipeline is working, but also pinpoint whether the problem lies in retrieval, ranking, generation, or robustness.

That makes RAGAS not just an evaluation tool, but a debugging framework for building better GenAI applications.


Git code for the Implementation of all these Metriceshttps://github.com/LeelaPrasadG/rag_langchain/blob/main/12_RAGAS_Metrics_Deep_Dive.ipynb


Referencehttps://medium.com/@danushidk507/evaluation-with-ragas-873a574b86a9

Python Concepts

 

Synchronous and ASynchronous:


Core Differences
  • Synchronous (Sync): Tasks are executed sequentially, one after another. If a task involves waiting (like a network request), the entire program "blocks" and stays idle until that task finishes.
  • Asynchronous (Async): Tasks can run concurrently. When an async task hits a waiting period, it "yields" control back to the system, allowing other tasks to run in the meantime.

FeatureSynchronousAsynchronous
ExecutionSequential (one at a time)Concurrent (overlapping)
BlockingBlocking architectureNon-blocking architecture
ComplexitySimple to write and debugMore complex; requires async/await
Best ForCPU-bound tasks (heavy math)I/O-bound tasks (web requests)

Sunday, 1 March 2026

Quadrant and Weaviate Vector Stores

 

Qdrant:

Qdrant is a vector database that excels at:
- ✅ Local development (no Docker required)
- ✅ Fast similarity search
- ✅ Flexible storage options (in-memory or persistent)
- ✅ Rich metadata filtering

It has 3 Approaches:

1. In-Memory - Fast but temporary (lost when script ends)
2. Persistent - Saved to disk (survives restarts)
3. from_documents - Easiest method (recommended)

Basic Similarity Search:
print("BASIC SIMILARITY SEARCH")

# Search for documents similar to this query
# k=2 means return the top 2 most similar documents
results = qdrant_store_memory.similarity_search(
    "Tell me about RAG",
    k=2
)

Filter on Metadata:
from qdrant_client.models import Filter, FieldCondition, MatchValue
print("SEARCH WITH METADATA FILTER")
print("-" * 80)
#metadata={"topic": ["rag", "llms", "agents"]}
# Create a filter to only search documents with topic='rag'
# Note: We use 'metadata.topic' because metadata is nested
qdrant_filter = Filter(
    must=[
        FieldCondition(
            key="metadata.topic",
            match=MatchValue(value="rag")
        )
    ]
)

# Same search, but only among filtered documents
results_filtered = qdrant_store_memory.similarity_search(
    "Tell me about RAG",
    k=2,
    filter=qdrant_filter
)

Multiple filter Conditions:
multi_filter = Filter(
    must=[
        FieldCondition(key="metadata.topic", match=MatchValue(value="rag")),
        FieldCondition(key="metadata.difficulty", match=MatchValue(value="intermediate"))
    ]
)

Create Qdrant store directly from Documents.
# Create Qdrant store directly from documents
# This is the easiest way - everything happens in one call!
qdrant_store_easy = QdrantVectorStore.from_documents(
    documents=sample_docs,          # Your documents
    embedding=embeddings,            # Embedding function
    path="./qdrant_easy",           # Local persistence (optional)
    collection_name="rag_collection" # Collection name
)

Weaviate:
This runs on Docket Image.

weaviate_client = weaviate.connect_to_local(
        host="localhost",
        port=8080,
        grpc_port=50051
    )
embeddings = OllamaEmbeddings(model="nomic-embed-text")

print("✓ Ollama embeddings initialized")
print("  Model: nomic-embed-text")
print("  Dimension: 768")

print("QDRANT FROM_DOCUMENTS (RECOMMENDED METHOD)")
# Create Qdrant store directly from documents
# This is the easiest way - everything happens in one call!
qdrant_store_easy = QdrantVectorStore.from_documents(
    documents=sample_docs,          # Your documents
    embedding=embeddings,            # Embedding function
    path="./qdrant_easy",           # Local persistence (optional)
    collection_name="rag_collection" # Collection name
)


Sample Code is under https://github.com/LeelaPrasadG/rag_langchain/blob/main/Vector_Stores_Tutorial.ipynb

Building a ReAct Agent with LangGraph & LangSmith

In this post, I walk through building a ReAct (Reasoning + Acting) agent using LangGraph and Groq's openai/gpt-oss-120b model, where the...