Getting Started

Introduction

What is Agentic RAG?

Agentic RAG is a method that combines the strengths of retrieval-augmented-generation with autonomous agents.

With vanilla RAG, Vectara receives a user query, retrieves the most relevant facts from your data, and uses an LLM to generate the most accurate response based on those facts.

Agentic RAG leverages an LLM to “manage” the process of answering the user query via reasoning, planning, and a provided set of “tools”. Since a “manager” LLM-powered agent is in charge, it is smart enough to analyze the user query and properly call tools to obtain a comprehensive response to a complex user query.

For example:

  • The agent can rephrase the user query.

  • The agent can break it down into multiple (simpler) sub-queries and call the RAG query tool for each sub-query, then combine the responses to come up with a comprehensive response.

  • The agent can identify filtering criteria in the user query and use them to filter the results from the RAG query tool.

The main tool used in vectara-agentic is the RAG query tool, which queries a Vectara corpus and returns the most relevant response. By using a RAG-based agent, you mitigate some of the issues with pure LLMs, particularly hallucinations and explainability.

Additional tools give your application superpowers to retrieve up-to-date information, access enterprise specific data via APIs, make SQL queries to a database, or even perform actions such as creating a calendar event or sending an email.

Unfamiliar with RAG? Check out this page to learn more!

RAG vs Agentic RAG

Let’s demonstrate this via a simple example.

Imagine that you have ingested into Vectara all your Google Drive files, JIRA tickets, and product documentation. You build an Agentic RAG application using these tools:

  1. A JIRA RAG query tool

  2. A Google Drive RAG query tool

  3. A product docs RAG query tool

  4. A tool that can issue SQL queries against an internal database containing customer support data

Consider the query: “What is the top issue reported by customers in the last 3 months? Who is working to solve it?”

A standard RAG pipeline would try to match this entire query to the most relevant facts in your data, and generate a response. It may fail to distinguish the query as two separate questions, and given the complexity, may fail to produce a good response.

An Agentic RAG pipeline would form a query with its SQL tool to identify the top issue reported by customers in the last 3 months, and then use the JIRA tool to identify who is working on it.

What is vectara-agentic?

Vectara-agentic is a Python package that allows you to build Agentic RAG applications quickly with Vectara. It is based on LlamaIndex and provides a simple API to define tools, including a quick way to generate Vectara RAG tools.

It also includes some pre-built tools that you can use out of the box for various topics, such as legal or finance, and provides access to a wide range of LLMs through integrations with OpenAI, Anthropic, Together.AI, Cohere, GROQ, and Fireworks AI.

Agent Architecture

vectara-agent architecture

Vectara-agentic follows a typical agentic RAG architecture. It consists of the following components:

  • A central LLM, or agent (based on ReAct, OpenAI, or LLMCompiler agent type) that manages the process of answering the user query.

  • One or more RAG tools for making queries to corpora in Vectara.

  • A set of additional tools that the agent can use to retrieve information, process data, or perform actions.

The agent is responsible for reasoning, planning, and executing the process of answering the user query using the available tools.

Basic Example

The most basic application you can make with vectara-agentic is an agent with a single RAG tool that can pull information from a Vectara corpus. This is very similar to specifying the parameters for a Vectara query, but you can also add special instructions for how you want your agent to behave. These instructions are passed to an LLM to describe your RAG tool and agent.

Let’s see how this is implemented in code with our standard initialization:

from vectara_agentic.agent import Agent
from vectara_agentic.tools import VectaraToolFactory
from pydantic import Field, BaseModel

import os
from dotenv import load_dotenv

load_dotenv(override=True)

api_key = str(os.environ['VECTARA_API_KEY'])
customer_id = str(os.environ['VECTARA_CUSTOMER_ID'])
corpus_id = str(os.environ['VECTARA_CORPUS_ID'])


vec_factory = VectaraToolFactory(
  vectara_api_key = api_key,
  vectara_customer_id = customer_id,
  vectara_corpus_id = corpus_id
)

class QueryPetPolicyArgs(BaseModel):
    query: str = Field(..., description="The user query.")

query_tool = vec_factory.create_rag_tool (
    tool_name = "ask_pet_policy",
    tool_description = "Responds to questions about Vectara's pet policy.",
    tool_args_schema = QueryPetPolicyArgs,
    summary_num_results = 10,
    n_sentences_before = 3,
    n_sentences_after = 3,
    mmr_diversity_bias = 0.1,
    include_citations = False
)

agent = Agent(
    tools = [query_tool],
    topic = "Vectara Pet Policy"
)

agent.chat("What is Vectara's pet policy?")

When we run this code, we get the following response:

Vectara’s pet policy does not allow common household pets like cats and dogs on their campuses. Instead, they welcome a select group of exotic creatures that reflect their innovative spirit and core values. Additionally, birds are not only permitted but encouraged in their workspace as part of their unique approach.

In the above code, we defined a single RAG tool for our Agent class, and then created an AI assistant with this tool. This is how you would typically instantiate your Agent object when you are defining more than one tool, but since making a simple assistant like this with just one RAG tool is quite common, we have provided a single function that does all of this at once.

Here’s how you can create a simple assistant that uses a single RAG tool for asking questions about Medicare:

agent = Agent.from_corpus(
  vectara_customer_id=customer_id,
  vectara_corpus_id=corpus_id,
  vectara_api_key=api_key,
  data_description="medical plan benefits and pricing",
  assistant_specialty="Medicare",
  tool_name="ask_medicare",
)

Try it Yourself

To run this code yourself, add the following environment variables to your console or a .env file:

Vectara Corpus:

VECTARA_CUSTOMER_ID: The customer id for your Vectara account. If you don’t have an account, simply create one to get started.

VECTARA_CORPUS_ID: The corpus id for the corpus that contains the Vectara pet policy. You can download the Pet Policy PDF file and add it to a new or existing Vectara corpus.

VECTARA_API_KEY: An API key that can perform queries on this corpus.

Agent type, LLMs and model names:

VECTARA_AGENTIC_AGENT_TYPE: Agent type, either OPENAI (default), REACT, or LLMCOMPILER (make sure you have an OpenAI API key if you use the OpenAI agent).

VECTARA_AGENTIC_MAIN_LLM_PROVIDER: The LLM used for the agent, either OPENAI (default), ANTHROPIC, TOGETHER, COHERE, GROQ, or FIREWORKS. Note that to use the OPENAI agent type, you must use OPENAI as the main LLM provider.

VECTARA_AGENTIC_TOOL_LLM_PROVIDER: The LLM used for the agent tools, either OPENAI (default), ANTHROPIC, TOGETHER, COHERE, GROQ, or FIREWORKS.

OPENAI_API_KEY, ANTHROPIC_API_KEY, TOGETHER_API_KEY, GROQ_API_KEY, COHERE_API_KEY, or FIREWORKS_API_KEY: Your API key for the agent or tool LLM, if you choose to use these services.

With any LLM provider choice, you can also specify the model type to use via these environment variables:

VECTARA_AGENTIC_MAIN_MODEL_NAME: specifies the model name for the main LLM provider.

VECTARA_AGENTIC_TOOL_MODEL_NAME: specifies the model name for the tool LLM provider.

Defaults:

  1. For OPENAI, the default is gpt-4o-2024-08-06.

  2. For Anthropic, the default is Claude-3-5-Sonnet-20240620.

  3. For TOGETHER.AI, the default is Meta-Llama-3.1-70B-Instruct-Turbo.

  4. For COHERE, the default is command-r-plus.

  5. For GROQ, the default is llama-3.1-70b-versatile.

  6. For FIREWORKS, the default is Firefunction-v2.