LlamaIndex 简介

LlamaIndex 是一个为开发「上下文增强」的大语言模型应用的框架（也就是 SDK）。上下文增强，泛指任何在私有或特定领域数据基础上应用大语言模型的情况。例如：

Question-Answering Chatbots (也就是 RAG)
Document Understanding and Extraction （文档理解与信息抽取）
Autonomous Agents that can perform research and take actions （智能体应用）

LlamaIndex 有 Python 和 Typescript 两个版本，Python 版的文档相对更完善。

Python 文档地址：
https://docs.llamaindex.ai/en/stable/
Python API 接口文档：
https://docs.llamaindex.ai/en/stable/api_reference/
TS 文档地址：
https://ts.llamaindex.ai/
TS API 接口文档：
https://ts.llamaindex.ai/api/

LlamaIndex 是一个开源框架，Github 链接：
https://github.com/run-llama

1	pip install llama-index

数据加载

加载本地数据

SimpleDirectoryReader 是一个简单的本地文件加载器。它会遍历指定目录，并根据文件扩展名自动加载文件（文本内容）。

默认的 PDFReader 效果并不理想，我们可以更换文件加载器：

1	pip install pymupdf

更多的 PDF 加载器还有 SmartPDFLoader 和 LlamaParse, 二者都提供了更丰富的解析能力，包括解析章节与段落结构等。但不是 100%准确，偶有文字丢失或错位情况，建议根据自身需求详细测试评估。

Data Connectors

对图像、视频、语音类文件，默认不会自动提取其中文字。如需提取, 需要对应读取器。

处理更丰富的数据类型，并将其读取为 Document 的形式（text + metadata）。

比如加载飞书文档 pip install llama-index-readers-feishu-docs
内置的文件加载器
连接三方服务的数据加载器
更多加载器可以在 LlamaHub 上找到

文本切分与解析（Chunking）

LlamaIndex 中，Node 被定义为一个文本的「chunk」。

使用 TextSplitters 对文本做切分

from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core import Document
from llama_index.core.node_parser import TokenTextSplitter

node_parser = TokenTextSplitter(
    chunk_size=100,  # 每个 chunk 的最大长度
    chunk_overlap=50  # chunk 之间重叠长度 
)

nodes = node_parser.get_nodes_from_documents(
    documents, show_progress=False
)

show_json(nodes[0])

LlamaIndex 提供了丰富的 TextSplitter，例如：

SentenceSplitter: 在切分指定长度的 chunk 同时尽量保证句子边界不被切断；
CodeSplitter: 根据 AST（编译器的抽象句法树）切分代码，保证代码功能片段完整；
SemanticSplitterNodeParser: 根据语义相关性对将文本切分为片段

使用 NodeParsers 对有结构的文档做解析

更多的 NodeParser 包括 HTMLNodeParser，JSONNodeParser等等。

索引（Indexing）与检索（Retrieval）

基础概念：在「检索」相关的上下文中，「索引」即 index，通常是指为了实现快速检索而设计的特定「数据结构」。

传统索引、向量索引

向量检索

SimpleVectorStore 直接在内存中构建一个 Vector Store 并建索引

LlamaIndex 默认的 Embedding 模型是 OpenAIEmbedding(model="text-embedding-ada-002")。

使用自定义的 Vector Store，以 Chroma 为例：

1	pip install llama-index-vector-stores-chroma

Ingestion Pipeline 自定义数据处理流程

LlamaIndex 通过 Transformations 定义一个数据（Documents）的多步处理的流程（Pipeline）。

这个 Pipeline 的一个显著特点是，它的每个子步骤是可以缓存（cache）的，即如果该子步骤的输入与处理方法不变，重复调用时会直接从缓存中获取结果，而无需重新执行该子步骤，这样即节省时间也会节省 token （如果子步骤涉及大模型调用）。

此外，也可以用远程的 Redis 或 MongoDB 等存储 IngestionPipeline 的缓存，具体参考官方文档：Remote Cache Management。

IngestionPipeline 也支持异步和并发调用，请参考官方文档：Async Support、Parallel Processing。

检索后处理

LlamaIndex 的 Node Postprocessors 提供了一系列检索后处理模块。

更多的 Rerank 及其它后处理方法，参考官方文档：Node Postprocessor Modules

生成回复（QA & Chat）

单轮问答（Query Engine）

1
2
3

qa_engine = index.as_query_engine()
response = qa_engine.query("Llama2 有多少参数?")
print(response)

流式输出

1
2
3

qa_engine = index.as_query_engine(streaming=True)
response = qa_engine.query("Llama2 有多少参数?")
response.print_response_stream()

多轮对话（Chat Engine）

chat_engine = index.as_chat_engine()
response = chat_engine.chat("Llama2 有多少参数?")
print(response)

response = chat_engine.chat("How many at most?")
print(response)

流式输出

chat_engine = index.as_chat_engine()
streaming_response = chat_engine.stream_chat("Llama 2有多少参数?")
for token in streaming_response.response_gen:
    print(token, end="")

底层接口：Prompt、LLM 与 Embedding

Prompt 模板

PromptTemplate 定义提示词模板

1
2
3

prompt = PromptTemplate("写一个关于{topic}的笑话")

prompt.format(topic="小明")

ChatPromptTemplate 定义多轮消息模板

from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core import ChatPromptTemplate

chat_text_qa_msgs = [
    ChatMessage(
        role=MessageRole.SYSTEM,
        content="你叫{name}，你必须根据用户提供的上下文回答问题。",
    ),
    ChatMessage(
        role=MessageRole.USER, 
        content=(
            "已知上下文：\n" \
            "{context}\n\n" \
            "问题：{question}"
        )
    ),
]
text_qa_template = ChatPromptTemplate(chat_text_qa_msgs)

print(
    text_qa_template.format(
        name="瓜瓜",
        context="这是一个测试",
        question="这是什么"
    )
)

语言模型

1
2
3

from llama_index.llms.openai import OpenAI

llm = OpenAI(temperature=0, model="gpt-4o")

设置全局使用的语言模型

1 2	from llama_index.core import Settings Settings.llm = OpenAI(temperature=0, model="gpt-4o")

除 OpenAI 外，LlamaIndex 已集成多个大语言模型，包括云服务 API 和本地部署 API，详见官方文档：Available LLM integrations

Embedding 模型

1 2	from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.core import Settings

全局设定

1	Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small", dimensions=512)

LlamaIndex 同样集成了多种 Embedding 模型，包括云服务 API 和开源模型（HuggingFace）等，详见官方文档。

1	基于 LlamaIndex 实现一个功能较完整的 RAG 系统

LlamaIndex 的更多功能

智能体（Agent）开发框架：
https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/
RAG 的评测：
https://docs.llamaindex.ai/en/stable/module_guides/evaluating/
过程监控：
https://docs.llamaindex.ai/en/stable/module_guides/observability/

以上内容涉及较多背景知识，暂时不在本课展开，相关知识会在后面课程中逐一详细讲解。

此外，LlamaIndex 针对生产级的 RAG 系统中遇到的各个方面的细节问题，总结了很多高端技巧（Advanced Topics），对实战很有参考价值，非常推荐有能力的同学阅读。

LlamaIndex 简介

数据加载

加载本地数据

Data Connectors

文本切分与解析（Chunking）

使用 TextSplitters 对文本做切分

使用 NodeParsers 对有结构的文档做解析

索引（Indexing）与检索（Retrieval）

向量检索

更多索引与检索方式

Ingestion Pipeline 自定义数据处理流程

检索后处理

生成回复（QA & Chat）

单轮问答（Query Engine）

流式输出

多轮对话（Chat Engine）

流式输出

底层接口：Prompt、LLM 与 Embedding

Prompt 模板

语言模型

设置全局使用的语言模型

Embedding 模型

LlamaIndex 的更多功能