LangChainのチュートリアル、RAG Part 2を試す

これは、なにをしたくて書いたもの？

前に、LangChainのチュートリアルからRAGのPart 1を試してみました。

LangChainのチュートリアル、RAG Part 1を試す（LangSmith、LangGraphなし） - CLOVER🍀

LangChainのチュートリアル、RAG Part 1を試す（LangSmithなし、LangGraphあり） - CLOVER🍀

今回は、RAGのPart 2を試してみようと思います。

Build a Retrieval Augmented Generation (RAG) App: Part 2 | 🦜️🔗 LangChain

RAG Part 2のチュートリアル

LangChainのRAG Part 2のチュートリアルで、どんなことを扱うのか見てみます。

Build a Retrieval Augmented Generation (RAG) App: Part 2 | 🦜️🔗 LangChain

どうやら、会話と複数ステップの検索プロセスが焦点のようです。

Part 1 introduces RAG and walks through a minimal implementation.
Part 2 (this guide) extends the implementation to accommodate conversation-style interactions and multi-step retrieval processes.

扱うコンセプトは、Chat historyとTool calling、そしてAgentですね。

Chat histroyは、その名前のとおりユーザーとChat modelとの間の会話の記録です。

Chat history is a record of the conversation between the user and the chat model.

Chat historyはメッセージのシーケンスであり、各メッセージは「user」、「assistant」、「system」や「tool」などの
特定のロールに関連付けられているとされています。

The chat history is sequence of messages, each of which is associated with a specific role, such as "user", "assistant", "system", or "tool".

Chat history / Conversation patterns

通常、会話はsystemメッセージで始まってuserメッセージが続き、最後にモデルの回答を含むassistantメッセージが
続きます。

アシスタントはユーザーに直接回答するか、ツールが構成されている場合は特定のタスクを実行するためにツールを
呼び出すように要求します。

ツールを呼び出す、というのはTool callingというコンセプトになりますね。

Tool calling | 🦜️🔗 LangChain

話をChat historyに戻します。

Chat modelには入力サイズの制限があるため、コンテキストウィンドウを超えないようにチャット履歴を管理し、
必要に応じてトリミングします。

Since chat models have a maximum limit on input size, it's important to manage chat history and trim it as needed to avoid exceeding the context window.

Chat history / Managing chat history

この時、正しい会話構造を維持することが重要になります。

会話は次のいずれかの構造に従うこと
- 最初のメッセージは「user」または「system」メッセージのいずれかであり、その後に「user」メッセージ、そして「assistant」メッセージが続く
- 最後のメッセージは、ツールの呼び出し結果を含む「user」メッセージまたは「ツール」メッセージのいずれかであること
ツール呼び出しを行う場合、「tool」メッセージはツールの呼び出しを要求した「assistant」メッセージの後にのみ続くこと

なお、メッセージのトリミングについてはこちらです。

How to trim messages | 🦜️🔗 LangChain

次のコンセプトである、Tool callingについても見ていきましょう。

Tool calling | 🦜️🔗 LangChain

Tool callingというのはモデルがツールを利用することで、人間以外のデータベースやAPIなどと対話できる仕組みです。

Many AI applications interact directly with humans. In these cases, it is appropriate for models to respond in natural language. But what about cases where we want a model to also interact directly with systems, such as databases or an API? These systems often have a particular input schema; for example, APIs frequently have a required payload structure. This need motivates the concept of tool calling. You can use tool calling to request model responses that match a particular schema.

ツールは@toolデコレーター付与した関数で、モデルに関連付けることでモデルから呼び出すことができるようになります。

実際にモデルがツールを呼び出すかどうかは、モデル自身が決定します。

最後のコンセプトはAgentです。

Agents | 🦜️🔗 LangChain

Agentは、LLMを推論エンジンとして使用して実行するアクションを決定し、そのアクションを実行するシステムです。

LangChainにもAgentの仕組みはあるのですが、ドキュメントを見るとAgentはLangGraphの機能を使って実現することが
勧められています。実際、チュートリアルでもLangGraphのAgentの機能を使っています。

ドキュメントを見るのはこれくらいして、あとは実際に進めていってみましょう。今回はChat modelにOllama、
Vector storeにQdrantを使います。

LangSmithは使いません。

環境

今回の環境はこちら。

$ python3 --version
Python 3.12.3


$ uv --version
uv 0.6.6

Ollama。

$ bin/ollama serve
$ bin/ollama --version
ollama version is 0.6.1

Qdrantは172.17.0.2で動作しているものとします。

$ ./qdrant --version
qdrant 1.13.4

準備

まずはプロジェクトを作成。

$ uv init --vcs none langchain-tutorial-rag-part2
$ cd langchain-tutorial-rag-part2
$ rm main.py

必要な依存関係をインストール。

$ uv add langchain-community langchain-ollama langchain-qdrant beautifulsoup4 langgraph

mypyとRuffも入れておきます。

$ uv add --dev mypy ruff

今回インストールした依存関係の一覧。

$ uv pip list
Package                  Version
------------------------ ---------
aiohappyeyeballs         2.6.1
aiohttp                  3.11.13
aiosignal                1.3.2
annotated-types          0.7.0
anyio                    4.8.0
attrs                    25.3.0
beautifulsoup4           4.13.3
certifi                  2025.1.31
charset-normalizer       3.4.1
dataclasses-json         0.6.7
frozenlist               1.5.0
greenlet                 3.1.1
grpcio                   1.71.0
grpcio-tools             1.71.0
h11                      0.14.0
h2                       4.2.0
hpack                    4.1.0
httpcore                 1.0.7
httpx                    0.28.1
httpx-sse                0.4.0
hyperframe               6.1.0
idna                     3.10
jsonpatch                1.33
jsonpointer              3.0.0
langchain                0.3.20
langchain-community      0.3.19
langchain-core           0.3.45
langchain-ollama         0.2.3
langchain-qdrant         0.2.0
langchain-text-splitters 0.3.6
langgraph                0.3.11
langgraph-checkpoint     2.0.20
langgraph-prebuilt       0.1.3
langgraph-sdk            0.1.57
langsmith                0.3.15
marshmallow              3.26.1
msgpack                  1.1.0
multidict                6.1.0
mypy                     1.15.0
mypy-extensions          1.0.0
numpy                    2.2.3
ollama                   0.4.7
orjson                   3.10.15
packaging                24.2
portalocker              2.10.1
propcache                0.3.0
protobuf                 5.29.3
pydantic                 2.10.6
pydantic-core            2.27.2
pydantic-settings        2.8.1
python-dotenv            1.0.1
pyyaml                   6.0.2
qdrant-client            1.13.3
requests                 2.32.3
requests-toolbelt        1.0.0
ruff                     0.11.0
setuptools               76.0.0
sniffio                  1.3.1
soupsieve                2.6
sqlalchemy               2.0.39
tenacity                 9.0.0
typing-extensions        4.12.2
typing-inspect           0.9.0
urllib3                  2.3.0
yarl                     1.18.3
zstandard                0.23.0

pyproject.toml

[project]
name = "langchain-tutorial-rag-part2"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
    "beautifulsoup4>=4.13.3",
    "langchain-community>=0.3.19",
    "langchain-ollama>=0.2.3",
    "langchain-qdrant>=0.2.0",
    "langgraph>=0.3.11",
]

[dependency-groups]
dev = [
    "mypy>=1.15.0",
    "ruff>=0.11.0",
]

[tool.mypy]
strict = true
disallow_any_unimported = true
#disallow_any_expr = true
disallow_any_explicit = true
warn_unreachable = true
pretty = true

LangChainのチュートリアルからRAG Part 2を試す

それではLangChainのチュートリアルから、RAG Part 2を試していきます。

Build a Retrieval Augmented Generation (RAG) App: Part 2 | 🦜️🔗 LangChain

ドキュメントをVector storeに登録する

最初はドキュメントをVector storeに登録するわけですが、この部分はPart 1と同じです。

Build a Retrieval Augmented Generation (RAG) App: Part 2 / Chains

なので、使用する外部ドキュメントもPart 1と同じ、以下のブログエントリーになります。

LLM Powered Autonomous Agents | Lil'Log

というわけで、作成したのはこちら。

hello_load_documents.py

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_ollama import OllamaEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")
client.delete_collection(collection_name="tutorial_collection")
client.create_collection(
    collection_name="tutorial_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

_ = vector_store.add_documents(all_splits)

ドキュメントをQdrantにロードします。

$ uv run hello_load_documents.py

ツールを使う

次は、ツールを使います。Tool callingですね。

作成したソースコードはこちら。

hello_rag.py

from typing import Tuple
from langchain.chat_models import init_chat_model
from langchain_core.documents import Document
from langchain_core.messages import BaseMessage, SystemMessage
from langchain_ollama import OllamaEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_core.tools import tool
from langgraph.graph import END, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode, tools_condition
from qdrant_client import QdrantClient

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)


llm = init_chat_model(
    "llama3.2:3b",
    model_provider="ollama",
    temperature=0,
    base_url="http://localhost:11434",
)


@tool(response_format="content_and_artifact")
def retrieve(query: str) -> Tuple[str, list[Document]]:
    """Retrieve information related to a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )

    return serialized, retrieved_docs


def query_or_respond(state: MessagesState) -> dict[str, list[BaseMessage]]:
    """Generate tool call for retrieval or respond."""
    llm_with_tools = llm.bind_tools([retrieve])
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}


def generate(state: MessagesState) -> dict[str, list[BaseMessage]]:
    """Generate answer."""
    recent_tool_messages = []
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            break

    tool_messages = recent_tool_messages[::-1]

    docs_content = "\n\n".join(doc.content for doc in tool_messages)

    system_message_content = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. Use three sentences maximum and keep the "
        "answer concise."
        "\n\n"
        f"{docs_content}"
    )

    conversation_messages = [
        message
        for message in state["messages"]
        if message.type in ("human", "system")
        or (message.type == "ai" and not message.tool_calls)
    ]

    prompt = [SystemMessage(system_message_content)] + conversation_messages

    response = llm.invoke(prompt)
    return {"messages": [response]}


tools = ToolNode([retrieve])

graph_builder = StateGraph(MessagesState)
graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
    "query_or_respond", tools_condition, {END: END, "tools": "tools"}
)
graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

graph = graph_builder.compile()

print(graph.get_graph().draw_mermaid())

input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]}, stream_mode="values"
):
    step["messages"][-1].pretty_print()

ツールの部分はこちらで、@toolデコレーターで定義しています。ツールの内容は、Vector storeに対する検索ですね。

@tool(response_format="content_and_artifact")
def retrieve(query: str) -> Tuple[str, list[Document]]:
    """Retrieve information related to a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )

    return serialized, retrieved_docs

そして各ステップを作成し

def query_or_respond(state: MessagesState) -> dict[str, list[BaseMessage]]:
    """Generate tool call for retrieval or respond."""
    llm_with_tools = llm.bind_tools([retrieve])
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}


def generate(state: MessagesState) -> dict[str, list[BaseMessage]]:
    """Generate answer."""
    recent_tool_messages = []
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            break

    tool_messages = recent_tool_messages[::-1]

    docs_content = "\n\n".join(doc.content for doc in tool_messages)

    system_message_content = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. Use three sentences maximum and keep the "
        "answer concise."
        "\n\n"
        f"{docs_content}"
    )

    conversation_messages = [
        message
        for message in state["messages"]
        if message.type in ("human", "system")
        or (message.type == "ai" and not message.tool_calls)
    ]

    prompt = [SystemMessage(system_message_content)] + conversation_messages

    response = llm.invoke(prompt)
    return {"messages": [response]}


tools = ToolNode([retrieve])

グラフを組み立てます。

tools = ToolNode([retrieve])

graph_builder = StateGraph(MessagesState)
graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
    "query_or_respond", tools_condition, {END: END, "tools": "tools"}
)
graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

graph = graph_builder.compile()

print(graph.get_graph().draw_mermaid())

input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]}, stream_mode="values"
):
    step["messages"][-1].pretty_print()

ここで表現されているグラフはこちらです。

%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
        __start__([<p>__start__</p>]):::first
        query_or_respond(query_or_respond)
        tools(tools)
        generate(generate)
        __end__([<p>__end__</p>]):::last
        __start__ --> query_or_respond;
        generate --> __end__;
        tools --> generate;
        query_or_respond -.-> __end__;
        query_or_respond -.-> tools;
        classDef default fill:#f2f0ff,line-height:1.2
        classDef first fill-opacity:0
        classDef last fill:#bfb6fc

実行結果全体。

$ uv run hello_rag.py
%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
        __start__([<p>__start__</p>]):::first
        query_or_respond(query_or_respond)
        tools(tools)
        generate(generate)
        __end__([<p>__end__</p>]):::last
        __start__ --> query_or_respond;
        generate --> __end__;
        tools --> generate;
        query_or_respond -.-> __end__;
        query_or_respond -.-> tools;
        classDef default fill:#f2f0ff,line-height:1.2
        classDef first fill-opacity:0
        classDef last fill:#bfb6fc

================================ Human Message =================================

What is Task Decomposition?
================================== Ai Message ==================================
Tool Calls:
  retrieve (ee3bf69d-97e2-4207-b798-2327c866be1d)
 Call ID: ee3bf69d-97e2-4207-b798-2327c866be1d
  Args:
    query: Task Decomposition
================================= Tool Message =================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '3c35387c-04eb-46cc-9abf-41481700294a', '_collection_name': 'tutorial_collection'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'be0a6a60-e261-4740-ae51-8d10be88136c', '_collection_name': 'tutorial_collection'}
Content: Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.


Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.
================================== Ai Message ==================================

Task decomposition is the process of breaking down complex tasks into smaller, more manageable subgoals or steps. This can be done using various methods, including Large Language Models (LLMs) with simple prompting, task-specific instructions, or human inputs. The goal is to create a structured approach to solving problems and exploring solution spaces.

「user」メッセージ、「assitant」メッセージ、ツールの実行、「assitant」メッセージの順で実行されていることがわかります。

Chat modelがツール向けにメッセージを作成し

================================== Ai Message ==================================
Tool Calls:
  retrieve (ee3bf69d-97e2-4207-b798-2327c866be1d)
 Call ID: ee3bf69d-97e2-4207-b798-2327c866be1d
  Args:
    query: Task Decomposition

ツールで検索を実行しています。

================================= Tool Message =================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '3c35387c-04eb-46cc-9abf-41481700294a', '_collection_name': 'tutorial_collection'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'be0a6a60-e261-4740-ae51-8d10be88136c', '_collection_name': 'tutorial_collection'}
Content: Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.


Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.

コンテキストを記憶する

次は、作成したソースコードを少し変更してコンテキストを覚えてもらいます。

先ほどのソースコード、こんな感じに変更。

hello_rag.py

from typing import Tuple
from langchain.chat_models import init_chat_model
from langchain_core.documents import Document
from langchain_core.messages import BaseMessage, SystemMessage
from langchain_ollama import OllamaEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_core.tools import tool
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import END, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode, tools_condition
from qdrant_client import QdrantClient

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)


llm = init_chat_model(
    "llama3.2:3b",
    model_provider="ollama",
    temperature=0,
    base_url="http://localhost:11434",
)


@tool(response_format="content_and_artifact")
def retrieve(query: str) -> Tuple[str, list[Document]]:
    """Retrieve information related to a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )

    return serialized, retrieved_docs


def query_or_respond(state: MessagesState) -> dict[str, list[BaseMessage]]:
    """Generate tool call for retrieval or respond."""
    llm_with_tools = llm.bind_tools([retrieve])
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}


def generate(state: MessagesState) -> dict[str, list[BaseMessage]]:
    """Generate answer."""
    recent_tool_messages = []
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            break

    tool_messages = recent_tool_messages[::-1]

    docs_content = "\n\n".join(doc.content for doc in tool_messages)

    system_message_content = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. Use three sentences maximum and keep the "
        "answer concise."
        "\n\n"
        f"{docs_content}"
    )

    conversation_messages = [
        message
        for message in state["messages"]
        if message.type in ("human", "system")
        or (message.type == "ai" and not message.tool_calls)
    ]

    prompt = [SystemMessage(system_message_content)] + conversation_messages

    response = llm.invoke(prompt)
    return {"messages": [response]}


tools = ToolNode([retrieve])

graph_builder = StateGraph(MessagesState)
graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
    "query_or_respond", tools_condition, {END: END, "tools": "tools"}
)
graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

config = {"configurable": {"thread_id": "abc123"}}

print(graph.get_graph().draw_mermaid())

input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    step["messages"][-1].pretty_print()


print()
print(
    "-------------------------------------------------------------------------------------------------------------------------------------------------"
)
print()

input_message = "Can you look up some common ways of doing it?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    step["messages"][-1].pretty_print()

変わっているのは、MemorySaverを使ってメッセージの履歴を覚えておく箇所と

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

会話を2回行っていることですね。

input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    step["messages"][-1].pretty_print()


print()
print(
    "-------------------------------------------------------------------------------------------------------------------------------------------------"
)
print()

input_message = "Can you look up some common ways of doing it?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    step["messages"][-1].pretty_print()

実行結果。

%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
        __start__([<p>__start__</p>]):::first
        query_or_respond(query_or_respond)
        tools(tools)
        generate(generate)
        __end__([<p>__end__</p>]):::last
        __start__ --> query_or_respond;
        generate --> __end__;
        tools --> generate;
        query_or_respond -.-> __end__;
        query_or_respond -.-> tools;
        classDef default fill:#f2f0ff,line-height:1.2
        classDef first fill-opacity:0
        classDef last fill:#bfb6fc

================================ Human Message =================================

What is Task Decomposition?
================================== Ai Message ==================================
Tool Calls:
  retrieve (a81f2325-1c22-4760-9433-6e302b387c8f)
 Call ID: a81f2325-1c22-4760-9433-6e302b387c8f
  Args:
    query: Task Decomposition
================================= Tool Message =================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '3c35387c-04eb-46cc-9abf-41481700294a', '_collection_name': 'tutorial_collection'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'be0a6a60-e261-4740-ae51-8d10be88136c', '_collection_name': 'tutorial_collection'}
Content: Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.


Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.
================================== Ai Message ==================================

Task decomposition is the process of breaking down complex tasks into smaller, more manageable subgoals or steps. This can be done using various methods, including Large Language Models (LLMs) with simple prompting, task-specific instructions, or human inputs. The goal is to create a structured approach to solving problems and exploring solution spaces.

-------------------------------------------------------------------------------------------------------------------------------------------------

================================ Human Message =================================

Can you look up some common ways of doing it?
================================== Ai Message ==================================
Tool Calls:
  retrieve (84f3b894-d4e6-41d1-a532-84c80bd84b27)
 Call ID: 84f3b894-d4e6-41d1-a532-84c80bd84b27
  Args:
    query: common methods for task decomposition
================================= Tool Message =================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '3c35387c-04eb-46cc-9abf-41481700294a', '_collection_name': 'tutorial_collection'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'be0a6a60-e261-4740-ae51-8d10be88136c', '_collection_name': 'tutorial_collection'}
Content: Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.


Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.
================================== Ai Message ==================================

According to the retrieved context, there are three common ways to do task decomposition:

1. Using LLM with simple prompting, such as "Steps for XYZ.\n1.", or "What are the subgoals for achieving XYZ?".
2. Using task-specific instructions, e.g. "Write a story outline." for writing a novel.
3. With human inputs.

ポイントは、この部分ですね。

================================== Ai Message ==================================
Tool Calls:
  retrieve (84f3b894-d4e6-41d1-a532-84c80bd84b27)
 Call ID: 84f3b894-d4e6-41d1-a532-84c80bd84b27
  Args:
    query: common methods for task decomposition

最初の質問は「タスク分解とはなんですか？」です。

================================ Human Message =================================

What is Task Decomposition?

次の質問は、「一般的な方法を調べてもらえますか？」です。

================================ Human Message =================================

Can you look up some common ways of doing it?

ここで、2回目の質問の時にChat modelがツールに聞いているのは「タスク分解の一般的な方法」なので、最初の質問の
コンテキストを引き継いでいることになります。

================================== Ai Message ==================================
Tool Calls:
  retrieve (84f3b894-d4e6-41d1-a532-84c80bd84b27)
 Call ID: 84f3b894-d4e6-41d1-a532-84c80bd84b27
  Args:
    query: common methods for task decomposition

これがLangGraphのチェックポイントの効果ですね。

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

Persistence

Agent

最後はAgentです。

Build a Retrieval Augmented Generation (RAG) App: Part 2 / Agents

Agentについて書かれているページはこちらなのですが、

Agents | 🦜️🔗 LangChain

実際にはLangGraphの機能を使うことになります。

Agent architectures

作成したソースコードはこちら。

hello_rag_react_agent.py

from typing import Tuple
from langchain.chat_models import init_chat_model
from langchain_core.documents import Document
from langchain_core.messages import BaseMessage, SystemMessage
from langchain_ollama import OllamaEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_core.tools import tool
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import MessagesState
from langgraph.prebuilt import create_react_agent, ToolNode
from qdrant_client import QdrantClient

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)


llm = init_chat_model(
    "llama3.2:3b",
    model_provider="ollama",
    temperature=0,
    base_url="http://localhost:11434",
)


@tool(response_format="content_and_artifact")
def retrieve(query: str) -> Tuple[str, list[Document]]:
    """Retrieve information related to a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )

    return serialized, retrieved_docs


def query_or_respond(state: MessagesState) -> dict[str, list[BaseMessage]]:
    """Generate tool call for retrieval or respond."""
    llm_with_tools = llm.bind_tools([retrieve])
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}


def generate(state: MessagesState) -> dict[str, list[BaseMessage]]:
    """Generate answer."""
    recent_tool_messages = []
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            break

    tool_messages = recent_tool_messages[::-1]

    docs_content = "\n\n".join(doc.content for doc in tool_messages)

    system_message_content = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. Use three sentences maximum and keep the "
        "answer concise."
        "\n\n"
        f"{docs_content}"
    )

    conversation_messages = [
        message
        for message in state["messages"]
        if message.type in ("human", "system")
        or (message.type == "ai" and not message.tool_calls)
    ]

    prompt = [SystemMessage(system_message_content)] + conversation_messages

    response = llm.invoke(prompt)
    return {"messages": [response]}


tools = ToolNode([retrieve])

memory = MemorySaver()
agent_executor = create_react_agent(llm, [retrieve], checkpointer=memory)

print(agent_executor.get_graph().draw_mermaid())

config = {"configurable": {"thread_id": "abc123"}}


input_message = (
    "What is the standard method for Task Decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method."
)

for event in agent_executor.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    event["messages"][-1].pretty_print()

変わったのはこのあたりですね。

memory = MemorySaver()
agent_executor = create_react_agent(llm, [retrieve], checkpointer=memory)

print(agent_executor.get_graph().draw_mermaid())

config = {"configurable": {"thread_id": "abc123"}}


input_message = (
    "What is the standard method for Task Decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method."
)

for event in agent_executor.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    event["messages"][-1].pretty_print()

グラフ構築がなくなり、Mermaidで見てもすごくあっさりした内容になりました。

%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
        __start__([<p>__start__</p>]):::first
        agent(agent)
        tools(tools)
        __end__([<p>__end__</p>]):::last
        __start__ --> agent;
        tools --> agent;
        agent -.-> tools;
        agent -.-> __end__;
        classDef default fill:#f2f0ff,line-height:1.2
        classDef first fill-opacity:0
        classDef last fill:#bfb6fc

ツールは用意していますが、あとはAgentが判断するようです。

実行。

$ uv run hello_rag_react_agent.py
%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
        __start__([<p>__start__</p>]):::first
        agent(agent)
        tools(tools)
        __end__([<p>__end__</p>]):::last
        __start__ --> agent;
        tools --> agent;
        agent -.-> tools;
        agent -.-> __end__;
        classDef default fill:#f2f0ff,line-height:1.2
        classDef first fill-opacity:0
        classDef last fill:#bfb6fc

================================ Human Message =================================

What is the standard method for Task Decomposition?

Once you get the answer, look up common extensions of that method.
================================== Ai Message ==================================
Tool Calls:
  retrieve (31a30fca-7d65-43c6-a85c-2163f635d0c6)
 Call ID: 31a30fca-7d65-43c6-a85c-2163f635d0c6
  Args:
    query: standard method for Task Decomposition
================================= Tool Message =================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '3c35387c-04eb-46cc-9abf-41481700294a', '_collection_name': 'tutorial_collection'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'be0a6a60-e261-4740-ae51-8d10be88136c', '_collection_name': 'tutorial_collection'}
Content: Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.


Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.
================================== Ai Message ==================================

The standard method for Task Decomposition is not a single, universally accepted approach, but rather a set of techniques that can be combined and adapted to suit specific use cases.

One widely used method is the "CoT" (Cognitive Tree) framework, which involves decomposing complex tasks into smaller subgoals and representing them as a tree structure. This framework has been extended by various researchers to improve its performance and flexibility.

Another approach is to use task-specific instructions or prompts to guide the decomposition process. For example, using prompting techniques like "Steps for XYZ.\n1." or "What are the subgoals for achieving XYZ?" can help generate a clear and structured decomposition plan.

Additionally, some researchers have explored the use of long-term planning and task decomposition in LLMs, which can be challenging due to limitations in context length and representation power. However, techniques like self-reflection and learning from past mistakes can help improve the robustness of these systems.

Common extensions of the CoT framework include:

* Tree of Thoughts (Yao et al. 2023), which explores multiple reasoning possibilities at each step
* Finite context length limitations, which require mechanisms to work within restricted communication bandwidth
* Challenges in long-term planning and task decomposition, which can be addressed through techniques like self-reflection and learning from past mistakes.

These extensions highlight the ongoing research and development in the field of task decomposition for LLMs.

質問を受けた後に1度検索を行い、その情報を元に回答するようになっていますね。

これでLangChainのRAG Part 2のチュートリアルはおしまいです。