これは、なにをしたくて書いたもの?
前に、LangChainのチュートリアルからRAGのPart 1を試してみました。
LangChainのチュートリアル、RAG Part 1を試す(LangSmith、LangGraphなし) - CLOVER🍀
LangChainのチュートリアル、RAG Part 1を試す(LangSmithなし、LangGraphあり) - CLOVER🍀
今回は、RAGのPart 2を試してみようと思います。
Build a Retrieval Augmented Generation (RAG) App: Part 2 | 🦜️🔗 LangChain
RAG Part 2のチュートリアル
LangChainのRAG Part 2のチュートリアルで、どんなことを扱うのか見てみます。
Build a Retrieval Augmented Generation (RAG) App: Part 2 | 🦜️🔗 LangChain
どうやら、会話と複数ステップの検索プロセスが焦点のようです。
Part 1 introduces RAG and walks through a minimal implementation.
Part 2 (this guide) extends the implementation to accommodate conversation-style interactions and multi-step retrieval processes.
扱うコンセプトは、Chat historyとTool calling、そしてAgentですね。
Chat histroyは、その名前のとおりユーザーとChat modelとの間の会話の記録です。
Chat history is a record of the conversation between the user and the chat model.
Chat historyはメッセージのシーケンスであり、各メッセージは「user」、「assistant」、「system」や「tool」などの
特定のロールに関連付けられているとされています。
The chat history is sequence of messages, each of which is associated with a specific role, such as "user", "assistant", "system", or "tool".
Chat history / Conversation patterns
通常、会話はsystemメッセージで始まってuserメッセージが続き、最後にモデルの回答を含むassistantメッセージが
続きます。
アシスタントはユーザーに直接回答するか、ツールが構成されている場合は特定のタスクを実行するためにツールを
呼び出すように要求します。
ツールを呼び出す、というのはTool callingというコンセプトになりますね。
話をChat historyに戻します。
Chat modelには入力サイズの制限があるため、コンテキストウィンドウを超えないようにチャット履歴を管理し、
必要に応じてトリミングします。
Since chat models have a maximum limit on input size, it's important to manage chat history and trim it as needed to avoid exceeding the context window.
Chat history / Managing chat history
この時、正しい会話構造を維持することが重要になります。
- 会話は次のいずれかの構造に従うこと
- 最初のメッセージは「user」または「system」メッセージのいずれかであり、その後に「user」メッセージ、そして「assistant」メッセージが続く
- 最後のメッセージは、ツールの呼び出し結果を含む「user」メッセージまたは「ツール」メッセージのいずれかであること
- ツール呼び出しを行う場合、「tool」メッセージはツールの呼び出しを要求した「assistant」メッセージの後にのみ続くこと
なお、メッセージのトリミングについてはこちらです。
How to trim messages | 🦜️🔗 LangChain
次のコンセプトである、Tool callingについても見ていきましょう。
Tool callingというのはモデルがツールを利用することで、人間以外のデータベースやAPIなどと対話できる仕組みです。
Many AI applications interact directly with humans. In these cases, it is appropriate for models to respond in natural language. But what about cases where we want a model to also interact directly with systems, such as databases or an API? These systems often have a particular input schema; for example, APIs frequently have a required payload structure. This need motivates the concept of tool calling. You can use tool calling to request model responses that match a particular schema.
ツールは@tool
デコレーター付与した関数で、モデルに関連付けることでモデルから呼び出すことができるようになります。
実際にモデルがツールを呼び出すかどうかは、モデル自身が決定します。
最後のコンセプトはAgentです。
Agentは、LLMを推論エンジンとして使用して実行するアクションを決定し、そのアクションを実行するシステムです。
LangChainにもAgentの仕組みはあるのですが、ドキュメントを見るとAgentはLangGraphの機能を使って実現することが
勧められています。実際、チュートリアルでもLangGraphのAgentの機能を使っています。
ドキュメントを見るのはこれくらいして、あとは実際に進めていってみましょう。今回はChat modelにOllama、
Vector storeにQdrantを使います。
LangSmithは使いません。
環境
今回の環境はこちら。
$ python3 --version Python 3.12.3 $ uv --version uv 0.6.6
Ollama。
$ bin/ollama serve $ bin/ollama --version ollama version is 0.6.1
Qdrantは172.17.0.2で動作しているものとします。
$ ./qdrant --version qdrant 1.13.4
準備
まずはプロジェクトを作成。
$ uv init --vcs none langchain-tutorial-rag-part2 $ cd langchain-tutorial-rag-part2 $ rm main.py
必要な依存関係をインストール。
$ uv add langchain-community langchain-ollama langchain-qdrant beautifulsoup4 langgraph
mypyとRuffも入れておきます。
$ uv add --dev mypy ruff
今回インストールした依存関係の一覧。
$ uv pip list Package Version ------------------------ --------- aiohappyeyeballs 2.6.1 aiohttp 3.11.13 aiosignal 1.3.2 annotated-types 0.7.0 anyio 4.8.0 attrs 25.3.0 beautifulsoup4 4.13.3 certifi 2025.1.31 charset-normalizer 3.4.1 dataclasses-json 0.6.7 frozenlist 1.5.0 greenlet 3.1.1 grpcio 1.71.0 grpcio-tools 1.71.0 h11 0.14.0 h2 4.2.0 hpack 4.1.0 httpcore 1.0.7 httpx 0.28.1 httpx-sse 0.4.0 hyperframe 6.1.0 idna 3.10 jsonpatch 1.33 jsonpointer 3.0.0 langchain 0.3.20 langchain-community 0.3.19 langchain-core 0.3.45 langchain-ollama 0.2.3 langchain-qdrant 0.2.0 langchain-text-splitters 0.3.6 langgraph 0.3.11 langgraph-checkpoint 2.0.20 langgraph-prebuilt 0.1.3 langgraph-sdk 0.1.57 langsmith 0.3.15 marshmallow 3.26.1 msgpack 1.1.0 multidict 6.1.0 mypy 1.15.0 mypy-extensions 1.0.0 numpy 2.2.3 ollama 0.4.7 orjson 3.10.15 packaging 24.2 portalocker 2.10.1 propcache 0.3.0 protobuf 5.29.3 pydantic 2.10.6 pydantic-core 2.27.2 pydantic-settings 2.8.1 python-dotenv 1.0.1 pyyaml 6.0.2 qdrant-client 1.13.3 requests 2.32.3 requests-toolbelt 1.0.0 ruff 0.11.0 setuptools 76.0.0 sniffio 1.3.1 soupsieve 2.6 sqlalchemy 2.0.39 tenacity 9.0.0 typing-extensions 4.12.2 typing-inspect 0.9.0 urllib3 2.3.0 yarl 1.18.3 zstandard 0.23.0
pyproject.toml
[project] name = "langchain-tutorial-rag-part2" version = "0.1.0" description = "Add your description here" readme = "README.md" requires-python = ">=3.12" dependencies = [ "beautifulsoup4>=4.13.3", "langchain-community>=0.3.19", "langchain-ollama>=0.2.3", "langchain-qdrant>=0.2.0", "langgraph>=0.3.11", ] [dependency-groups] dev = [ "mypy>=1.15.0", "ruff>=0.11.0", ] [tool.mypy] strict = true disallow_any_unimported = true #disallow_any_expr = true disallow_any_explicit = true warn_unreachable = true pretty = true
LangChainのチュートリアルからRAG Part 2を試す
それではLangChainのチュートリアルから、RAG Part 2を試していきます。
Build a Retrieval Augmented Generation (RAG) App: Part 2 | 🦜️🔗 LangChain
ドキュメントをVector storeに登録する
最初はドキュメントをVector storeに登録するわけですが、この部分はPart 1と同じです。
Build a Retrieval Augmented Generation (RAG) App: Part 2 / Chains
なので、使用する外部ドキュメントもPart 1と同じ、以下のブログエントリーになります。
LLM Powered Autonomous Agents | Lil'Log
というわけで、作成したのはこちら。
hello_load_documents.py
import bs4 from langchain_community.document_loaders import WebBaseLoader from langchain_ollama import OllamaEmbeddings from langchain_qdrant import QdrantVectorStore from langchain_text_splitters import RecursiveCharacterTextSplitter from qdrant_client import QdrantClient from qdrant_client.http.models import Distance, VectorParams embeddings = OllamaEmbeddings( model="all-minilm:l6-v2", base_url="http://localhost:11434" ) client = QdrantClient("http://172.17.0.2:6333") client.delete_collection(collection_name="tutorial_collection") client.create_collection( collection_name="tutorial_collection", vectors_config=VectorParams(size=384, distance=Distance.COSINE), ) vector_store = QdrantVectorStore( client=client, collection_name="tutorial_collection", embedding=embeddings ) loader = WebBaseLoader( web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs=dict( parse_only=bs4.SoupStrainer( class_=("post-content", "post-title", "post-header") ) ), ) docs = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) all_splits = text_splitter.split_documents(docs) _ = vector_store.add_documents(all_splits)
ドキュメントをQdrantにロードします。
$ uv run hello_load_documents.py
ツールを使う
次は、ツールを使います。Tool callingですね。
作成したソースコードはこちら。
hello_rag.py
from typing import Tuple from langchain.chat_models import init_chat_model from langchain_core.documents import Document from langchain_core.messages import BaseMessage, SystemMessage from langchain_ollama import OllamaEmbeddings from langchain_qdrant import QdrantVectorStore from langchain_core.tools import tool from langgraph.graph import END, MessagesState, StateGraph from langgraph.prebuilt import ToolNode, tools_condition from qdrant_client import QdrantClient embeddings = OllamaEmbeddings( model="all-minilm:l6-v2", base_url="http://localhost:11434" ) client = QdrantClient("http://172.17.0.2:6333") vector_store = QdrantVectorStore( client=client, collection_name="tutorial_collection", embedding=embeddings ) llm = init_chat_model( "llama3.2:3b", model_provider="ollama", temperature=0, base_url="http://localhost:11434", ) @tool(response_format="content_and_artifact") def retrieve(query: str) -> Tuple[str, list[Document]]: """Retrieve information related to a query.""" retrieved_docs = vector_store.similarity_search(query, k=2) serialized = "\n\n".join( (f"Source: {doc.metadata}\nContent: {doc.page_content}") for doc in retrieved_docs ) return serialized, retrieved_docs def query_or_respond(state: MessagesState) -> dict[str, list[BaseMessage]]: """Generate tool call for retrieval or respond.""" llm_with_tools = llm.bind_tools([retrieve]) response = llm_with_tools.invoke(state["messages"]) return {"messages": [response]} def generate(state: MessagesState) -> dict[str, list[BaseMessage]]: """Generate answer.""" recent_tool_messages = [] for message in reversed(state["messages"]): if message.type == "tool": recent_tool_messages.append(message) else: break tool_messages = recent_tool_messages[::-1] docs_content = "\n\n".join(doc.content for doc in tool_messages) system_message_content = ( "You are an assistant for question-answering tasks. " "Use the following pieces of retrieved context to answer " "the question. If you don't know the answer, say that you " "don't know. Use three sentences maximum and keep the " "answer concise." "\n\n" f"{docs_content}" ) conversation_messages = [ message for message in state["messages"] if message.type in ("human", "system") or (message.type == "ai" and not message.tool_calls) ] prompt = [SystemMessage(system_message_content)] + conversation_messages response = llm.invoke(prompt) return {"messages": [response]} tools = ToolNode([retrieve]) graph_builder = StateGraph(MessagesState) graph_builder.add_node(query_or_respond) graph_builder.add_node(tools) graph_builder.add_node(generate) graph_builder.set_entry_point("query_or_respond") graph_builder.add_conditional_edges( "query_or_respond", tools_condition, {END: END, "tools": "tools"} ) graph_builder.add_edge("tools", "generate") graph_builder.add_edge("generate", END) graph = graph_builder.compile() print(graph.get_graph().draw_mermaid()) input_message = "What is Task Decomposition?" for step in graph.stream( {"messages": [{"role": "user", "content": input_message}]}, stream_mode="values" ): step["messages"][-1].pretty_print()
ツールの部分はこちらで、@tool
デコレーターで定義しています。ツールの内容は、Vector storeに対する検索ですね。
@tool(response_format="content_and_artifact") def retrieve(query: str) -> Tuple[str, list[Document]]: """Retrieve information related to a query.""" retrieved_docs = vector_store.similarity_search(query, k=2) serialized = "\n\n".join( (f"Source: {doc.metadata}\nContent: {doc.page_content}") for doc in retrieved_docs ) return serialized, retrieved_docs
そして各ステップを作成し
def query_or_respond(state: MessagesState) -> dict[str, list[BaseMessage]]: """Generate tool call for retrieval or respond.""" llm_with_tools = llm.bind_tools([retrieve]) response = llm_with_tools.invoke(state["messages"]) return {"messages": [response]} def generate(state: MessagesState) -> dict[str, list[BaseMessage]]: """Generate answer.""" recent_tool_messages = [] for message in reversed(state["messages"]): if message.type == "tool": recent_tool_messages.append(message) else: break tool_messages = recent_tool_messages[::-1] docs_content = "\n\n".join(doc.content for doc in tool_messages) system_message_content = ( "You are an assistant for question-answering tasks. " "Use the following pieces of retrieved context to answer " "the question. If you don't know the answer, say that you " "don't know. Use three sentences maximum and keep the " "answer concise." "\n\n" f"{docs_content}" ) conversation_messages = [ message for message in state["messages"] if message.type in ("human", "system") or (message.type == "ai" and not message.tool_calls) ] prompt = [SystemMessage(system_message_content)] + conversation_messages response = llm.invoke(prompt) return {"messages": [response]} tools = ToolNode([retrieve])
グラフを組み立てます。
tools = ToolNode([retrieve]) graph_builder = StateGraph(MessagesState) graph_builder.add_node(query_or_respond) graph_builder.add_node(tools) graph_builder.add_node(generate) graph_builder.set_entry_point("query_or_respond") graph_builder.add_conditional_edges( "query_or_respond", tools_condition, {END: END, "tools": "tools"} ) graph_builder.add_edge("tools", "generate") graph_builder.add_edge("generate", END) graph = graph_builder.compile() print(graph.get_graph().draw_mermaid()) input_message = "What is Task Decomposition?" for step in graph.stream( {"messages": [{"role": "user", "content": input_message}]}, stream_mode="values" ): step["messages"][-1].pretty_print()
ここで表現されているグラフはこちらです。
%%{init: {'flowchart': {'curve': 'linear'}}}%% graph TD; __start__([<p>__start__</p>]):::first query_or_respond(query_or_respond) tools(tools) generate(generate) __end__([<p>__end__</p>]):::last __start__ --> query_or_respond; generate --> __end__; tools --> generate; query_or_respond -.-> __end__; query_or_respond -.-> tools; classDef default fill:#f2f0ff,line-height:1.2 classDef first fill-opacity:0 classDef last fill:#bfb6fc
実行結果全体。
$ uv run hello_rag.py %%{init: {'flowchart': {'curve': 'linear'}}}%% graph TD; __start__([<p>__start__</p>]):::first query_or_respond(query_or_respond) tools(tools) generate(generate) __end__([<p>__end__</p>]):::last __start__ --> query_or_respond; generate --> __end__; tools --> generate; query_or_respond -.-> __end__; query_or_respond -.-> tools; classDef default fill:#f2f0ff,line-height:1.2 classDef first fill-opacity:0 classDef last fill:#bfb6fc ================================ Human Message ================================= What is Task Decomposition? ================================== Ai Message ================================== Tool Calls: retrieve (ee3bf69d-97e2-4207-b798-2327c866be1d) Call ID: ee3bf69d-97e2-4207-b798-2327c866be1d Args: query: Task Decomposition ================================= Tool Message ================================= Name: retrieve Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '3c35387c-04eb-46cc-9abf-41481700294a', '_collection_name': 'tutorial_collection'} Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote. Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs. Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'be0a6a60-e261-4740-ae51-8d10be88136c', '_collection_name': 'tutorial_collection'} Content: Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention. Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error. ================================== Ai Message ================================== Task decomposition is the process of breaking down complex tasks into smaller, more manageable subgoals or steps. This can be done using various methods, including Large Language Models (LLMs) with simple prompting, task-specific instructions, or human inputs. The goal is to create a structured approach to solving problems and exploring solution spaces.
「user」メッセージ、「assitant」メッセージ、ツールの実行、「assitant」メッセージの順で実行されていることがわかります。
Chat modelがツール向けにメッセージを作成し
================================== Ai Message ================================== Tool Calls: retrieve (ee3bf69d-97e2-4207-b798-2327c866be1d) Call ID: ee3bf69d-97e2-4207-b798-2327c866be1d Args: query: Task Decomposition
ツールで検索を実行しています。
================================= Tool Message ================================= Name: retrieve Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '3c35387c-04eb-46cc-9abf-41481700294a', '_collection_name': 'tutorial_collection'} Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote. Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs. Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'be0a6a60-e261-4740-ae51-8d10be88136c', '_collection_name': 'tutorial_collection'} Content: Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention. Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.
コンテキストを記憶する
次は、作成したソースコードを少し変更してコンテキストを覚えてもらいます。
先ほどのソースコード、こんな感じに変更。
hello_rag.py
from typing import Tuple from langchain.chat_models import init_chat_model from langchain_core.documents import Document from langchain_core.messages import BaseMessage, SystemMessage from langchain_ollama import OllamaEmbeddings from langchain_qdrant import QdrantVectorStore from langchain_core.tools import tool from langgraph.checkpoint.memory import MemorySaver from langgraph.graph import END, MessagesState, StateGraph from langgraph.prebuilt import ToolNode, tools_condition from qdrant_client import QdrantClient embeddings = OllamaEmbeddings( model="all-minilm:l6-v2", base_url="http://localhost:11434" ) client = QdrantClient("http://172.17.0.2:6333") vector_store = QdrantVectorStore( client=client, collection_name="tutorial_collection", embedding=embeddings ) llm = init_chat_model( "llama3.2:3b", model_provider="ollama", temperature=0, base_url="http://localhost:11434", ) @tool(response_format="content_and_artifact") def retrieve(query: str) -> Tuple[str, list[Document]]: """Retrieve information related to a query.""" retrieved_docs = vector_store.similarity_search(query, k=2) serialized = "\n\n".join( (f"Source: {doc.metadata}\nContent: {doc.page_content}") for doc in retrieved_docs ) return serialized, retrieved_docs def query_or_respond(state: MessagesState) -> dict[str, list[BaseMessage]]: """Generate tool call for retrieval or respond.""" llm_with_tools = llm.bind_tools([retrieve]) response = llm_with_tools.invoke(state["messages"]) return {"messages": [response]} def generate(state: MessagesState) -> dict[str, list[BaseMessage]]: """Generate answer.""" recent_tool_messages = [] for message in reversed(state["messages"]): if message.type == "tool": recent_tool_messages.append(message) else: break tool_messages = recent_tool_messages[::-1] docs_content = "\n\n".join(doc.content for doc in tool_messages) system_message_content = ( "You are an assistant for question-answering tasks. " "Use the following pieces of retrieved context to answer " "the question. If you don't know the answer, say that you " "don't know. Use three sentences maximum and keep the " "answer concise." "\n\n" f"{docs_content}" ) conversation_messages = [ message for message in state["messages"] if message.type in ("human", "system") or (message.type == "ai" and not message.tool_calls) ] prompt = [SystemMessage(system_message_content)] + conversation_messages response = llm.invoke(prompt) return {"messages": [response]} tools = ToolNode([retrieve]) graph_builder = StateGraph(MessagesState) graph_builder.add_node(query_or_respond) graph_builder.add_node(tools) graph_builder.add_node(generate) graph_builder.set_entry_point("query_or_respond") graph_builder.add_conditional_edges( "query_or_respond", tools_condition, {END: END, "tools": "tools"} ) graph_builder.add_edge("tools", "generate") graph_builder.add_edge("generate", END) memory = MemorySaver() graph = graph_builder.compile(checkpointer=memory) config = {"configurable": {"thread_id": "abc123"}} print(graph.get_graph().draw_mermaid()) input_message = "What is Task Decomposition?" for step in graph.stream( {"messages": [{"role": "user", "content": input_message}]}, stream_mode="values", config=config, ): step["messages"][-1].pretty_print() print() print( "-------------------------------------------------------------------------------------------------------------------------------------------------" ) print() input_message = "Can you look up some common ways of doing it?" for step in graph.stream( {"messages": [{"role": "user", "content": input_message}]}, stream_mode="values", config=config, ): step["messages"][-1].pretty_print()
変わっているのは、MemorySaver
を使ってメッセージの履歴を覚えておく箇所と
memory = MemorySaver() graph = graph_builder.compile(checkpointer=memory)
会話を2回行っていることですね。
input_message = "What is Task Decomposition?" for step in graph.stream( {"messages": [{"role": "user", "content": input_message}]}, stream_mode="values", config=config, ): step["messages"][-1].pretty_print() print() print( "-------------------------------------------------------------------------------------------------------------------------------------------------" ) print() input_message = "Can you look up some common ways of doing it?" for step in graph.stream( {"messages": [{"role": "user", "content": input_message}]}, stream_mode="values", config=config, ): step["messages"][-1].pretty_print()
実行結果。
%%{init: {'flowchart': {'curve': 'linear'}}}%% graph TD; __start__([<p>__start__</p>]):::first query_or_respond(query_or_respond) tools(tools) generate(generate) __end__([<p>__end__</p>]):::last __start__ --> query_or_respond; generate --> __end__; tools --> generate; query_or_respond -.-> __end__; query_or_respond -.-> tools; classDef default fill:#f2f0ff,line-height:1.2 classDef first fill-opacity:0 classDef last fill:#bfb6fc ================================ Human Message ================================= What is Task Decomposition? ================================== Ai Message ================================== Tool Calls: retrieve (a81f2325-1c22-4760-9433-6e302b387c8f) Call ID: a81f2325-1c22-4760-9433-6e302b387c8f Args: query: Task Decomposition ================================= Tool Message ================================= Name: retrieve Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '3c35387c-04eb-46cc-9abf-41481700294a', '_collection_name': 'tutorial_collection'} Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote. Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs. Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'be0a6a60-e261-4740-ae51-8d10be88136c', '_collection_name': 'tutorial_collection'} Content: Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention. Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error. ================================== Ai Message ================================== Task decomposition is the process of breaking down complex tasks into smaller, more manageable subgoals or steps. This can be done using various methods, including Large Language Models (LLMs) with simple prompting, task-specific instructions, or human inputs. The goal is to create a structured approach to solving problems and exploring solution spaces. ------------------------------------------------------------------------------------------------------------------------------------------------- ================================ Human Message ================================= Can you look up some common ways of doing it? ================================== Ai Message ================================== Tool Calls: retrieve (84f3b894-d4e6-41d1-a532-84c80bd84b27) Call ID: 84f3b894-d4e6-41d1-a532-84c80bd84b27 Args: query: common methods for task decomposition ================================= Tool Message ================================= Name: retrieve Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '3c35387c-04eb-46cc-9abf-41481700294a', '_collection_name': 'tutorial_collection'} Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote. Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs. Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'be0a6a60-e261-4740-ae51-8d10be88136c', '_collection_name': 'tutorial_collection'} Content: Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention. Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error. ================================== Ai Message ================================== According to the retrieved context, there are three common ways to do task decomposition: 1. Using LLM with simple prompting, such as "Steps for XYZ.\n1.", or "What are the subgoals for achieving XYZ?". 2. Using task-specific instructions, e.g. "Write a story outline." for writing a novel. 3. With human inputs.
ポイントは、この部分ですね。
================================== Ai Message ================================== Tool Calls: retrieve (84f3b894-d4e6-41d1-a532-84c80bd84b27) Call ID: 84f3b894-d4e6-41d1-a532-84c80bd84b27 Args: query: common methods for task decomposition
最初の質問は「タスク分解とはなんですか?」です。
================================ Human Message ================================= What is Task Decomposition?
次の質問は、「一般的な方法を調べてもらえますか?」です。
================================ Human Message ================================= Can you look up some common ways of doing it?
ここで、2回目の質問の時にChat modelがツールに聞いているのは「タスク分解の一般的な方法」なので、最初の質問の
コンテキストを引き継いでいることになります。
================================== Ai Message ================================== Tool Calls: retrieve (84f3b894-d4e6-41d1-a532-84c80bd84b27) Call ID: 84f3b894-d4e6-41d1-a532-84c80bd84b27 Args: query: common methods for task decomposition
これがLangGraphのチェックポイントの効果ですね。
memory = MemorySaver() graph = graph_builder.compile(checkpointer=memory)
Agent
最後はAgentです。
Build a Retrieval Augmented Generation (RAG) App: Part 2 / Agents
Agentについて書かれているページはこちらなのですが、
実際にはLangGraphの機能を使うことになります。
作成したソースコードはこちら。
hello_rag_react_agent.py
from typing import Tuple from langchain.chat_models import init_chat_model from langchain_core.documents import Document from langchain_core.messages import BaseMessage, SystemMessage from langchain_ollama import OllamaEmbeddings from langchain_qdrant import QdrantVectorStore from langchain_core.tools import tool from langgraph.checkpoint.memory import MemorySaver from langgraph.graph import MessagesState from langgraph.prebuilt import create_react_agent, ToolNode from qdrant_client import QdrantClient embeddings = OllamaEmbeddings( model="all-minilm:l6-v2", base_url="http://localhost:11434" ) client = QdrantClient("http://172.17.0.2:6333") vector_store = QdrantVectorStore( client=client, collection_name="tutorial_collection", embedding=embeddings ) llm = init_chat_model( "llama3.2:3b", model_provider="ollama", temperature=0, base_url="http://localhost:11434", ) @tool(response_format="content_and_artifact") def retrieve(query: str) -> Tuple[str, list[Document]]: """Retrieve information related to a query.""" retrieved_docs = vector_store.similarity_search(query, k=2) serialized = "\n\n".join( (f"Source: {doc.metadata}\nContent: {doc.page_content}") for doc in retrieved_docs ) return serialized, retrieved_docs def query_or_respond(state: MessagesState) -> dict[str, list[BaseMessage]]: """Generate tool call for retrieval or respond.""" llm_with_tools = llm.bind_tools([retrieve]) response = llm_with_tools.invoke(state["messages"]) return {"messages": [response]} def generate(state: MessagesState) -> dict[str, list[BaseMessage]]: """Generate answer.""" recent_tool_messages = [] for message in reversed(state["messages"]): if message.type == "tool": recent_tool_messages.append(message) else: break tool_messages = recent_tool_messages[::-1] docs_content = "\n\n".join(doc.content for doc in tool_messages) system_message_content = ( "You are an assistant for question-answering tasks. " "Use the following pieces of retrieved context to answer " "the question. If you don't know the answer, say that you " "don't know. Use three sentences maximum and keep the " "answer concise." "\n\n" f"{docs_content}" ) conversation_messages = [ message for message in state["messages"] if message.type in ("human", "system") or (message.type == "ai" and not message.tool_calls) ] prompt = [SystemMessage(system_message_content)] + conversation_messages response = llm.invoke(prompt) return {"messages": [response]} tools = ToolNode([retrieve]) memory = MemorySaver() agent_executor = create_react_agent(llm, [retrieve], checkpointer=memory) print(agent_executor.get_graph().draw_mermaid()) config = {"configurable": {"thread_id": "abc123"}} input_message = ( "What is the standard method for Task Decomposition?\n\n" "Once you get the answer, look up common extensions of that method." ) for event in agent_executor.stream( {"messages": [{"role": "user", "content": input_message}]}, stream_mode="values", config=config, ): event["messages"][-1].pretty_print()
変わったのはこのあたりですね。
memory = MemorySaver() agent_executor = create_react_agent(llm, [retrieve], checkpointer=memory) print(agent_executor.get_graph().draw_mermaid()) config = {"configurable": {"thread_id": "abc123"}} input_message = ( "What is the standard method for Task Decomposition?\n\n" "Once you get the answer, look up common extensions of that method." ) for event in agent_executor.stream( {"messages": [{"role": "user", "content": input_message}]}, stream_mode="values", config=config, ): event["messages"][-1].pretty_print()
グラフ構築がなくなり、Mermaidで見てもすごくあっさりした内容になりました。
%%{init: {'flowchart': {'curve': 'linear'}}}%% graph TD; __start__([<p>__start__</p>]):::first agent(agent) tools(tools) __end__([<p>__end__</p>]):::last __start__ --> agent; tools --> agent; agent -.-> tools; agent -.-> __end__; classDef default fill:#f2f0ff,line-height:1.2 classDef first fill-opacity:0 classDef last fill:#bfb6fc
ツールは用意していますが、あとはAgentが判断するようです。
実行。
$ uv run hello_rag_react_agent.py %%{init: {'flowchart': {'curve': 'linear'}}}%% graph TD; __start__([<p>__start__</p>]):::first agent(agent) tools(tools) __end__([<p>__end__</p>]):::last __start__ --> agent; tools --> agent; agent -.-> tools; agent -.-> __end__; classDef default fill:#f2f0ff,line-height:1.2 classDef first fill-opacity:0 classDef last fill:#bfb6fc ================================ Human Message ================================= What is the standard method for Task Decomposition? Once you get the answer, look up common extensions of that method. ================================== Ai Message ================================== Tool Calls: retrieve (31a30fca-7d65-43c6-a85c-2163f635d0c6) Call ID: 31a30fca-7d65-43c6-a85c-2163f635d0c6 Args: query: standard method for Task Decomposition ================================= Tool Message ================================= Name: retrieve Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '3c35387c-04eb-46cc-9abf-41481700294a', '_collection_name': 'tutorial_collection'} Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote. Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs. Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'be0a6a60-e261-4740-ae51-8d10be88136c', '_collection_name': 'tutorial_collection'} Content: Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention. Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error. ================================== Ai Message ================================== The standard method for Task Decomposition is not a single, universally accepted approach, but rather a set of techniques that can be combined and adapted to suit specific use cases. One widely used method is the "CoT" (Cognitive Tree) framework, which involves decomposing complex tasks into smaller subgoals and representing them as a tree structure. This framework has been extended by various researchers to improve its performance and flexibility. Another approach is to use task-specific instructions or prompts to guide the decomposition process. For example, using prompting techniques like "Steps for XYZ.\n1." or "What are the subgoals for achieving XYZ?" can help generate a clear and structured decomposition plan. Additionally, some researchers have explored the use of long-term planning and task decomposition in LLMs, which can be challenging due to limitations in context length and representation power. However, techniques like self-reflection and learning from past mistakes can help improve the robustness of these systems. Common extensions of the CoT framework include: * Tree of Thoughts (Yao et al. 2023), which explores multiple reasoning possibilities at each step * Finite context length limitations, which require mechanisms to work within restricted communication bandwidth * Challenges in long-term planning and task decomposition, which can be addressed through techniques like self-reflection and learning from past mistakes. These extensions highlight the ongoing research and development in the field of task decomposition for LLMs.
質問を受けた後に1度検索を行い、その情報を元に回答するようになっていますね。
これでLangChainのRAG Part 2のチュートリアルはおしまいです。
おわりに
LangChainのチュートリアルからRAG Part 2を試してみました。
だいぶ使う要素が増えてきた感じがします。使っている方も、さらっと流しがちになってくるのですが…。
こうなると、チュートリアルの写しではなくテーマを変えてやってみたくなりますね。次はそのあたりを考えてみましょう。