ããã¯ããªã«ãããããŠæžãããã®ïŒ
åã«ãLangChainãå§ããŠã¿ãŸããã
LangChainを始めてみる(チュートリアルのチャットモデルとプロンプトテンプレートを試す) - CLOVER🍀
ä»åã¯ãã¥ãŒããªã¢ã«ã®ç¶ãã§ãã»ãã³ãã£ãã¯æ€çŽ¢ããã£ãŠã¿ãããšæããŸãã
Build a semantic search engine | 🦜️🔗 LangChain
ã»ãã³ãã£ãã¯æ€çŽ¢ã®ãã¥ãŒããªã¢ã«
LangChainã®ã»ãã³ãã£ãã¯æ€çŽ¢ã®ãã¥ãŒããªã¢ã«ã§ãã©ããªããšãæ±ãã®ããèŠãŠã¿ãŸãã
Build a semantic search engine | 🦜️🔗 LangChain
ããã§ã¯ä»¥äžã®5ã€ã®ã³ã³ã»ãããæ±ãããã§ãïŒæèšãããŠããã®ã¯3ã€ã§ãããèªã¿é²ããŠãããšText splittersãšRetrieversã
åºãŠããŸãïŒã
- Document loaders | 🦜️🔗 LangChain
- Text splitters | 🦜️🔗 LangChain
- Embedding models | 🦜️🔗 LangChain
- Vector stores | 🦜️🔗 LangChain
- Retrievers | 🦜️🔗 LangChain
LangChainã§ã¯ãããã®æŠå¿µãæœè±¡åããŠæ±ããLLMã¯ãŒã¯ãããŒã«ã€ã³ãã°ã¬ãŒã·ã§ã³ããŸãã
ãããã®ã³ã³ã»ããã¯ãã¢ãã«ãæšè«ããã«ããã£ãŠå¥ã®ããŒã¿ãå¿
èŠãšããã¢ããªã±ãŒã·ã§ã³ã«ãšã£ãŠéèŠãªãã®ã«ãªããŸãã
ããšãã°RAGã§ããã
Retrieval augmented generation (RAG) | 🦜️🔗 LangChain
ä»åã¯RAGã¯çœ®ããŠãããŠã3ã€ã®ã³ã³ã»ãããããããèŠãŠãããŸãããã
Document loadersã¯ãããã¥ã¡ã³ããªããžã§ã¯ããããŒãããããã«èšèšããããã®ã§ãSlackãNotionãGoogle Driveãšãã£ã
æ§ã
ãªããŒã¿ãœãŒã¹ããããã¥ã¡ã³ããããŒãã§ããŸãã
Document loaders | 🦜️🔗 LangChain
å©çšå¯èœãªã€ã³ãã°ã¬ãŒã·ã§ã³ã¯ãã¡ãã
Document loaders | 🦜️🔗 LangChain
ããŒããããããã¥ã¡ã³ãã¯Document
ãšããåã§æœè±¡åããã3ã€ã®å±æ§ãæã¡ãŸãã
- page_content ⊠ã³ã³ãã³ãã衚ãæåå
- metadata ⊠任æã®ã¡ã¿ããŒã¿ãå«ãèŸæž
- id ⊠ïŒãªãã·ã§ã³ïŒããã¥ã¡ã³ãã®æååèå¥å
Document — 🦜🔗 LangChain documentation
ãã¥ãŒããªã¢ã«ã§ã¯ãPDFããããŒã¿ãããŒãããŸãã
Document loadersã®how toã¬ã€ãã«ã€ããŠã¯ããã¡ãã«äžèЧããããŸãã
How-to guides / Components / Document loaders
Text splittersã¯ã倧ããªããã¹ããæ±ãããããã£ã³ã¯ã«åå²ãããã®ã§ãã
Text splitters | 🦜️🔗 LangChain
ã©ãããŠåå²ããããšãããšããã®ããããçç±ã®ããã§ãã
- äžåäžãªããã¥ã¡ã³ãã®é·ããæ±ã
- çŸå®ã®ããã¥ã¡ã³ãã®ã³ã¬ã¯ã·ã§ã³ã«ã¯æ§ã ãªé·ãã®ããã¥ã¡ã³ããå«ãŸããããšããããããåå²ãããšãã¹ãŠã®ããã¥ã¡ã³ãã«å¯ŸããŠäžè²«ããåŠçãè¡ããããã«ãªã
- ã¢ãã«ã®å¶éãå
æãã
- å€ãã®åã蟌ã¿ã¢ãã«ãèšèªã¢ãã«ã«ã¯æå€§å ¥åãµã€ãºã®å¶éããããããã¹ããšåå²ããããšã§ãããã®å¶éãè¶ ããããã¥ã¡ã³ããåŠçã§ããããã«ãªã
- 衚çŸå質ã®åäž
- é·ãããã¥ã¡ã³ãã®å Žåãåã蟌ã¿ããã®ä»ã®è¡šçŸã¯å€ãã®æ å ±ãååŸããããšããããå質ãäœäžããå¯èœæ§ããã
- åå²ãããšãåã»ã¯ã·ã§ã³ãããéäžçãã€æ£ç¢ºã«è¡šçŸã§ããããã«ãªã
- æ€çŽ¢ç²ŸåºŠã®åäž
- æ å ±æ€çŽ¢ã·ã¹ãã ã§ã¯åå²ã«ãã£ãŠæ€çŽ¢çµæã®ç²åºŠãåäžããã¯ãšãªãŒãšé¢é£ããããã¥ã¡ã³ãã®ã»ã¯ã·ã§ã³ãããæ£ç¢ºã«äžèŽãããããšãã§ãã
- èšç®ãªãœãŒã¹ã®æé©å
- ããã¹ãã®ãã£ã³ã¯ãå°ããããããšã§ãã¡ã¢ãªãŒå¹çãåäžãåŠçã¿ã¹ã¯ã®äžŠååãã§ããããã«ãªã
ããã¹ãåå²ã®ã¢ãããŒãã¯ä»¥äžã®4ã€ããããŸãã
- é·ãããŒã¹
- Text splitters / Approaches / Length-based
- èšèªã¢ãã«åãã«äŸ¿å©ãªããŒã¯ã³ããŒã¹ãã·ã³ãã«ã«æåæ°ã«åºã¥ããŠåå²ããæåããŒã¹ã®ã¢ãããŒãããã
- ããã¹ãæ§é ããŒã¹
- Text splitters / Approaches / Text-structured based
- ããã¹ãã¯æ®µèœãæãåèªãªã©ã®éå±€çãªåäœã§æ§æãããã®ã§ããã®æ§é ãå©çšããŠèªç¶ãªèšèªã®æµããåå²åŸã®æå³ã®äžè²«æ§ãç¶æãããŸãŸåå²ãã
- ããã¥ã¡ã³ãæ§é ããŒã¹
- Text splitters / Approaches / Document-structured based
- HTMLãMarkdownãJSONãªã©ãããã¥ã¡ã³ãã®ãã©ãŒãããã«ãã£ãŠã¯åºæã®æ§é ãããããã®ãããªå Žåã¯æå³çã«é¢é£ããããã¹ããã°ã«ãŒãåãããããšãå€ããããããã¥ã¡ã³ãããã®æ§é ã«åºã¥ããŠåå²ãããšäŸ¿å©ãªããšããã
- ã»ãã³ãã£ãã¯ã»æå³çããŒã¹
Text splittersã®how toã¬ã€ãã«ã€ããŠã¯ããã¡ãã«äžèЧããããŸãã
How-to guides / Components / Text splitters
Embeddeing modelsã¯ãããã¹ãããã¯ãã«ç©ºéã«åã蟌ããããããåã蟌ã¿ã«å¯Ÿããæœè±¡åã§ãã
Embedding models | 🦜️🔗 LangChain
Embeddeing modelsã§ã¯ã2ã€ã®ã¡ãœããã䜿ããŸãã
- embed_documents âŠ è€æ°ã®ããã¥ã¡ã³ãã«å¯Ÿããããã¹ãåã蟌ã¿ãè¡ã
- embed_query ⊠åäžã®ã¯ãšãªãŒã«å¯Ÿããããã¹ãåã蟌ã¿ãè¡ã
ãã®åºå¥ã¯éèŠã§ãã¢ãã«ã«ãã£ãŠã¯ããã¥ã¡ã³ãïŒæ€çŽ¢å¯Ÿè±¡ïŒãšã¯ãšãªãŒïŒæ€çŽ¢ãè¡ãããã®å
¥åïŒã«å¯ŸããŠã
ç°ãªãåãèŸŒã¿æŠç¥ããšã£ãŠããå Žåãããããã§ãã
åã蟌ã¿ã®é¡äŒŒåºŠã¯ã以äžã®3ã€ã®è·é¢é¢æ°ïŒé¡äŒŒæ§ã¡ããªã¯ã¹ïŒã§æž¬å®ããŸãã
- ã³ãµã€ã³é¡äŒŒåºŠ
- ãŠãŒã¯ãªããè·é¢
- ãããç©
å©çšå¯èœãªã€ã³ãã°ã¬ãŒã·ã§ã³ã¯ãã¡ãã
Embedding models | 🦜️🔗 LangChain
Embeddeing modelsã®how toã¬ã€ãã«ã€ããŠã¯ããã¡ãã«äžèЧããããŸãã
How-to guides / Components / Embeddeing models
Vector storesã¯ãããã¹ãã®åã蟌ã¿ïŒãã¯ãã«è¡šçŸïŒã«åºã¥ããŠæ
å ±ã®ã€ã³ããã¯ã¹äœæãšååŸãã§ããããŒã¿ã¹ãã¢ã«
察ããæœè±¡åã§ãã
å©çšå¯èœãªã€ã³ãã°ã¬ãŒã·ã§ã³ã¯ãã¡ãã
Vector storesã§ã¯ãäž»ã«ä»¥äžã®ã¡ãœããã䜿çšããŸãã
- add_documents ⊠ãã¯ãã«ããŒã¿ããŒã¹ã«ããã¹ãã®ãªã¹ãã远å ãã
- delete ⊠ãã¯ãã«ããŒã¿ããŒã¹ããããã¥ã¡ã³ãã®ãªã¹ããåé€ãã
- similarity_search ⊠æå®ãããã¯ãšãªãŒã«å¯ŸããŠãé¡äŒŒããããã¥ã¡ã³ããæ€çŽ¢ãã
ãã¥ãŒããªã¢ã«ã§èšã£ãŠããã»ãã³ãã£ãã¯æ€çŽ¢ã¯ããã®é¡äŒŒããããã¥ã¡ã³ããæ€çŽ¢ããããšãèšã£ãŠããŸãã
LangChainã«ãããã»ãšãã©ã®Vector storesã§ã¯ãåæåã®éã«Embedding modelãå¿ èŠã«ãªããŸãã
åæååŸã¯åè¿°ã®3ã€ã®ã¡ãœããã䜿ã£ãŠããããã§ãããããã¥ã¡ã³ãã«ä»äžããã¡ã¿ããŒã¿ã§ã®ãã£ã«ã¿ãªã³ã°ã
å¯èœãªå ŽåããããŸãã
Vector storesã®how toã¬ã€ãã«ã€ããŠã¯ããã¡ãã«äžèЧããããŸãã
How-to guides / Components / Vector stores
ãŸãRetrieversã®æ¹ã«ãªããŸãããããŒã¿ã¹ãã¢ã«ãã£ãŠã¯ããŒã¯ãŒãæ€çŽ¢ãšã»ãã³ãã£ãã¯æ€çŽ¢ãçµã¿åããã
ãã€ããªããæ€çŽ¢ã䜿ãããã®ããããŸãã
æåŸã¯Retrieversã§ããRetrieversã¯ãæ§ã ãªã¿ã€ãã®æ€çŽ¢ã·ã¹ãã ãšå¯Ÿè©±ããããã®ã€ã³ã¿ãŒãã§ãŒã¹ã§ãã
å©çšå¯èœãªã€ã³ãã°ã¬ãŒã·ã§ã³ã¯ãã¡ãã
Retrieversã«ã¯ãšãªãŒãæž¡ããŠåŒã³åºããšã次ã®å±æ§ãæã€ããã¥ã¡ã³ãã®ãªã¹ããè¿ããŸãã
- page_content ⊠ããã¥ã¡ã³ãã®ã³ã³ãã³ãïŒæååïŒ
- metadata ⊠ããã¥ã¡ã³ãã«é¢é£ä»ããããä»»æã®ã¡ã¿ããŒã¿
Retrieversã®how toã¬ã€ãã«ã€ããŠã¯ããã¡ãã«äžèЧããããŸãã
How-to guides / Components / Retrievers
ä»åã¯ãã¥ãŒããªã¢ã«ã®å 容ãããEmbedding modelã«OllamaãVector storeã«Qdrantã䜿ã£ãŠè©ŠããŠã¿ãããšæããŸãã
ç°å¢
ä»åã®ç°å¢ã¯ãã¡ãã
$ python3 --version Python 3.12.3 $ uv --version uv 0.6.2
Ollamaã
$ bin/ollama serve $ bin/ollama --version ollama version is 0.5.11
Qdrantã¯172.17.0.2ã§åäœããŠãããã®ãšããŸãã
$ ./qdrant --version qdrant 1.13.4
æºå
ãŸãã¯ãããžã§ã¯ããäœæããŸãã
$ uv init --vcs none langchain-tutorial-semantic-search $ cd langchain-tutorial-semantic-search $ rm main.py
ä»åå¿ èŠãªäŸåé¢ä¿ãã€ã³ã¹ããŒã«ã
$ uv add langchain-community langchain-ollama langchain-qdrant pypdf
mypyãšRuffãå ¥ããŠãããŸãã
$ uv add --dev mypy ruff
ã€ã³ã¹ããŒã«ãããäŸåé¢ä¿ã®äžèЧã
$ uv pip list Package Version ------------------------ --------- aiohappyeyeballs 2.4.6 aiohttp 3.11.12 aiosignal 1.3.2 annotated-types 0.7.0 anyio 4.8.0 attrs 25.1.0 certifi 2025.1.31 charset-normalizer 3.4.1 dataclasses-json 0.6.7 frozenlist 1.5.0 greenlet 3.1.1 grpcio 1.70.0 grpcio-tools 1.70.0 h11 0.14.0 h2 4.2.0 hpack 4.1.0 httpcore 1.0.7 httpx 0.28.1 httpx-sse 0.4.0 hyperframe 6.1.0 idna 3.10 jsonpatch 1.33 jsonpointer 3.0.0 langchain 0.3.19 langchain-community 0.3.18 langchain-core 0.3.37 langchain-ollama 0.2.3 langchain-qdrant 0.2.0 langchain-text-splitters 0.3.6 langsmith 0.3.10 marshmallow 3.26.1 multidict 6.1.0 mypy 1.15.0 mypy-extensions 1.0.0 numpy 2.2.3 ollama 0.4.7 orjson 3.10.15 packaging 24.2 portalocker 2.10.1 propcache 0.3.0 protobuf 5.29.3 pydantic 2.10.6 pydantic-core 2.27.2 pydantic-settings 2.8.0 pypdf 5.3.0 python-dotenv 1.0.1 pyyaml 6.0.2 qdrant-client 1.13.2 requests 2.32.3 requests-toolbelt 1.0.0 ruff 0.9.7 setuptools 75.8.0 sniffio 1.3.1 sqlalchemy 2.0.38 tenacity 9.0.0 typing-extensions 4.12.2 typing-inspect 0.9.0 urllib3 2.3.0 yarl 1.18.3 zstandard 0.23.0
pyproject.toml
[project] name = "langchain-tutorial-semantic-search" version = "0.1.0" description = "Add your description here" readme = "README.md" requires-python = ">=3.12" dependencies = [ "langchain-community>=0.3.18", "langchain-ollama>=0.2.3", "langchain-qdrant>=0.2.0", "pypdf>=5.3.0", ] [dependency-groups] dev = [ "mypy>=1.15.0", "ruff>=0.9.7", ] [tool.mypy] strict = true disallow_any_unimported = true #disallow_any_expr = true disallow_any_explicit = true warn_unreachable = true pretty = true
LangChainã®ãã¥ãŒããªã¢ã«ã®ã»ãã³ãã£ãã¯æ€çŽ¢ã詊ã
ããã§ã¯ããã¡ãã«æ²¿ã£ãŠé²ããŠãããŸãã
Build a semantic search engine | 🦜️🔗 LangChain
å 容ã®åºåããèŠãŠã3ã€ã«åããŠé²ããŠãããŸãããã
ãã¯ãã«ããŒã¿ããŒã¹ã«ããã¥ã¡ã³ããä¿åãã
æåã¯ããã¯ãã«ããŒã¿ããŒã¹ã«ããã¥ã¡ã³ããä¿åãããŸã§ããã£ãŠã¿ãŸãã
ãã¥ãŒããªã¢ã«ã§ã¯ããã®3ã€ã®ã»ã¯ã·ã§ã³ã§ããã
- Build a semantic search engine / Documents and Document Loaders
- Build a semantic search engine / Embeddings
- Build a semantic search engine / Vector stores
Vector storesã«é¢ããŠã¯æ€çŽ¢ãŸã§ã¯è¡ããŸããã
äœæãããœãŒã¹ã³ãŒãã¯ãã¡ãã
hello_load_documents.py
from langchain_community.document_loaders import PyPDFLoader from langchain_core.documents import Document from langchain_ollama import OllamaEmbeddings from langchain_qdrant import QdrantVectorStore from langchain_text_splitters import RecursiveCharacterTextSplitter from qdrant_client import QdrantClient from qdrant_client.http.models import Distance, VectorParams documents = [ Document( page_content="Dogs are great companions, known for their loyalty and friendliness.", metadata={"source": "mammal-pets-doc"}, ), Document( page_content="Cats are independent pets that often enjoy their own space.", metadata={"source": "mammal-pets-doc"}, ), ] file_path = "example_data/nke-10k-2023.pdf" loader = PyPDFLoader(file_path) docs = loader.load() print(f"loaded document count = {len(docs)}") print() print(f"{docs[0].page_content[:200]}\n") print() print(docs[0].metadata) print() text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, add_start_index=True ) all_splits = text_splitter.split_documents(docs) print(f"all splits count = {len(all_splits)}") embeddings = OllamaEmbeddings( model="all-minilm:l6-v2", base_url="http://localhost:11434" ) vector_1 = embeddings.embed_query(all_splits[0].page_content) vector_2 = embeddings.embed_query(all_splits[1].page_content) assert len(vector_1) == len(vector_2) print(f"Generated vectors of length {len(vector_1)}") print(vector_1[:10]) client = QdrantClient("http://172.17.0.2:6333") client.delete_collection(collection_name="tutorial_collection") client.create_collection( collection_name="tutorial_collection", vectors_config=VectorParams(size=384, distance=Distance.COSINE), ) vector_store = QdrantVectorStore( client=client, collection_name="tutorial_collection", embedding=embeddings ) ids = vector_store.add_documents(all_splits)
説æã¯ããããæžããŠãããŸãã
å®è¡ã¯ãã¡ãã
$ uv run hello_load_documents.py
Document
ã®ãµã³ãã«ãããã§å®çŸ©ããããŒã¿ã¯äœ¿ããããããŸã§åã®ãµã³ãã«ãšããŠã®æç€ºã§ããã
documents = [ Document( page_content="Dogs are great companions, known for their loyalty and friendliness.", metadata={"source": "mammal-pets-doc"}, ), Document( page_content="Cats are independent pets that often enjoy their own space.", metadata={"source": "mammal-pets-doc"}, ), ]
ä»åãå®éã«äœ¿ãããã¥ã¡ã³ãã®ããŒããè¡ãã³ãŒãã¯ãã¡ãã
file_path = "example_data/nke-10k-2023.pdf"
loader = PyPDFLoader(file_path)
docs = loader.load()
èªã¿èŸŒã¿å¯Ÿè±¡ã¯PDFãã¡ã€ã«ã§ã䜿çšããŠããã®ã¯PyPDFLoader
ã§ããã
How to load PDFs | 🦜️🔗 LangChain
example_data/nke-10k-2023.pdf
ãšããã®ã¯ããã®PDFãã¡ã€ã«ã®ããšã§ãã
ããŠã³ããŒãããŠãããŒã«ã«ãã¡ã€ã«ãšããŠèªãããã«ããŸãã
$ mkdir example_data $ curl -L https://raw.githubusercontent.com/langchain-ai/langchain/refs/tags/langchain-core%3D%3D0.3.37/docs/docs/example_data/nke-10k-2023.pdf -o example_data/nke-10k-2023.pdf
èªã¿èŸŒãã ããã¥ã¡ã³ãã®å 容ã衚瀺ã
print(f"loaded document count = {len(docs)}") print() print(f"{docs[0].page_content[:200]}\n") print() print(docs[0].metadata) print()
ããããèªã¿èŸŒãã ããã¥ã¡ã³ãæ°ãæåã®ããã¥ã¡ã³ãã®200æåãæåã®ããã¥ã¡ã³ãã®ã¡ã¿ããŒã¿ã衚瀺ããŠããŸããã
ãããªçµæã«ãªããŸãã
loaded document count = 107 Table of Contents UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM 10-K (Mark One) â ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(D) OF THE SECURITIES EXCHANGE ACT OF 1934 F {'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2023-07-20T16:22:00-04:00', 'title': '0000320187-23-000039', 'author': 'EDGAR Online, a division of Donnelley Financial Solutions', 'subject': 'Form 10-K filed on 2023-07-20 for the period ending 2023-05-31', 'keywords': '0000320187-23-000039; ; 10-K', 'moddate': '2023-07-20T16:22:08-04:00', 'source': 'example_data/nke-10k-2023.pdf', 'total_pages': 107, 'page': 0, 'page_label': '1'}
次ã¯ããã¹ãã®åå²ã§ãã
text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, add_start_index=True ) all_splits = text_splitter.split_documents(docs) print(f"all splits count = {len(all_splits)}")
ããã§ã¯ããã¥ã¡ã³ãã1000æåã®ãã£ã³ã¯ã«åå²ãããã£ã³ã¯éã®éè€ã200æåã«ããŠããŸãããã£ã³ã¯éã§éè€ãã
ç¯å²ãæãããããšã§ããã£ã³ã¯ã«å«ãŸããæãéèŠãªã³ã³ããã¹ãããåé¢ãããŠããŸãå¯èœæ§ã軜æžããŸãã
RecursiveCharacterTextSplitter
ã䜿ãããšã§ãåãã£ã³ã¯ãé©åãªãµã€ãºã«ãªããŸã§ååž°çã«åå²ããŸããåå²ã«ã¯ã
æ¹è¡ãªã©ã®äžè¬çãªã»ãã¬ãŒã¿ãŒã䜿çšããŸãã
How to recursively split text by characters | 🦜️🔗 LangChain
add_start_index=True
ãšããã®ã¯ãããã¥ã¡ã³ãå
ã®æåã®ãã£ã³ã¯ã«start_index
ãšããã¡ã¿ããŒã¿ãä»äžããèšå®ã§ãã
ä»åã¯516ã®ãã£ã³ã¯ã«åå²ãããŸããã
all splits count = 516
ããã¹ãã®åã蟌ã¿ã
embeddings = OllamaEmbeddings( model="all-minilm:l6-v2", base_url="http://localhost:11434" ) vector_1 = embeddings.embed_query(all_splits[0].page_content) vector_2 = embeddings.embed_query(all_splits[1].page_content) assert len(vector_1) == len(vector_2) print(f"Generated vectors of length {len(vector_1)}") print(vector_1[:10])
ä»åã¯ãOllamaã䜿çšããŠããã¹ãåã蟌ã¿ãè¡ããŸãããã¢ãã«ã¯all-minilm:l6-v2ã䜿ã£ãŠããŸãã
embeddings = OllamaEmbeddings( model="all-minilm:l6-v2", base_url="http://localhost:11434" )
OllamaEmbeddings | 🦜️🔗 LangChain
ããã§ã¯ãµã³ãã«ãšããŠããã£ã³ã¯ã®æåã®2ã€ããã¯ãã«åããŠãã¯ãã«ã®æ¬¡å
æ°ã確èªããŠããŸããããããã
æåã®ãã¯ãã«10åã衚瀺ããŠããŸãã
vector_1 = embeddings.embed_query(all_splits[0].page_content) vector_2 = embeddings.embed_query(all_splits[1].page_content) assert len(vector_1) == len(vector_2) print(f"Generated vectors of length {len(vector_1)}") print(vector_1[:10])
ä»åã®çµæã¯ãã¡ããæ¬¡å æ°ã¯384ã§ããã
Generated vectors of length 384 [-0.024527563, -0.118282035, 0.004233229, 0.018769965, 0.0025654335, 0.09103639, 0.035418395, 0.012415745, -0.0065588024, -0.033638902]
æåŸã¯ãQdrantãžããã¹ãåã蟌ã¿ããã€ã€ããŒã¿ãä¿åããŸãã
client = QdrantClient("http://172.17.0.2:6333") client.delete_collection(collection_name="tutorial_collection") client.create_collection( collection_name="tutorial_collection", vectors_config=VectorParams(size=384, distance=Distance.COSINE), ) vector_store = QdrantVectorStore( client=client, collection_name="tutorial_collection", embedding=embeddings ) ids = vector_store.add_documents(all_splits)
ããã¯Qdrantã®ã¯ã©ã€ã¢ã³ããçŽæ¥æäœããQdrantã®ã³ã¬ã¯ã·ã§ã³ãäœæããŠããŸããæ¬¡å
æ°ã¯384ãè·é¢ã¡ããªã¯ã¹ã¯
ã³ãµã€ã³é¡äŒŒåºŠã«ããŸããã
client = QdrantClient("http://172.17.0.2:6333") client.delete_collection(collection_name="tutorial_collection") client.create_collection( collection_name="tutorial_collection", vectors_config=VectorParams(size=384, distance=Distance.COSINE), )
ãããŠQdrantã®ã¯ã©ã€ã¢ã³ããã³ã¬ã¯ã·ã§ã³åãOllamaã䜿ã£ãEmbedding modelãæå®ããŠQdrantãšã®Vector storeã
äœæããŸãã
vector_store = QdrantVectorStore(
client=client, collection_name="tutorial_collection", embedding=embeddings
)
ids = vector_store.add_documents(all_splits)
æåŸã«ããã¥ã¡ã³ããä¿åããŠããŸãã
ãã®æãããã¥ã¡ã³ããä¿åããæã«åæã«ããã¹ãåã蟌ã¿ãè¡ãããŸãã
ãªã®ã§ããã®éšåããã®ã¹ã¯ãªããã§1çªéãã§ãã
ids = vector_store.add_documents(all_splits)
http://[QdrantãåäœããŠãããã¹ã]:6333/dashboard
ã§Qdrantã®Web UIãèŠããããã«ããŠããã®ã§ã確èªããŠãããŸãã
è¯ãããã§ãã
æ€çŽ¢ãã
次ã¯ããã¯ãã«ããŒã¿ããŒã¹ããæ€çŽ¢ããŠã¿ãŸãã
ãã®éšåã§ããã
Build a semantic search engine / Vector stores / Usage
äœæãããœãŒã¹ã³ãŒãã¯ãã¡ãã
hello_query.py
from langchain_ollama import OllamaEmbeddings from langchain_qdrant import QdrantVectorStore from qdrant_client import QdrantClient import sys embeddings = OllamaEmbeddings( model="all-minilm:l6-v2", base_url="http://localhost:11434" ) client = QdrantClient("http://172.17.0.2:6333") vector_store = QdrantVectorStore( client=client, collection_name="tutorial_collection", embedding=embeddings ) query = sys.argv[1] print(f"query = {query}") print() results = vector_store.similarity_search(query) print(f"result count = {len(results)}") print(f"first document = {results[0]}")
Vector storeãäœæãããšãããŸã§ã¯ãããã¥ã¡ã³ãã®ããŒãã®æãšç»å Žäººç©ã¯å€ãããŸããã
ã¯ãšãªãŒã¯ã³ãã³ãã©ã€ã³åŒæ°ãšããŠåãåãããã«ããŸããã
query = sys.argv[1]
æ€çŽ¢ã¯ãsimilarity_search
ã§è¡ããŸãããã®æã«ã¯ãšãªãŒããã¯ãã«åãããããšã«ãªããŸãã
results = vector_store.similarity_search(query)
ä»åã¯ãããä»¶æ°ãšããã¥ã¡ã³ãã®æåã®1ä»¶ã衚瀺ããããã«ããŸããã
å®è¡çµæã
$ uv run hello_query.py 'How many distribution centers does Nike have in the US?' query = How many distribution centers does Nike have in the US? result count = 4 first document = page_content='direct to consumer operations sell products through the following number of retail stores in the United States: U.S. RETAIL STORES NUMBER NIKE Brand factory stores 213 NIKE Brand in-line stores (including employee-only stores) 74 Converse stores (including factory stores) 82 TOTAL 369 In the United States, NIKE has eight significant distribution centers. Refer to Item 2. Properties for further information. 2023 FORM 10-K 2' metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2023-07-20T16:22:00-04:00', 'title': '0000320187-23-000039', 'author': 'EDGAR Online, a division of Donnelley Financial Solutions', 'subject': 'Form 10-K filed on 2023-07-20 for the period ending 2023-05-31', 'keywords': '0000320187-23-000039; ; 10-K', 'moddate': '2023-07-20T16:22:08-04:00', 'source': 'example_data/nke-10k-2023.pdf', 'total_pages': 107, 'page': 4, 'page_label': '5', 'start_index': 3125, '_id': 'b88bb8f3-10c1-4147-9047-4ecdfd335912', '_collection_name': 'tutorial_collection'} $ uv run hello_query.py 'When was Nike incorporated?' query = When was Nike incorporated? result count = 4 first document = page_content='Table of Contents PART I ITEM 1. BUSINESS GENERAL NIKE, Inc. was incorporated in 1967 under the laws of the State of Oregon. As used in this Annual Report on Form 10-K (this "Annual Report"), the terms "we," "us," "our," "NIKE" and the "Company" refer to NIKE, Inc. and its predecessors, subsidiaries and affiliates, collectively, unless the context indicates otherwise. Our principal business activity is the design, development and worldwide marketing and selling of athletic footwear, apparel, equipment, accessories and services. NIKE is the largest seller of athletic footwear and apparel in the world. We sell our products through NIKE Direct operations, which are comprised of both NIKE-owned retail stores and sales through our digital platforms (also referred to as "NIKE Brand Digital"), to retail accounts and to a mix of independent distributors, licensees and sales' metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2023-07-20T16:22:00-04:00', 'title': '0000320187-23-000039', 'author': 'EDGAR Online, a division of Donnelley Financial Solutions', 'subject': 'Form 10-K filed on 2023-07-20 for the period ending 2023-05-31', 'keywords': '0000320187-23-000039; ; 10-K', 'moddate': '2023-07-20T16:22:08-04:00', 'source': 'example_data/nke-10k-2023.pdf', 'total_pages': 107, 'page': 3, 'page_label': '4', 'start_index': 0, '_id': 'b19e33b5-8e05-4282-9063-bc308dd64e0d', '_collection_name': 'tutorial_collection'}
ãã¥ãŒããªã¢ã«ãšåãã«ãªããŸãããã
éåæã«ããã«ã¯asimilarity_search
ãšã¡ãœããåã®å
é ã«a
ãä»ããã¿ããã§ãã
ãŸããã¹ã³ã¢ãåŸãã«ã¯similarity_search_with_score
ã¡ãœããã䜿ãããã§ããã
Retrieverã䜿ã
æåŸã¯Retrieverã䜿ããŸããããã§ã¯ã¡ãã£ãšäœ¿ã£ãŠã¿ãããšããæãã§ããã
Build a semantic search engine / Retrievers
ä»åã®å Žåã¯Vector storeããRetrieverãååŸããã®ã§ããã@chain
ãäœ¿ãæ¹æ³ãšVector storeããas_retriever
ã¡ãœããã
䜿ã£ãŠRetrieverãååŸããæ¹æ³ã䜿ããŸãã
ãœãŒã¹ã³ãŒãã¯ãã¡ãã
hello_retriever.py
from langchain_core.documents import Document from langchain_core.runnables import chain from langchain_ollama import OllamaEmbeddings from langchain_qdrant import QdrantVectorStore from qdrant_client import QdrantClient embeddings = OllamaEmbeddings( model="all-minilm:l6-v2", base_url="http://localhost:11434" ) client = QdrantClient("http://172.17.0.2:6333") vector_store = QdrantVectorStore( client=client, collection_name="tutorial_collection", embedding=embeddings ) @chain def retriever(query: str) -> list[Document]: return vector_store.similarity_search(query, k=1) results = retriever.batch( [ "How many distribution centers does Nike have in the US?", "When was Nike incorporated?", ], ) print(results[0][0]) print() print(results[1][0]) print() r = vector_store.as_retriever( search_type="similarity", search_kwargs={"k": 1}, ) results = r.batch( [ "How many distribution centers does Nike have in the US?", "When was Nike incorporated?", ], ) print(results[0][0]) print() print(results[1][0]) print()
å ã»ã©ã³ãã³ãã©ã€ã³åŒæ°ããäžããã¯ãšãªãŒãçŽæ¥æå®ããŠããŸãã
å®è¡çµæã¯ãã¡ãã
$ uv run hello_retriever.py /path/to/langchain-tutorial-semantic-search/.venv/lib/python3.12/site-packages/langchain/__init__.py:30: UserWarning: Importing debug from langchain root module is no longer supported. Please use langchain.globals.set_debug() / langchain.globals.get_debug() instead. warnings.warn( page_content='direct to consumer operations sell products through the following number of retail stores in the United States: U.S. RETAIL STORES NUMBER NIKE Brand factory stores 213 NIKE Brand in-line stores (including employee-only stores) 74 Converse stores (including factory stores) 82 TOTAL 369 In the United States, NIKE has eight significant distribution centers. Refer to Item 2. Properties for further information. 2023 FORM 10-K 2' metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2023-07-20T16:22:00-04:00', 'title': '0000320187-23-000039', 'author': 'EDGAR Online, a division of Donnelley Financial Solutions', 'subject': 'Form 10-K filed on 2023-07-20 for the period ending 2023-05-31', 'keywords': '0000320187-23-000039; ; 10-K', 'moddate': '2023-07-20T16:22:08-04:00', 'source': 'example_data/nke-10k-2023.pdf', 'total_pages': 107, 'page': 4, 'page_label': '5', 'start_index': 3125, '_id': 'b88bb8f3-10c1-4147-9047-4ecdfd335912', '_collection_name': 'tutorial_collection'} page_content='Table of Contents PART I ITEM 1. BUSINESS GENERAL NIKE, Inc. was incorporated in 1967 under the laws of the State of Oregon. As used in this Annual Report on Form 10-K (this "Annual Report"), the terms "we," "us," "our," "NIKE" and the "Company" refer to NIKE, Inc. and its predecessors, subsidiaries and affiliates, collectively, unless the context indicates otherwise. Our principal business activity is the design, development and worldwide marketing and selling of athletic footwear, apparel, equipment, accessories and services. NIKE is the largest seller of athletic footwear and apparel in the world. We sell our products through NIKE Direct operations, which are comprised of both NIKE-owned retail stores and sales through our digital platforms (also referred to as "NIKE Brand Digital"), to retail accounts and to a mix of independent distributors, licensees and sales' metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2023-07-20T16:22:00-04:00', 'title': '0000320187-23-000039', 'author': 'EDGAR Online, a division of Donnelley Financial Solutions', 'subject': 'Form 10-K filed on 2023-07-20 for the period ending 2023-05-31', 'keywords': '0000320187-23-000039; ; 10-K', 'moddate': '2023-07-20T16:22:08-04:00', 'source': 'example_data/nke-10k-2023.pdf', 'total_pages': 107, 'page': 3, 'page_label': '4', 'start_index': 0, '_id': 'b19e33b5-8e05-4282-9063-bc308dd64e0d', '_collection_name': 'tutorial_collection'} page_content='direct to consumer operations sell products through the following number of retail stores in the United States: U.S. RETAIL STORES NUMBER NIKE Brand factory stores 213 NIKE Brand in-line stores (including employee-only stores) 74 Converse stores (including factory stores) 82 TOTAL 369 In the United States, NIKE has eight significant distribution centers. Refer to Item 2. Properties for further information. 2023 FORM 10-K 2' metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2023-07-20T16:22:00-04:00', 'title': '0000320187-23-000039', 'author': 'EDGAR Online, a division of Donnelley Financial Solutions', 'subject': 'Form 10-K filed on 2023-07-20 for the period ending 2023-05-31', 'keywords': '0000320187-23-000039; ; 10-K', 'moddate': '2023-07-20T16:22:08-04:00', 'source': 'example_data/nke-10k-2023.pdf', 'total_pages': 107, 'page': 4, 'page_label': '5', 'start_index': 3125, '_id': 'b88bb8f3-10c1-4147-9047-4ecdfd335912', '_collection_name': 'tutorial_collection'} page_content='Table of Contents PART I ITEM 1. BUSINESS GENERAL NIKE, Inc. was incorporated in 1967 under the laws of the State of Oregon. As used in this Annual Report on Form 10-K (this "Annual Report"), the terms "we," "us," "our," "NIKE" and the "Company" refer to NIKE, Inc. and its predecessors, subsidiaries and affiliates, collectively, unless the context indicates otherwise. Our principal business activity is the design, development and worldwide marketing and selling of athletic footwear, apparel, equipment, accessories and services. NIKE is the largest seller of athletic footwear and apparel in the world. We sell our products through NIKE Direct operations, which are comprised of both NIKE-owned retail stores and sales through our digital platforms (also referred to as "NIKE Brand Digital"), to retail accounts and to a mix of independent distributors, licensees and sales' metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2023-07-20T16:22:00-04:00', 'title': '0000320187-23-000039', 'author': 'EDGAR Online, a division of Donnelley Financial Solutions', 'subject': 'Form 10-K filed on 2023-07-20 for the period ending 2023-05-31', 'keywords': '0000320187-23-000039; ; 10-K', 'moddate': '2023-07-20T16:22:08-04:00', 'source': 'example_data/nke-10k-2023.pdf', 'total_pages': 107, 'page': 3, 'page_label': '4', 'start_index': 0, '_id': 'b19e33b5-8e05-4282-9063-bc308dd64e0d', '_collection_name': 'tutorial_collection'}
ä»åã¯ãã®ãããã«ããŠãããŸãã
ãããã«
LangChainã®ãã¥ãŒããªã¢ã«ã®ã»ãã³ãã£ãã¯æ€çŽ¢ã詊ããŠã¿ãŸããã
ã ãã¶åºæ¬çãªèŠçŽ ãåºãŠããæããããŸããã
次ã¯RAGããã£ãŠã¿ãŸããããã