QdrantのExampleから、「基本的なRAG（Basic RAG）」を試す

これは、なにをしたくて書いたもの？

今までQdrantのチュートリアルを試してきたのですが、今度はExampleを見てみようと思います。

Examples - Qdrant

ただ、Exampleで見るのは「基本的なRAG（Basic RAG）」のみにしたいと思います。それから、Qdrant自体を集中的に扱うのもここで
区切りにしようかなと。

今回のExampleの狙い

このExampleでは、Qdrant＋Fastembed、OpenAIを使ってRAGを構成する例を示します。

ところでExampleは「Examples」ページにリストアップされている内容から、実際のページに移るとタイトルが大幅に変わるのですが、
そういうものなのでしょうか…？

「Examples」ページでは「Basic RAG」なのに、実際のページのタイトルはこちらになっています。

Retrieval Augmented Generation (RAG) with OpenAI and Qdrant

で、RAGを扱うページなのですが、なんとRAG自体の説明はこのページにはありません。

こちらのページを見ること、だそうです。

Patterns for Building LLM-based Systems & Products / Retrieval-Augmented Generation: To add knowledge

RAGは「Retrieval-Augmented Generation」の略で、モデルの外部で検索によりデータを取得してモデルへの入力に加えることで、
出力を改善できるものと紹介されています。

つまり、LLMに入力を与える前に関連する情報を検索して追加することで、より良い回答を得ようとするものだというのがざっくりした
捉え方でしょうか。

RAGの利点は以下です。

外部から取得したコンテキストを使うことでハルシネーション（幻覚）を軽減する
LLMをファインチューニングするよりもコストがかからない（検索インデックスの最新化の方が安価）
- 最新のデータにアクセスしやすくなる
偏ったドキュメントや有害なドキュメントを更新または削除する場合も、（LLMをファインチューニングしたりすることよりも）簡単に済む

論文もあるんですね。

[2005.11401] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

扱うデータは、すべてこのExampleのページ内に含まれるものを使うようなので、ここは素直に従ってみましょう。

あとOpenAIは使わずに、llama-cpp-pythonで代用することにします。

環境

今回の環境はこちら。Qdrantは172.17.0.2で動作しているものとします。

$ ./qdrant --version
qdrant 1.9.0

QdrantのWeb UIは0.1.25を使っています。

Qdrant Clientを使うPython環境。

$ python3 --version
Python 3.10.12


$ pip3 --version
pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)

llama-cpp-python＋Llama 3

最初にOpenAIの代わりのサーバーを立ち上げておきます。今回はllama-cpp-pythonを使います。

まずはインストール。

$ pip3 install llama-cpp-python[server]

バージョン。

$ pip3 list
Package           Version
----------------- -------
annotated-types   0.6.0
anyio             4.3.0
click             8.1.7
diskcache         5.6.3
exceptiongroup    1.2.1
fastapi           0.110.2
h11               0.14.0
idna              3.7
Jinja2            3.1.3
llama_cpp_python  0.2.65
MarkupSafe        2.1.5
numpy             1.26.4
pip               22.0.2
pydantic          2.7.1
pydantic_core     2.18.2
pydantic-settings 2.2.1
python-dotenv     1.0.1
PyYAML            6.0.1
setuptools        59.6.0
sniffio           1.3.1
sse-starlette     2.1.0
starlette         0.37.2
starlette-context 0.3.6
typing_extensions 4.11.0
uvicorn           0.29.0

モデルはLlama 3を使うことにします。

QuantFactory/Meta-Llama-3-8B-Instruct-GGUF · Hugging Face

量子化済みのモデルをダウンロードします。

$ curl -L https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf?download=true -o Meta-Llama-3-8B-Instruct.Q4_K_M.gguf

起動。

$ python3 -m llama_cpp.server --model Meta-Llama-3-8B-Instruct.Q4_K_M.gguf --chat_format llama-3

これでOpenAIの代替サーバーの準備は完了です。

QdrantのExampleから「基本的なRAG（Basic RAG）」を試す

それでは、QdrantのExampleの「基本的なRAG（Basic RAG）」を試してみましょう。

Retrieval Augmented Generation (RAG) with OpenAI and Qdrant

まずはライブラリーのインストール。

$ pip3 install qdrant-client fastembed openai

インストールしたライブラリーおよびバージョン。

$ pip3 list
Package            Version
------------------ --------
annotated-types    0.6.0
anyio              4.3.0
certifi            2024.2.2
charset-normalizer 3.3.2
coloredlogs        15.0.1
distro             1.9.0
exceptiongroup     1.2.1
fastembed          0.2.6
filelock           3.13.4
flatbuffers        24.3.25
fsspec             2024.3.1
grpcio             1.62.2
grpcio-tools       1.62.2
h11                0.14.0
h2                 4.1.0
hpack              4.0.0
httpcore           1.0.5
httpx              0.27.0
huggingface-hub    0.20.3
humanfriendly      10.0
hyperframe         6.0.1
idna               3.7
loguru             0.7.2
mpmath             1.3.0
numpy              1.26.4
onnx               1.16.0
onnxruntime        1.17.3
openai             1.23.6
packaging          24.0
pip                22.0.2
portalocker        2.8.2
protobuf           4.25.3
pydantic           2.7.1
pydantic_core      2.18.2
PyYAML             6.0.1
qdrant-client      1.9.0
requests           2.31.0
setuptools         59.6.0
sniffio            1.3.1
sympy              1.12
tokenizers         0.15.2
tqdm               4.66.2
typing_extensions  4.11.0
urllib3            2.2.1

FastEmbedというのは、テキスト埋め込みを行えるライブラリーです。Qdrantと一緒に使った時は、Qdrantの操作時にテキスト埋め込みを
自動的に行ってくれるようになります。

ドキュメントに習って、作成したプログラムはこちら。

rag.py

from openai  import OpenAI
from qdrant_client import QdrantClient

qclient = QdrantClient("http://172.17.0.2:6333", prefer_grpc=True)

qclient.delete_collection(collection_name="knowledge-base")
print(f"get_collection = {qclient.get_collections()}")

qclient.add(
    collection_name="knowledge-base",
    documents=[
        "Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!",
        "Docker helps developers build, share, and run applications anywhere — without tedious environment configuration or management.",
        "PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing.",
        "MySQL is an open-source relational database management system (RDBMS). A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database.",
        "NGINX is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. NGINX is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption.",
        "FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.",
        "SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.",
        "The cron command-line utility is a job scheduler on Unix-like operating systems. Users who set up and maintain software environments use cron to schedule jobs (commands or shell scripts), also known as cron jobs, to run periodically at fixed times, dates, or intervals.",
    ]
)

print(f"get_collection = {qclient.get_collections()}")

openai_client = OpenAI(base_url = "http://localhost:8000/v1", api_key = "dummy-api-key")

prompt = """
What tools should I need to use to build a web service using vector embeddings for search?
"""

completion = openai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": prompt},
    ]
)

print()

print(f"message = {completion.choices[0].message.content}")

query_results = qclient.query(
    collection_name="knowledge-base",
    query_text=prompt,
    limit=3,
)

print()

print(f"query results = {query_results}")

context = "\n".join(r.document for r in query_results)

print()

print(f"context = {context}")

metaprompt = f"""
You are a software architect.
Answer the following question using the provided context.
If you can't find the answer, do not pretend you know it, but answer "I don't know".

Question: {prompt.strip()}

Context:
{context.strip()}

Answer:
"""

print()

print(f"meta prompt = {metaprompt}")

completion = openai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": metaprompt},
    ]
)

print()

print(f"message = {completion.choices[0].message.content}")

実行。

$ python3 rag.py

動かした結果を載せながら、ソースコードを順に説明しておきます。

まずはQdrantを操作するクライアントを作成。

qclient = QdrantClient("http://172.17.0.2:6333", prefer_grpc=True)

qclient.delete_collection(collection_name="knowledge-base")
print(f"get_collection = {qclient.get_collections()}")

使うコレクションの名前は「knowledge-base」としていて、最初に削除しています。

なので、このprintの結果はコレクションがないことになります。

get_collection = collections=[]

データを登録します。コレクションは自動的に作成され、テキスト埋め込みも自動的に行われます。

qclient.add(
    collection_name="knowledge-base",
    documents=[
        "Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!",
        "Docker helps developers build, share, and run applications anywhere — without tedious environment configuration or management.",
        "PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing.",
        "MySQL is an open-source relational database management system (RDBMS). A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database.",
        "NGINX is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. NGINX is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption.",
        "FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.",
        "SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.",
        "The cron command-line utility is a job scheduler on Unix-like operating systems. Users who set up and maintain software environments use cron to schedule jobs (commands or shell scripts), also known as cron jobs, to run periodically at fixed times, dates, or intervals.",
    ]
)

print(f"get_collection = {qclient.get_collections()}")

QdrantClient#get_collectionsの結果として、コレクションが現れます。

get_collection = collections=[CollectionDescription(name='knowledge-base')]

llama-cpp-pythonで起動したサーバーを使用するOpenAIのライブラリーの設定。

openai_client = OpenAI(base_url = "http://localhost:8000/v1", api_key = "dummy-api-key")

「検索用のベクトル埋め込みを使ったWebサービスを構築するには、どのようなツールが必要ですか？」と聞いてみます。

prompt = """
What tools should I need to use to build a web service using vector embeddings for search?
"""

completion = openai_client.chat.completions.create(
    model="gpt-3.5-turbo", 
    messages=[
        {"role": "user", "content": prompt},
    ]
)
 
print()
 
print(f"message = {completion.choices[0].message.content}")

プログラミング言語、ベクトル埋め込み向けのライブラリーやフレームワーク、検索エンジンなどたくさんの情報が返ってきます。

message = To build a web service that uses vector embeddings for search, you'll likely need the following tools:

**Programming languages and frameworks:**

1. **Python**: A popular choice for natural language processing (NLP) tasks, including building vector embedding-based search services.
2. **Java**: Alternatively, you can use Java as your programming language of choice, especially if you're already familiar with it or prefer its ecosystem.

**Vector embedding libraries and frameworks:**

1. **Gensim**: A Python library for topic modeling and document similarity analysis using word embeddings (e.g., Word2Vec, Doc2Vec).
2. **TensorFlow** or **PyTorch**: Deep learning frameworks that can be used to train your own custom vector embedding models.
3. **Hugging Face Transformers**: A library providing pre-trained language models and a simple interface for building text-to-vector embeddings.

**Search and indexing tools:**

1. **Elasticsearch**: A popular search engine that supports vector-based queries and indexing.
2. **Apache Solr**: Another powerful search engine that can be used for vector-based search and indexing.
3. **Inverted indexes**: You can also build your own inverted indexes using libraries like Python's `numpy` or `scipy`.

**Other dependencies:**

1. **Numpy**: A library for efficient numerical computation, essential for many NLP tasks.
2. **Pandas**: A library for data manipulation and analysis, useful for preprocessing and handling large datasets.
3. **Scikit-learn**: A machine learning library that provides algorithms for classification, regression, clustering, and more.

**Optional tools:**

1. **Distributed computing frameworks**: If you plan to scale your service to handle large volumes of data or high query loads, consider using distributed computing frameworks like Apache Spark, Hadoop, or Dask.
2. **Cloud services**: Consider deploying your service on cloud platforms like AWS, Google Cloud, or Microsoft Azure, which provide scalable infrastructure and managed services for search and indexing.

**Development environments:**

1. **Jupyter Notebook**: A web-based interactive environment for exploring data, prototyping, and developing your vector embedding-based search service.
2. **Integrated Development Environments (IDEs)**: Choose an IDE like PyCharm, Visual Studio Code, or IntelliJ IDEA to write, debug, and optimize your code.

Keep in mind that the specific tools you choose will depend on your project requirements, performance needs, and personal preferences.

もう少し絞った回答にしたいのが今回のお題のようです。

ここで、同じ質問をQdrantで検索してみます。

query_results = qclient.query(
    collection_name="knowledge-base",
    query_text=prompt,
    limit=3,
)

print()

print(f"query results = {query_results}")

結果、Qdrant、FastAPI、PyTorchについて情報が得られました。ちなみに、元のドキュメントだとPyTorchではなくSentenceTransformersが
返ってきています…。

query results = [QueryResponse(id='55bc59a8-5ab2-47b8-bb9a-449f84373e33', embedding=None, sparse_embedding=None, metadata={'document': 'Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!'}, document='Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!', score=0.8290700316429138), QueryResponse(id='d5554610-fd63-4304-a705-378701215c11', embedding=None, sparse_embedding=None, metadata={'document': 'FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.'}, document='FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.', score=0.8190128803253174), QueryResponse(id='385c9690-821d-46a5-8ffb-54e0286098aa', embedding=None, sparse_embedding=None, metadata={'document': 'PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing.'}, document='PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing.', score=0.8056522607803345)]

これをつなげてコンテキストにします。

query_results = qclient.query(
    collection_name="knowledge-base",
    query_text=prompt,
    limit=3,
)

print()

print(f"query results = {query_results}")

結果。

context = Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!
FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing.

このコンテキストと、先程のプロンプトを組み合わせます。

metaprompt = f"""
You are a software architect. 
Answer the following question using the provided context. 
If you can't find the answer, do not pretend you know it, but answer "I don't know".

Question: {prompt.strip()}

Context: 
{context.strip()}

Answer:
"""

print()

print(f"meta prompt = {metaprompt}")

こんなプロンプトになります。

meta prompt =
You are a software architect.
Answer the following question using the provided context.
If you can't find the answer, do not pretend you know it, but answer "I don't know".

Question: What tools should I need to use to build a web service using vector embeddings for search?

Context:
Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!
FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing.

Answer:

先程の質問に加えて、質問相手が「ソフトウェアアーキテクトである」ということ、そしてQdrantからの検索結果を追加しています。

そしてこのプロンプトで質問してみます。

completion = openai_client.chat.completions.create(
    model="gpt-3.5-turbo", 
    messages=[
        {"role": "user", "content": metaprompt},
    ]
)
 
print()
 
print(f"message = {completion.choices[0].message.content}")

結果。

message = Based on the context provided, I would recommend using the following tools to build a web service using vector embeddings for search:

1. Qdrant: As mentioned in the context, Qdrant is a vector database & vector similarity search engine that can be used as an API service.
2. FastAPI: With its high-performance capabilities and ease of use, I recommend using FastAPI as the web framework to build the API.
3. PyTorch: Since you want to use vector embeddings for search, PyTorch can be used for training neural network encoders that generate these embeddings.

Additionally, you may also need:

* A Python IDE or code editor (e.g., PyCharm, VSCode) to write and debug your code.
* A library like scikit-learn or TensorFlow for preprocessing and processing the data.
* A database to store the vector embeddings. Qdrant provides its own database, but you can also use other databases like MySQL or MongoDB.

Please note that this is just a suggested approach and may vary depending on the specific requirements of your project.

だいぶストレートな回答になりました。

あとは追加情報があるくらいですね。

これがRAGの基本的な形なんだな、というのを自分で書いてみて試してみた感じですね。

ちなみに、このプログラムが完了するまで、自分の環境では10分近くかかります…。

おわりに

QdrantのExampleから、「基本的なRAG（Basic RAG）」を試してみました。

RAGがどのようなものかはぼんやり知っていましたが、自分で動かしてみるのは初めてだったのでいい機会になりました。
もうちょっとちゃんとやろうとすると、LangChainを使ったりといろいろするのでしょうけれど、まずは基本的なところからかなと。

また、今回でQdrantのチュートリアルやExampleをなぞるのは終わりにします。いい勉強になりました。

CLOVER🍀

That was when it all began.