Qdrantのチュートリアルから、「Fastembedを使ってシンプルなNeural Searchサービスを作成する（Create a Neural Search Service with Fastembed）」を試す

これは、なにをしたくて書いたもの？

先日、Qdrantのチュートリアルから「シンプルなNeural Searchサービスを作成する（Create a Simple Neural Search Service）」を
試しました。

Qdrantのチュートリアルから、「シンプルなNeural Searchサービスを作成する（Create a Simple Neural Search Service）」を試す - CLOVER🍀

今度は、「Fastembedを使ってシンプルなNeural Searchサービスを作成する（Create a Neural Search Service with Fastembed）」を
試してみたいと思います。

Neural Search with Fastembed - Qdrant

やること自体は同じでスタートアップ企業のNeural Searchなのですが、両者の違いはテキスト埋め込みの方法です。

こちらではSentence Transformersでテキスト埋め込みを行いましたが、

Neural Search Service - Qdrant

こちらではFastEmbedというものを使います。

Neural Search with Fastembed - Qdrant

なので、今回はどちらかというとFastEmbedに寄った話になります。

最初はSentence Transformersの方だけでいいかなと思ったのですが、後に出てくるコンテンツを見ていると1度触っておいた方がいいなと
思いましたので。

FastEmbed

FastEmbedのWebサイトはこちら。

FastEmbed

GitHub リポジトリーはこちら。

GitHub - qdrant/fastembed: Fast, Accurate, Lightweight Python library to make State of the Art Embedding

FastEmbedは、埋め込み生成を行える軽量なPythonライブラリーです。

FastEmbed is a lightweight, fast, Python library built for embedding generation.

GitHub上では、Qdrantのorganizationに属しています。

デフォルトで入力テキストに「query」および「passage」という接頭辞を付けた埋め込みをサポートしています。

The default embedding supports "query" and "passage" prefixes for the input text.

サポートしているモデルはこちらです。

Supported Models - FastEmbed

all-MiniLM-L6-v2やmultilingual-e5-largeといった、よく見るモデルが並んでいます。

デフォルトのモデルはFlagEmbeddingというものらしく、MTEBリーダーボードの上位にいるらしいのですが…。

The default model is Flag Embedding, which is top of the MTEB leaderboard.

パッと見て、名前でわかりません…。

MTEB Leaderboard - a Hugging Face Space by mteb

FlagEmbeddingのGitHub リポジトリーはこちら。

GitHub - FlagOpen/FlagEmbedding: Dense Retrieval and Retrieval-augmented LLMs

FlagEmbeddingにも扱っているモデルはいくつかあるようです。

FlagEmbedding / Model List

見ていると、BAAI/bge-*の名前になっているモデルがFlagEmbeddingが提供するモデルのようですね。

その目線でMTEBを見直すと、bge-large-en-v1.5やbge-base-en-v1.5あたりが見つかります。

FlagEmbedding自体には多言語モデルはありそうですが、現在のFastEmbedで扱えるモデルは英語と中国語に限られそうです。

話をFastEmbedに戻します。

FastEmbedは単体で使うこともできれば、Qdrant Clientに組み込んで使うこともできるようです。

今回見ていくチュートリアルは、Qdrant Clientに組み込んで使うパターンになります。

環境

今回の環境はこちら。Qdrantは172.17.0.2で動作しているものとします。

$ ./qdrant --version
qdrant 1.7.4

QdrantのWeb UIは0.1.21を使っています。

Qdrant Clientを使うPython環境。

$ python3 --version
Python 3.10.12


$ pip3 --version
pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)

Qdrantチュートリアル「Fastembedを使ってシンプルなNeural Searchサービスを作成する（Create a Neural Search Service with Fastembed）」を試す

それでは、Qdrantのチュートリアル「Fastembedを使ってシンプルなNeural Searchサービスを作成する（Create a Neural Search Service with Fastembed）」を
ドキュメントに沿って進めていきます。

Neural Search with Fastembed - Qdrant

今回もスタートアップ企業のデータ（description）をベクトル化しつつ、QdrantからNeural SearchができるREST APIを作成するものに
なります。

スタートアップ企業のデータはこちらですね。

Startups List

まずはデータセットをダウンロードします。

$ curl -LO https://storage.googleapis.com/generall-shared-data/startups_demo.json

中身は、こんな感じのJSON Linesです。

$ head -n 5 startups_demo.json
{"name":"SaferCodes","images":"https:\/\/safer.codes\/img\/brand\/logo-icon.png","alt":"SaferCodes Logo QR codes generator system forms for COVID-19","description":"QR codes systems for COVID-19.\nSimple tools for bars, restaurants, offices, and other small proximity businesses.","link":"https:\/\/safer.codes","city":"Chicago"}
{"name":"Human Practice","images":"https:\/\/d1qb2nb5cznatu.cloudfront.net\/startups\/i\/373036-94d1e190f12f2c919c3566ecaecbda68-thumb_jpg.jpg?buster=1396498835","alt":"Human Practice -  health care information technology","description":"Point-of-care word of mouth\nPreferral is a mobile platform that channels physicians\u2019 interest in networking with their peers to build referrals within a hospital system.\nHospitals are in a race to employ physicians, even though they lose billions each year ($40B in 2014) on employment. Why ...","link":"http:\/\/humanpractice.com","city":"Chicago"}
{"name":"StyleSeek","images":"https:\/\/d1qb2nb5cznatu.cloudfront.net\/startups\/i\/3747-bb0338d641617b54f5234a1d3bfc6fd0-thumb_jpg.jpg?buster=1329158692","alt":"StyleSeek -  e-commerce fashion mass customization online shopping","description":"Personalized e-commerce for lifestyle products\nStyleSeek is a personalized e-commerce site for lifestyle products.\nIt works across the style spectrum by enabling users (both men and women) to create and refine their unique StyleDNA.\nStyleSeek also promotes new products via its email newsletter, 100% personalized ...","link":"http:\/\/styleseek.com","city":"Chicago"}
{"name":"Scout","images":"https:\/\/d1qb2nb5cznatu.cloudfront.net\/startups\/i\/190790-dbe27fe8cda0614d644431f853b64e8f-thumb_jpg.jpg?buster=1389652078","alt":"Scout -  security consumer electronics internet of things","description":"Hassle-free Home Security\nScout is a self-installed, wireless home security system. We've created a more open, affordable and modern system than what is available on the market today. With month-to-month contracts and portable devices, Scout is a renter-friendly solution for the other ...","link":"http:\/\/www.scoutalarm.com","city":"Chicago"}
{"name":"Invitation codes","images":"https:\/\/invitation.codes\/img\/inv-brand-fb3.png","alt":"Invitation App - Share referral codes community ","description":"The referral community\nInvitation App is a social network where people post their referral codes and collect rewards on autopilot.","link":"https:\/\/invitation.codes","city":"Chicago"}

データ量。

$ ll -h startups_demo.json
-rw-rw-r-- 1 xxxxx xxxxx 22M  2月 11 21:13 startups_demo.json


$ wc -l startups_demo.json
40473 startups_demo.json

必要なライブラリーをインストールします。

$ pip3 install qdrant-client[fastembed] fastapi uvicorn[standard]

qdrant-client[fastembed]は、FastEmbedが統合されたQdrant Clientです。

インストールしたライブラリーの一覧。

$ pip3 list
Package            Version
------------------ --------
annotated-types    0.6.0
anyio              4.2.0
certifi            2024.2.2
charset-normalizer 3.3.2
click              8.1.7
coloredlogs        15.0.1
exceptiongroup     1.2.0
fastapi            0.109.2
fastembed          0.1.1
flatbuffers        23.5.26
grpcio             1.60.1
grpcio-tools       1.60.1
h11                0.14.0
h2                 4.1.0
hpack              4.0.0
httpcore           1.0.2
httptools          0.6.1
httpx              0.26.0
humanfriendly      10.0
hyperframe         6.0.1
idna               3.6
mpmath             1.3.0
numpy              1.26.4
onnx               1.15.0
onnxruntime        1.17.0
packaging          23.2
pip                22.0.2
portalocker        2.8.2
protobuf           4.25.2
pydantic           2.6.1
pydantic_core      2.16.2
python-dotenv      1.0.1
PyYAML             6.0.1
qdrant-client      1.7.3
requests           2.31.0
setuptools         59.6.0
sniffio            1.3.0
starlette          0.36.3
sympy              1.12
tokenizers         0.13.3
tqdm               4.66.2
typing_extensions  4.9.0
urllib3            2.2.0
uvicorn            0.27.1
uvloop             0.19.0
watchfiles         0.21.0
websockets         12.0

ライブラリーのサイズ感ですが、Sentence Transformersをインストールした時（5GB弱）よりもだいぶ少ない（300MB強）感じですね。

では、ソースコードを作成していきます。

最初に作るのは、Qdrantにデータをアップロードするプログラムです。

upload_data.py

import json
from qdrant_client import QdrantClient

if __name__ == "__main__":
    qdrant_client = QdrantClient("http://172.17.0.2:6333")

    qdrant_client.set_model("sentence-transformers/all-MiniLM-L6-v2")

    print(f"vector params = {qdrant_client.get_fastembed_vector_params()}")

    qdrant_client.recreate_collection(
        collection_name="startups",
        vectors_config=qdrant_client.get_fastembed_vector_params()
    )

    metadata = []
    documents = []

    with open("./startups_demo.json") as fd:
        for line in fd:
            data = json.loads(line)
            documents.append(data.pop("description"))
            metadata.append(data)

    qdrant_client.add(
        collection_name="startups",
        documents=documents,
        metadata=metadata,
        parallel=0
    )

QdrantClientのインスタンスを作成して、FastEmbedで使うモデルを設定します。

    qdrant_client = QdrantClient("http://172.17.0.2:6333")

    qdrant_client.set_model("sentence-transformers/all-MiniLM-L6-v2")

モデルは、実行時にFastEmbedが自動でダウンロードします。

次にコレクションを作成します。

    print(f"vector params = {qdrant_client.get_fastembed_vector_params()}")

    qdrant_client.recreate_collection(
        collection_name="startups",
        vectors_config=qdrant_client.get_fastembed_vector_params()
    )

ベクトルの次元数と距離メトリクスは、FastEmbedを使う場合はモデルに設定された値を使うようです。

これがなんなのかわからないので、QdrantClient#get_fastembed_vector_paramsの値を出力するようにしています。

そして、データを読み込んでQdrantにアップロードします。

    metadata = []
    documents = []

    with open("./startups_demo.json") as fd:
        for line in fd:
            data = json.loads(line)
            documents.append(data.pop("description"))
            metadata.append(data)

    qdrant_client.add(
        collection_name="startups",
        documents=documents,
        metadata=metadata,
        parallel=0
    )

ドキュメントとメタデータに分かれていますが、ドキュメントはdescriptionのみ、メタデータはdescription以外ですね。
popしているので、descriptionはメタデータには入りません。

明示的なテキストのベクトル化がないこと、そしてpayloadを使っていないところがこちらとの差ですね。

Neural Search Service - Qdrant

あとはQdrantClient#addに渡します。ここでテキストのベクトル化も同時に行われるようです。また、この時にparallel=0と
指定することで全CPUを使って処理をしてくれます。

なのですが、このスクリプトを各プロセスが実行しようとするのでふつうに書くと失敗してしまいます。特に、コレクションを
同時に再作成するところで引っかかりますね。

なので、以下でコントロールしています。

if __name__ == "__main__":

では、実行。

$ python3 upload_data.py

ベクトルの設定はこちらが出力されました。次元数は384、距離メトリクスはコサイン類似度に設定されているようです。

vector params = {'fast-all-minilm-l6-v2': VectorParams(size=384, distance=<Distance.COSINE: 'Cosine'>, hnsw_config=None, quantization_config=None, on_disk=None)}

データのアップロードは、15分ほどかけて終了しました…。

FastEmbedがダウンロードするモデルは、デフォルトでカレントディレクトリに置かれるようです。local_cacheというディレクトリが
できていました。

$ tree local_cache
local_cache
└── fast-all-MiniLM-L6-v2
    ├── config.json
    ├── model.onnx
    ├── ort_config.json
    ├── special_tokens_map.json
    ├── tokenizer.json
    ├── tokenizer_config.json
    └── vocab.txt

1 directory, 7 files

それはそうと、結果がどうなったかよくわからないですね。

Web UIで見ると、データが入っていることが確認できます。

QdrantのAPIでも確認してみましょう。

$ curl -s 172.17.0.2:6333/collections/startups/points/00000a0b-10f2-453d-b474-8c65ea5ddcb1 | jq
{
  "result": {
    "id": "00000a0b-10f2-453d-b474-8c65ea5ddcb1",
    "payload": {
      "alt": "Startr -  mobile startups entrepreneur match making",
      "city": "Montréal",
      "document": "Tinder for Startups\nTodays self-starters rely on ineffective means of networking. Tools like LinkedIn express a vague outline of a candidates professional life without explicitly expressing their business goals. Craigslist posts are often lost in the clutter and remain totally unverified. ...",
      "images": "https://d1qb2nb5cznatu.cloudfront.net/startups/i/680923-c5cd996f535b7108bb24ef919ae8ea67-thumb_jpg.jpg?buster=1430665662",
      "link": "http://www.startr.ca",
      "name": "Startr"
    },
    "vector": {
      "fast-all-minilm-l6-v2": [
        -0.04055357,
        -0.030779947,
        0.039366815,
        -0.03326919,
        -0.017016592,
        〜省略〜

        0.0004849722,
        0.0031453324,
        0.0038477478,
        -0.018718224,
        0.014635129
      ]
    }
  },
  "status": "ok",
  "time": 9.215e-05
}

大丈夫そうですね。

最後は検索サービスを作成します。

検索を行う部分。

neural_searcher.py

from qdrant_client import QdrantClient

class NeuralSearcher:
    def __init__(self, collection_name):
        self.collection_name = collection_name
        self.qdrant_client = QdrantClient("http://172.17.0.2:6333")
        self.qdrant_client.set_model("sentence-transformers/all-MiniLM-L6-v2")

    def search(self, text: str):
        search_result = self.qdrant_client.query(
            collection_name=self.collection_name,
            query_text=text,
            query_filter=None,
            limit=5
        )

        metadata = [hit.metadata for hit in search_result]

        return metadata

パッと見はSentence Transformersを使って作成した時と同じに見えますが、Qdrant Clientにモデルを設定していることと

        self.qdrant_client = QdrantClient("http://172.17.0.2:6333")
        self.qdrant_client.set_model("sentence-transformers/all-MiniLM-L6-v2")

QdrantClient#searchではなくQdrantClient#queryを使っているところ、そして検索時に明示的にテキストのベクトル化がない部分が
異なるところですね。

    def search(self, text: str):
        search_result = self.qdrant_client.query(
            collection_name=self.collection_name,
            query_text=text,
            query_filter=None,
            limit=5
        )

また、結果はmetadataから取得します。やっぱりpayloadではありません。

        metadata = [hit.metadata for hit in search_result]

このあたりがこちらとの差異でしょうか。

Neural Search Service - Qdrant

あとはFastAPIを使ってREST APIを作成します。

service.py

from fastapi import FastAPI
from neural_searcher import NeuralSearcher

app = FastAPI()

searcher = NeuralSearcher(collection_name="startups")

@app.get("/api/search")
def search_startup(q: str):
    return {"result": searcher.search(text=q)}

Uvicornで起動。

$ uvicorn service:app

確認してみましょう。

「programmer」で検索。

$ curl -s localhost:8000/api/search?q=programmer | jq
{
  "result": [
    {
      "alt": "Helloco.de -  education startups programming",
      "city": "New York",
      "document": "The home for code\nTo become one of the most reliable and know sources of information of things related to programming. Users will be able to learn about newly released program languages, compare them to old languages, and also access historical information about each language. By ...",
      "images": "https://d1qb2nb5cznatu.cloudfront.net/startups/i/455527-849ee13b975a927178215c6949c943d2-thumb_jpg.jpg?buster=1407394758",
      "link": "http://Helloco.de",
      "name": "Helloco.de"
    },
    {
      "alt": "Worketer -  general public worldwide",
      "city": "Paris",
      "document": "Find thousands of developers",
      "images": "https://angel.co/images/shared/nopic_startup.png",
      "link": "http://worketer.com",
      "name": "Worketer"
    },
    {
      "alt": "HackerEarth -  marketplaces recruiting college recruiting programming",
      "city": "Bangalore",
      "document": "Helping companies find the smartest programmers\nHackerEarth is building an engaged community of developers with a deep skill graph of each developer on the network. Developers can come and participate in online programming challenges (https://www.hackerearth.com/challenges) and Hackathons and solve problems ...",
      "images": "https://d1qb2nb5cznatu.cloudfront.net/startups/i/134498-a29b2f7d1ddd293823db1253417a42ce-thumb_jpg.jpg?buster=1354195180",
      "link": "https://www.hackerearth.com",
      "name": "HackerEarth"
    },
    {
      "alt": "Headhuntable -  social recruiting recruiting web development",
      "city": "New York",
      "document": "We find developers that are head-huntable.\nHeadhuntable lets developers showcase their work and receive recommendations from peers and colleagues so that prospective employers can see that they are \"headhuntable\". Developers can also take challenges (either created by another developer or a company) to ...",
      "images": "https://d1qb2nb5cznatu.cloudfront.net/startups/i/88269-6ce028f0a4255c72c3161ca04aa722f8-thumb_jpg.jpg?buster=1336272604",
      "link": "http://www.headhuntable.com/tour",
      "name": "Headhuntable"
    },
    {
      "alt": "Hiresync -  SaaS recruiting",
      "city": "Toronto",
      "document": "The best way to interview developers.",
      "images": "https://angel.co/images/shared/nopic_startup.png",
      "link": "http://www.hiresync.io",
      "name": "Hiresync"
    }
  ]
}

もうひとつのチュートリアルとは、また違う結果になりました。

「sns」で検索。

$ curl -s localhost:8000/api/search?q=sns | jq
{
  "result": [
    {
      "alt": "SNMsystems -  it and cybersecurity",
      "city": "Washington DC",
      "document": "SNMsystems is a small IT company. Hope to get Fed IT contracts",
      "images": "https://angel.co/images/shared/nopic_startup.png",
      "link": "http://www.snmsystemsllc.com",
      "name": "SNMsystems"
    },
    {
      "alt": "Altr think -  digital media music photography sns",
      "city": "Tokyo",
      "document": "With Contents Without Any Words\n\"&\" is SNS App without any words.\nWe are forcusing on Nonverbal communication via Pic and Sound on MAP.\nIn \"&\", you can connect your content (Pic or Sound) to stranger's one directly by only 3 taps.\nAnd you can get the whole view of these connections on MAP anytime, ...",
      "images": "https://d1qb2nb5cznatu.cloudfront.net/startups/i/208649-087b94437db023d390c9ca8d5e7a840c-thumb_jpg.jpg?buster=1368621734",
      "link": "http://www.altrthink.com",
      "name": "Altr think"
    },
    {
      "alt": "myAround -  mobile",
      "city": "Los Angeles",
      "document": "LBS+SNS\n(1) PEOPLE:\nWe believe 90% people want to meet, chat, and know new friends around.\n(2) CIRCLES\nWe believe 90% people like to connect/join a circle activity around in a real life.\n(3) HOUSING:\nWe believe 90% people search housing information around.\n(4) JOBS:\nWe ...",
      "images": "https://d1qb2nb5cznatu.cloudfront.net/startups/i/117931-ac28ce1092a977c9396225086a9ae0dc-thumb_jpg.jpg?buster=1346119218",
      "link": "http://www.myaround.com",
      "name": "myAround"
    },
    {
      "alt": "Meetbank Inc. -  mobile private social networking",
      "city": "Tokyo",
      "document": "SNS remember people you meet\nhttp://meetbank.net\n\"Meetbank\" is a social network to remember people you meet in life.\nYou will meet thousands of people in your life.\nDo you remember all people you met?\n-Childhood friends who made secret base together.\n-When a child, friends who went to ...",
      "images": "https://d1qb2nb5cznatu.cloudfront.net/startups/i/452492-d8403cc5967799b03697a59e627029aa-thumb_jpg.jpg?buster=1407145982",
      "link": "http://meetbank.net",
      "name": "Meetbank Inc."
    },
    {
      "alt": "The Exchange S&P -  mobile business development personal branding user experience design",
      "city": "San Francisco",
      "document": "The NASA of bespoke internet solutions\nThe Exchange S&P (Otherwise known as the special projects group [and applied technologies], a subdivision of the San Francisco-based technology group \"The Exchange Visionary Laboratories\".\nWe partner with diverse businesses that share a desire to win. Together, ...",
      "images": "https://d1qb2nb5cznatu.cloudfront.net/startups/i/468493-d507c40db1dacd2c72c27ae04aef8626-thumb_jpg.jpg?buster=1408602315",
      "link": "http://www.theexchangesp.com",
      "name": "The Exchange S&P"
    }
  ]
}

こちらもまた違う結果になりましたね。

そういえば、デモアプリケーションが公開されていましたが、こちらはSentence TransformersとFastEmbedのどちらを使って
作られているんでしょうね？

Semantic Search Demo - Qdrant

両方のチュートリアルページからリンクされているので、どちらで作られているのかわかりません笑