ããã¯ããªã«ãããããŠæžãããã®ïŒ
以åãããŒã«ã«ã§åãããOpenAI APIäºæã®ãµãŒããŒãšããŠllama-cpp-pythonã䜿ã£ãŠã¿ãŸããã
llama-cpp-pythonで、OpenAI API互換のサーバーを試す - CLOVER🍀
ä»ã«ãåæ§ã®ããšãã§ãããã®ãšããŠãLocalAIãšãããã®ãããããšãç¥ã£ãã®ã§ãã¡ããè©ŠããŠã¿ããããªãšã
LocalAI
LocalAIã®Webãµã€ãã¯ãã¡ãã
LocalAI :: LocalAI documentation
GitHubãªããžããªãŒã¯ãã¡ãã§ãã
æåã«æžãããŠããŸãããããŒã«ã«ã§åããããšãã§ããOpenAIã®ä»£æ¿ãšããŠäœãããŠããŠãOpenAI APIãšäºææ§ã®ããREST APIã
æäŸãããã®ã«ãªã£ãŠããŸãã
LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API thatâs compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU. It is maintained by mudler.
LocalAI :: LocalAI documentation
以äžãç¹åŸŽã®ããã§ãã
- ããŒã«ã«ã§åäœããOpenAI APIã®ããããã€ã³ä»£æ¿
- GPUäžèŠ
- GPUã䜿ããå Žåã¯ãGPUçšã®ã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ãå©çšå¯èœ
- ã¢ãã«ã¯ååããŒãããããã以éã¯é«éåã®ããã¢ãã«ãã¡ã¢ãªã«ããŒããããŸãŸã«ãªã
- shell-outïŒãµãããã»ã¹ã®å®è¡ïŒã¯ããªããããã€ã³ãã£ã³ã°ãå©çšããæšè«ã®é«éåãããã©ãŒãã³ã¹ã®åäžãè¡ã
æ©èœãšããŠã¯ä»¥äžããããŸãã
- GPTã䜿çšããããã¹ãçæ
- ããã¹ããããªãŒãã£ãªãžã®å€æ
- é³å£°ããããã¹ããžïŒé³å£°æåèµ·ããïŒ
- Stable Diffusionã«ããç»åçæ
- OpenAI functions
- ãã¯ãã«ããŒã¿ããŒã¹åãã®åã蟌ã¿çæ
- å¶çŽææ³
- Hugging FaceããçŽæ¥ã¢ãã«ãããŠã³ããŒããã
- Vision API
ã©ãããä»çµã¿ã§åããŠããã®ããšãããšããã¡ãã«èª¬æããããŸãã
ããããããšã®ããã§ãã
- LocalAIèªäœã¯ãOpenAI SDKïŒã¯ã©ã€ã¢ã³ãïŒãšå¯Ÿè©±ã§ããããã«ããã®Goã§å®è£ ãããã©ãããŒ
- gRPCã䜿ããæ§ã ãªããã¯ãšã³ããšçµ±åãã
ã€ãŸãããã§ã«å®è£
æžã¿ã®ããã¯ãšã³ãããã³ã¢ãã«ã«å¯ŸããŠãOpenAI APIãšããŠæ¯ãèããããªã¬ã€ã€ãŒãšããŠå®è£
ããããã®ã
LocalAIãšããããšã«ãªããŸãã
å©çšã§ããããã¯ãšã³ãããã³ã¢ãã«ã¯ã以äžã«è¡šããããŸãã
Model compatibility :: LocalAI documentation
ãã®äžã«ã¯llama.cppãå«ãŸããŠããŸãã
🦙 llama.cpp :: LocalAI documentation
ããããåºæ¬ã¯llama.cppã䜿ãã®ããªãšãæããŸãã
ãªã®ã§ãããŒããŠã§ã¢èŠä»¶ã¯llama.cppãèŠãããã«æžãããŠãããããŸãã
Depending on the model you are attempting to run might need more RAM or CPU resources. Check out also here for gguf based backends. rwkv is less expensive on resources.
Model Compatibility / Hardware requirements
llama.cpp / Usage / Memory/Disk Requirements
ããèŠããšllama-cpp-pythonãšè¿ãå°è±¡ãåããŸãããããã¯ãšã³ããã¢ãã«ã«llama.cpp以å€ã䜿çšå¯èœãæåããOpenAI APIã®ä»£æ¿ã
ç®æããŠããïŒllama-cpp-pythonã¯äž»äœã§ã¯ãªãããã ã£ãïŒãšããã®ãéããšããã§ããããã
ãµã³ãã«ãããããããããã§ãã
LocalAI/examples at v2.3.1 · mudler/LocalAI · GitHub
åçš®ããã¯ãšã³ãã®ããŒãžã§ã³ã確èªããã«ã¯ãMakefile
ãèŠãã°ããããã§ãã
https://github.com/mudler/LocalAI/blob/v2.3.1/Makefile#L6-L37
ä»åã¯ãã¡ãã®ãšã³ããªãŒãæžããæãšåãã§ã
llama-cpp-pythonで、OpenAI API互換のサーバーを試す - CLOVER🍀
ãã¡ãã®APIãåããããšãè©ŠããŠã¿ãããšæããŸãã
OpenAI / API reference / ENDPOINTS / Chat / Create chat completion
ç°å¢
ä»åã®ç°å¢ã¯ããã¡ããUbuntu Linux 22.04 LTSã§ãã
$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy $ uname -srvmpio Linux 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
LocalAIãã€ã³ã¹ããŒã«ãã
LocalAIã®ã€ã³ã¹ããŒã«æ¹æ³ã¯ããã¡ãã«æžãããŠããŸãã
Getting started :: LocalAI documentation
ã³ã³ããã€ã¡ãŒãžã䜿ãæ¹æ³ããã€ããªãããŠã³ããŒãããŠäœ¿ãæ¹æ³ããœãŒã¹ã³ãŒããããã«ãããæ¹æ³ãæžãããŠããŸããã
ä»åã¯ãã€ããªãããŠã³ããŒãããŠäœ¿ãããšã«ããŸãã
ãã€ããªã¯avxãšavx2ãavx512ã®3çš®é¡ããããŸãããä»åã¯avx2ã䜿ãããšã«ããŸãã
$ grep -E avx /proc/cpuinfo flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d ã以éçç¥ã
LocalAIãããŠã³ããŒãã
$ curl -LO https://github.com/mudler/LocalAI/releases/download/v2.3.1/local-ai-avx2-Linux-x86_64
ãŸããŸããµã€ãºãããŸããâŠã
$ ll -h local-ai-avx2-Linux-x86_64 -rw-rw-r-- 1 xxxxx xxxxx 317M 1æ 1 18:14 local-ai-avx2-Linux-x86_64
å®è¡æš©éãä»äžããŠ
$ chmod a+x local-ai-avx2-Linux-x86_64
ãŸãã¯ããŒãžã§ã³ç¢ºèªã
$ ./local-ai-avx2-Linux-x86_64 --version LocalAI version v2.3.1 (a95bb0521d3f3183c9bba468c1417f4d000bdfb3)
ãã«ãã
$ ./local-ai-avx2-Linux-x86_64 --help NAME: LocalAI - OpenAI compatible API for running LLaMA/GPT models locally on CPU with consumer grade hardware. USAGE: local-ai [options] VERSION: v2.3.1 (a95bb0521d3f3183c9bba468c1417f4d000bdfb3) DESCRIPTION: LocalAI is a drop-in replacement OpenAI API which runs inference locally. Some of the models compatible are: - Vicuna - Koala - GPT4ALL - GPT4ALL-J - Cerebras - Alpaca - StableLM (ggml quantized) For a list of compatible model, check out: https://localai.io/model-compatibility/index.html COMMANDS: models List or install models tts Convert text to speech transcript Convert audio to text help, h Shows a list of commands or help for one command GLOBAL OPTIONS: --f16 (default: false) [$F16] --autoload-galleries (default: false) [$AUTOLOAD_GALLERIES] --debug (default: false) [$DEBUG] --single-active-backend Allow only one backend to be running. (default: false) [$SINGLE_ACTIVE_BACKEND] --parallel-requests Enable backends to handle multiple requests in parallel. This is for backends that supports multiple requests in parallel, like llama.cpp or vllm (default: false) [$PARALLEL_REQUESTS] --cors (default: false) [$CORS] --cors-allow-origins value [$CORS_ALLOW_ORIGINS] --threads value Number of threads used for parallel computation. Usage of the number of physical cores in the system is suggested. (default: 4) [$THREADS] --models-path value Path containing models used for inferencing (default: "/home/kazuhira/study/llm/clover/localai/models") [$MODELS_PATH] --galleries value JSON list of galleries [$GALLERIES] --preload-models value A List of models to apply in JSON at start [$PRELOAD_MODELS] --preload-models-config value A List of models to apply at startup. Path to a YAML config file [$PRELOAD_MODELS_CONFIG] --config-file value Config file [$CONFIG_FILE] --address value Bind address for the API server. (default: ":8080") [$ADDRESS] --image-path value Image directory (default: "/tmp/generated/images") [$IMAGE_PATH] --audio-path value audio directory (default: "/tmp/generated/audio") [$AUDIO_PATH] --backend-assets-path value Path used to extract libraries that are required by some of the backends in runtime. (default: "/tmp/localai/backend_data") [$BACKEND_ASSETS_PATH] --external-grpc-backends value [ --external-grpc-backends value ] A list of external grpc backends [$EXTERNAL_GRPC_BACKENDS] --context-size value Default context size of the model (default: 512) [$CONTEXT_SIZE] --upload-limit value Default upload-limit. MB (default: 15) [$UPLOAD_LIMIT] --api-keys value [ --api-keys value ] List of API Keys to enable API authentication. When this is set, all the requests must be authenticated with one of these API keys. [$API_KEY] --enable-watchdog-idle Enable watchdog for stopping idle backends. This will stop the backends if are in idle state for too long. (default: false) [$WATCHDOG_IDLE] --enable-watchdog-busy Enable watchdog for stopping busy backends that exceed a defined threshold. (default: false) [$WATCHDOG_BUSY] --watchdog-busy-timeout value Watchdog timeout. This will restart the backend if it crashes. (default: "5m") [$WATCHDOG_BUSY_TIMEOUT] --watchdog-idle-timeout value Watchdog idle timeout. This will restart the backend if it crashes. (default: "15m") [$WATCHDOG_IDLE_TIMEOUT] --preload-backend-only If set, the api is NOT launched, and only the preloaded models / backends are started. This is intended for multi-node setups. (default: false) [$PRELOAD_BACKEND_ONLY] --help, -h show help --version, -v print the version COPYRIGHT: Ettore Di Giacinto
ã¢ãã«ã䜿çšããŠããã¹ãçæãããŠã¿ã
ã§ã¯ãã¢ãã«ã䜿çšããŠLocalAIã«ããã¹ãçæãããŠã¿ãŸãããã
ã¢ãã«ã¯ãã¡ãããllama-2-7b-chat.Q4_K_M.gguf
ã䜿ãããšã«ããŸãã
TheBloke/Llama-2-7B-Chat-GGUF · Hugging Face
4GBããã¢ãã«ã§ãã
Getting StartedãèŠããšãmodels/[ã¢ãã«å]
ãšãã£ã圢ã§æå®ãããã®ãããããã§ãã
Getting started :: LocalAI documentation
models
ãã£ã¬ã¯ããªãäœæã
$ mkdir models
ã¢ãã«ãllama-2-7b-chat-gguf
ãšããååã§ããŠã³ããŒãã
$ curl -L https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf -o models/llama-2-7b-chat-gguf
ãããªããŸããã
$ tree models -h [4.0K] models âââ [3.8G] llama-2-7b-chat-gguf 0 directories, 1 file
LocalAIãèµ·åããŠã¿ãŸãã
$ ./local-ai-avx2-Linux-x86_64 --models-path models --context-size 700 --threads 4
ãªãã·ã§ã³ã®æå³ã¯ãã¡ãã«æžãããŠããŸããããããªæãã§ããã
--models-path
⊠æšè«ã«äœ¿çšãããã¢ãã«ãå«ãã ãã£ã¬ã¯ããªãã¹ïŒããã©ã«ãå€ã¯./models
ïŒ--context-size
⊠ã¢ãã«ã®ããã©ã«ãã®ã³ã³ããã¹ããµã€ãºïŒããã©ã«ãå€ã¯512ïŒ--threads
⊠䞊åèšç®ã«äœ¿çšããã¹ã¬ããæ°ïŒããã©ã«ãã¯4ïŒ
Getting Started / CLI parameters
èµ·åæã®ãã°ã
7:05PM DBG no galleries to load 7:05PM INF Starting LocalAI using 4 threads, with models path: models 7:05PM INF LocalAI version: v2.3.1 (a95bb0521d3f3183c9bba468c1417f4d000bdfb3) 7:05PM INF Preloading models from models âââââââââââââââââââââââââââââââââââââââââââââââââââââ â Fiber v2.50.0 â â http://127.0.0.1:8080 â â (bound on host 0.0.0.0 and port 8080) â â â â Handlers ............ 73 Processes ........... 1 â â Prefork ....... Disabled PID ............. 22558 â âââââââââââââââââââââââââââââââââââââââââââââââââââââ
models
ãã£ã¬ã¯ããªãããã¢ãã«ãããªããŒãããããã§ãã
ã§ã¯ãmodel
ã«llama-2-7b-chat-gguf
ãæå®ããŠããã¹ãçæãè¡ã£ãŠã¿ãŸãã
$ time curl -s -XPOST -H 'Content-Type: application/json' localhost:8080/v1/chat/completions -d \ '{"model": "llama-2-7b-chat-gguf", "messages": [{"role": "user", "content": "Could you introduce yourself?"}]}' | jq
1å40ç§ã»ã©ãããŠãçµæãè¿ã£ãŠããŸããã
{ "created": 1704103556, "object": "chat.completion", "id": "89ed376e-0d0f-41cb-a711-c007c880fc3d", "model": "llama-2-7b-chat-gguf", "choices": [ { "index": 0, "finish_reason": "stop", "message": { "role": "assistant", "content": "\n\nI'm a 32-year-old woman from the United States. I'm a writer and editor, and I've been working in the industry for about 10 years now. I've written for a variety of publications, including newspapers, magazines, and online sites. I'm also a mom to two young children, and I enjoy spending time with them and watching them grow. In my free time, I like to read, watch movies, and go for walks. I'm excited to be here and to share my thoughts and experiences with you." } } ], "usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0 } } real 1m4.061s user 0m0.031s sys 0m0.011s
llama-cpp-pythonã䜿ã£ãæãšåãã¢ãã«ã䜿ã£ãŠããã®ã§ãããèªå·±çŽ¹ä»ã32æ³ã®ã¢ã¡ãªã«åšäœã®å¥³æ§ãšããããšã«ãªã£ãŠããŸããã
usage
ã®å€ããªããšãããã¡ãã£ãšæ°ã«ãªããŸããâŠã
"usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0 }
ãã®æãLocalAIåŽã§ã¯ãããªãã°ãåºåãããŠããŸãã
7:06PM INF Loading model 'llama-2-7b-chat-gguf' with backend llama-cpp rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38605: connect: connection refused"
ã¡ãªã¿ã«ããŸã£ããé¢ä¿ãªãã¢ãã«åãæå®ãããš
$ time curl -s -XPOST -H 'Content-Type: application/json' localhost:8080/v1/chat/completions -d \ '{"model": "hoge", "messages": [{"role": "user", "content": "What is your name?"}]}' | jq
ã©ã®ããã¯ãšã³ãã§ãå©çšã§ããã¢ãã«ããªããšããããšã§ããšã©ãŒã«ãªããŸãã
{ "error": { "code": 500, "message": "could not load model - all backends returned error: 18 errors occurred:\n\t* could not load model: rpc error: code = Canceled desc = \n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Canceled desc = \n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unavailable desc = error reading from server: EOF\n\t* could not load model: rpc error: code = Unknown desc = stat models/hoge: no such file or directory\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/stablediffusion. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/tinydream. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/piper. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\n", "type": "" } } real 0m30.788s user 0m0.035s sys 0m0.004s
ãã®æã®LocalAIåŽã®ãã°ã
7:13PM INF Loading model 'hoge' with backend llama-cpp rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46755: connect: connection refused" 7:13PM INF Loading model 'hoge' with backend llama-ggml rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32975: connect: connection refused" 7:13PM INF Loading model 'hoge' with backend llama rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35765: connect: connection refused" 7:13PM INF Loading model 'hoge' with backend gpt4all rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33913: connect: connection refused" 7:13PM INF Loading model 'hoge' with backend gptneox rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37145: connect: connection refused" 7:13PM INF Loading model 'hoge' with backend bert-embeddings rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32845: connect: connection refused" 7:13PM INF Loading model 'hoge' with backend falcon-ggml rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33267: connect: connection refused" 7:13PM INF Loading model 'hoge' with backend gptj rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38685: connect: connection refused" 7:14PM INF Loading model 'hoge' with backend gpt2 rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37601: connect: connection refused" 7:14PM INF Loading model 'hoge' with backend dolly rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41031: connect: connection refused" 7:14PM INF Loading model 'hoge' with backend mpt rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37935: connect: connection refused" 7:14PM INF Loading model 'hoge' with backend replit rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42047: connect: connection refused" 7:14PM INF Loading model 'hoge' with backend starcoder rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38465: connect: connection refused" 7:14PM INF Loading model 'hoge' with backend rwkv rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40645: connect: connection refused" 7:14PM INF Loading model 'hoge' with backend whisper rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35603: connect: connection refused" 7:14PM INF Loading model 'hoge' with backend stablediffusion 7:14PM INF Loading model 'hoge' with backend tinydream 7:14PM INF Loading model 'hoge' with backend piper
æå®ãããã¢ãã«ã«å¯ŸããŠã䜿çšå¯èœãªããã¯ãšã³ããé 次æ¢ããããªæåã®ããã§ããã
èšå®ãã¡ã€ã«ã§æå®ãã
æåŸã«ã èšå®ãã¡ã€ã«ã䜿ã£ãŠLocalAIãæ§æããããã«ããŠã¿ãŸãããã
ãã¡ãã«ç¿ãæãã§ããã
llama.cpp / YAML configuration
Advanced / Advanced configuration with YAML files
ã¡ãã£ãšãããã«ããã®ã§ããã[ã¢ãã«å].yaml
ã§èšå®ãã¡ã€ã«ãäœæãããã--config-file
ã§èšå®ãã¡ã€ã«ãæå®ãããã§æžãæ¹ã
å€ããããã§ãã
ã¢ãã«ã«ã€ããŠã¯ãmodels
ãã£ã¬ã¯ããªãåäœæããŠã¢ãã«ãä»åºŠã¯ãã®ãŸãŸã®ååã§çœ®ããŠãããŸãã
$ mkdir models $ curl -L https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf -o models/llama-2-7b-chat.Q4_K_M.gguf
ãŸãã¯ãèšå®ãã¡ã€ã«ãã¢ãã«åã«ããŠã¿ãŸããããã¢ãã«åã¯gpt-3.5-turbo
ã«ããŠã¿ãŸããããã®ãã¡ã€ã«ã¯ãmodels
ãã£ã¬ã¯ããªå
ã«
é
眮ããå¿
èŠããããŸãã
models/gpt-3.5-turbo.yaml
name: gpt-3.5-turbo backend: llama context_size: 700 parameters: model: llama-2-7b-chat.Q4_K_M.gguf
backend
ã§äœ¿çšããããã¯ãšã³ããparameters
/ model
ã§å¯Ÿå¿ããã¢ãã«ã®ãã¡ã€ã«ãæå®ããããã§ãã
models
ãã£ã¬ã¯ããªã¯ããããæ§æã§ããã
$ tree models -h [4.0K] models âââ [ 102] gpt-3.5-turbo.yaml âââ [3.8G] llama-2-7b-chat.Q4_K_M.gguf 0 directories, 2 files
èµ·åã
$ ./local-ai-avx2-Linux-x86_64 --models-path models --threads 4
model
ã«gpt-3.5-turbo
ãæå®ããŠåäœç¢ºèªã
$ time curl -s -XPOST -H 'Content-Type: application/json' localhost:8080/v1/chat/completions -d \ '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Could you introduce yourself?"}]}' | jq
LocalAIåŽã§ã¯èšå®ãã¡ã€ã«ã§æå®ãããã¢ãã«ãèªèããŠããŸãã
8:28PM INF Loading model 'llama-2-7b-chat.Q4_K_M.gguf' with backend llama rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35403: connect: connection refused"
çµæãè¿ã£ãŠããŸããã
{ "created": 1704108534, "object": "chat.completion", "id": "ccd1fc52-f9ae-4c70-9242-bc0486972b40", "model": "gpt-3.5-turbo", "choices": [ { "index": 0, "finish_reason": "stop", "message": { "role": "assistant", "content": "\n\nI'm a 32-year-old woman from the United States. I'm a writer and editor, and I've been working in the industry for about 10 years now. I've written for a variety of publications, including newspapers, magazines, and online sites. I'm also a mom to two young children, and I enjoy spending time with them and watching them grow. In my free time, I like to read, watch movies, and go for walks. I'm excited to be here and to share my thoughts and experiences with you." } } ], "usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0 } } real 1m17.497s user 0m0.044s sys 0m0.000s
LocalAIã®èšå®ãã¡ã€ã«ãšããŠ--config-file
ãªãã·ã§ã³ã§æå®ããå Žåã¯ããã®ããã«ãªããŸãã
localai-config.yaml
- name: gpt-3.5-turbo backend: llama context_size: 700 parameters: model: llama-2-7b-chat.Q4_K_M.gguf
ã¢ãã«ããšã«é
åã«ãªããŸãã--config-file
ã§æå®ããå Žåã¯ããã®åœ¢åŒã«ãªãããšã«ãã¡ããèŠããŸã§æ°ã¥ããªããŠãYAMLãã¡ã€ã«ã
ããŒã¹ãšã©ãŒã«ãªã£ãŠãŸãããããŸããâŠã
Advanced / Advanced configuration with YAML files
ãã®å Žåãmodels
ãã£ã¬ã¯ããªå
ã«ã¯ã¢ãã«ã®ãã¡ã€ã«ãããã°OKã§ãã
$ tree models -h [4.0K] models âââ [3.8G] llama-2-7b-chat.Q4_K_M.gguf 0 directories, 1 file
èµ·åã
$ ./local-ai-avx2-Linux-x86_64 --config-file localai-config.yaml --models-path models --threads 4
åäœç¢ºèªçµæã¯ãã¢ãã«ããšã®èšå®ãã¡ã€ã«ãäœæããæãšåããªã®ã§çç¥ããŸãã
ä»åã¯ãããããã«ããŠãããŸãããã
ãããã«
OpenAI APIäºæã®ãµãŒããŒãããŒã«ã«ã§åãããLocalAIãè©ŠããŠã¿ãŸããã
llama.cppãOpenAI APIäºæã®ãµãŒããŒãšããŠåãããªããllama-cpp-pythonã®æ¹ãåäœãšããŠã¯ããããããã®ããªãšããæ°ãããŸãã
çµå±ãllama.cppããã§ããããšã¯å€ãããªããšæãã®ã§ã
ãã®äžæ¹ã§ãå¥ã®ããã¯ãšã³ãã䜿ãæãããã«é¢ããåšèŸºç¥èãåŸãããšãªã©ãèžãŸãããšãæŒãããŠãããæ¹ãããããã ãªãšã¯
æããŸããã
llama-cpp-pythonãšã¯ã±ãŒã¹ãã€ã±ãŒã¹ã§äœ¿ãåããŠãããããšæããŸãã