llama-cpp-pythonでReplit Code V1.5 3Bを使ってコード生成を試す

これは、なにをしたくて書いたもの？

以前のエントリーで、SalesforceのCodeGenのモデルを使ってソースコードの生成を行ってみました。

この時は、llama-cpp-pythonのドキュメントで紹介されていたReplit Code V1.5 3Bのオリジナルを使おうとしてメモリ不足になったのですが、
GGUF形式に変換されたモデルをllama-cpp-python（llama.cpp）で実行する分には動くのでは？と思ったので試してみました。

結論としては、動きました。

Replit Code V1.5 3B

llama-cpp-pythonのコード生成のドキュメントで紹介されているのは、Replit Code V1.5 3Bというモデルです。

OpenAI Compatible Server / Guides / Code Completion

こちらですね。正確には、Replit Code V1.5 3BというモデルをGGUFフォーマットに変換したものです。

abetlen/replit-code-v1_5-3b-GGUF · Hugging Face

このモデルはどういうものなのか？というと、Replit社がApache License 2.0で公開しているコード生成用のLLMのようです。

コードを生成する大規模言語モデル「Replit Code V1.5 3B」が登場|CodeZine（コードジン）

Replit社によるブログはこちら。

Replit — Replit’s new AI Model now available on Hugging Face

Stackデータセットの寛容なライセンスコードと、StackExchangeにより公開されている開発向けコンテンツによる1兆トークンのコードで
トレーニングされたモデルだそうです。

Extensive Permissively Licensed Training Data: Trained on 1 trillion tokens of code from permissively licensed code from the Stack dataset and publicly available dev-oriented content from StackExchange.

Hugging FaceのTransformersでの使い方も合わせて書かれています。

Hugging Face Hubで公開されているモデルはこちら。

replit/replit-code-v1_5-3b · Hugging Face

対応している言語は、以下の30個だそうです。

Java
JavaScript
C
PHP
Python
C++
C#
TypeScript
Go
CSS
HTML
Rust
Ruby
Swift
Scala
Shell
Lua
Perl
Haskell
JSX
Julia
Common Lisp
OCaml
Solidity
Scheme
R
Zig
SQL
Racket
D

今回は、GGUFフォーマットに変換されたReplit Code V1.5 3Bのモデルをllama-cpp-pythonで動かしてみます。

abetlen/replit-code-v1_5-3b-GGUF · Hugging Face

環境

今回の環境はこちら。

$ python3 --version
Python 3.10.12


$ pip3 --version
pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)

llama-cpp-pythonのバージョン。

$ pip3 freeze | grep llama_cpp_python
llama_cpp_python==0.2.28

GGUFフォーマットのReplit Code V1.5 3Bを使ってコード生成してみる

まずはGGUFフォーマットのReplit Code V1.5 3Bのモデルをダウンロードします。Q4_0のちょっと軽量なものにしておきました。

$ curl -LO https://huggingface.co/abetlen/replit-code-v1_5-3b-GGUF/resolve/main/replit-code-v1_5-3b.Q4_0.gguf

ダウンロードしたモデルを指定して、llama-cpp-pythonを起動。

$ python3 -m llama_cpp.server --model replit-code-v1_5-3b.Q4_0.gguf --n_ctx 16192

--n_ctxというのは、コンテキストサイズを指定するオプションです。デフォルトは2048で、ドキュメントに従って大きくしています。

確認。

$ time curl -s -XPOST -H 'Content-Type: application/json' localhost:8000/v1/engines/copilot-codex/completions -d '{"prompt": "// App.java \n public class App {\n    // print Hello World\n    public static void main(String[] args) {"}' | jq
{
  "id": "cmpl-47c4ed04-f81b-4921-8283-c3f60a6d663f",
  "object": "text_completion",
  "created": 1705235381,
  "model": "replit-code-v1_5-3b.Q4_0.gguf",
  "choices": [
    {
      "text": "\n        System.out.println(\"Hello, world\");\n\n        }\n}",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 26,
    "completion_tokens": 15,
    "total_tokens": 41
  }
}

real    0m1.949s
user    0m0.052s
sys     0m0.014s

動きました。それっぽい結果が返ってきている感じです。

エンドポイントはどこ？

llama-cpp-pythonではhttp://[llama-cpp-pythonが動作しているホスト]:8000/docsにアクセスすると、OpenAPI定義を見ることができます。

以前はこの中に/v1/engines/copilot-codex/completionsというエンドポイントがあったのですが、なくなっています。

ソースコードを確認すると、しっかり残っています

@router.post(
    "/v1/completions", summary="Completion", dependencies=[Depends(authenticate)]
)
@router.post(
    "/v1/engines/copilot-codex/completions",
    include_in_schema=False,
    dependencies=[Depends(authenticate)],
)
async def create_completion(

https://github.com/abetlen/llama-cpp-python/blob/v0.2.28/llama_cpp/server/app.py#L199-L207

といっても、Completion APIのエイリアス扱いなのですが。
OpenAPI定義上では表示しないようになったみたいです（include_in_schema=False）。

こちらの変更ですね。

https://github.com/abetlen/llama-cpp-python/commit/1a7bf2037bba35b5b15340088694aa897d83fe36

Completion APIでアクセスしてもいいのですが、OpenAIのAPI上はすでにレガシー扱いなのでOpenAI Python APIライブラリーで
アクセスするのはちょっと気が引けるところです。

Legacy / Create completion

コード生成用エンドポイントにPythonコードを書いてアクセスしてみる

ちょっとcurlでアクセスではつらいので、簡単にプログラムを書いてアクセスしてみましょう。

使うのはPythonの標準ライブラリーのみにします。

urllib.request --- URL を開くための拡張可能なライブラリ — Python 3.10.13 ドキュメント

json --- JSON エンコーダおよびデコーダ — Python 3.10.13 ドキュメント

作成したプログラム。

completion.py

from urllib import request
import json

def code_completion(data: dict):
    url = "http://localhost:8000/v1/engines/copilot-codex/completions"
    data_as_binary = json.dumps(data).encode("utf-8")

    req = request.Request(url, data=data_as_binary)
    req.add_header("Content-Type", "application/json")

    with request.urlopen(req) as r:
        response = json.loads(r.read().decode("utf-8"))

        print("-------------------------------------------------------")
        print("input:")
        print(data["prompt"])
        print()
        print("generated with:")
        print(data["prompt"] + response["choices"][0]["text"])
        print()

dataList = [
    {
        "prompt": """// App.java
public class App {
    // main method
    // Outputs "Hello World" and terminates the program.""",
        "max_tokens": 64
    },
    {
        "prompt": """// hello.ts
// Outputs "Hello World" and terminates the program.
console""",
        "max_tokens": 64
    }
]

for data in dataList:
    code_completion(data)

コード補完のエンドポイントにアクセスして、入力値とつなぎ合わせて表示するようにしてみました。

def code_completion(data: dict):
    url = "http://localhost:8000/v1/engines/copilot-codex/completions"
    data_as_binary = json.dumps(data).encode("utf-8")

    req = request.Request(url, data=data_as_binary)
    req.add_header("Content-Type", "application/json")

    with request.urlopen(req) as r:
        response = json.loads(r.read().decode("utf-8"))

        print("-------------------------------------------------------")
        print("input:")
        print(data["prompt"])
        print()
        print("generated with:")
        print(data["prompt"] + response["choices"][0]["text"])
        print()

リクエストは2回、JavaとTypeScriptでコード生成してもらうことにします。

dataList = [
    {
        "prompt": """// App.java
public class App {
    // main method
    // Outputs "Hello World" and terminates the program.""",
        "max_tokens": 64
    },
    {
        "prompt": """// hello.ts
// Outputs "Hello World" and terminates the program.
console""",
        "max_tokens": 64
    }
]

for data in dataList:
    code_completion(data)

実行。

$ python3 completion.py
-------------------------------------------------------
input:
// App.java
public class App {
    // main method
    // Outputs "Hello World" and terminates the program.

generated with:
// App.java
public class App {
    // main method
    // Outputs "Hello World" and terminates the program.
    public static void main(String[] args) throws Exception{
        System.out.println("Hello, world.");
   }  // end of test()
  }//endclass Test

-------------------------------------------------------
input:
// hello.ts
// Outputs "Hello World" and terminates the program.
console

generated with:
// hello.ts
// Outputs "Hello World" and terminates the program.
console.log('Hello world'); // Print to stdout or stderr based on process exit code!
process.exit(0);             /* 0 = success, non-zero otherwise (i.e., 1) */

それっぽいのが出ました。良さそうです。

結果は実行する度に変わります。

今回、max_tokensをこれくらいにしてみましたが、より多くするとさらにブレが大きくなりました。

        "max_tokens": 64

一例。

-------------------------------------------------------
input:
// App.java
public class App {
    // main method
    // Outputs "Hello World" and terminates the program.

generated with:
// App.java
public class App {
    // main method
    // Outputs "Hello World" and terminates the program.
    public static void main(String args[]){
        System.out.println("Hello, world");
    }  // end of Main







    /*

    // Printer Class to handle all printers
    class printer{

        private int max_pages;
            {
                this.max_pages = 20000 ;
            };

        public void setMaxPages(int n){
        if (n>0)
            System.out.println("The maximum number of pages for a single copy is: " + n);
         else
                throw new IllegalArgumentException ("Number must be positive");
        }


        // Printer Class Constructor method to create an object with default values

        public printer(){
            this(1,20000,"Pink");
        };

        private String type;
    {
    ;

          System.out.println("You have created a " +this.type+" print.");





    } // end of Constructor


    void setType (String s){
               if (!s = null)
                   this.type= new String(new char[max_pages]);
                else throw IllegalArgumentException ("Incorrect format");
             };




        public int getMaxPages() { return max_pages;

        } // end

-------------------------------------------------------

input:
// hello.ts
        // print Hello World

generated with:
// hello.ts
        // print Hello World
function main() {
  console.log('Hello, world!');
}


// export to make it available in other programs/files that import this file by its path relative or absolute from where you are running the program (either directly for local testing) and then use a require statement of `path` module which is part of node core package.  See https://nodejs.org/api/modules_modules
module.exports = {
  main, // make this available to other programs as an object with property "main" by exporting it in the scope (the parent) file where we are using require() statement from its `path` module!   When I tried below syntax on import like:  const helloWorld2=require('./hello'); , then for some reason, VS Code gave me errors but when running directly through node command line and with or without the export in it worked fine.
};

ひとまず、動かせたのでよしとしましょうか。

おわりに

llama-cpp-pythonでReplit Code V1.5 3Bを使ってコード生成を試してみました。

素のReplit Code V1.5 3BをTransformersで動かそうとした時は全然ダメでしたが、選んだモデルがその中でも軽量なものだったので
なんとか動かせましたね。

こういう活用観点もあるんだなとちょっと思ったり。

CLOVER🍀

That was when it all began.