寫寫東西: 在 arch linux 以 vLLM 使用輕量型 OCR 模型 GLM-OCR

看到報導說GLM-OCR 0.7B參數模型效果不錯，

https://github.com/zai-org/GLM-OCR

有個書本幾頁照片想要轉成文字，就可以用這個離線辨識圖片中的文字。

原本丟給 ChatGPT，但是每個對話第一個還算正確，第二個開始就ChatGPT自己解釋成和照片文字不同的內容。

我分2個python環境，

一個是執行 vLLM （步驟1），一個是執行 example （步驟8）

1. 先建立執行 vllm環境

mkdir vllm
cd vllm/
uv venv --python 3.12 --seed

2. 套用python環境變數

source .venv/bin/activate

3. 安裝 vllm for cu130 (cuda 1.3x)

uv pip install -U vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly/cu130

4. 安裝 transformers

uv pip install git+https://github.com/huggingface/transformers.git

5. 啟動 glm-ocr vllm server

vllm serve zai-org/GLM-OCR --allowed-local-media-path / --port 8080 --speculative-config '{"method": "mtp", "num_speculative_tokens": 1}' --served-model-name glm-ocr

6. GPU memory 設定調整

6.1 預設預先佔用過多

如果出現以下錯誤，表示 GPU memory 設定保留 90%（0.9) 比目前可用還多，無法分配，加上 --gpu-memory-utilization 0.XX 減少分配量

ValueError: Free memory on device cuda:0 (6.4/7.66 GiB) on startup is less than desired GPU memory utilization (0.9, 6.89 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.

vllm serve zai-org/GLM-OCR --allowed-local-media-path / --port 8080 --speculative-config '{"method": "mtp", "num_speculative_tokens": 1}' --served-model-name glm-ocr --gpu-memory-utilization 0.70

6.2 預設 model seq len 過大

出現以下錯誤表示預設 model seq len 大於能用的 KV cache memory，可以先嘗試以 --max_model_len XXXX 改小 model seq len

ValueError: To serve at least one request with the models's max seq len (131072), (8.5 GiB KV cache is needed, which is larger than the available KV cache memory (1.8 GiB). Based on the available memory, the estimated maximum model length is 27792. Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing

vllm serve zai-org/GLM-OCR --allowed-local-media-path / --port 8080 --speculative-config '{"method": "mtp", "num_speculative_tokens": 1}' --served-model-name glm-ocr --gpu-memory-utilization 0.70 --max_model_len 4096

6.3 剩餘 GPU memory 不足

出現以下錯誤表示 GPU memory 的 --gpu-memory-utilization 設定太高，導致其他部份程式無法執行，再調低 --gpu-memory-utilization

torch.OutOfMemoryError: CUDA out of memory.

6.4 max model len 設定不足

出現以下錯誤表示 --max_model_len 設定不夠放 input token 和 output token，增加 --max_model_len 到足夠放的數字

Received bad status code: 400, response: {"error":{"message":"You passed 96 input characters and requested 4096 output tokens. However, the model's context length is only 4096 tokens, resulting in a maximum input length of 0 tokens (at most 0 characters). Please reduce the length of the input prompt. (parameter=input_text, value=96)","type":"BadRequestError","param":"input_text","code":400}}

7. 取得 GLM-OCR SDK

git clone git@github.com:zai-org/GLM-OCR.git

8. 建立python 環境

cd GLM-OCR/
uv venv --python 3.14

9. 套用 python 環境變數

source .venv/bin/activate

10. python 套件

glm-ocr sdk 好像有提供安裝方式，我另外先請 OpenAI codex 依據 examples/example.py 內容產生條列 python package 的 requirements.txt。以下是

10.1 requirements.txt 內容

pillow>=9.0.0
numpy>=1.24.0
requests>=2.28.0
pydantic>=2.7.0
wordfreq>=3.0.0
PyYAML>=6.0.0
portalocker>=2.8.2
python-dotenv>=0.21.0
# Layout detection
torch>=2.0.0
torchvision>=0.15.0
transformers>=5.1.0
sentencepiece>=0.1.99
accelerate>=0.20.0
opencv-python>=4.8.0
# PDF support
pypdfium2>=5.3.0
# Flask server
flask>=2.3.0

10.2 安裝 python 套件

uv pip install -r requirements.txt

11. 修改成使用local server

原本 examples/example.py 是使用開發公司Z.ai提供的API，改成使用前面 vLLM在 port 8080 提供的 API，需插入 import os，並且在 main() 一開使設定相關執行環境變數，將原本是連到GLM-OCR API server，改成使用本機的 port 8080

from __future__ import annotations

import os
import shutil

......

def main() -> int:
    # Use local self-hosted OCR API.
    os.environ["GLMOCR_MODE"] = "selfhosted"
    os.environ["GLMOCR_OCR_API_HOST"] = "127.0.0.1"
    os.environ["GLMOCR_OCR_API_PORT"] = "8080"

    here = Path(__file__).resolve().parent

8. 將圖片放到 examples/source 資料夾中

9. 執行 example

因為 examples/example.py 中引用的 glmocr package 在專案木的的 glmocr，所以執行時要加上環境變數，讓python 在這個專案目錄找 glmocr

PYTHONPATH=. python3 examples/example.py

10. 結果

產生的資料會在 examples/result，分別有JSON 格式的 .json 和 markdown 的 .md。另外還有標示辨識區域的圖片在layout_vis。

在手機拍攝的書本印刷字，發現2次前後行對調，還有訓練字集可能因為簡體字比較多，當字體比較模糊時，會辨識成簡體字。

寫寫東西

2026年2月22日星期日

在 arch linux 以 vLLM 使用輕量型 OCR 模型 GLM-OCR

1. 先建立執行 vllm環境

2. 套用python環境變數

3. 安裝 vllm for cu130 (cuda 1.3x)

4. 安裝 transformers

5. 啟動 glm-ocr vllm server

6. GPU memory 設定調整

6.1 預設預先佔用過多

6.2 預設 model seq len 過大

6.3 剩餘 GPU memory 不足

6.4 max model len 設定不足

7. 取得 GLM-OCR SDK

8. 建立python 環境

9. 套用 python 環境變數

10. python 套件

10.1 requirements.txt 內容

10.2 安裝 python 套件

11. 修改成使用local server

8. 將圖片放到 examples/source 資料夾中

9. 執行 example

10. 結果

沒有留言:

張貼留言

2026年2月22日 星期日

在 arch linux 以 vLLM 使用輕量型 OCR 模型 GLM-OCR

1. 先建立執行 vllm環境

2. 套用python環境變數

3. 安裝 vllm for cu130 (cuda 1.3x)

4. 安裝 transformers

5. 啟動 glm-ocr vllm server

6. GPU memory 設定調整

6.1 預設預先佔用過多

6.2 預設 model seq len 過大

6.3 剩餘 GPU memory 不足

6.4 max model len 設定不足

7. 取得 GLM-OCR SDK

8. 建立python 環境

9. 套用 python 環境變數

10. python 套件

10.1 requirements.txt 內容

10.2 安裝 python 套件

11. 修改成使用local server

8. 將圖片放到 examples/source 資料夾中

9. 執行 example

10. 結果

沒有留言:

張貼留言

2026年2月22日星期日