Leon's Blog

分享一点有趣的技术

0%

vllm-ascend 源码编译

image-20260525231538632

环境搭建流程

单卡环境搭建

  1. 镜像环境检查

    使用vllm-ascend项目自带的collect_env.py检查环境,我当前的环境如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    Ascend 910B NPU
    openEuler操作系统(容器自带)
    Python 3.11(容器自带)
    CANN 8.5.1(容器自带)
    torch 2.9.0(容器自带)
    torch-npu 2.9.0rc1(容器自带环境,后续需要重新install成torch-npu 2.9.0以解决兼容性问题)
    Triton-Ascend 3.2.0(需要自行安装)
    vLLM-Ascend `0.1.dev1+gce9effc33.d20260507`(最新github main分支的版本)
    vLLM `0.19.2rc1.dev17+gd886c26d4`(和vllm-ascend最新main分支兼容版本)
  2. 虚拟环境搭建

    • 启动虚拟环境

      1
      python3 -m venv --system-site-packages .venv   # 使用 --system-site-packages 继承系统环境

      这里对--system-site-packages做一点小拓展:

      • 如果用户需要安装依赖包,且该依赖包系统环境已有,会默认跳过
      • 这种情况用户需要强制使用:pip install --ignore-installed <包名>==<版本>来强制安装
      • :warning:由于虚拟环境路径优先,新安装的包会遮蔽系统包
      • :warning:会跳过依赖检查,有风险
    • 基础工具链安装

      1
      dnf install -y git gcc gcc-c++ cmake numactl-devel wget curl jq ninja-build python3-pip
      1
      2
      3
      4
      5
      6
      7
      8
      9
      python -m pip install \
      cmake \
      ninja \
      packaging \
      "setuptools>=77,<81" \
      setuptools-scm \
      pybind11

      pip install torchvision==0.24.0 torchaudio==2.9.0 # 参考:http://pytorch.org/get-started/previous-versions/
      1
      2
      # For torch-npu dev version or x86 machine
      pip config set global.extra-index-url "https://download.pytorch.org/whl/cpu/"
  3. 源码安装vllm

    • clone并切到对应版本

      1
      2
      git clone --recursive git@github.com:vllm-project/vllm.git
      git checkout d886c26d4d4fef7d079696beb4ece1cfb4b008a8

      vllm版本具体参考 vllm-ascend Dokerfile

    • 源码安装

      这里的核心要点是,一定使用 --no-deps --no-build-isolation,以防止vllm 下载并安装默认torch(2.10.0版本,存在兼容性问题)。

      1
      2
      VLLM_TARGET_DEVICE=empty \
      python -m pip install -v --no-deps --no-build-isolation -e .
  4. 源码安装vllm-ascend

    1
    2
    # 进入 vllm-ascend目录
    python -m pip install -v --no-deps --no-build-isolation -e .
  5. 解决版本兼容性问题

    • torch-npu必须是torch-npu2.9.0

      1
      2
      python -m pip install --no-cache-dir --force-reinstall --no-deps \
      "https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/torch_npu-2.9.0.post1%2Bgitdc51c2d-cp311-cp311-manylinux_2_28_x86_64.whl"
    • triton-ascend必须是3.2.0

      1
      2
      3
      python -m pip install --no-cache-dir --force-reinstall --ignore-installed --no-deps \
      -i https://pypi.org/simple \
      "triton-ascend==3.2.0"

最终测试

使用如下qwen3-0.6B测试。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from vllm import LLM, SamplingParams

prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm = LLM(model="Qwen/Qwen3-0.6B")

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

如下结果即为初步构建成功。

image-20260507195130000

CI test

首先需要补充如下python包:

1
2
3
pip install pytest  
pip install pandas
pip install pytest-mock

然后运行:

1
2
3
4
5
6
cd /vllm-workspace/vllm-ascend/
# Run all single-card tests
pytest -sv tests/ut

# Run single test
pytest -sv tests/ut/test_ascend_config.py

但是目前的branch不支持mooncace,所以使用:

1
pytest -sv tests/ut   --ignore=tests/ut/kv_connector/test_mooncake_connector.py   --ignore=tests/ut/kv_connector/test_mooncake_layerwise_connector.py

问题

镜像相关

python 版本问题

检查 Python:

1
2
3
which python
python -V
python -m pip -V

期望 Python 来自:

1
/usr/local/python3.11.14/bin/python

昇腾容器有两条python环境,需要自行export一下。

OpenEuler相关

1
2
dnf install -y 包名
yum install -y 包名

vLLM相关

vLLM或是vLLM-ascend出现module缺失

这种情况就是版本没有对齐。参考 https://docs.vllm.ai/projects/vllm-ascend-cn/zh-cn/latest/community/versioning_policy.html 版本管理矩阵对齐。如果是用的github最新的main分支,那么vllm等依赖版本通过Dockerfile来对齐即可。

torch_air 缺失 bug

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
python - <<'PY'
import importlib.metadata as md

for p in ["torch", "torch-npu", "torchvision", "torchaudio", "vllm", "vllm-ascend"]:
try:
print(f"{p:12s} => {md.version(p)}")
except Exception as e:
print(f"{p:12s} => NOT FOUND: {e}")

print("\nChecking torchair API...")
try:
import torch_npu
from torch_npu.dynamo import torchair
print("torch_npu:", torch_npu.__file__)
print("torchair:", torchair)
print("torchair file:", getattr(torchair, "__file__", None))
print("has register_replacement:", hasattr(torchair, "register_replacement"))
print("related symbols:", [x for x in dir(torchair) if "register" in x or "replace" in x])
except Exception as e:
print("FAILED:", repr(e))
PY

运行上述脚本,has register_replacement:会返回false,原因是算力广场提供的pytorch镜像使用的是torch-npu2.9.0rc1,而不是官方要求的torch-npu2.9.0。重新构建好 torch-npu2.9.0即可。

缺少triton

1
2
3
4
5
6
7
(.venv) [root@179581a576fc vllm-ascend-dly]# python check.py 
INFO 05-07 10:36:10 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-07 10:36:10 [__init__.py:46] - ascend -> vllm_ascend:register
INFO 05-07 10:36:10 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 05-07 10:36:10 [__init__.py:239] Platform plugin ascend is activated
INFO 05-07 10:36:14 [importing.py:44] Triton is installed but 0 active driver(s) found (expected 1). Disabling Triton to prevent runtime errors.
INFO 05-07 10:36:14 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
1
AttributeError: '_OpNamespace' 'vllm' object has no attribute 'qkv_rmsnorm_rope'

相关issue:https://github.com/vllm-project/vllm-ascend/issues/6737

重新构建triton-ascend:

1
2
3
python -m pip install --no-cache-dir --force-reinstall --ignore-installed --no-deps \
-i https://pypi.org/simple \
"triton-ascend==3.2.0"

测试如下脚本,只要qkv_rmsnorm_rope存在即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
python - <<'PY'
import torch
import importlib.util

print("torch:", torch.__version__)
try:
import torch_npu
print("torch_npu:", torch_npu.__version__)
except Exception as e:
print("torch_npu import failed:", repr(e))

try:
import vllm
print("vllm:", vllm.__version__, vllm.__file__)
except Exception as e:
print("vllm import failed:", repr(e))

try:
import vllm_ascend
print("vllm_ascend:", getattr(vllm_ascend, "__version__", "unknown"), vllm_ascend.__file__)
except Exception as e:
print("vllm_ascend import failed:", repr(e))

print("triton spec:", importlib.util.find_spec("triton"))
print("triton_ascend spec:", importlib.util.find_spec("triton_ascend"))

for name in [
"qkv_rmsnorm_rope",
"triton_split_qkv_rmsnorm_rope",
"triton_split_qkv_rmsnorm_mrope",
"fused_qk_norm_rope",
]:
try:
op = getattr(torch.ops.vllm, name)
print("[OK] torch.ops.vllm.%s exists: %s" % (name, op))
except Exception as e:
print("[MISS] torch.ops.vllm.%s: %s" % (name, repr(e)))
PY