前提:docker、docker-compose、固件、驱动、MindIE、Ascend镜像等已被正确安装

步骤一:下载模型

从Modelscope上下载模型,下载后会存储到/root/.cache/modelscope/hub/models/Qwen下,

给模型文件添加权限并修改config.json:

chmod -R 640 {{modelpath}} 

vim config.json
将模型的config.json中的 "torch_dtype": "bfloat16", 更改为float16

步骤二:部署MindIE

将准备好的MindIE镜像加载到docker仓库,默认已加载,可以使用docker images查看;

复制该image的ID放到下面文件中

创建启动容器的脚本,可命名为mindie.sh

#!/bin/bash
container_name="Qwen2.5-70B"
image_name="d5a029763969"
model_path="/data"
docker run -it --ipc=host --name=$container_name --shm-size=500G --net=host --privileged=true \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \

--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /usr/local/sbin/:/usr/local/sbin/ \
-v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf \
-v /var/log/npu/slog/:/var/log/npu/slog \
-v /var/log/npu/profiling/:/var/log/npu/profiling \
-v /var/log/npu/dump/:/var/log/npu/dump \
-v /var/log/npu/:/usr/slog \
-v /etc/timezone:/etc/timezone:ro \
-v $model_path:$model_path \
$image_name /bin/bash

标红部分需要按需修改,其中device参数控制加载到容器的NPU

保存后,执行命令赋予权限:

chmod +x mindie.sh

启动脚本,进入容器,若没有自动启动可以使用docker命令

./mindie.sh

若没有自动进入容器,则使用

docker exec -it 容器名/容器id /bin/bash

执行时可能出现报错,需要将文件用vscode打开,并在右下角将文件从CRLF修改保存为LF

步骤三:进入容器后配置模型

进入目录: 

cd /usr/local/Ascend/mindie/latest/mindie-service

配置文件:

vi conf/config.json

为方便复制,模型路径放这里,可忽略本条:

/root/.cache/modelscope/hub/models/Qwen/Qwen2___5-14B-Instruct

保存后,给config.json文件赋予权限

chmod 640 conf/config.json

为方便重复启动,把具体配置放在后面,先放启动命令和查看日志命令

使用nohup后台启动mindie
nohup ./bin/mindieservice_daemon &

查看启动日志内容
tail -f nohup.out

有问题需要重启时,将进程杀死再重新启动

pkill -f mindieservice_daemon

下面放配置文件:

{
  "Version" : "1.0.0",
  "LogConfig" :
  {
    "logLevel" : "Info",
    "logFileSize" : 20,
    "logFileNum" : 20,
    "logPath" : "logs/mindie-server.log"
  },

  "ServerConfig" :
  {
    "ipAddress" : "0.0.0.0",
    "managementIpAddress" : "127.0.0.2",
    "port" : 1025,
    "managementPort" : 1026,
    "metricsPort" : 1027,
    "allowAllZeroIpListening" : false,
    "maxLinkNum" : 1000,
    "httpsEnabled" : false,
    "fullTextEnabled" : false,
    "tlsCaPath" : "security/ca/",
    "tlsCaFile" : ["ca.pem"],
    "tlsCert" : "security/certs/server.pem",
    "tlsPk" : "security/keys/server.key.pem",
    "tlsPkPwd" : "security/pass/key_pwd.txt",
    "tlsCrlPath" : "security/certs/",
    "tlsCrlFiles" : ["server_crl.pem"],
    "managementTlsCaFile" : ["management_ca.pem"],
    "managementTlsCert" : "security/certs/management/server.pem",
    "managementTlsPk" : "security/keys/management/server.key.pem",
    "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
    "managementTlsCrlPath" : "security/management/certs/",
    "managementTlsCrlFiles" : ["server_crl.pem"],
    "kmcKsfMaster" : "tools/pmt/master/ksfa",
    "kmcKsfStandby" : "tools/pmt/standby/ksfb",
    "inferMode" : "standard",
    "interCommTLSEnabled" : true,
    "interCommPort" : 1121,
    "interCommTlsCaPath" : "security/grpc/ca/",
    "interCommTlsCaFiles" : ["ca.pem"],
    "interCommTlsCert" : "security/grpc/certs/server.pem",
    "interCommPk" : "security/grpc/keys/server.key.pem",
    "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
    "interCommTlsCrlPath" : "security/grpc/certs/",
    "interCommTlsCrlFiles" : ["server_crl.pem"],
    "openAiSupport" : "vllm"
  },

  "BackendConfig" : {
    "backendName" : "mindieservice_llm_engine",
    "modelInstanceNumber" : 1,
    "npuDeviceIds" : [[0,1]],
    "tokenizerProcessNumber" : 8,
    "multiNodesInferEnabled" : false,
    "multiNodesInferPort" : 1120,
    "interNodeTLSEnabled" : true,
    "interNodeTlsCaPath" : "security/grpc/ca/",
    "interNodeTlsCaFiles" : ["ca.pem"],
    "interNodeTlsCert" : "security/grpc/certs/server.pem",
    "interNodeTlsPk" : "security/grpc/keys/server.key.pem",
    "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
    "interNodeTlsCrlPath" : "security/grpc/certs/",
    "interNodeTlsCrlFiles" : ["server_crl.pem"],
    "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
    "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
    "ModelDeployConfig" :
    {
      "maxSeqLen" : 16384,
      "maxInputTokenLen" : 8192,
      "truncation" : false,
      "ModelConfig" : [
        {
          "modelInstanceType" : "Standard",
          "modelName" : "DeepSeek-R1-Distill-Qwen-7B",
          "modelWeightPath" : "/data/DeepSeek-R1-Distill-Qwen-7B",
                    "worldSize" : 2,
                    "cpuMemSize" : 5,
                    "npuMemSize" : -1,
                    "backendType" : "atb",
                    "trustRemoteCode" : false
                }
            ]
        },

        "ScheduleConfig" :
        {
            "templateType" : "Standard",
            "templateName" : "Standard_LLM",
            "cacheBlockSize" : 128,

            "maxPrefillBatchSize" : 50,
            "maxPrefillTokens" : 8192,
            "prefillTimeMsPerReq" : 150,
            "prefillPolicyType" : 0,

            "decodeTimeMsPerReq" : 50,
            "decodePolicyType" : 0,

            "maxBatchSize" : 200,
            "maxIterTimes" : 8192,
            "maxPreemptCount" : 0,
            "supportSelectBatch" : false,
            "maxQueueDelayMicroseconds" : 5000
        }
    }
}
 

标红处需要修改,若部署多个模型,标紫处也要修改。

以下参数仅为示例,需根据实际修改
"ipAddress" : ""   填业务ip地址
"httpsEnabled" : false   忽略https的通信
"npuDeviceIds" : [[0,1]] 表示启用哪几张卡(这里是0,1两张卡)
"modelName" ="DeepSeek-R1-Distill-Qwen-7B"   模型名称
"modelWeightPath" = "/data/DeepSeek-R1-Distill-Qwen-7B"  模型权重路径,由于启动脚本已经配置了与宿主机一致,所以这里写宿主机的目录就可以,比如/root/.cache/modelscope/hub/models/Qwen/Qwen2___5-14B-Instruct
"worldSize" : 2  使用卡的数量与上面npuDeviceIds要对应
"maxSeqLen" :最大序列长度
"maxInputTokenLen":最大输入token数
"maxIterTimes":模型最大输出token数

maxSeqLen = maxInputTokenLen + maxIterTimes:最大序列长度
maxPrefillTokens = maxInputTokenLen:预填充最大token数:

"supportSelectBatch" : true 建议打开

Logo

开源鸿蒙跨平台开发社区汇聚开发者与厂商,共建“一次开发,多端部署”的开源生态,致力于降低跨端开发门槛,推动万物智联创新。

更多推荐