Local LLM with Nvidia P40 Tesla
This post describes a quest of enabling Nvidia P40 Tesla as the GPU accelerator for Local LLMs.
How long it took, how much money it costed, what are its capability and my overall satisfaction included.
TL;DR: The Nvidia Tesla P40 is a surprisingly capable GPU for 7B-parameter local LLMs. After ~$430 and a few weeks of troubleshooting (power, cooling, drivers), it now runs Ollama + Open-WebUI smoothly under Proxmox. Great value if you like tinkering.
How it begins..
One day I was browsing through a local auction site and saw a GPU - Nvidia P40 Tesla for sale. Took on a little rabbit hole on researching about it and its capabilities.
I acquired it for about $140.
To be honest, I didn't plan on implementing it in my homelab.. but its presence means that I can play around with Local LLM and perhaps have a ChatGPT privately.
That seemed exciting.
Let's go.
Enter, Another PC
After digging through my spare components I found the following items:
- Intel Core i7 10700
- Crucial Pro DDR4 RAM 64GB Kit (2x32GB) 3200MHz
- Crucial P310 SSD 1TB M.2 2280 PCIe Gen4 NVMe
- Crucial MX500 500GB
- And this GPU…
Okay, okay.. so I barely have half of necessary parts to build a PC.
At that point my homelab was “Mini” focused. I had three mini PCs and a NAS, it was pretty compact.
Understandably, this has to change now because this GPU is massive compared to a Mini PC.
Before continuing, it is worth noting that this is my first ever built PC.
The Case
Decision was to find a SFF case and the choice fell onto
Cooler Master Q300L
It's just a cute case, not too large and has nice airflow potential. Too be honest, I am very happy with this decision.
It was about $50.
I was looking for white one, but wasn't able to find it. It had to be black.
The Motherboard
The choice I made is dreadful, but it is my choice regardless.
I took the cheapest second hand one.
Asus H510M-R
What I realized is missing:
- only 2 RAM slots
- no NVME slots
- only one fan pin besides CPU fan
- no RGB support
- no USB C support.
It was about $40.
The Power Supply
For PSU I wanted at least Gold rated and 700W+.
I got Cooler Master V750 750W 80+ GOLD for about $73, second hand.
This was a good choice.
The Fans
I bought Cooler Master MasterFan MF120 Halo White Edition.
Now there are two reasons why these weren't my favorite:
- They were white. Initially, the plan was to have all white PC. But as the white case was not available and these are white, it didn't fit at all.
- My motherboard does not support RGB. RGB cables from these fans are gonna be dangling.
But, if they're gonna provide cooling to GPU without a cooler, why would I object? It was $25 for a pair.
There was a little issue my mobo, unfortunately, has only 2 fan ports (CHA and CPU), thus I had to purchase Arctic Case Fan Hub to support multiple fans.
Another $10 spent.
The Assembly
I got all the parts for the PC and it was ready to assemble. Very exciting!
I even invited a buddy to chill and assemble the PC with me. PC assembly chilling is underrated!
The assembly went smooth and it was super fun. We got the PC up and running quite quickly.
The last but not least was GPU!
Once we inserted the pins from PSU and started the PC it would stop momentarily with a loud click.
We did that a couple of times and afterwards declared GPU dead.
In the hindsight, the Gold Rated PSU reacted well and stopped the PC to turn on.
But this was too easy to give up! I decided to research a bit more and got a hold of this article where I saw mentions of certain adapter. I looked a bit closer and after some more research I realized I need Dual PCIe 2x8P PCI-E 8Pin Female to ATX CPU 8 P-in Male. Here's a link to the one I ordered. Costed $8.
The Dual PCIe 2x8P PCI-E 8Pin Female to ATX CPU 8 P-in Male adapter connector
Okok, so this was quite interesting. New finding alert!
It turns out in PSU's you have PCIe and ATX CPU power connectors. And Nvidia P40 Tesla uses 8-pin EPS power connector which fits into ATX CPU. 8-pin
Most PSUs have only one 8-pin ATX CPU (usually 2x4-pin) power connector for you guessed it, CPU.
Essentially, what this adapter connector does is it translates 2x PCIe 8-pin connectors to one 8-pin ATX CPU. This fits in Nvidia P40 Tesla 8-pin EPS connector quite smoothly.
[!IMPORTANT] Power Connector Trap: The P40 uses an EPS 8-pin connector, not PCIe. You must use a “Dual PCIe 2×8-pin to ATX 8-pin CPU” adapter or it won’t power on.
Back on the Assembly
It took a month for connector to arrive. As soon as I got my hands onto it, I tried it and it was turning on with the GPU!
Amazing!
LXC passthrough madness
Admittedly, this is where I likely lost a couple of days of my life.
I installed Proxmox directly on the metal so I can virtualize for different use cases, as I have a beefy PC now RAM wise plus I have some experience with it. Spun up a LXC and installed necessary Docker for Open WebUI and Ollama.
It's tricky to hit right Nvidia P40 Tesla driver version from the get go, and it is not streamlined as on Windows. Mind you, I was using Windows my whole life until recently. I had to do a lot of digging. Thankfully, I went through it so hopefully others don't have to.
Why it’s trickier than it looks
- CUDA needs **/dev/nvidia-uvm**.nvidia-smidoes not. Missing that one device file produces the infamouscuInit -> 999.
- LXC has its own security layers (cgroup, seccomp, AppArmor). Miss one rule and CUDA falls apart.
Step 1 – Set up the driver on the host (Proxmox)
# Add NVIDIA repo for Debian 12
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)   # debian12
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/nvidia-keyring.gpg  | gpg --dearmor -o /usr/share/keyrings/nvidia-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/nvidia-archive-keyring.gpg]   https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /"   >/etc/apt/sources.list.d/nvidia-cuda.list
apt-get update
# Install the 560 series driver and kernel modules
apt-get -y install nvidia-driver-560
Make sure the modules and device nodes appear every boot:
# 1. Load modules automatically
echo -e "nvidia\nnvidia_modeset\nnvidia_uvm" >/etc/modules-load.d/nvidia.conf
# 2. Create /dev nodes when nvidia_uvm comes up
cat >/etc/udev/rules.d/90-nvidia-uvm.rules <<'EOF'
KERNEL=="nvidia-uvm", RUN+="/usr/bin/nvidia-modprobe -u -c=0"
EOF
udevadm control --reload
# 3. Bring them up right now (so we don't have to reboot)
modprobe nvidia && modprobe nvidia_uvm
nvidia-modprobe -u -c=0
Quick check:
ls -l /dev/nvidia-uvm*
You should see two character devices (nvidia-uvm and nvidia-uvm-tools, major 511).
Step 2 – Create the privileged LXC
- Debian 12 template
- Privileged (unprivileged: 0)
- Whatever CPU/RAM/disk you like
GPU bits in _/etc/pve/lxc/801.conf_
# Bind the GPU nodes
lxc.mount.entry: /dev/nvidiactl        dev/nvidiactl        none bind,optional,create=file
lxc.mount.entry: /dev/nvidia0          dev/nvidia0          none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm       dev/nvidia-uvm       none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
# Allow their majors
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 241:* rwm
lxc.cgroup2.devices.allow: c 511:* rwm
# Relax LXC security (simplest)
lxc.apparmor.profile: unconfined
lxc.seccomp.profile:  unconfined
Restart the container:
pct restart 801
Step 3 – Inside the container: userspace driver, Docker & NVIDIA toolkit
apt-get update
VER=560.35.05-1   # same build as the host
# Userspace driver libs and tools
apt-get install -y --no-install-recommends   nvidia-alternative=$VER nvidia-driver-bin=$VER   libcuda1=$VER libnvidia-ml1=$VER nvidia-smi=$VER
# Docker Engine + compose plugin
apt-get install -y ca-certificates curl gnupg
install -d -m 0755 /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg  | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo  "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.gpg]  https://download.docker.com/linux/debian bookworm stable"  >/etc/apt/sources.list.d/docker.list
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
# NVIDIA Container Runtime
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey  | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/amd64  | sed 's|^deb |deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] |'  >/etc/apt/sources.list.d/nvidia-container-toolkit.list
apt-get update
apt-get install -y nvidia-container-toolkit
Tell Docker to use it:
/etc/docker/daemon.json
{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": ["--security-opt=seccomp=unconfined"]
    }
  }
}
systemctl restart docker
Step 4 – Compose Ollama + Open‑WebUI
/opt/open-webui/docker-compose.yml
version: "3.9"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    runtime: nvidia
    gpus: all
    privileged: true                # IPC_LOCK + unlimited memlock
    environment:
      NVIDIA_VISIBLE_DEVICES: all
      NVIDIA_DRIVER_CAPABILITIES: compute,utility
      USE_CUDA: "1"
      TORCH_DEVICE: cuda
      OLLAMA_DEBUG: "1"
    volumes:
      - ollama-data:/root/.ollama
    restart: unless-stopped
  webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    depends_on: [ollama]
    ports:
      - "3000:8080"
    environment:
      OLLAMA_BASE_URL: http://ollama:11434
    volumes:
      - webui-data:/app/backend/data
    restart: unless-stopped
volumes:
  ollama-data:
  webui-data:
docker compose pull
docker compose up -d
Ollama log should now announce the GPU.
Step 5 – Sanity‑check the GPU
docker run --rm --gpus all \
  --cap-add IPC_LOCK --ulimit memlock=-1 \
  --security-opt seccomp=unconfined \
  nvidia/cuda:12.4.1-base-ubuntu22.04 \
  bash -c "apt-get update && apt-get install -y python3-minimal && \
           python3 - <<'PY'
import ctypes
print('cuInit ->', ctypes.CDLL('libcuda.so.1').cuInit(0))
PY"
You want cuInit -> 0.
Step 6 – Keep an eye on temps
watch -n1 nvidia-smi
# If it sits above 80 °C under load:
nvidia-smi -pl 150
The Almost Final Setup
After these steps it was finally working, almost. The next problems were the temperatures hikes which are a totally new challenge to handle.
The Temperatures
With the two Cooler Master fans I've had, I was easily hitting 80+°C after two prompts and it was coming down very slowly. I tried to reposition them many times, understandably to have at least one intake and one exhaust (the necessity of this is a new thing I learned), but the temperatures were consistently above 80 °C.
Challenge to keep those temps down accepted.
The Noctua NF-F12 Industrial PPC-3000 PWM
I figured the faster and stronger fan would be a good solution here.
Thus, I've managed to pick up three Noctua NF-F12 Industrial for $45.
Automatic NVIDIA‑Driven Case‑Fan Control on Linux
Below I made a fun script that would automatically raise fan speed based on GPU temperature.
1 Install tiny requirements
apt update
apt install -y lm-sensors fancontrol nano
sensors-detect         # press <Enter> at every prompt
reboot                 # log back in as root
2 Identify the CHA header (if unsure)
for ch in /sys/class/hwmon/hwmon2/pwm* ; do
  echo "Testing $ch"
  echo 1   > ${ch}_enable    # manual mode
  echo 255 > $ch ; sleep 2   # full blast
  echo  80 > $ch ; sleep 1   # slow
done
Take note of the pwmX file that spins all case fans.
Assumed below: **/sys/class/hwmon/hwmon2/pwm1**.
3 Create the control script
nano /usr/local/sbin/gpu_casefan.sh
Paste:
#!/usr/bin/env bash
set -euo pipefail
FAN=/sys/class/hwmon/hwmon2/pwm1      # change if different
echo 1 > "${FAN}_enable"              # force manual mode
BASE=60        # PWM at ≤35 °C  (idle)
MAX=255        # PWM at ≥70 °C  (full)
LO=35          # ramp start (°C)
HI=70          # ramp end   (°C)
INTERVAL=3     # seconds between polls
while true; do
  T=$(/usr/bin/nvidia-smi --query-gpu=temperature.gpu                           --format=csv,noheader,nounits)
  if   (( T <= LO )); then PWM=$BASE
  elif (( T >= HI )); then PWM=$MAX
  else
    PWM=$(( BASE + (MAX-BASE)*(T-LO)/(HI-LO) ))
  fi
  echo "$PWM" > "$FAN"
  sleep "$INTERVAL"
done
Save (Ctrl + O, Enter) → exit (Ctrl + X):
chmod +x /usr/local/sbin/gpu_casefan.sh
4 Add a systemd unit
nano /etc/systemd/system/gpu-casefan.service
Paste:
[Unit]
Description=Case‑fan hub follows NVIDIA GPU temperature
After=multi-user.target
[Service]
Type=simple
ExecStart=/usr/local/sbin/gpu_casefan.sh
Restart=always
RestartSec=2
[Install]
WantedBy=multi-user.target
Save & exit.
5 Enable and start the service
systemctl daemon-reload
systemctl enable --now gpu-casefan.service
systemctl status gpu-casefan.service   # should be *active (running)*
6 Live monitoring
watch -n1 'printf "GPU °C: "; /usr/bin/nvidia-smi --query-gpu=temperature.gpu                       --format=csv,noheader,nounits;            printf " | PWM: "; cat /sys/class/hwmon/hwmon2/pwm1'
Launch a GPU load (stress-ng --gpu 1, Blender, a game) and watch PWM ramp from 60 → 255 as GPU heat rises.
7 Tweaks
| What | Line to change | Example | 
|---|---|---|
| Quieter idle | BASE= | BASE=45 | 
| More aggressive | LO=orMAX= | LO=30,MAX=255 | 
| Pause service | systemctl stop gpu-casefan.service | — | 
| Return to BIOS curve | echo 5 > /sys/class/hwmon/hwmon2/pwm1_enable | — | 
That’s all—no compilers, no sudo, ten minutes to GPU‑aware case‑fans!
Bonus: tedious script for monitoring
#!/usr/bin/env bash
# gpu-mon.sh – aligned GPU dashboard for a passive-cooled Tesla
set -euo pipefail
# ── column headers & visible widths ────────────────────────────────────────
COLS=(TIME TEMP  PWR  UTIL FAN PWM MEM_USED/TOT PSTATE GPU_NAME           DRIVER)
WIDTH=(  8    6    7    5   6  11     14         6     20               11)
repeat() { printf '%*s' "$1" '' | tr ' ' '-'; }
# ── query nvidia-smi once for core metrics ─────────────────────────────────
FIELDS=temperature.gpu,power.draw,utilization.gpu,memory.used,memory.total,\
pstate,name,driver_version
row=$(nvidia-smi --query-gpu="$FIELDS" --format=csv,noheader,nounits | head -n1)
IFS=',' read -r TEMP POWER UTIL MEMUSED MEMTOT PSTATE GPU_NAME DRIVER <<<"$row" || true
for v in TEMP POWER UTIL MEMUSED MEMTOT PSTATE DRIVER; do
  declare "$v=${!v//[[:space:]]/}"
done
GPU_NAME=$(sed 's/^ *//;s/ *$//' <<<"$GPU_NAME")
# ── fan hub RPM & PWM (%) ---------------------------------------------------
FAN='--'; for f in /sys/class/hwmon/hwmon*/fan*_input; do
  [[ -r $f ]] || continue; rpm=$(<"$f")
  [[ $rpm =~ ^[0-9]+$ && $rpm -gt 0 ]] && { FAN=$rpm; break; }
done
PWM='--'; for p in /sys/class/hwmon/hwmon*/pwm1; do
  [[ -r $p ]] || continue; raw=$(<"$p")
  [[ $raw =~ ^[0-9]+$ ]] || continue; PWM="$raw($((raw*100/255))%)"; break
done
# ── assemble raw cell values ----------------------------------------------
VALS=(
  "$(date +%H:%M:%S)"
  "${TEMP:-?}°C"
  "${POWER:-?}W"
  "${UTIL:-?}%"
  "$FAN"
  "$PWM"
  "${MEMUSED:-?}/${MEMTOT:-?}"
  "$PSTATE"
  "$GPU_NAME"
  "$DRIVER"
)
# ── colour for TEMP (4 levels) ---------------------------------------------
c0='' c1=''
if [[ -t 1 && -z ${NO_COLOR-} ]]; then
  bold=$(tput bold); reset=$(tput sgr0)
  case ${TEMP:-0} in
    ''|[0-5][0-9])         c0=$(tput setaf 2) ;;                # green
    [6][0-9]|7[0-4])        c0=$(tput setaf 3) ;;                # yellow
    7[5-9]|8[0-4])          c0="${bold}$(tput setaf 3)" ;;       # orange-ish
    *)                      c0=$(tput setaf 1) ;;                # red
  esac
  c1="$reset"
fi
# ── draw header & borders ---------------------------------------------------
border='+' header='|'
for i in "${!COLS[@]}"; do
  border+="$(repeat $((WIDTH[i]+2)))+"
  printf -v cell "%-${WIDTH[i]}s" "${COLS[i]}"
  header+=" ${cell} |"
done
printf '%s\n%s\n%s\n' "$border" "$header" "$border"
# ── data row ---------------------------------------------------------------
printf '|'
for i in "${!VALS[@]}"; do
  [[ $i -eq 1 && -n $c0 ]] \
    && printf ' %b%-*s%b |' "$c0" "${WIDTH[i]}" "${VALS[i]}" "$c1" \
    || printf ' %-*s |'      "${WIDTH[i]}" "${VALS[i]}"
done
printf '\n%s\n' "$border"
chmod +x /your/data/gpu-mon/gpu-mon.sh
watch -n1 -c /your/data/gpu-mon/gpu-mon.sh
Buuuut, Temperatures are High Again
These fans were not enough. The temperature was a bit harder to spike, but it kept reaching 80+°C, while still being quite slow to come down.
Essentially, the entire setup was not usable.
The Attachable Custom 3D-Printed Fan
The next solution that I was going to try was this attachable 3D-printed case with a fan that sits on top of GPU. Argument is that if the single strong enough fan blows directly inside the GPU, the temperature would more easily come down.
It's not really an argument, it has been most battle tested solution based on online reviews.
First I ordered AVC BAZA laptop fan (link), but it's adapter was not compatible with Artic Fan Hub which required me to order another adapter for that or rewire. I gave up on that. $18 went to dust.
After some more browsing on AliExpress I found the entire 3D-printed case and the fan with appropriate - link . This was about $30.
And this worked!
[!IMPORTANT] The combination of attachable 3D-printed fan with the script for temperature controlling kept the temperature to max 60 °C!
The Closer
There is three things that I got from this project.
- Less money.
- A lot of knowledge.
- An AI workstation at home.
Less Money
Yes, it is a good idea to review how much money is spent. Remember people, tracking expense is important part of personal finances.
- Nvidia P40 Tesla GPU - $140
- Cooler Master Q300L Case- $50
- Asus H510M-R Motherboard - $40
- Cooler Master V750 750W 80+ GOLD PSU - $73
- Cooler Master MasterFan MF120 Halo White Edition Fans x3 - $25
- Dual PCIe 2x8P PCI-E 8Pin Female to ATX CPU 8 P-in Male - $8
- Noctua NF-F12 Industrial PPC-3000 PWM x3 - $45
- AVC BAZA Notebook Fan - $18
- Attachable Custom 3D-Printed Fan - $30
In total I've spent $429. Wow.
A lot of Knowledge
Those $429 are totally worth it for the knowledge I gained. To name a few:
- PC Assembly
- Powersupply pins purpose
- Fan Intake / Exhaust positioning and purpose
- Proxmox passthrough
- Nvidia Drivers on Linux
- Fan speed management on Linux
- GPU Maintenance
- Hosting of Local LLM
AI Workstation at Home
To be honest, this GPU will not give you the performance of ChatGPT. It is just an too old GPU.
However it will provide great performance for 7B model and decent performance for 14B models.
There is some light use cases that I use it nowadays but it is very limited.
In the end, the P40 didn’t just give me a GPU — it gave me a crash course in hardware, Linux, and patience.
It’s not a plug-and-play experience, but if you like challenges, the Tesla P40 is a surprisingly solid entry into the world of local AI computing.
Comments
No comments yet. Be the first to comment!
Add a Comment