Audience: New users and system administrators seeking a high-level overview of NeuralDrive.

Introduction

NeuralDrive is a specialized Linux distribution designed to turn any x86_64 computer into a high-performance, headless Large Language Model (LLM) inference server. By booting directly from a LiveUSB or LiveCD, you can deploy a complete AI stack—including GPU drivers, runtimes, and web interfaces—without modifying your existing operating system or performing complex manual installations.

How to Use This Documentation

This guide is structured to take you from initial hardware selection to advanced model management. Throughout the manual, you will encounter audience badges that indicate the technical depth of specific sections:

[User]: General concepts and web interface usage.
[Admin]: Network configuration, security settings, and hardware management.
[Developer]: API integration and custom image building.

If you are ready to begin, proceed directly to the Quick Start guide.

Version Note

This documentation covers NeuralDrive based on Debian 12 (Bookworm). It includes support for the latest stable releases of Ollama, Open WebUI, and major GPU compute stacks including NVIDIA CUDA 12.x and AMD ROCm 6.x.

Audience: Users and decision-makers looking for a conceptual overview of the NeuralDrive platform.

What is NeuralDrive

Running large language models locally often requires significant Linux expertise, including managing complex GPU driver installations, orchestrating runtimes like Ollama or llama.cpp, and configuring network security. NeuralDrive removes these barriers by providing a pre-configured, bootable environment that transforms any compatible hardware into a dedicated LLM server.

Overview

NeuralDrive is a Debian 12-based LiveCD/LiveUSB distribution that boots entirely into RAM. It provides a headless inference environment that is ready to use in less than two minutes. By automating hardware detection and driver loading, it ensures that your NVIDIA, AMD, or Intel GPU is immediately available for model acceleration.

Key Features

Rapid Deployment: Boot-to-inference in under two minutes.
Hardware Autoprobe: Automatic detection and configuration for NVIDIA (CUDA), AMD (ROCm), and Intel Arc (oneAPI) hardware.
Standardized API: Fully OpenAI-compatible API available at port 8443, allowing immediate connection with popular coding agents and tools.
Web-Based Management: Access a feature-rich dashboard via Open WebUI for model downloading and interactive chatting.
Local Management: A Python-based Textual TUI is available on the local console for system status and network configuration.
Persistence Support: Configurations and downloaded models survive reboots when using a USB drive with a designated "persistence" partition.
Customizable: Includes a toolkit for users to build their own customized system images.

Architecture

NeuralDrive uses a layered approach to ensure stability and performance across different hardware configurations.

+-------------------------------------------------------+
|                    User Interfaces                    |
|   (Open WebUI Dashboard :443 / TUI Console / API)     |
+-------------------------------------------------------+
|                    Security Layer                     |
|    (Caddy Reverse Proxy / nftables / Bearer Auth)     |
+-------------------------------------------------------+
|                    Runtime Stack                      |
|           (Ollama / llama.cpp advanced)               |
+-------------------------------------------------------+
|                    GPU Compute Layer                  |
|        (NVIDIA CUDA / AMD ROCm / Intel oneAPI)        |
+-------------------------------------------------------+
|                   Operating System                    |
|          (Debian 12 / SquashFS / OverlayFS)           |
+-------------------------------------------------------+
|                    Boot Media                         |
|             (LiveUSB / LiveCD / ISO)                  |
+-------------------------------------------------------+

Design Goals

The development of NeuralDrive is guided by specific performance and usability targets.

Priority	Goal	Measure
P0	Boot-to-inference <2min	First API response within 120s of power-on
P0	GPU auto-detection	NVIDIA, AMD, and Intel GPUs work without manual driver installs
P0	OpenAI-compatible API	Popular coding agents connect out of the box
P1	Multiple concurrent models	Load and unload different models without system restarts
P1	USB persistence	Downloaded models and system configs survive reboots
P2	Web dashboard	Full remote management via browser
P3	Custom image toolkit	Provide tools for users to build and sign their own images

Use Cases

NeuralDrive is designed for environments where privacy, simplicity, and performance are paramount.

Home Labs: Run private LLMs on consumer hardware without cluttering your primary OS.
Developer Workstations: Quickly spin up an inference server to test local AI-integrated applications.
Small Offices: Provide a shared, local AI resource for a small team over a local network.
Air-Gapped Environments: Deploy AI capabilities to systems with restricted or no internet access by pre-loading models onto the USB media.

What NeuralDrive is Not

To maintain its focus as a specialized inference tool, NeuralDrive excludes several common features found in other platforms:

Not a Cloud Service: NeuralDrive is local software that runs on your hardware; no data is sent to external servers unless you explicitly configure it.
Not Docker-Based: The entire stack runs directly on the OS for maximum performance and reduced overhead.
Not a Desktop OS: NeuralDrive is a headless server distribution. While it provides a web interface, it does not include a traditional desktop environment like GNOME or KDE.

Audience: Users planning hardware deployments for NeuralDrive inference servers.

Hardware Requirements

NeuralDrive is designed to run on a wide range of x86_64 hardware, but performance varies significantly depending on your CPU, system RAM, and GPU.

Minimum Requirements

The following specifications are suitable for running smaller models (up to 7B parameters) with Q4 quantization.

Component	Minimum Specification
CPU	x86_64 with AVX2 support
System RAM	8 GB
GPU	Optional; 6 GB VRAM recommended for acceleration
Storage	16 GB USB 3.0 flash drive

Recommended Specifications

For high-performance inference using larger models (13B to 70B parameters) or concurrent model loading, the following hardware is recommended.

Component	Recommended Specification
CPU	x86_64 with AVX-512 support
System RAM	32 GB – 64 GB
GPU	24 GB+ VRAM (NVIDIA RTX 3090/4090 or AMD RX 7900 XTX)
Storage	128 GB+ USB 3.0 or external SSD for model persistence

GPU Compatibility Matrix

NeuralDrive automatically detects and configures drivers for major GPU vendors.

Vendor	Generation	Example Hardware	Driver Type	Compute Stack	Status
NVIDIA	Ada Lovelace	RTX 4090	Proprietary 535+	CUDA 12.x	Supported
NVIDIA	Ampere	RTX 3060	Proprietary 535+	CUDA 12.x	Supported
NVIDIA	Pascal	GTX 1080	Proprietary 535+	CUDA 12.x	Supported
AMD	RDNA 3	RX 7900 XTX	amdgpu + ROCm	ROCm 6.x	Supported
AMD	RDNA 2	RX 6800 XT	amdgpu + ROCm	ROCm 6.x	Supported
Intel	Arc	A770	compute-runtime	oneAPI	Experimental
None	CPU	Any x86_64	N/A	AVX2/AVX-512	Supported

Model Size Cheat Sheet

Use this guide to determine if your hardware can support specific model sizes.

3B Models: 8 GB System RAM.
8B Models: 16 GB System RAM, 8 GB VRAM.
70B Models: 64 GB System RAM, 24 GB+ VRAM.

Important Hardware Notes

UEFI and Secure Boot

NeuralDrive supports both hybrid BIOS and UEFI boot modes. However, the proprietary NVIDIA drivers may require you to disable Secure Boot or enroll a Machine Owner Key (MOK) during the first boot. If the GPU is not detected on an NVIDIA system, verify your Secure Boot status in the BIOS/UEFI settings.

USB Media Selection

For the best experience, use a USB 3.0 or faster flash drive. If you plan to maintain a large library of high-parameter models, booting from an external SATA or NVMe SSD via a USB enclosure is strongly recommended for faster load times.

Audience: Users ready to install and deploy NeuralDrive for the first time.

Quick Start Guide

This guide describes how to flash NeuralDrive to a USB drive and start your first inference server.

Prerequisites

NeuralDrive ISO file.
A USB flash drive (16 GB or larger).
A target computer with x86_64 architecture and compatible GPU.

Step 1: Flash the USB Drive

The method depends on which operating system you are using to write the USB drive.

Linux — Use the automated flash script for the simplest experience:

sudo ./scripts/neuraldrive-flash.sh neuraldrive.iso /dev/sdX

This writes the image and creates the persistence partition in one step.

macOS — Use dd with macOS device paths:

diskutil list                                          # find your USB (e.g., /dev/disk4)
diskutil unmountDisk /dev/diskN
sudo dd if=neuraldrive.iso of=/dev/rdiskN bs=4m status=progress
diskutil eject /dev/diskN

Windows — Use Rufus or Balena Etcher to write the ISO to your USB drive.

Any platform — Balena Etcher provides a graphical interface that works on Linux, macOS, and Windows.

[!NOTE] On macOS and Windows, the persistence partition cannot be created during flashing. NeuralDrive will detect this on first boot and offer to set it up automatically. For full details on each method and persistence setup, see Writing the USB Drive.

Step 2: Boot from USB

Insert the USB drive into the target machine.
Power on the machine and access the BIOS/UEFI boot menu (typically by pressing F12, F11, or ESC).
Select the NeuralDrive USB device and press Enter.

Step 3: Complete First-Boot Wizard

NeuralDrive will automatically launch a seven-step wizard to configure your server.

Set the administrative password.
Generate your API key.
Configure Wi-Fi or Ethernet settings.
Set up the local storage and persistence partition.

Step 4: Record the IP Address

Once the wizard is complete, the local console (TUI) will display the system's IP address and mDNS hostname (default: neuraldrive.local). Note this address for remote access.

Step 5: Access the Web Dashboard

Open a web browser on a different computer on the same network.
Navigate to https://<IP-ADDRESS>/ (or https://neuraldrive.local/).
You will receive a self-signed certificate warning; accept it to proceed.

Step 6: Log In

Step 7: Pull a Model

Navigate to the model management section of the dashboard.
Enter the name of a model (e.g., llama3.1) and click Pull.
Once the download is complete, you can begin chatting or using the API.

[!TIP] Connecting a coding agent? See the Connecting Coding Agents guide for API details.

[!NOTE] Booting from CD? If you are using read-only media, see CD Mode vs USB Mode for details on RAM-only operation.

This chapter is for everyone.

USB Writing

NeuralDrive is distributed as a hybrid ISO image. This format allows the same file to be written to USB drives or burned to physical optical media. Because NeuralDrive runs as a live system, your choice of writing method determines whether your settings and models persist across reboots.

WARNING: Writing the NeuralDrive ISO to a USB drive or disk destroys all existing data on that device. Ensure you have backed up any important files before proceeding.

Recommended Method: Automated Flash Script

The most reliable way to create a NeuralDrive USB is using the provided neuraldrive-flash.sh script. This script automates the dd write process and the creation of the required persistence partition in a single step.

Locate the script in the scripts/ directory of the NeuralDrive repository.
Identify your USB device path (e.g., /dev/sdX or /dev/nvmeXn1).
Run the script with root privileges:

sudo ./scripts/neuraldrive-flash.sh neuraldrive.iso /dev/sdX

The script performs the following actions:

Writes the ISO image to the device using dd (bs=4M, conv=fsync).
Automatically executes prepare-usb.sh to configure the persistence layer.

Manual Writing Options

If you cannot use the automated script, choose one of the following manual methods based on your operating system.

Linux: Manual dd

Identify your USB device:
```
lsblk
```
Look for your USB drive by size. It will appear as /dev/sdX or /dev/nvmeXn1. Do not use a partition path like /dev/sdb1 — use the whole-disk device.
Unmount any mounted partitions on the device:
```
sudo umount /dev/sdX*
```

Write the ISO to the USB device:

sudo dd if=neuraldrive.iso of=/dev/sdX bs=4M conv=fsync status=progress

Initialize the persistence partition:
```
sudo /usr/lib/neuraldrive/prepare-usb.sh /dev/sdX
```
The prepare-usb.sh script creates an ext4 partition labeled "persistence" and writes the necessary persistence.conf file to enable union mounts.

macOS: dd with diskutil

macOS uses different device paths and a slightly different dd syntax.

Identify your USB device:
```
diskutil list
```
Look for your USB drive by size. It will appear as /dev/diskN (e.g., /dev/disk4). Do not use a partition path like /dev/disk4s1 — use the whole-disk device.
Unmount the USB drive (this does not eject it):
```
diskutil unmountDisk /dev/diskN
```
Write the ISO using the raw device (rdiskN) for significantly faster writes:
```
sudo dd if=neuraldrive.iso of=/dev/rdiskN bs=4m status=progress
```
Note: macOS dd uses lowercase 4m (not 4M), and conv=fsync is not supported. The raw device path /dev/rdiskN bypasses the buffer cache and is roughly 10x faster than /dev/diskN.
Eject the drive:
```
diskutil eject /dev/diskN
```
Persistence partition: The prepare-usb.sh script requires Linux tools (sfdisk, mkfs.ext4) and cannot run directly on macOS. To set up persistence, choose one of:
- Boot NeuralDrive first: Boot the USB on the target machine. On first boot, the system will detect the missing persistence partition and offer to create it.
- Use the Docker builder: Pass the USB device into the builder container and run the script there.
- Use any Linux machine: Mount the USB on a Linux system and run sudo /usr/lib/neuraldrive/prepare-usb.sh /dev/sdX.

Windows: Rufus

Rufus is a free, open-source tool for writing ISO images on Windows.

Download and run Rufus.
Under Device, select your USB drive.
Under Boot selection, click SELECT and choose the neuraldrive.iso file.
Set Partition scheme to GPT and Target system to UEFI.
Click START and wait for the write to complete.

Note: Rufus may offer to write in "ISO Image mode" or "DD Image mode." Either mode works. If you encounter boot issues, try DD Image mode.

Persistence partition: Rufus does not create the NeuralDrive persistence partition. After flashing, set up persistence using one of the methods described in the macOS section above (boot the target machine, use a Linux system, or use the Docker builder).

Cross-Platform GUI: Balena Etcher

Balena Etcher works on Linux, macOS, and Windows.

Download and install Balena Etcher.
Click Flash from file and select the neuraldrive.iso file.
Click Select target and choose your USB drive.
Click Flash and wait for the write and verification to complete.

Important: Balena Etcher does not create the persistence partition. Follow the persistence setup instructions for your platform described above.

Ventoy

NeuralDrive is compatible with Ventoy. Simply copy the neuraldrive.iso file to your Ventoy-enabled USB drive. Note that persistence setup via Ventoy may require additional manual configuration not covered by the standard prepare-usb.sh script.

Partition Layout

After a successful flash and persistence setup, your USB drive will have the following partition structure:

Partition	Label	Type	Filesystem	Size	Purpose
1	EFI	EFI System	FAT32	512 MiB	UEFI boot files
2	NBOOT	Linux	ext2	1 GiB	GRUB and kernel images
3	NSYSTEM	Linux	SquashFS	~8 GiB	Read-only root filesystem
4	persistence	Linux	ext4	Remaining	Persistent storage for models and settings

Verification

To verify your USB drive is ready:

Ensure the drive is recognized by your system.
Check that the "persistence" partition is present and labeled correctly.
Confirm that the drive is bootable in your system's UEFI settings.

Note on CD/DVD Burning

NeuralDrive can be burned to physical media using any standard ISO burning tool (such as Brasero on Linux or Disk Utility on macOS). Note that physical discs are read-only; persistence features will not be available, and all data will be stored in RAM.

For instructions on what to do after your USB is ready, see First Boot.

This chapter is for everyone.

First Boot

Setting up NeuralDrive for the first time requires a local keyboard and monitor. Once the initial configuration is complete, you can manage the system entirely over your network.

Booting NeuralDrive

Insert your NeuralDrive USB and power on your hardware. Access your system's boot menu (usually via F12, F11, or Esc) and select the USB drive. You will see the GRUB boot menu with several options:

NeuralDrive (Normal): The standard boot mode. Enables data persistence and hardware acceleration via nvidia-drm.modeset=1.
NeuralDrive (Safe Mode): Use this if you encounter display or boot errors. This mode disables modesetting (nomodeset), advanced interrupt controllers (noapic), and persistence.
NeuralDrive (CD Mode - RAM Only): Loads the entire system into memory (toram). Useful for testing or when using physical optical media where persistence is not possible.
NeuralDrive (Debug): Provides detailed boot information by setting systemd.log_level=debug. Use this when troubleshooting startup failures.

Boot Sequence Overview

NeuralDrive follows a structured startup process:

GRUB: Loads the initial bootloader and kernel parameters.
live-boot: Initializes the Debian live environment and mounts the persistent partition.
systemd: Starts core system services.
GPU Detect: Automatically identifies NVIDIA, AMD, or Intel hardware and loads appropriate drivers.
Services: Launches the internal API, system monitor, and web interface.
TUI: Displays the final status screen.

Once the boot process is complete, the console will display your system's IP address: NeuralDrive is ready! Dashboard: https://192.168.x.x/

First-Boot Wizard

If the system has not been initialized, a Text User Interface (TUI) wizard will start automatically. The wizard runs as part of the TUI application, checking for a sentinel file on startup. You must complete these six steps to prepare your server:

Welcome: Introductory screen with hardware summary and system health check.
Storage/Persistence: Detects your USB boot device and creates an ext4 persistence partition on unused space. This step also creates required directories under /var/lib/neuraldrive/ (ollama, models, config, webui, logs).
Security: Sets the administrator password and configures system credentials.
Network: Configure your network connection, including Wi-Fi (if applicable) and IP assignment (DHCP or static).
Models: Select initial LLM models to download based on your hardware capabilities.
Done: Final completion summary and display of system credentials.

Re-running the Wizard

If you need to reset your configuration, run neuraldrive-tui --wizard from the console. This command removes the sentinel file and forces the wizard to run again on the next TUI launch.

Write Down Your Credentials

At the end of the wizard, your final credentials and the dashboard URL will be displayed. Record these immediately, as they are required for logging into the web dashboard.

System Initialization Files

NeuralDrive uses a sentinel file to track its state:

/etc/neuraldrive/first-boot-complete: Confirms the user setup wizard has been finished.

Once this file is present, the system will boot directly to the ready state.

Next step: Web Dashboard

This chapter is for everyone.

Web Dashboard

NeuralDrive's web-based interface consists of two applications: the main Chat Dashboard and the administrative System Panel. Both are accessible through your local network.

Accessing the Dashboard

Once your server is booted and configured, navigate to the following URL in your web browser: https://<SERVER_IP>/ or https://neuraldrive.local/

NeuralDrive uses a secure Caddy web server to proxy all traffic over port 443. Login using the administrator credentials created during the First Boot wizard.

Chat Dashboard (Open WebUI)

The primary interface for NeuralDrive is Open WebUI. It provides a robust environment for managing and interacting with large language models. Key features include:

Model Management: Download, update, and switch between various models.
RAG Integration: Upload documents to use as context for your conversations.
Multi-user Support: Admins can create and manage additional user accounts. Registration is disabled by default for security (ENABLE_SIGNUP=false).
Custom Branding: The interface is pre-configured with NeuralDrive branding and dark mode.

System Panel

For hardware monitoring and service management, NeuralDrive includes a custom System Panel. This FastAPI application runs separately from the chat interface.

Main System Panel: https://<SERVER_IP>:8443/system/
GPU Monitoring: https://<SERVER_IP>:8443/monitor/

The System Panel provides real-time data on your hardware performance, thermal status, and running services. This separation ensures that even if a heavy inference task impacts the Chat Dashboard, you can still monitor your hardware health.

Two-Application Architecture

NeuralDrive uses a dual-app architecture to balance user interaction and system reliability:

Open WebUI: Dedicated to chat, models, and user management.
System Panel: Dedicated to low-level hardware monitoring and service health.

This design allows for maximum uptime and precise control over your local inference environment.

Next step: Chat Interface

This chapter is for everyone.

Chat Interface

The chat interface is your primary tool for interacting with the AI models running on your NeuralDrive server. It is built to support fluid, real-time conversations.

Starting a New Conversation

To begin a new chat session, click the "New Chat" button located in the sidebar. This opens a fresh workspace where you can enter prompts and receive responses.

Selecting a Model

NeuralDrive allows you to choose which model handles your query.

Locate the model dropdown menu at the top of the chat interface.
Select your desired model from the list of installed options.
You can switch models mid-conversation without needing to reload the page or start over.

Streaming Responses

NeuralDrive features real-time token streaming. As the AI generates a response, the text will appear on your screen immediately. This provides a dynamic experience and allows you to begin reading the output before the complete response is generated.

Conversation History

Your chat history is automatically saved to your NeuralDrive's persistent storage.

Storage Location: All session data is stored in /var/lib/neuraldrive/webui/.
Retrieval: You can access previous conversations from the sidebar at any time.
Concurrent Chats: NeuralDrive supports multiple simultaneous chat sessions, allowing you to manage different tasks or projects independently.

The interface is optimized for high performance, ensuring that even with multiple active sessions, your interaction remains responsive.

For more information on the broader system management, return to the Web Dashboard chapter.

This chapter is for everyone.

Model Management via Web

The Open WebUI provides a rich, graphical interface for managing your LLMs. This is the recommended method for most users who are connected to the NeuralDrive network.

Accessing the Models Page

Once logged into the Open WebUI, you can find the model management tools in the settings or administration area, typically under a Models tab.

Viewing Downloaded Models

The Models page displays a list of all models currently stored on your NeuralDrive. For each model, you can see:

Name and Tag: (e.g., llama3.1:8b)
Size: The disk space occupied by the model.
Quantization: The precision level of the weights.
Last Used: When the model was last loaded for a conversation.

Pulling New Models

To download a new model from the official registry:

Locate the input field titled "Pull a model from Ollama.com".
Enter the model string (e.g., mistral:7b).
Click the download/pull button.
A progress bar will appear. You can navigate away from the page, and the download will continue in the background.

Deleting Models

If you need to free up storage space:

Find the model you wish to remove in the list.
Click the trash can or delete icon associated with that model.
Confirm the deletion when prompted.

Note that deleting a model from the web UI is permanent and removes the files from the /var/lib/neuraldrive/models/ directory.

Model Details and Customization

Clicking on an individual model in the list allows you to view more detailed metadata, including the Modelfile used to create it. Advanced users can use this interface to create "Model Files" which are customized versions of base models with specific system prompts or parameters (like temperature and top-k) pre-configured.

This chapter assumes familiarity with RAG concepts.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a technique that allows an LLM to access and reference specific information from your own documents during a conversation. This effectively gives the model "long-term memory" and access to data it wasn't originally trained on.

How RAG Works in NeuralDrive

When you upload a document, NeuralDrive processes it through several steps:

Parsing: The text is extracted from the file (PDF, TXT, DOCX, etc.).
Chunking: The text is broken down into smaller, manageable pieces.
Embedding: Each chunk is converted into a numerical vector that represents its semantic meaning.
Storage: These vectors are stored in a local vector database on your persistent partition.

When you ask a question in a RAG-enabled chat, the system searches the vector database for chunks that are mathematically similar to your query and provides them to the LLM as context.

Using RAG in Conversations

To use your documents in a chat:

Upload Documents: Use the Documents page in the Open WebUI to upload your files.
Select for Chat: In the chat interface, you can select specific documents or entire collections to be used as context for your current session.
Querying: Simply type your question. The model will analyze the provided document context and generate a response based on those facts.

Limitations on LiveUSB

While RAG is a powerful feature, there are important considerations when running from a LiveUSB:

Processing Power: Embedding large documents is a CPU and GPU intensive task. Processing a hundreds-of-pages PDF may take several minutes.
Persistence: Ensure your documents are stored on the persistence partition if you want them to remain available after a reboot.
Storage Space: Vector databases can grow significantly in size. Monitor your Storage Management closely if you plan to index large libraries of documents.

This chapter is for everyone.

Terminal Interface (TUI)

The NeuralDrive Terminal User Interface (TUI) provides a keyboard-driven dashboard for system monitoring and management. Built using Python 3.11 and the Textual framework, it offers a responsive and high-performance way to interact with your system directly from the console.

When to Use the TUI

While the web interface is excellent for daily chat and document management, the TUI is better suited for:

Initial Configuration: Checking network status and IP addresses.
System Monitoring: Real-time tracking of CPU, GPU, and VRAM usage.
Troubleshooting: Reviewing system logs and restarting services if they become unresponsive.
Offline Use: Managing models and chatting without needing a second device to access the web UI.

Accessing the TUI

By default, the TUI auto-launches on tty1 for the neuraldrive-admin user. If you are at the physical console of the NeuralDrive machine, it should be the first thing you see after boot.

To run it manually from any shell session, use the command:

neuraldrive-tui

This launcher script (installed at /usr/local/bin/neuraldrive-tui) activates the Python virtual environment and runs the application from /usr/lib/neuraldrive/tui/main.py.

Main Dashboard Layout

The main dashboard provides a high-level overview of system health and active models.

┌──────────────── NeuralDrive v1.0.0 ───────────────── 10:45:22 ─┐
│ Host: neuraldrive.local    │ Uptime: 2h 15m │ IP: 192.168.1.50 │
├────────────────────────────────────────────────────────────────┤
│ GPU: NVIDIA RTX 4090  │ VRAM: 12.4/24.0 GB │ Temp: 65°C │ 85%  │
│ CPU: 12%              │ RAM: 18.2/64.0 GB  │ Disk: 45.2 GB     │
├────────────────────────────────────────────────────────────────┤
│ LOADED MODELS                                                  │
│ ● llama3.1:8b        [GPU] 4.7 GB                              │
│ ● codestral:latest   [GPU] 8.2 GB                              │
│ ○ phi3:mini           ---  (not loaded)                        │
├────────────────────────────────────────────────────────────────┤
│ F1 Dashboard  F2 Models  F3 Services  F4 Logs  F5 Chat  Q Quit │
└────────────────────────────────────────────────────────────────┘

Navigation is performed using function keys:

F1: Dashboard overview.
F2: Models screen for managing downloads and loading state.
F3: Services screen for restarting or stopping system components.
F4: Logs screen for real-time system and service logs.
F5: Chat screen for a lightweight, terminal-based LLM chat.
Q: Quit the TUI and return to the shell.

Within each screen, the following navigation model is used:

Tab / Shift+Tab: Cycle focus between different screen zones.
Arrow Keys: Navigate within a focused zone (e.g., scrolling a list).
Enter: Activate the currently focused element or button.

Troubleshooting and Debugging

If the TUI encounters a critical error, it will write a crash dump to /var/lib/neuraldrive/logs/tui-crash-*.log.

Screenshots captured within the TUI are saved to /var/lib/neuraldrive/screenshots/.

Re-running the First-Boot Wizard

If you need to force the first-boot wizard to run again, launch the TUI with the --wizard flag:

neuraldrive-tui --wizard

This removes the sentinel file and initiates the guided setup process.

Resilience

The TUI is designed to be resilient. If the underlying Ollama service is unavailable, an "Ollama Offline" badge will appear on the dashboard, and certain model management features will be disabled until the service is restored via the Services screen.

This chapter is for everyone.

TUI Dashboard

The Dashboard is the central monitoring hub of NeuralDrive. It is designed to provide immediate visual feedback on the health and workload of your system.

Layout and Sections

The dashboard is divided into three functional areas:

1. Header and System Info

Located at the very top, this section displays the version of NeuralDrive, the current hostname, system uptime, and the primary IP address. A live system clock is displayed in the upper-right corner, showing the exact time of the last data refresh.

2. Hardware Resource Monitor

This section provides real-time metrics for your hardware.

GPU: Displays the detected GPU model name, total VRAM capacity, driver version, and CUDA compute capability.
VRAM: Shows the current VRAM usage (e.g., 12.4/24.0 GB) and a percentage bar.
Temp: Current GPU temperature in Celsius.
CPU: Real-time CPU utilization percentage.
RAM: System memory usage (e.g., 18.2/64.0 GB).
Disk: Total disk space used on the persistence partition (e.g., 45.2 GB).
Refresh Rate: Hardware metrics refresh every 2 seconds. You can press R at any time to trigger a manual refresh of all dashboard data.

3. Loaded Models List

This list displays the models currently residing in memory and ready for immediate inference.

Status Indicator: A solid circle (●) indicates the model is currently loaded in memory. An open circle (○) indicates the model is cached on disk but not currently loaded.
Backend: Shows if the model is running on the [GPU] or [CPU].
VRAM Footprint: The amount of memory the model is currently occupying.
Refresh Rate: The model list metrics refresh every 10 seconds.

Interaction

The Dashboard is accessed via F1 from any other screen. While it is primarily for monitoring, you can transition to other management screens using the function keys (F2-F5) shown at the bottom. Use the R key to manually refresh the displayed information.

This chapter is for everyone.

Model Management via TUI

The Model Management screen allows you to download, load, unload, and delete LLMs directly from the terminal.

Access

Press F2 from any screen to access Model Management.

Layout

The screen is organized into three zones that you can navigate between using Tab or Shift+Tab:

Installed Models list (top zone): A scrollable list of models currently on your system.
Browse Catalog button (middle zone): Opens a popup to browse the Ollama library.
Pull by name (bottom zone): A text input field and a Pull button for direct model downloads.

Installed Models List

Model name: The name of the model (e.g., llama3:8b).
Params: Parameter count of the model.
Quant: Quantization level.
Disk: Space occupied on disk.
VRAM: Measured or cached VRAM usage (e.g., "6.2 GB" or "~6.2 GB").
Status: Current state of the model ("loaded (GPU)", "loaded (CPU)", or "ready").

Tab / Shift+Tab: Cycle focus between the three zones (models → browse → pull-input → pull-btn).
Up / Down arrows: Navigate through the installed model list. The view scrolls automatically to follow your focus.
Left / Right arrows: Navigate between the action buttons (Load/Unload/Delete) for the currently selected model. The cursor automatically skips disabled buttons.
Enter: Activate the focused button or zone.
PageUp / PageDown: Fast scroll through the model list.

Model Actions

Each model has specific action buttons:

Load: Loads the model into VRAM for inference. The status will show "Loading..." while in progress. Loaded models use a keep_alive: -1 setting for infinite retention.
Unload: Removes the model from VRAM. The system polls the engine until the unload is confirmed.
Delete: Permanently removes the model from disk. A confirmation prompt will appear before deletion.

Downloading Models

Browse Catalog

Selecting the Browse Catalog button opens a scrollable popup listing popular models from the Ollama library. Select a model from the list and confirm to start the download.

Pull by Name

To download a specific model, type its name (e.g., llama3:8b) into the text input field in the bottom zone and press Enter or click the Pull button.

A progress bar will show the download status. You can press Escape or the Cancel button to abort an active download.

VRAM Management

VRAM usage values are measured during operation or retrieved from a cache stored in /var/lib/neuraldrive/config/.

Multiple models can be installed and loaded simultaneously. The underlying engine manages VRAM using an LRU (Least Recently Used) eviction policy when the OLLAMA_MAX_LOADED_MODELS setting is set to auto.

This chapter is for everyone.

Service Control

The Services screen provides a centralized interface for managing the background system processes that power NeuralDrive.

Access

Press F3 from any screen to access Service Control.

Layout

The screen displays a scrollable list of services. Each service is represented by a ServiceItem widget showing the service name, its current status, and a set of action buttons.

Services Managed

The TUI allows you to manage critical NeuralDrive components, including:

neuraldrive-ollama: The core model execution engine.
neuraldrive-webui: The browser-based user interface.
Any other configured system services specific to the NeuralDrive distribution.

Up / Down arrows: Navigate between the different services in the list.
Left / Right arrows: Navigate between the action buttons (Start/Stop/Restart) for the currently selected service. The focus will automatically skip buttons that are disabled based on the service's current state.
Enter: Activate the focused action button.

Action Buttons

Each service has three colored action buttons that enable or disable dynamically:

Start (green): Starts a service that is currently stopped or inactive.
Stop (red): Gracefully shuts down a running service.
Restart (amber): Stops and immediately restarts a running service. This is often the quickest way to resolve minor connectivity or performance issues.

Auto-Refresh and Monitoring

The status of all services is automatically polled every 5 seconds to ensure the interface reflects the actual state of the system.

If a service like neuraldrive-ollama shows a failed or inactive status, use the action buttons to restore it. Continuous monitoring and manual control through this screen help maintain the overall health of your NeuralDrive instance.

This chapter is for everyone.

System Logs

The System Logs screen provides a real-time view of internal events and service outputs. This is an essential tool for diagnosing issues with model loading, network connectivity, or user authentication. Access this screen by pressing L from the main dashboard.

Logs Screen Interface

┌──────────────── System Logs ─────────────────────────────────┐
│ Service: [All Services] │ Level: [Info+] │ Search: [        ]│
├──────────────────────────────────────────────────────────────┤
│ 10:15:30 [OLLAMA] Loaded llama3.1:8b successfully            │
│ 10:15:35 [WEBUI] Admin user logged in from 192.168.1.5       │
│ 10:16:12 [OLLAMA] Error: Connection closed by remote peer    │
├──────────────────────────────────────────────────────────────┤
│ [S]elect Service  [F]ilter  [C]lear  [B]ack                 │
└──────────────────────────────────────────────────────────────┘

The logs can be voluminous, so the TUI provides several ways to narrow down the information:

S (Select Service): Cycles through specific services (e.g., OLLAMA, WEBUI, SYSTEM) or displays all services combined.
F (Filter): Adjusts the minimum severity level of logs displayed (e.g., Info, Warning, Error).
Search: Allows you to type a keyword to highlight or filter lines containing that string.
C (Clear): Clears the current view of logs. Note that this does not delete the logs from the disk, only from the current TUI session.

Understanding Log Entries

Each log entry includes a timestamp, the source service name in brackets, and the message content.

[OLLAMA]: Records model pull progress, loading/unloading events, and inference errors.
[WEBUI]: Tracks user logins, API requests, and document processing events.
[SYSTEM]: General OS-level events, including disk space warnings and network changes.

If you encounter an error you don't understand, the specific message provided in the logs is the most useful piece of information to include in a support request or bug report.

Press B or Back to return to the main dashboard.

This chapter is for everyone.

Local Terminal Chat

For quick testing and offline interaction, NeuralDrive includes a lightweight, terminal-based chat interface. This allows you to communicate with your local models without needing a web browser or a network connection.

Launching the Chat

Access the local chat by pressing F5 from any screen.

Using the Chat Interface

Model Selection: A model selector dropdown at the top of the screen lets you choose which installed model to chat with. The selected model persists even when switching away and returning to the chat screen.
Messaging: Type your message into the input field at the bottom of the screen. Press Enter to send.
Streaming Responses: The model's response will stream directly into the terminal window in real-time.
Keyboard Shortcuts:
- F1-F4: Switch to another TUI screen (Dashboard, Models, Services, or Logs).
- Ctrl+C: Interrupt the current response generation.

Features and Limitations

The TUI chat is designed for simplicity and speed. You must have at least one model downloaded and loaded to use the chat interface.

Fast & Lightweight: Minimal resource overhead compared to the full web UI.
Persistent Context: The chat maintains a basic conversation history within the current session, allowing for follow-up questions.
Streaming: Responses appear as they are generated, providing immediate feedback.

Limitations compared to the Web UI:

No Multimedia: Does not support images, file uploads, or complex markdown rendering.
Single Session: Conversation history is not saved across TUI restarts.
No RAG: The local chat cannot access your uploaded documents; for Retrieval-Augmented Generation, use the Web Interface or refer to the RAG chapter.

This chapter is for everyone.

Understanding Models

Large Language Models (LLMs) are the core engines that power NeuralDrive. These models are probabilistic systems trained on vast datasets to predict and generate human-like text, code, and reasoning. NeuralDrive uses Ollama to manage these models, providing a streamlined experience for running them locally without external dependencies.

Model Formats and GGUF

NeuralDrive exclusively uses the GGUF (GPT-Generated Unified Format) format for model storage and execution. GGUF is designed for efficient loading and performance on both CPUs and GPUs. It packs the model weights, configuration, and vocabulary into a single file, making it highly portable.

Parameter Counts

The "size" of a model is often described by its parameter count, typically denoted in billions (B). Parameters are the internal variables the model learned during training.

3B (Small): Extremely fast and lightweight. Ideal for basic text processing, classification, or running on hardware with limited VRAM (6GB or less). Examples: phi3:mini, qwen2.5:3b.
8B (Medium): The current sweet spot for local deployment. Offers a strong balance of reasoning capability and speed. Fits comfortably on 8GB-12GB VRAM cards. Examples: llama3.1:8b.
13B (Large): Provides deeper reasoning and better instruction following. Requires 12GB+ VRAM for optimal performance.
70B (Very Large): Top-tier performance comparable to many commercial cloud models. Requires significant hardware resources (24GB+ VRAM for quantized versions). Examples: llama3.1:70b.

Quantization

Quantization is a compression technique that reduces the precision of model weights (e.g., from 16-bit floats to 4-bit integers). This significantly lowers the VRAM and storage requirements with minimal impact on output quality.

NeuralDrive supports several quantization levels:

Q4_K_M: The standard recommendation. It offers the smallest file size and fastest inference while maintaining high accuracy.
Q5_K_M: A balanced option for users who want slightly higher quality than Q4 without the massive overhead of Q8.
Q8_0: The highest quality available, preserving almost all original model precision. It requires much more VRAM and storage.

Naming Convention

Models in NeuralDrive follow a specific naming string: <name>:<tag>.

Name: The base model family (e.g., llama3.1, codestral).
Tag: Specifies the version or size (e.g., 8b, latest, mini).

When you pull a model like llama3.1:8b, NeuralDrive downloads the specific version optimized for local execution.

Size vs Resources

A model's resource consumption is determined by its parameter count and quantization level. A larger parameter count requires more VRAM to "fit" the model for processing, while higher quantization levels increase the memory footprint of a model with the same parameter count.

Before downloading, check Model Recommendations to ensure your hardware can support your desired model.

This chapter is for everyone.

Downloading Models

NeuralDrive provides four primary methods for downloading and managing LLMs. Regardless of the method used, all models are stored in a central location on the persistent partition of your drive.

1. Web Interface (Open WebUI)

The most user-friendly method is through the Open WebUI.

Navigate to the Models page in the web interface.
Enter the name of the model you wish to pull (e.g., llama3.1:8b).
Click the download icon.
You can monitor the download progress directly in the web UI.

2. Terminal User Interface (TUI)

The TUI provides a fast, keyboard-driven way to manage models without opening a browser.

From the main dashboard, press M to enter the Models screen.
Press P to initiate a Pull Model command.
Enter the model string and press Enter.
The TUI displays a progress bar and allows you to cancel if needed.

3. Command Line Interface (CLI)

For power users and automated scripts, you can use the Ollama CLI directly from any terminal session.

ollama pull <model_name>

Example:

ollama pull qwen2.5:3b

4. API Request

NeuralDrive's underlying Ollama service exposes an API that can be used to programmatically trigger downloads.

curl -X POST http://localhost:11434/api/pull -d '{"name": "phi3:mini"}'

Storage Location and Progress

All model data is stored in the following path on the persistence partition: /var/lib/neuraldrive/models/

Specifically:

Blobs: Actual model weights are stored in /var/lib/neuraldrive/models/blobs/.
Manifests: Metadata about the models is stored in /var/lib/neuraldrive/models/manifests/.

When a download is in progress, NeuralDrive tracks the state and prevents simultaneous downloads of the same model. If a download is interrupted, it can typically be resumed by initiating the pull command again. Ensure you have sufficient disk space before starting a large download by checking the Storage Management guide.

This chapter is for everyone.

Model Recommendations

Selecting the right model depends on your hardware capabilities, specifically your Video RAM (VRAM) and CPU performance. Running a model that exceeds your hardware limits will result in extremely slow response times or failure to load.

VRAM Requirements

The most critical factor for performance is the amount of VRAM available on your GPU. The following table provides recommendations for models based on standard VRAM tiers.

VRAM	Recommended Models
6 GB	`qwen2.5:3b`, `phi3:mini`
8 GB	`llama3.1:8b`
12 GB	`codestral:latest`
24 GB+	`llama3.1:70b` (Q4_K_M)

CPU-Only Execution

If your system lacks a compatible GPU, NeuralDrive can run models on the CPU. While this is significantly slower, it is still functional for many tasks.

Minimum: AVX2 support is required.
Preferred: AVX-512 support provides a noticeable speed boost for CPU inference.
Recommendation: Stick to smaller models (3B or 8B) for a better experience when running on CPU only.

Concurrent Models

NeuralDrive allows multiple models to be loaded into memory simultaneously, provided there is enough VRAM. This is managed by Ollama using several environment variables:

OLLAMA_MAX_LOADED_MODELS: Defines the maximum number of models kept in memory. The default is 0 (auto), which allows Ollama to manage loading based on available VRAM.
OLLAMA_NUM_PARALLEL: Determines how many concurrent requests can be handled.
OLLAMA_KEEP_ALIVE: Sets how long a model stays in memory after the last request before being evicted.

NeuralDrive uses a Least Recently Used (LRU) eviction policy. If you attempt to load a new model and VRAM is full, Ollama handles eviction automatically to make room for the new request.

Model Catalog

For a curated list of models tested and recommended for NeuralDrive, you can inspect the system's model catalog located at: /etc/neuraldrive/neuraldrive-models.yaml

This file contains recommendations optimized for the NeuralDrive environment. You can also view these recommendations in the Open WebUI Models page or the TUI Models screen.

This chapter is for everyone.

Storage Management

Because LLMs can be several gigabytes in size, managing disk space is essential for a stable NeuralDrive experience. All models are stored on the persistent partition of your drive.

Checking Available Space

You can monitor your disk usage through three primary interfaces:

TUI Dashboard: The main screen shows current disk usage as a percentage and in GB.
Web System Panel: The Open WebUI administration area provides a graphical view of storage consumption.
Command Line: You can run the following command from any terminal to see exactly how much space remains on the persistence partition:
```
df -h /var/lib/neuraldrive
```

Storage Thresholds

NeuralDrive monitors storage levels and will alert you when space is running low.

80% Usage (Warning): A warning badge appears in the TUI and web interface. You should consider deleting unused models.
90% Usage (Critical): System performance may degrade. Urgent action is required to free space.
95% Usage (Blocked): Downloads are automatically blocked to prevent the system from becoming unresponsive.

Deleting Models

If you need to free up space, you can delete models that are no longer in use.

TUI: Navigate to the Models screen (M), highlight a model, and press D to delete it.
Web UI: Go to the Models page and use the delete icon next to the model name.
CLI: Run ollama rm <model_name>.

Deleting a model removes both its weights (blobs) and metadata (manifests) from /var/lib/neuraldrive/models/.

Hardware Recommendations for Storage

For the best experience, we recommend using a USB drive with at least 128GB of capacity. This allows you to store multiple large models (like llama3.1:8b and codestral:latest) while still having room for logs and document storage.

If you find that 128GB is insufficient, you can reinstall NeuralDrive on a larger drive or a high-speed external SSD for even more storage capacity.

This chapter assumes Linux system administration experience.

Pre-loading Models

For large-scale deployments or specialized air-gapped environments, you may want to distribute NeuralDrive images that already contain specific models. This avoids the need for users to download gigabytes of data on first boot.

Why Pre-load?

Zero-Setup Experience: Users can start chatting immediately without internet access.
Consistent Environment: Ensures every instance of your custom NeuralDrive image has the exact same model versions.
Reduced Bandwidth: Saves significant time and network resources during mass deployments.

Build Configuration

Pre-loading is managed through the neuraldrive-build.yaml configuration file used during the image creation process. You can specify a list of models to be included in the models.preload section.

models:
  preload:
    - llama3.1:8b
    - phi3:mini
    - qwen2.5:3b

Two-Phase Build Approach

The current build system uses a two-phase approach to ensure models are correctly staged and compressed within the squashfs filesystem of the LiveUSB.

Phase 1: Environment Staging. The build system sets up the base operating system and installs the Ollama service.
Phase 2: Model Injection. In this phase, the build system pulls the requested models from the official Ollama registry into a temporary staging directory.

Currently, this injection is a manual staging step during the development of custom images. You must ensure that the staging environment has internet access to perform the initial ollama pull commands before the final image is wrapped.

Model Persistence

It is important to note that models pre-loaded into the base image are stored in the read-only section of the drive. While they are available for use immediately, any new models downloaded by the user after booting will be stored in the /var/lib/neuraldrive/models/ directory on the persistence partition.

For more information on customizing your NeuralDrive build, refer to the Custom Images documentation.

This chapter assumes familiarity with REST APIs.

API Overview

NeuralDrive exposes a comprehensive set of APIs to facilitate integration with external tools, coding agents, and custom scripts. By default, all API traffic is served over TLS on a unified public port.

Base URLs and Ports

The primary entry point for all external communication is port 8443. This port is managed by a Caddy reverse proxy that handles TLS termination and directs traffic to the appropriate internal services.

API Type	Path Prefix	Internal Port	Description
OpenAI Compatible	`/v1/`	11434	Compatible with standard OpenAI SDKs and integrations.
Ollama Native	`/api/`	11434	Direct access to native Ollama features and model management.
System Management	`/system/`	3001	NeuralDrive-specific administrative and management operations.

The base URL for most integrations is: https://neuraldrive.local:8443

Authentication

NeuralDrive uses Bearer token authentication for all API requests. You must include your API key in the Authorization header.

Header Format: Authorization: Bearer <API_KEY>

Key Format: API keys follow the pattern nd-xxxxxxxxxxxxxxxx. You can find your key in /etc/neuraldrive/api.key or rotate it using the System Management API. Refer to the API Keys chapter for details on management and rotation.

TLS and Security

All external connections require TLS. NeuralDrive generates a self-signed certificate on first boot with Subject Alternative Names (SAN) for neuraldrive.local, the local hostname, and the detected IP address.

To establish a secure connection, clients should trust the NeuralDrive Certificate Authority (CA). You can download the CA certificate via SCP from /etc/neuraldrive/tls/neuraldrive-ca.crt or through the /system/ca-cert endpoint. Detailed installation steps are available in the TLS Trust chapter.

Rate Limits and Timeouts

To ensure system stability, NeuralDrive enforces the following limits:

Rate Limit: 100 requests per minute per source IP address.
Generation Timeout: 600 seconds (10 minutes) for inference tasks.
Management Timeout: 30 seconds for administrative operations via the System API.

Internal Access

While port 8443 is the recommended public interface, services are also available on internal ports for local debugging or specialized networking configurations:

Ollama: 11434
WebUI: 3000
System API: 3001

Note that these internal ports typically do not have the same TLS or authentication protections as the public 8443 interface.

This chapter assumes familiarity with REST APIs.

Coding Agents

NeuralDrive is designed to provide high-performance local inference for coding agents and IDE integrations. By following a standard OpenAI-compatible pattern, you can connect your favorite development tools to NeuralDrive with minimal configuration.

Generic Configuration Pattern

Most coding agents use a common set of parameters for OpenAI-compatible providers:

Provider: OpenAI Compatible (or Custom)
Base URL: https://neuraldrive.local:8443/v1
API Key: nd-xxxxxxxxxxxxxxxxxxxx
Model Name: e.g., llama3.1:8b, codestral, or starcoder2

Important Note on TLS Trust

Because NeuralDrive uses a self-signed certificate, most agents will fail to connect unless you explicitly trust the NeuralDrive CA. Before configuring your agent, ensure you have followed the steps in the TLS Trust chapter to install the certificate on your local machine.

Cursor

To use NeuralDrive with Cursor:

Open Cursor Settings > Models.
In the OpenAI API section, toggle the switch to Enabled.
Click Override OpenAI Base URL and enter: https://neuraldrive.local:8443/v1
Enter your NeuralDrive API key in the API Key field.
Under Models, add your desired model names (e.g., llama3.1:8b).
Ensure other OpenAI models are disabled if you wish to force local inference.

Continue

For the Continue VS Code or JetBrains extension, edit your config.json:

{
  "models": [
    {
      "title": "NeuralDrive",
      "provider": "openai",
      "baseUrl": "https://neuraldrive.local:8443/v1",
      "apiKey": "nd-xxxxxxxxxxxxxxxxxxxx",
      "model": "llama3.1:8b"
    }
  ]
}

If you experience TLS verification errors, you may need to set the NODE_EXTRA_CA_CERTS environment variable to point to your neuraldrive-ca.crt file before launching your IDE.

Aider

Aider supports OpenAI-compatible endpoints through environment variables. Use the following command to start Aider with NeuralDrive:

export OPENAI_API_BASE=https://neuraldrive.local:8443/v1
export OPENAI_API_KEY=nd-xxxxxxxxxxxxxxxxxxxx
export REQUESTS_CA_BUNDLE=/path/to/neuraldrive-ca.crt

aider --model openai/llama3.1:8b

By setting REQUESTS_CA_BUNDLE, you ensure that Aider's underlying Python libraries trust the self-signed certificate.

Open Interpreter

Open Interpreter can be configured to use NeuralDrive by specifying the base URL and model:

export SSL_CERT_FILE=/path/to/neuraldrive-ca.crt

interpreter --model openai/llama3.1:8b \
            --api_base https://neuraldrive.local:8443/v1 \
            --api_key nd-xxxxxxxxxxxxxxxxxxxx

Using --api_base directs traffic to NeuralDrive, while SSL_CERT_FILE handles the TLS verification requirements for the Python environment.

This chapter assumes familiarity with Python.

Python SDK

NeuralDrive provides a seamless integration path for Python developers by maintaining compatibility with the official OpenAI Python library. This allows you to use familiar patterns while running inference entirely on local hardware.

Installation

To get started, install the openai and httpx libraries:

pip install openai httpx

Initializing the Client

Since NeuralDrive uses a self-signed certificate, you must configure the OpenAI client to trust the NeuralDrive CA. The most reliable way is to use an httpx.Client with the verify parameter set to the path of your neuraldrive-ca.crt file.

from openai import OpenAI
import httpx

# Path to the CA certificate downloaded from NeuralDrive
CA_CERT_PATH = "/path/to/neuraldrive-ca.crt"

client = OpenAI(
    base_url="https://neuraldrive.local:8443/v1",
    api_key="nd-xxxxxxxxxxxxxxxxxxxx",
    http_client=httpx.Client(verify=CA_CERT_PATH)
)

Chat Completions

NeuralDrive supports both streaming and non-streaming chat completions.

Streaming Example

Streaming provides real-time feedback as the model generates text, which is ideal for interactive applications.

response = client.chat.completions.create(
    model="llama3.1:8b",
    messages=[{"role": "user", "content": "Explain quantum entanglement."}],
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Non-Streaming Example

For automated scripts where the full output is needed at once:

response = client.chat.completions.create(
    model="llama3.1:8b",
    messages=[{"role": "user", "content": "Write a Python function to sort a list."}],
    stream=False
)

print(response.choices[0].message.content)

Embeddings

You can generate text embeddings for RAG (Retrieval-Augmented Generation) applications using compatible models.

response = client.embeddings.create(
    model="mxbai-embed-large",
    input="NeuralDrive provides high-performance local AI."
)

embedding = response.data[0].embedding
print(f"Generated embedding with {len(embedding)} dimensions.")

Cert Trust Options

If you prefer not to specify the CA path in every script, you have three primary alternatives:

Environment Variables: Set REQUESTS_CA_BUNDLE or SSL_CERT_FILE in your shell environment.
System-wide Install: Add the CA certificate to your operating system's trusted store.
Disable Verification (Testing Only): Set verify=False in the httpx.Client. This is insecure and not recommended for production.

Error Handling

Implement basic error handling to manage timeouts or connection issues:

import openai

try:
    response = client.chat.completions.create(
        model="llama3.1:8b",
        messages=[{"role": "user", "content": "Hi!"}]
    )
except openai.APIConnectionError as e:
    print(f"Could not connect to NeuralDrive: {e}")
except openai.AuthenticationError as e:
    print(f"Invalid API key: {e}")
except openai.APITimeoutError as e:
    print(f"Request timed out (NeuralDrive limit: 600s): {e}")

This chapter assumes basic command-line familiarity.

cURL Examples

Using curl is the quickest way to verify your connection to the NeuralDrive API or perform one-off administrative tasks. All examples assume you have downloaded the NeuralDrive CA certificate (neuraldrive-ca.crt) to your current directory.

Trusting the Certificate

NeuralDrive uses a self-signed certificate. For security, we recommend using the --cacert flag to point to the CA certificate.

Recommended (Secure): curl --cacert neuraldrive-ca.crt ...
Alternative (Insecure): curl -k ... (Only use for quick health checks)

OpenAI-Compatible API

These endpoints follow the standard OpenAI request and response formats.

Chat Completion

curl --cacert neuraldrive-ca.crt \
  -X POST https://neuraldrive.local:8443/v1/chat/completions \
  -H "Authorization: Bearer nd-xxxxxxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

List Available Models

curl --cacert neuraldrive-ca.crt \
  -H "Authorization: Bearer nd-xxxxxxxxxxxxxxxxxxxx" \
  https://neuraldrive.local:8443/v1/models

Ollama Native API

Directly interact with the underlying Ollama service for tasks like pulling models.

Pull a Model

curl --cacert neuraldrive-ca.crt \
  -X POST https://neuraldrive.local:8443/api/pull \
  -H "Authorization: Bearer nd-xxxxxxxxxxxxxxxxxxxx" \
  -d '{"name": "mistral"}'

Get Model Details

curl --cacert neuraldrive-ca.crt \
  -X POST https://neuraldrive.local:8443/api/show \
  -H "Authorization: Bearer nd-xxxxxxxxxxxxxxxxxxxx" \
  -d '{"name": "llama3.1:8b"}'

Health and Management

Health Check

The health endpoint is public and does not require an API key or certificate verification (though -k is used here for brevity).

curl -k https://neuraldrive.local:8443/health

System CA Certificate

Download the CA certificate directly from the System API if you do not have SSH access. This endpoint is public and does not require authentication. Use -k to skip certificate verification — you cannot verify a certificate you have not yet downloaded.

curl -k https://neuraldrive.local:8443/system/ca-cert -o neuraldrive-ca.crt

Once you have the certificate, verify it by checking its fingerprint against the value displayed in the TUI or on the console during boot.

This chapter is for developers and administrators.

TLS Trust

NeuralDrive uses TLS to secure all communications between your client and the server. Because NeuralDrive is a local-first system, it employs a self-signed certificate rather than one from a public Certificate Authority (CA). To establish a secure connection, you must instruct your tools and operating system to trust the NeuralDrive CA.

Why Self-Signed?

Standard CAs like Let's Encrypt require a public domain name and a publicly reachable server to verify ownership. NeuralDrive is designed to operate on local networks, often without a public DNS record. A self-signed CA allows NeuralDrive to generate valid certificates for neuraldrive.local, local IP addresses, and custom hostnames without external dependencies.

Downloading the CA Certificate

You can retrieve the CA certificate (neuraldrive-ca.crt) using one of three methods:

SCP (Recommended): Use secure copy to pull the file directly from the NeuralDrive server. scp <username>@neuraldrive.local:/etc/neuraldrive/tls/neuraldrive-ca.crt ./
System API: Download the certificate via the management endpoint. This endpoint is public and does not require authentication. curl -k https://neuraldrive.local:8443/system/ca-cert -o neuraldrive-ca.crt
Local Filesystem: If you have direct terminal access to the NeuralDrive machine, the file is located at /etc/neuraldrive/tls/neuraldrive-ca.crt.

Installing the Certificate

Operating System Level

Installing the CA at the OS level allows browsers and many native applications to trust NeuralDrive automatically.

macOS:
1. Open Keychain Access.
2. Drag neuraldrive-ca.crt into the System keychain.
3. Double-click the certificate, expand Trust, and set When using this certificate to Always Trust.
Linux (Ubuntu/Debian):
1. sudo cp neuraldrive-ca.crt /usr/local/share/ca-certificates/neuraldrive.crt
2. sudo update-ca-certificates
Windows:
1. Double-click neuraldrive-ca.crt.
2. Click Install Certificate...
3. Select Local Machine and click Next.
4. Select Place all certificates in the following store and browse for Trusted Root Certification Authorities.

Tool-Specific Configuration

Many development environments maintain their own certificate stores or require explicit paths.

Python (requests/httpx/OpenAI SDK): Set the environment variable: export REQUESTS_CA_BUNDLE=/path/to/neuraldrive-ca.crt Or export SSL_CERT_FILE=/path/to/neuraldrive-ca.crt
Node.js: Set the environment variable: export NODE_EXTRA_CA_CERTS=/path/to/neuraldrive-ca.crt
cURL: Use the --cacert flag: curl --cacert neuraldrive-ca.crt ...

Certificate Management

Regeneration

If your server's IP address changes or the certificate expires, you can force a regeneration by deleting the existing server certificates and restarting the certificate services:

sudo rm /etc/neuraldrive/tls/server.crt /etc/neuraldrive/tls/server.key
sudo systemctl restart neuraldrive-certs neuraldrive-caddy

This will generate a new server certificate signed by the existing CA.

Custom Certificates

If you prefer to use your own certificate (e.g., from a corporate CA), replace the files in /etc/neuraldrive/tls/server.crt and /etc/neuraldrive/tls/server.key and restart the neuraldrive-caddy service. Note that the neuraldrive-certs service should be disabled to prevent it from overwriting your custom files on reboot.

This chapter is for everyone.

API Keys

NeuralDrive uses a token-based authentication system to secure access to its APIs. Every request made to the public interface (port 8443) must include a valid API key.

Key Format

API keys follow a standard prefix-based format: nd-xxxxxxxxxxxxxxxxxxxx

The nd- prefix ensures that keys are easily identifiable in configuration files and logs.

Finding Your Key

There are several ways to retrieve your current API key:

First-Boot: On your first login to the NeuralDrive console, the initial API key is displayed in the welcome banner.
NeuralDrive TUI: Launch the Text User Interface by running neuraldrive-tui on the server. Navigate to the Security or API section to view the active key.
System Files: If you have terminal access, the key is stored in plain text at /etc/neuraldrive/api.key.

Rotating API Keys

For security reasons, we recommend rotating your API key periodically or immediately if you suspect it has been compromised.

Using the TUI

Open neuraldrive-tui.
Select Security > Rotate API Key.
Confirm the action. The TUI will generate a new key, update the local configuration files, reload the Caddy service, and display the new key.

Using the System API

You can also rotate the key programmatically via the System Management API.

curl --cacert neuraldrive-ca.crt \
  -X POST https://neuraldrive.local:8443/system/api-keys/rotate \
  -H "Authorization: Bearer <CURRENT_API_KEY>"

The response will contain the newly generated key. Note that the old key becomes invalid immediately after this call.

Pre-setting Keys in Custom Images

When building custom NeuralDrive images using neuraldrive-build.yaml, you can define a static API key in the configuration:

security:
  api_key: "nd-mycustomapikey12345"

This allows for pre-configured deployments where the API key is known before the first boot. If this field is omitted, NeuralDrive will generate a random key during the initialization process.

This chapter assumes familiarity with REST APIs.

Ollama Native API

While the OpenAI-compatible API is recommended for most integrations, NeuralDrive also exposes the Ollama Native API for tasks that require finer control over model management and specialized inference parameters.

Why Use the Native API?

The Native API is necessary for operations not covered by the OpenAI specification, such as:

Downloading (pulling) new models from the library.
Getting granular progress updates during model downloads.
Accessing detailed model metadata (modelfile, license, parameters).
Performing raw text generation without chat-specific formatting.

Base URL and Auth

The Native API is available at the /api/ path on port 8443. Like all other external APIs, it requires a Bearer token in the Authorization header.

Base URL: https://neuraldrive.local:8443/api

Core Endpoints

Endpoint	Method	Description
`/generate`	POST	Raw text completion.
`/chat`	POST	Structured chat completion.
`/tags`	GET	List all models currently available on the server.
`/pull`	POST	Download a model from the Ollama library.
`/show`	POST	View details, parameters, and the Modelfile for a specific model.
`/delete`	DELETE	Remove a model from local storage.

Remote Ollama CLI

You can use the standard ollama command-line tool to interact with your NeuralDrive instance remotely. This allows you to run models on the server using your local terminal.

To point your local CLI to NeuralDrive, set the OLLAMA_HOST environment variable:

export OLLAMA_HOST=https://neuraldrive.local:8443
ollama run llama3.1:8b

Note that the ollama CLI does not natively support Bearer token authentication in all versions. For secure remote CLI usage, we recommend using curl or a custom wrapper that includes the Authorization: Bearer <API_KEY> header.

Pulling a Model via API

When pulling a model, NeuralDrive returns a stream of JSON objects indicating the progress:

curl --cacert neuraldrive-ca.crt \
  -X POST https://neuraldrive.local:8443/api/pull \
  -H "Authorization: Bearer nd-xxxxxxxxxxxxxxxxxxxx" \
  -d '{"name": "mistral"}'

The response will look similar to this:

{"status":"pulling manifest"}
{"status":"pulling layer","digest":"sha256:e8a35d5...","total":5120000000,"completed":1024000}
...
{"status":"success"}

This chapter is for system administrators.

Network Configuration

NeuralDrive uses NetworkManager for managing both wired and wireless interfaces. By default, the system attempts to obtain an IP address via DHCP.

Static IP Configuration

If a static IP address is required, it can be configured using the TUI (Terminal User Interface) or via the command line with nmcli.

To configure a static IP using nmcli, identify the connection name:

nmcli connection show

Apply the static configuration:

sudo nmcli connection modify "Wired connection 1" \
  ipv4.addresses 192.168.1.100/24 \
  ipv4.gateway 192.168.1.1 \
  ipv4.dns "1.1.1.1,8.8.8.8" \
  ipv4.method manual
sudo nmcli connection up "Wired connection 1"

Hostname Configuration

The default hostname is neuraldrive. To change the hostname, use hostnamectl or the TUI:

sudo hostnamectl set-hostname my-neural-node

mDNS and Avahi

NeuralDrive automatically advertises its presence on the local network using mDNS (Multicast DNS) via Avahi. By default, the system is reachable at neuraldrive.local.

Avahi service files are located at:

/etc/avahi/services/neuraldrive-web.service (port 443)
/etc/avahi/services/neuraldrive-api.service (port 8443)

The system uses systemd-resolved with mDNS support enabled to handle local name resolution.

Warning: mDNS does not work on all networks. Corporate environments, certain routers, and VPNs often block multicast traffic. If neuraldrive.local is unreachable, check the IP address on the physical console.

Console IP Display

At boot, the neuraldrive-show-ip.service runs to detect the active IP address and display it directly on the console. This ensures that the node can be located even if mDNS fails or DHCP assigns an unexpected address.

First Boot Setup Network Troubleshooting

This chapter is for system administrators.

Security

The security design of NeuralDrive is built on principles of defense in depth and minimal attack surface. Each component is isolated and operates with only the necessary privileges.

Security Design Principles

NeuralDrive implements several key security architectural features:

Defense in Depth: Multiple layers of security control from the kernel to the application.
Minimal Attack Surface: Only essential services are exposed.
Read-Only Root: The core system is a SquashFS image.
Default-Deny Networking: All incoming traffic is blocked except for required services.
No Default Passwords: Credentials are unique to each installation or rely on key-based authentication.

Default Security Posture

By default, the system-level firewall blocks all incoming traffic except for:

HTTPS (port 443)
System API (port 8443)
mDNS (port 5353)

SSH is disabled by default and requires manual activation.

Service Isolation

Every major service in NeuralDrive runs as a dedicated, low-privilege user account. Systemd hardening is applied to each unit to restrict access to the rest of the system.

Service	User	UID
neuraldrive-ollama	neuraldrive-ollama	901
neuraldrive-webui	neuraldrive-webui	902
neuraldrive-caddy	neuraldrive-caddy	903
neuraldrive-monitor	neuraldrive-monitor	904
neuraldrive-api	neuraldrive-api	905

The systemd units for these services employ several hardening flags:

ProtectSystem=full: Makes /usr, /boot, and /etc read-only for the service.
NoNewPrivileges=true: Prevents the service from gaining more privileges via setuid.
PrivateTmp=true: Gives the service its own /tmp directory.

Data Protection and Permissions

NeuralDrive enforces a strict file permissions model to protect sensitive data:

Models: Owned by neuraldrive-ollama:neuraldrive-ollama with 640 permissions.
Configurations: Owned by root:neuraldrive-admin with 640 permissions.
Credentials: Owned by root:root with 600 permissions.

Audit Logging

System audit logs are maintained for monitoring security events. Logs are stored in JSON format for easy programmatic analysis.

Log file: /var/log/neuraldrive/audit.log

Log rotation is handled daily with a maximum of 7 rotations and a 100MB limit per file.

Firewall TLS Certificates SSH Access LUKS Encryption

This chapter is for system administrators.

Firewall

NeuralDrive uses nftables as the primary firewall. The configuration is designed to block all unsolicited incoming traffic while allowing essential system services.

Default Policy

The firewall is configured with a strict default-deny policy for incoming and forwarded traffic.

Input Chain: policy drop
Forward Chain: policy drop
Output Chain: policy accept

Allowed Traffic

The default ruleset permits the following incoming connections:

Established and Related: Existing outbound sessions.
Loopback: Traffic within the system.
ICMP: Rate-limited echo requests (5 per second).
Web Services: TCP ports 443 (HTTPS) and 8443 (System API).
SSH: TCP port 22, rate-limited to 3 new connections per minute with a burst of 5.
mDNS: UDP port 5353, rate-limited to 10 per second for local service discovery.

Configuration Files

The primary firewall rules are defined in /etc/neuraldrive/nftables.conf.

#!/usr/sbin/nft -f
flush ruleset
table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;
        ct state established,related accept
        iifname "lo" accept
        ip protocol icmp icmp type echo-request limit rate 5/second accept
        ip6 nexthdr icmpv6 icmpv6 type echo-request limit rate 5/second accept
        tcp dport { 443, 8443 } accept
        tcp dport 22 ct state new limit rate 3/minute burst 5 packets accept
        udp dport 5353 limit rate 10/second accept
    }
    chain forward { type filter hook forward priority 0; policy drop; }
    chain output { type filter hook output priority 0; policy accept; }
}

Administrators can add custom rules by creating /etc/neuraldrive/firewall-custom.conf. If this file exists, it is included at the end of the ruleset.

Managing the Firewall

To view the active ruleset:

sudo nft list ruleset

To reload the configuration after making changes:

sudo systemctl restart nftables

Security SSH Access

This chapter is for system administrators.

TLS Certificates

NeuralDrive uses TLS to encrypt all communication between clients and the node. Certificates are automatically managed to ensure secure defaults without requiring manual intervention.

Automatic Generation

The neuraldrive-certs.service is responsible for certificate management. This is a oneshot service that runs at first boot and before the Caddy web server starts.

The certificate generation is idempotent. The service uses an ExecCondition to check for existing certificates and only generates new ones if they are missing.

Certificate Details

The generated certificates are self-signed RSA 4096-bit with SHA-256 signatures. They have a 365-day validity period and include the following Subject Alternative Names (SANs):

DNS:neuraldrive.local
DNS:neuraldrive
IP:<detected-IP>

Certificate Files

Certificates and keys are stored in /etc/neuraldrive/tls/.

File	Permissions	Description
`server.crt`	644	The server certificate.
`server.key`	600	The private key for the server certificate.
`neuraldrive-ca.crt`	644	The root CA certificate used to sign the server certificate.

Client Trust

To avoid browser warnings and ensure secure programmatic access, the CA certificate can be downloaded and installed in the client's trust store.

The CA certificate is available at:

https://<IP>:8443/system/ca-cert (no authentication required)
/etc/neuraldrive/tls/neuraldrive-ca.crt (on the filesystem)

Manual Management

Regenerating Certificates

If certificates need to be regenerated (e.g., due to a hostname change), delete the existing files and restart the certificate service:

sudo rm /etc/neuraldrive/tls/server.*
sudo systemctl restart neuraldrive-certs neuraldrive-caddy

Using Custom Certificates

To use certificates issued by a third-party CA:

Replace server.crt and server.key in /etc/neuraldrive/tls/ with your custom files.
Ensure the file names match and permissions are correctly set (644 for the certificate, 600 for the key).
Restart Caddy: sudo systemctl restart neuraldrive-caddy.

TLS Certificate Trust Security

This chapter is for system administrators.

SSH Access

SSH access is disabled by default in NeuralDrive. This minimizes the initial attack surface and ensures that users must explicitly opt-in to remote command-line management.

Enabling SSH

SSH can be enabled through three primary methods:

TUI (Terminal User Interface): Use the Services or Security menu to toggle SSH.
Boot Parameter: Add neuraldrive.ssh=1 to the kernel command line (e.g., in /proc/cmdline) during the boot process.
System API: Send a POST request to /system/ssh/enable using an authenticated API client.

SSH Configuration and Security

The system uses a hardened SSH configuration located at /etc/ssh/sshd_config.d/neuraldrive.conf.

Authentication: Key-only authentication is enforced. Password authentication is disabled (PasswordAuthentication no).
User Restrictions: Only the neuraldrive-admin user is permitted to log in (AllowUsers neuraldrive-admin).
Root Login: Root login is strictly prohibited (PermitRootLogin no).
Session Settings: MaxAuthTries 3, ClientAliveInterval 300, and ClientAliveCountMax 2 are set to ensure session integrity and prevent brute-force attempts.

Adding SSH Keys

Public SSH keys should be placed in the authorized keys file:

/etc/neuraldrive/ssh/authorized_keys

Alternatively, keys can be injected into this location during the image building process.

Brute-Force Protection

In addition to the rate-limiting provided by the firewall, fail2ban monitors SSH login attempts. The configuration is located at /etc/fail2ban/jail.d/neuraldrive.conf.

Max Retries: 5 failed attempts.
Ban Time: 600 seconds.
Find Time: 600 seconds.

Firewall Rate-Limiting

The system firewall (nftables) further protects the SSH port by rate-limiting new connections to 3 per minute with a burst allowance of 5 packets.

GPU Monitoring

NeuralDrive provides multiple ways to monitor GPU health, utilization, and resource consumption in real-time.

TUI Dashboard

The Terminal User Interface (TUI) provides a high-level overview of GPU status, including VRAM usage, temperature, and utilization percentages. This dashboard is accessible directly from the physical console or via SSH.

Web System Panel

A more detailed GPU monitoring interface is available through the web System Panel.

URL: https://<IP>:8443/monitor/
Dashboard: "GPU Hot"

This interface provides a real-time dashboard powered by an internal service running on port 1312, which is securely proxied by Caddy.

Command-Line Tools

For advanced diagnostics, standard vendor-specific command-line tools are available:

NVIDIA: nvidia-smi
AMD: rocm-smi
General Hardware Check: lspci | grep -i vga

Monitoring Service

The neuraldrive-gpu-monitor.service is responsible for collecting and serving GPU telemetry. This service runs as the neuraldrive-monitor user and exposes data on internal port 1312.

System API

Programmatic access to GPU telemetry is available via the System Management API.

Endpoint: GET /system/gpu
Response Data:
- Vendor and device names.
- VRAM total and currently used.
- Current temperature.

Terminal Interface System Management API

This chapter is for system administrators.

Service Management

NeuralDrive is composed of several specialized services that handle different aspects of the system, from GPU detection to web application hosting.

NeuralDrive Service List

The following services are managed by systemd:

neuraldrive-setup: Handles initial system setup and first-boot configurations.
neuraldrive-gpu-detect: Automatically detects available GPUs and prepares the system before Ollama starts.
neuraldrive-certs: Manages the generation and renewal of TLS certificates.
neuraldrive-zram: Configures zram-based swap space for efficient memory management.
neuraldrive-show-ip: Displays the active IP address on the physical console at boot.
neuraldrive-ollama: The core LLM inference server (internal port 11434).
neuraldrive-webui: The Open WebUI dashboard for user interaction (internal port 3000).
neuraldrive-caddy: The reverse proxy that exposes services via ports 443 and 8443.
neuraldrive-gpu-monitor: Collects and serves GPU telemetry data (internal port 1312).
neuraldrive-system-api: Provides programmatic access to system management (internal port 3001).
neuraldrive-storage-monitor: Monitors disk usage and provides alerts based on defined thresholds.

Monitoring and Control

Service status can be checked using several interfaces:

TUI Services Screen: Real-time status and control.
Web System Panel: Visual status overview.
Command Line: Standard systemctl commands.

# Check status
systemctl status neuraldrive-ollama

# Restart a service
sudo systemctl restart neuraldrive-ollama

# View logs
journalctl -u neuraldrive-ollama -f

Service Dependencies

NeuralDrive services are designed with a specific boot order to ensure reliability:

gpu-detect must complete before ollama starts.
ollama must be active before webui begins operation.
certs must successfully generate certificates before caddy can bind to its ports.

Storage Monitoring

The neuraldrive-storage-monitor service provides automated alerts when disk usage exceeds specific thresholds:

80%: Warning threshold.
90%: Critical threshold.
95%: Immediate action required.

Service Reference TUI Services

This chapter is for all users.

Updating NeuralDrive

NeuralDrive uses a LiveUSB deployment model. This means that system updates are typically applied by re-flashing the USB drive with a newer ISO image.

Update Model

Unlike traditional operating systems that use package managers for updates, NeuralDrive is designed as an immutable system. This ensures that the core operating environment is always in a known, stable state.

Upgrade Procedure

To upgrade to a new version:

Back up data: Backup models, configurations, and WebUI data from the persistence partition.
Re-flash: Use an ISO writer to flash the new version onto the USB drive.
Initialize persistence: Re-create the persistence partition on first boot.
Restore data: Copy the backed-up data back to the appropriate locations (optional).

Warning: Re-flashing the USB drive destroys the persistence partition and all data stored on it. Always ensure a complete backup of critical data before proceeding with an upgrade.

Backup Procedure

It is recommended to copy the following directories to an external drive or network location before re-flashing:

/var/lib/neuraldrive/: Contains downloaded models and Open WebUI user data.
/etc/neuraldrive/: Contains system configurations, TLS certificates, and the API key.

Version Checking

To verify the current version of the system:

Command line: cat /etc/neuraldrive/version (e.g., dev-snapshot).
System API: Send a GET request to /system/status and check the version field.

Future Plans

A specialized neuraldrive-upgrade tool is planned for future releases. This tool will automate the process of downloading and applying updates directly to the persistence partition without requiring a full re-flash.

Writing the USB Drive Storage Management

This chapter is for system administrators and developers.

System Management API

The NeuralDrive System Management API provides programmatic access to system operations, monitoring, and configuration.

Access and Endpoints

The API is accessible over HTTPS:

URL: https://<IP>:8443/system/

Internally, the neuraldrive-system-api service runs on port 3001 and is proxied by Caddy to port 8443.

Authentication

All requests to the System Management API require a Bearer token for authentication. This is the same token used for the inference API and can be found in /etc/neuraldrive/api.key.

Common Endpoints

Method	Path	Purpose
`GET`	`/system/status`	CPU, RAM, disk, uptime, and system version.
`GET`	`/system/services`	List all `neuraldrive-*` services and their status.
`POST`	`/system/services/{name}/restart`	Restart a specific service.
`POST`	`/system/services/{name}/{action}`	Start or stop a specific service.
`GET`	`/system/logs`	Service log tailing (query params: `service`, `lines`).
`GET`	`/system/storage`	Detailed disk usage (models and persistence).
`GET`	`/system/network`	Interfaces, hostname, and mDNS status.
`POST`	`/system/network/hostname`	Set a new system hostname.
`GET`	`/system/gpu`	GPU vendor, device names, VRAM usage, and temperature.
`POST`	`/system/ssh/{action}`	Enable or disable SSH access.
`GET`	`/system/security`	Firewall, TLS, and SSH status.
`POST`	`/system/api-keys/rotate`	Rotate the system API key.
`GET`	`/system/ca-cert`	Download the root CA certificate (no authentication required).

Implementation Details

The API is implemented using FastAPI and is served by Uvicorn. The application environment is located at /usr/lib/neuraldrive/api/.

For security reasons, only services prefixed with neuraldrive- can be managed through the API.

Building Custom Images

The neuraldrive-builder toolkit provides the necessary infrastructure to generate tailored NeuralDrive ISO images. By modifying the build configuration, you can pre-load specific LLM models, inject custom branding, or bake in specific GPU drivers and system packages.

Prerequisites

Building a custom image requires a Linux environment with sufficient disk space (at least 50GB recommended) and a fast internet connection for downloading packages and models.

Docker (Recommended): The easiest way to build is using the provided Docker environment.
Debian 12 (Native): If building natively, you must use Debian 12 (Bookworm) with the live-build package installed.

Build Process

Follow these steps to generate a custom NeuralDrive ISO.

1. Prepare the Environment

Clone the NeuralDrive repository and navigate to the builder directory.

git clone https://github.com/Rightbracket/NeuralDrive.git
cd NeuralDrive/builder

2. Configure the Build

The build is controlled by neuraldrive-build.yaml. Create your configuration file from the provided example.

cp neuraldrive-build.yaml.example neuraldrive-build.yaml

Edit neuraldrive-build.yaml to suit your requirements. Key sections include:

system: Define the kernel version, locale, and additional Debian packages.
gpu: Enable or disable support for NVIDIA, AMD, and Intel GPUs.
models: Specify which models to pre-load.
webui: Customize the management interface branding.
output: Set the filename and compression level.

3. Pre-load Models

Note: Model pre-loading currently requires a manual staging step.

To include models in your image, you must first stage them. The scripts/download-models.sh script uses a temporary Ollama instance to pull the models listed in your configuration into the ./model-staging/ directory.

./scripts/download-models.sh

These models are later packaged into the final image.

4. Execute the Build

You can run the build either natively or via Docker.

Using Docker (Recommended):

docker compose run builder

The Docker environment uses a debian:bookworm base and runs in privileged mode to allow live-build to mount filesystems.

Using Native Build:

sudo ./build.sh

The build.sh script performs the following actions:

Validates the configuration using scripts/validate-config.sh.
Prepares the live-build environment.
Applies branding via scripts/apply-branding.sh.
Runs the lb build process.
Post-processes the output via scripts/post-build.sh (e.g., running isohybrid).

5. Retrieve the ISO

Once the build completes, the resulting ISO file will be located in the output/ directory.

Configuration Example

neuraldrive:
  version: "1.0"
  name: "My Custom NeuralDrive"
  hostname: "my-llm-server"
  system:
    kernel: "default"            # "default" (6.1) or "backport" (6.12+)
    locale: "en_US.UTF-8"
    timezone: "UTC"
    extra_packages: ["htop", "vim"]
  gpu:
    nvidia: true
    amd: false
    intel: false
  models:
    preload: ["llama3.1:8b", "codestral:latest"]
    catalog: "default"           # default/minimal/none
  network:
    ssh_enabled: false
    default_ip: "dhcp"
  security:
    encrypt_persistent: false
    api_key: ""                  # Empty = auto-generate at first boot
  webui:
    enabled: true
    admin_email: "admin@example.com"
    branding:
      title: "Custom NeuralDrive"
      logo: "assets/custom-logo.png"
  output:
    format: "iso-hybrid"
    filename: "neuraldrive-custom.iso"
    compression: "xz"

Build Estimates

Build times vary significantly based on your configuration:

Minimal Image: 30-45 minutes.
Full GPU Stack (NVIDIA + AMD): 60-70 minutes.
Heavy Model Pre-loading: 90+ minutes (largely dependent on download speeds).

Output Formats

NeuralDrive supports two primary output formats:

iso-hybrid (Default): A bootable image compatible with both USB flash drives and optical media (CD/DVD).
raw-disk: A standard disk image for virtual machines or direct disk writing.

For detailed information on every configuration key, see the Build Configuration Reference. To further customize the system during the build process, refer to Custom Hooks & Overlays.

Audience: Admin

Build Configuration Reference

This reference provides a complete annotation of all keys available in the neuraldrive-build.yaml file. This file controls the generation of the NeuralDrive ISO image and defines the default system state.

Specification

`neuraldrive`

version: (String) The internal version of the NeuralDrive build. Default: "1.0"
name: (String) The human-readable name of the image. This appears in the boot menu and the WebUI title. Default: "NeuralDrive"
hostname: (String) The default system hostname. Default: "neuraldrive"

`system`

kernel: (Enum) Specifies the Linux kernel version.
- default: Standard Debian 12 kernel (6.1 LTS).
- backport: Newer kernel from Debian backports (6.12+), recommended for recent hardware.
locale: (String) System locale. Example: "en_US.UTF-8"
timezone: (String) System timezone. Example: "UTC" or "America/New_York"
extra_packages: (List) Additional Debian packages to install. Example: ["htop", "vim", "tmux"]

`gpu`

nvidia: (Boolean) Enable NVIDIA driver and toolkit installation. Default: true
amd: (Boolean) Enable ROCm drivers and AMD GPU support. Default: false
intel: (Boolean) Enable Intel GPU and XPU support. Default: false

`models`

preload: (List) A list of models to pre-load during the build process. Example: ["llama3.1:8b", "codestral:latest"]
catalog: (Enum) The set of models to include in the default available list.
- default: Includes standard common models.
- minimal: Includes only the absolute essentials for basic testing.
- none: No models pre-populated.

`network`

ssh_enabled: (Boolean) Enable SSH server by default. Default: false
default_ip: (String) Default IP configuration. Use "dhcp" or a static IP in CIDR format. Default: "dhcp"

`security`

encrypt_persistent: (Boolean) Enable LUKS2 encryption for the persistence partition by default. Default: false
api_key: (String) Default API key for external access. If empty, a key will be auto-generated at first boot. Default: ""

`webui`

enabled: (Boolean) Enable the NeuralDrive management WebUI. Default: true
admin_email: (String) The default administrative email for the WebUI. Default: "admin@example.com"
branding.title: (String) The title displayed in the WebUI. Default: "NeuralDrive"
branding.logo: (Path) Path to the custom logo file relative to the builder root. Default: "assets/logo.png"

`output`

format: (Enum) The final output image format.
- iso-hybrid: Bootable on both USB and CD.
- raw-disk: Raw disk image.
filename: (String) The name of the resulting image file. Default: "neuraldrive.iso"
compression: (Enum) Compression algorithm for the ISO image.
- xz: Higher compression, slower build time.
- gzip: Faster build time, larger image size.

Configuration Examples

NVIDIA-Only Minimal Image

This configuration creates a lightweight image focused on NVIDIA GPUs with no extra overhead.

neuraldrive:
  name: "NVIDIA Minimal NeuralDrive"
gpu:
  nvidia: true
  amd: false
  intel: false
models:
  catalog: "minimal"
  preload: []

Full Image with Pre-loaded Models

A comprehensive image containing all GPU drivers and several models ready for immediate use.

neuraldrive:
  name: "Full NeuralDrive with Models"
gpu:
  nvidia: true
  amd: true
  intel: true
models:
  catalog: "default"
  preload: ["llama3.1:8b", "codestral:latest", "mistral:7b"]
output:
  compression: "xz"

Custom Branded Image

Designed for deployment with specific hostname and branding requirements.

neuraldrive:
  name: "Company NeuralDrive"
  hostname: "company-llm-server"
webui:
  branding:
    title: "Company AI Lab"
    logo: "assets/company-logo.png"
network:
  ssh_enabled: true

For more details on the build process, see Building Custom Images.

Audience: Admin (advanced)

Custom Hooks & Overlays

The NeuralDrive build system uses the underlying Debian live-build framework to allow for deep customization. You can inject scripts, configuration files, and third-party packages into the final image without modifying the core build scripts.

Hook System

Hooks are executable scripts that run within the Debian environment (chroot) during the image creation process. They allow you to perform complex setup tasks like installing specific Python packages, configuring system services, or modifying system files.

Implementation

Hooks must be placed in the hooks/chroot/ directory of the builder. During the build, they are executed in alphabetical order. Using a numeric prefix (e.g., 01-, 02-) ensures a deterministic execution sequence.

Example Hook Script

To install custom Python packages via pip:

#!/bin/bash
# File: hooks/chroot/05-custom-pip-packages.hook.chroot

echo "Installing custom pip packages..."
pip install --no-cache-dir langchain-community chromadb

Overlay System

The overlay system provides a way to add or overwrite files in the final root filesystem. The contents of the overlay/ directory are mirrored directly onto the target system's root.

Implementation

If you want to add a file to /etc/neuraldrive/custom.conf, you should place it at overlay/etc/neuraldrive/custom.conf in the builder directory.

Common Uses

Config Files: Provide default configurations for services like SSH, Nginx, or Ollama.
Scripts: Inject administrative scripts into /usr/local/bin/.
Static Assets: Replace or add branding assets like splash screens or icons.

Package Injection

You can automatically install third-party Debian (.deb) packages by placing them in the packages/ directory.

Implementation

Any .deb file found in the packages/ folder will be included in the local repository created during the build and installed by apt alongside the standard system packages. This is particularly useful for proprietary drivers or custom-built software not available in the Debian repositories.

Execution Order

During the lb build process, these customizations are applied as follows:

Overlay Injection: Files from overlay/ are copied into the chroot environment.
Package Installation: Standard packages and injected .deb files are installed via apt.
Hooks: Scripts in hooks/chroot/ are executed in alphabetical order.

Warning: Hooks run with root privileges inside the chroot. Ensure your scripts are idempotent and do not fail, as a hook failure will cause the entire image build to fail.

For an overview of the full build process, refer to Building Custom Images.

Audience: Everyone

CD Mode vs USB Mode

NeuralDrive's hybrid ISO image can be used in two distinct ways: as a live environment from a USB drive or as a read-only system from a CD/DVD. The choice of medium significantly impacts the features and persistence of the system.

USB Mode (Recommended)

When NeuralDrive is flashed to a USB drive, it creates a persistence partition that allows for the full suite of features. This is the intended deployment method for most users.

Persistence: Models, system configurations, and user data are saved to the USB drive and survive reboots.
Model Management: Downloading and updating models via the WebUI or command line is fully supported.
Performance: USB 3.0+ provides acceptable I/O speeds for system operations.

CD Mode

If the ISO is burned to a CD or DVD, or booted in an environment where the boot medium is read-only, NeuralDrive operates in a restricted live mode.

Read-Only System: All system changes are made to a tmpfs (RAM-based filesystem) and are lost upon reboot.
Disabled Downloads: Since there is no persistent storage on the disc, model downloads are disabled. Users will see a warning: "Downloads disabled — connect external storage."
Stateless Security: Every boot starts with a clean, known state, making it ideal for high-security environments where no data should remain on the hardware.

Booting to RAM

For CD mode, a specialized "Copy to RAM" (toram) boot option is available in the boot menu. This loads the entire image into system memory, which has several benefits:

Speed: Operating purely from RAM is faster than reading from an optical drive.
Drive Availability: Once loaded, the CD can be ejected, freeing the drive for other uses.
Hardware Compatibility: Requires sufficient RAM (typically 16GB+) to hold both the system and the models.

Comparison and Use Cases

Feature	USB Mode	CD Mode
Persistence	Full	None (lost on reboot)
Model Downloads	Enabled	Disabled (unless external storage added)
Encryption	Supported	Not applicable
Typical Use Case	Permanent local LLM server	Evaluation, high-security workstations

When to Use Each

USB: Best for regular use, developing with LLMs locally, or setting up a dedicated server for a small team.
CD: Best for testing hardware compatibility, air-gapped security needs, or environments where the operating system must never be modified.

Tip: If using CD mode but still needing persistent model storage, you can connect a separate USB or SATA drive. NeuralDrive will automatically detect and mount it. For more, see External Storage.

For the initial setup process on either medium, refer to First Boot Setup.

Audience: Admin

External Storage

NeuralDrive is designed to work with external storage devices for model persistence and data management. This is particularly useful for systems running from a read-only medium (CD) or for users who maintain a large library of models that exceed the size of a standard USB drive.

Auto-Mounting

The system includes pre-configured udev rules that automatically detect and mount external storage devices (USB, SATA, etc.) as they are connected.

Mount Point: /mnt/external/<LABEL>
Directory Name: The drive's volume label is used as the directory name. If no label exists, the device identifier (e.g., sdb1) is used.

Configuring External Model Storage

By default, models are stored in /var/lib/neuraldrive/models/. You can redirect this to an external drive using one of two methods.

Method 1: Bind Mount (Recommended)

Edit /etc/fstab to mount your external storage directory directly over the default models directory. This is the most reliable method for ensuring persistence.

/mnt/external/MyModels/ollama /var/lib/neuraldrive/models/ none bind 0 0

Method 2: Symlinking

Alternatively, you can create a symbolic link from the default location to the external drive.

# Stop the Ollama service
sudo systemctl stop ollama

# Move existing models to the external drive
sudo mv /var/lib/neuraldrive/models/* /mnt/external/MyModels/

# Create the symlink
sudo ln -s /mnt/external/MyModels/ /var/lib/neuraldrive/models

# Restart the service
sudo systemctl start ollama

Use Cases

CD Mode with External Storage: When running NeuralDrive from a CD, you can still have a persistent model library by connecting an external USB drive.
Shared Model Library: Multiple NeuralDrive instances can share a single large external drive containing a comprehensive library of LLMs.
Supplementing USB Storage: If your primary USB drive runs low on space, you can seamlessly add a second drive to expand your available model storage.

For more information on model storage and management, see Storage Management and CD Mode vs USB Mode.

Audience: Admin

LUKS Encryption

NeuralDrive supports LUKS2 encryption for its persistence partition. This ensures that models, system configurations, and credentials stored on the USB drive are protected from unauthorized access if the physical medium is lost or stolen.

What is Protected?

When encryption is enabled, it applies specifically to the persistence partition (typically labeled persistence). This partition stores:

Models: All LLMs downloaded via Ollama.
Configs: Network settings, API keys, and system customizations.
WebUI Data: User accounts and administrative settings.

Enabling Encryption

There are two primary ways to enable LUKS encryption.

1. First-Boot Wizard

During the initial setup process, the first-boot wizard will present an option to "Enable Persistence Encryption." Selecting this option will prompt you to enter a passphrase that will be required to unlock the partition on every subsequent boot.

2. Build Configuration

For automated deployments or pre-configured images, you can enable encryption in the neuraldrive-build.yaml file:

security:
  encrypt_persistent: true

Boot Experience

When encryption is enabled, the system will pause during the boot sequence to prompt for the decryption passphrase. This occurs before any NeuralDrive services (including the WebUI) are started.

Warning: If you lose your passphrase, the data on the persistence partition is unrecoverable. There is no password reset or recovery mechanism for LUKS-encrypted partitions.

Performance Considerations

Using LUKS2 encryption introduces a minor overhead for disk I/O operations. However, because LLM inference is primarily bound by GPU or CPU performance and VRAM/RAM bandwidth, the impact on overall model performance is negligible for most users.

Important Note: Destructive Operation

Enabling encryption for the first time on a drive that already contains data is a destructive operation. It will reformat the persistence partition. Ensure you back up any critical data before enabling this feature on an existing installation.

For more information on the first-time setup process, see First Boot Setup. To explore other security features, refer to Security.

Audience: Admin / Developers

Performance Tuning

NeuralDrive is pre-optimized for a wide range of hardware, but fine-tuning specific configurations can significantly improve inference speed, concurrent user handling, and memory efficiency.

Ollama Configuration

The primary backend service, Ollama, is controlled via /etc/neuraldrive/ollama.conf. Modifying these parameters allows you to tailor the system's behavior to your specific hardware and workload.

OLLAMA_HOST=127.0.0.1:11434
OLLAMA_MODELS=/var/lib/neuraldrive/models/
OLLAMA_KEEP_ALIVE=5m
OLLAMA_MAX_LOADED_MODELS=0
OLLAMA_NUM_PARALLEL=1

Key Parameters

OLLAMA_NUM_PARALLEL: (Integer) The number of concurrent requests the server will handle. Increase this for multi-user environments, though this will increase VRAM usage.
OLLAMA_KEEP_ALIVE: (Duration) How long a model remains loaded in memory after the last request. Setting this to a higher value (e.g., 30m) avoids the latency of reloading models.
OLLAMA_MAX_LOADED_MODELS: (Integer) The maximum number of models to keep in VRAM simultaneously. The default is 0 (auto), which allows Ollama to manage concurrent loading based on available VRAM. When memory is full, Least Recently Used (LRU) models are evicted automatically. Manual overrides can be set in /var/lib/neuraldrive/config/ollama.conf.
OLLAMA_NUM_THREADS: (Integer) Specifies the number of CPU threads to use for inference. By default, this auto-detects based on your hardware.
OLLAMA_FLASH_ATTENTION: (Boolean) Enabling Flash Attention can significantly improve speed on supported GPUs (e.g., NVIDIA Ampere and newer).

Memory Management

VRAM and RAM Spilling

Ollama uses memory mapping (mmap) by default. This allows the system to load models larger than the available VRAM by spilling some layers into system RAM. While this enables the execution of larger models, it will result in slower inference speeds for the layers processed by the CPU.

zRAM Swap

NeuralDrive includes the neuraldrive-zram.service, which creates a compressed swap device in RAM. This is particularly beneficial when running on systems where the combined requirement of VRAM and RAM is very tight, as it provides a faster alternative to traditional disk-based swap.

Storage and I/O

To minimize I/O wait times and reduce wear on USB flash media, NeuralDrive employs several filesystem optimizations:

noatime: The system is mounted with the noatime option to prevent unnecessary write operations when files are accessed.
commit=60: Data is committed to disk every 60 seconds (instead of the default 5), reducing the frequency of physical write cycles.

Context Window Management

The size of the context window directly impacts memory consumption. A larger context window allows the model to "remember" more of the conversation but requires significantly more VRAM. Adjust the context size within your application or model configuration to balance memory usage and conversational depth.

Multi-GPU Optimization

If multiple compatible GPUs are present, Ollama will automatically detect them and distribute model layers across all available devices. This is an effective way to run very large models that would otherwise exceed the VRAM of a single card.

For further details on configuration, see Configuration Files Reference. To choose the right models for your hardware, see Model Recommendations.

Audience: Developers (advanced)

llama.cpp Server

While Ollama is the default model server for NeuralDrive, the system also includes the llama-server binary for users who require lower-level control or specialized inference features.

When to Use llama.cpp

The llama-server is ideal for developers who need more granular configuration than the Ollama API provides.

Feature	Ollama	llama.cpp
Ease of Use	High (Managed downloads)	Medium (Manual model placement)
Model Format	Managed Blobs	Direct GGUF loading
Control	Standardized API	Fine-grained sampling & batching
Resource Usage	Integrated Management	Lower overhead per instance

Enabling the Server

The llama-server is included in the base image but is not enabled by default. You can start it manually or configure a custom systemd service.

# Example command to start the llama.cpp server
llama-server --model /path/to/my-model.gguf --port 8080 --host 0.0.0.0

Configuration Options

The server supports a wide variety of command-line flags to tune performance:

--ctx-size: (Integer) Define the maximum context window size.
--n-gpu-layers: (Integer) Specifically state how many layers to offload to the GPU.
--threads: (Integer) The number of CPU threads to utilize.
--batch-size: (Integer) Set the batch size for prompt processing.

Important Note: Model Formats

It's important to understand the difference in model storage between the two servers. Ollama stores models in a proprietary blob format within /var/lib/neuraldrive/models/. These blobs are not directly compatible with the llama-server. To use a model with llama.cpp, you must provide a raw .gguf file.

For more information on model formats, see Understanding LLM Models. To further optimize your inference setup, refer to Performance Tuning.

Audience: Everyone

Common Issues

This guide provides solutions for frequent technical challenges encountered while using NeuralDrive. If your issue is not listed here, consult the specialized troubleshooting pages for GPU, Boot, Network, or Models.

Dashboard and Access

I can't reach the web dashboard

If the web interface does not load at https://neuraldrive.local, verify the following:

Check the Local IP: The NeuralDrive console (TUI) displays the current IP address. Attempt to connect directly via https://<IP_ADDRESS>.
Verify Port 443: Ensure your client machine can reach the NeuralDrive device on port 443. Some firewalls or router settings may block this traffic.
Network Connection: Confirm the NeuralDrive device has an active Ethernet or Wi-Fi connection.
mDNS Resolution: The .local hostname requires mDNS (Avahi/Bonjour) support on the client. If your client is on a corporate network, mDNS may be filtered.

The API returns 401 Unauthorized

All API requests must include a valid Bearer token.

Key Format: Verify your API key follows the nd-xxxx format.
Header Syntax: Ensure the header is sent correctly:
```
Authorization: Bearer nd-xxxx
```
Key Location: The system API key is stored in /etc/neuraldrive/api.key and can be viewed or rotated via the TUI.

System Persistence

My models disappeared after reboot

NeuralDrive is a live system. Data only survives reboots if a persistence partition is active.

USB Mode Requirement: Persistence is only available when running from a USB drive with a labeled persistence partition. It does not function in "Live CD" (ISO only) mode.
Verify Mount: Run the following command to check if the persistence layer is active:
```
mount | grep persistence
```
Partition Health: If the partition is present but not mounting, check the filesystem integrity using fsck.

Resource Management

The system is running out of memory

Large language models require significant RAM or VRAM. If the system becomes unresponsive or returns memory errors:

Downsize the Model: Use a smaller model (e.g., 7B instead of 70B).
Increase Quantization: Use a more compressed version of the model (e.g., Q4_K_M instead of Q8_0 or FP16).
Check Swap: NeuralDrive uses zram for compressed swap. Verify it is active with zramctl.

Model download is extremely slow or fails

Storage Space: Model downloads will abort if the disk is full. Check availability with df -h /var/lib/neuraldrive.
Network Stability: Ensure the device has a stable connection to the internet. Downloads are resumed automatically if interrupted, but high latency can cause timeouts.
Proxy Settings: If you are behind a corporate proxy, ensure the environment variables are correctly set in /etc/neuraldrive/ollama.conf.

Service Status

The TUI shows 'Ollama Offline'

If the inference engine is not responding:

Check Service Status:
```
systemctl status neuraldrive-ollama
```
Restart the Service:
```
systemctl restart neuraldrive-ollama
```
Check Logs: If the service fails to start, inspect the logs for GPU driver or initialization errors:
```
journalctl -u neuraldrive-ollama -e
```

Diagnostics

NeuralDrive-Check

NeuralDrive includes a dedicated diagnostic tool for rapid health assessment. Run this from the console or via SSH:

/usr/bin/neuraldrive-check

This tool verifies:

GPU driver initialization and VRAM availability.
Persistence partition mount status.
Core service health (Ollama, WebUI, Caddy).
Network connectivity and API key validity.

Tip: Always run neuraldrive-check before seeking manual support, as it identifies 90% of common configuration errors.

Audience: Everyone

GPU Troubleshooting

NeuralDrive is optimized for hardware-accelerated inference. If the system fails to detect or utilize your GPU, it will fallback to CPU inference, which is significantly slower.

Detection Issues

No GPU detected

If NeuralDrive does not recognize your hardware, verify the following system configurations:

Hardware Compatibility: Ensure your GPU is listed in the Hardware Compatibility Matrix.
BIOS Settings:
- IOMMU/VT-d: Ensure these are enabled for proper PCI communication.
- Above 4G Decoding: Must be enabled for modern GPUs (RTX 30-series and newer).
- Resizable BAR: Recommended for improved performance, though not required for detection.
Secure Boot: NVIDIA drivers require MOK (Machine Owner Key) enrollment to function with Secure Boot. If you cannot enroll the key, disable Secure Boot in the BIOS.
Mixed Vendors: Mixed-vendor configurations (e.g., one NVIDIA and one AMD card) are not supported. The first vendor detected by the boot sequence will be initialized.

Nouveau Conflict

NeuralDrive automatically blacklists the open-source nouveau driver to prevent conflicts with the proprietary NVIDIA stack. To verify:

lsmod | grep nouveau

If the command returns any output, the blacklist failed. Check /etc/modprobe.d/neuraldrive-blacklist.conf.

Driver and Hardware Support

missing nvidia-uvm module

If Ollama reports CPU-only inference despite having an NVIDIA GPU, the nvidia-uvm (Unified Video Memory) kernel module may not be loaded. This module is essential for CUDA memory allocation.

Verify Module: Check if the module is loaded:
```
lsmod | grep nvidia_uvm
```
Naming Convention: On NeuralDrive (based on Debian), the module is named nvidia-current-uvm.

Manual Load: If missing, load it manually:

sudo modprobe nvidia-current-uvm && nvidia-modprobe -u

Automatic Loading: NeuralDrive should load this automatically at boot via /etc/modules-load.d/nvidia-uvm.conf. If it fails, check the journalctl -u neuraldrive-ollama logs for ExecStartPre failures.

Without this module, /dev/nvidia-uvm device nodes will be missing, causing CUDA calls to fail silently and Ollama to fall back to CPU.

cgroup v2 / DeviceAllow blocking

If the GPU is detected by the system but Ollama still falls back to CPU inference, systemd DeviceAllow rules might be blocking access.

cgroup v2 Behavior: On systems using cgroup v2, DeviceAllow uses eBPF device filters. These filters can block CUDA access even when explicit allow rules for /dev/nvidia* and /dev/dri/* are present.
NeuralDrive Default: The default NeuralDrive Ollama service has all DeviceAllow lines removed to prevent this.
Custom Units: If you have modified the service unit and re-added DeviceAllow rules, remove them and ensure PrivateDevices=no is set to restore GPU access.

Diagnostic Tools

NeuralDrive provides several utilities to inspect GPU state:

NVIDIA: Run nvidia-smi to view VRAM usage, temperature, and driver version.
AMD: Run rocm-smi to inspect ROCm status and device health.
System Config: The file /run/neuraldrive/gpu.conf is generated at boot by neuraldrive-gpu-detect.service. It contains the detected vendor:
```
cat /run/neuraldrive/gpu.conf
```
PCI Enumeration: Use lspci | grep -i vga to see if the kernel sees the hardware at the bus level.

Recovery and Safe Mode

Safe Mode Boot

If a GPU driver causes a system hang or kernel panic during boot, use the Safe Mode option in the GRUB menu.

Effect: Skips all GPU detection and driver loading.
Result: The system will boot with generic VGA drivers and use CPU-only inference.
Usage: Ideal for troubleshooting BIOS settings or extracting logs when the GPU is failing.

Warning: Running in Safe Mode will result in extremely high CPU usage and latency during model inference.

Mixed Vendor Support

Current NeuralDrive releases only support single-vendor clusters. If you have multiple GPUs, they must all be from the same manufacturer (e.g., all NVIDIA or all AMD). Ollama will automatically distribute model layers across all compatible GPUs of the same vendor to maximize VRAM utilization.

Note: For detailed kernel parameters related to GPU management, see Boot Parameters.

Audience: Everyone

Boot Failures

This section addresses issues that prevent NeuralDrive from reaching the console or dashboard.

BIOS and UEFI Compatibility

NeuralDrive is distributed as an isohybrid image, which supports both legacy BIOS and modern UEFI boot modes.

Preferred Mode: UEFI is highly recommended for compatibility with modern GPU drivers and Secure Boot.
Boot Order: Ensure the USB flash drive is set as the primary boot device in your system firmware.
USB Port: If the drive is not detected, try a different USB port (e.g., USB 2.0 instead of 3.x).

When the system starts, the GRUB menu provides three primary boot entries:

Normal: Standard boot with full GPU detection and driver initialization.
Safe Mode: Skips GPU driver loading and PCI enumeration. Use this if the system hangs during boot.
Debug Mode: Appends debug and verbose to the kernel command line, providing detailed systemd output for troubleshooting.

Common Boot Errors

Black Screen

If the system hangs with a black screen or blinking cursor shortly after selecting a boot entry:

Driver Issue: This is typically caused by a GPU driver conflict. Reboot and select Safe Mode.
Kernel Modesetting: If Safe Mode fails, try editing the boot parameter (press 'e' in GRUB) and add nomodeset.

"No bootable device"

If the hardware fails to recognize the USB drive entirely:

Flash Verification: The image may have been written incorrectly. Re-flash the USB drive and enable "Verify write" in your flashing utility.
GPT vs MBR: Ensure your BIOS is set to match the partition style of the flash drive.

Kernel Panic

If the boot process halts with a "Kernel Panic" or "VFS: Unable to mount root fs":

Corrupt Image: This usually indicates a bad write to the USB drive. Re-flash using a high-quality USB 3.x drive.
Memory Corruption: In rare cases, this may indicate faulty RAM. Run a MemTest86+ cycle to verify hardware health.

Note: For a complete list of supported kernel arguments, see Boot Parameters. For instructions on creating the bootable media, see Writing the USB Drive.

Audience: Everyone

Network & mDNS Troubleshooting

This guide addresses connectivity issues between your client machine and the NeuralDrive appliance.

mDNS and Hostname Resolution

`neuraldrive.local` doesn't resolve

The neuraldrive.local address uses Multicast DNS (mDNS) for discovery.

Client Software: Ensure your client has mDNS support. Windows (via Bonjour), macOS (native), and Linux (Avahi) are supported.
Network Hardware: Some routers or managed switches block multicast traffic (UDP 5353).
Corporate Networks: mDNS is often disabled or filtered on enterprise-grade networks.
Workaround: Use the direct IP address shown on the NeuralDrive console (TUI).

TLS and Certificates

HTTPS certificate warning

NeuralDrive uses self-signed certificates for end-to-end encryption. Browsers will display a "Your connection is not private" warning.

Expected Behavior: This warning is expected when using the default self-signed CA.
CA Installation: To resolve this, download the CA certificate and add it to your browser or OS trust store.
- Path: /etc/neuraldrive/tls/neuraldrive-ca.crt
- URL: https://<IP>:8443/system/ca-cert
Custom Certificates: You can replace the default certificates in /etc/neuraldrive/tls/ with your own.

Wi-Fi Configuration

Supported hardware: Most Intel and Realtek Wi-Fi chipsets are supported via NetworkManager.
TUI Configuration: Use the "Network" menu in the TUI to scan and connect to access points.
Command Line: Advanced users can use nmcli dev wifi connect <SSID> password <PASSWORD> for manual association.

Firewall and Ports

Can't connect from another machine

Port Access: Ensure the following ports are open on the host firewall (nftables):
- 443: Web UI Dashboard.
- 8443: API Gateway and System Panel.
Ping Test: Verify basic ICMP connectivity with ping <IP_ADDRESS>. If pings are successful but port 443 fails, the web server (Caddy) may be offline.

Note: For more information on configuring networking and firewall rules, see Network Configuration and TLS Certificate Trust.

Audience: Everyone

Model Loading Issues

This guide covers issues related to downloading, loading, and running Large Language Models (LLMs).

Download Failures

"Downloads disabled"

If the system prevents downloading new models, it has likely detected it is running in "Live CD" (read-only) mode.

USB Mode Requirement: You must run the system from a USB drive with a valid persistence partition to save downloaded models.
External Storage: If persistence is not available, you can mount an external disk to /var/lib/neuraldrive/models/ to store downloads.

Slow Download Speed

Model downloads depend on your internet connection and the responsiveness of the source registry.

Storage Check: Downloads will abort if the target storage is full. Check free space with df -h /var/lib/neuraldrive/models/.
Resuming: If a download is interrupted, NeuralDrive will automatically attempt to resume from the last successful byte.

Execution Issues

"Model failed to load"

This error occurs when the model weights cannot be loaded into available memory.

Insufficient VRAM: The model is too large for your GPU. Try a smaller model or a version with higher quantization (e.g., Q4_K_M).
RAM Fallback: If VRAM is exhausted, Ollama may attempt to load portions of the model into system RAM. This process is very slow and can lead to a load timeout.

Slow Inference

If the model is generating text very slowly (less than 1 token per second):

CPU Fallback: The system is likely running on the CPU. Check if your GPU was detected:
```
cat /run/neuraldrive/gpu.conf
```

Concurrent Model Loading

Multiple models can be loaded simultaneously if your hardware supports it.

Automatic Management: NeuralDrive uses OLLAMA_MAX_LOADED_MODELS=0 (auto) by default. Ollama manages concurrent loading based on your available VRAM.
Model Eviction: When VRAM is exhausted, least-recently-used models are automatically evicted from memory to make room for new ones.
Manual Control: You can manually load or unload models via the TUI Models screen (accessible with the F2 key).

Management and Corruption

"Model not found"

Exact Naming: Models must be called by their full tag (e.g., llama3:8b).
Verify Inventory: Use the TUI or the following API command to list all locally available models:
```
curl -H "Authorization: Bearer nd-xxxx" https://<IP>:8443/api/tags
```

Model Corruption

If a model loads but produces garbled output or crashes the service:

Delete and Re-pull: Remove the corrupted model and download it again via the TUI or API:

curl -H "Authorization: Bearer nd-xxxx" \
  -X DELETE https://<IP>:8443/api/delete -d '{"name":"model_name"}'

Note: For specific model recommendations based on your hardware, see Model Recommendations. For GPU-specific issues, consult GPU Problems.

Audience: Everyone

Frequently Asked Questions

This guide provides answers to common questions about NeuralDrive's capabilities, architecture, and administration.

Installation and Compatibility

Can I install NeuralDrive to a hard drive?

NeuralDrive is a live system designed to run from removable media. It is not intended for standard disk installation. Persistence of configuration and models is achieved through a dedicated partition on the USB drive. This ensures the host system remains untouched and the appliance remains portable.

Can I run NeuralDrive in a Virtual Machine (VM)?

Yes. You can run NeuralDrive in a VM using the ISO image. For optimal performance, you must use GPU passthrough to give the VM direct access to the host hardware. If GPU passthrough is not available, NeuralDrive will fallback to CPU-only mode, which is significantly slower but functional.

Can I use NeuralDrive without a GPU?

Yes. If no compatible GPU is detected, NeuralDrive will automatically fallback to CPU inference. While functional, the performance will be substantially lower than GPU-accelerated modes. This is ideal for lightweight testing or running small models on high-performance CPUs.

Usage and Administration

How do I add more users?

NeuralDrive uses Open WebUI for its primary dashboard. You can manage users through the Admin Panel at https://<IP_ADDRESS>/admin. Note that user registration is disabled by default to maintain local security; the admin must manually create or approve new user accounts.

Can I use NeuralDrive offline?

NeuralDrive is fully offline-capable. Once models are downloaded, no internet connection is required for inference, API access, or dashboard usage. For entirely air-gapped operations, you can pre-load models onto the persistence partition of your USB drive before moving to the target environment.

How is this different from running Ollama directly?

NeuralDrive is a turnkey appliance that eliminates the complexity of system setup. It includes:

An optimized operating system with a minimal attack surface.
Automated GPU driver detection and configuration.
A built-in web dashboard, API gateway, and system monitoring.
Self-signed TLS encryption for all traffic.
Pre-configured firewall and security hardening.
Portability on a single USB drive.

Can I use my own TLS certificate?

Yes. You can replace the default self-signed certificates in /etc/neuraldrive/tls/ with your own PEM-formatted certificates. After replacing the files, restart the Caddy service to apply the changes:

systemctl restart neuraldrive-caddy

Is there any telemetry or data collection?

No. NeuralDrive is designed for privacy and local-first operations. No data, usage metrics, or telemetry are ever sent to external servers.

Hardware and Performance

What models are recommended?

Model performance depends entirely on your available VRAM and RAM. See Model Recommendations for a list of tested models and their hardware requirements.

Can I use multiple GPUs?

Yes. Ollama automatically detects and utilizes all available GPUs from the same vendor. It will distribute model layers across GPUs to maximize VRAM utilization, enabling the execution of models that are too large for a single card.

Note: For more information on hardware compatibility, see the Hardware Matrix. For advanced system settings, see Boot Parameters.

Audience: Everyone

Hardware Compatibility Matrix

This document provides a comprehensive list of supported GPU hardware and their corresponding compute stacks within the NeuralDrive environment.

Supported Graphics Hardware

Vendor	Family	Example Cards	Compute Stack	Driver	Status
NVIDIA	Turing	RTX 2060-2080, T4	CUDA 12.x	535+	Supported
NVIDIA	Ampere	RTX 3060-3090, A100, A2000	CUDA 12.x	535+	Supported
NVIDIA	Ada Lovelace	RTX 4060-4090, L40	CUDA 12.x	535+	Supported
NVIDIA	Hopper	H100	CUDA 12.x	535+	Supported
AMD	RDNA 3	RX 7600-7900 XTX	ROCm 6.x	amdgpu	Supported
AMD	CDNA 2/3	MI250, MI300	ROCm 6.x	amdgpu	Supported
Intel	Arc Alchemist	A770, A750	oneAPI/SYCL	i915	Experimental
CPU-only	Any x86_64	Any	N/A	N/A	Supported (slow)

VRAM Recommendations

The table below outlines the minimum VRAM requirements for common model sizes at various quantization levels.

Model Size	Quantization (Q4_K_M)	Quantization (Q8_0)	Full Weights (FP16)
7B - 8B	6 GB	10 GB	16 GB
13B - 14B	10 GB	16 GB	32 GB
30B - 34B	24 GB	40 GB	64 GB
70B	48 GB	80 GB	140 GB

Note: These values are estimates and do not include the memory required for context window overhead (KV cache). High context lengths will increase VRAM consumption.

Important Hardware Considerations

Secure Boot: NVIDIA drivers require MOK (Machine Owner Key) enrollment or Secure Boot to be disabled. NeuralDrive uses DKMS-based driver installation.
Bus Interface: PCIe 4.0 or 5.0 is recommended to minimize latency during model loading and context ingestion.
Power Supply: Ensure your power supply (PSU) is rated for the peak power consumption of your GPU(s) during inference.
Cooling: GPUs can generate significant heat during long-running inference tasks. Ensure your system has adequate thermal management.

Note: For detailed hardware setup instructions, see Hardware Requirements. If your GPU is not detected, consult the GPU Troubleshooting guide.

Audience: Admin

Configuration Files

This document provides a complete reference for all critical configuration and state files within the NeuralDrive appliance.

Master File Inventory

File	Purpose	Format	Owner
`/etc/neuraldrive/ollama.conf`	Ollama baked-in defaults	KEY=VALUE	root:neuraldrive-admin
`/var/lib/neuraldrive/config/ollama.conf`	Persistent Ollama overrides	KEY=VALUE	root:neuraldrive-admin
`/etc/neuraldrive/config.yaml`	TUI overlay fallback config	YAML	root:neuraldrive-admin
`/var/lib/neuraldrive/config/config.yaml`	Persistent TUI configuration	YAML	root:neuraldrive-admin
`/var/lib/neuraldrive/config/api.key`	Persistent API key	plaintext	root:root (600)
`/etc/neuraldrive/api.key`	System API key (synced)	plaintext	root:root (600)
`/var/lib/neuraldrive/config/credentials.conf`	Persistent credentials	KEY=VALUE	root:root (600)
`/etc/neuraldrive/webui.env`	Open WebUI configuration	KEY=VALUE	root:neuraldrive-admin
`/etc/neuraldrive/caddy.env`	Caddy API key environment	KEY=VALUE	root:neuraldrive-admin
`/etc/neuraldrive/api.env`	System API environment	KEY=VALUE	root:neuraldrive-admin
`/etc/neuraldrive/Caddyfile`	Caddy reverse proxy configuration	Caddyfile	root:neuraldrive-caddy
`/etc/neuraldrive/nftables.conf`	Global firewall rules	nftables	root:root
`/etc/neuraldrive/neuraldrive-models.yaml`	Model catalog definitions	YAML	root:neuraldrive-admin
`/etc/neuraldrive/version`	Build version string	plaintext	root:root
`/etc/neuraldrive/tls/server.crt`	System TLS certificate	PEM	root:neuraldrive-caddy
`/etc/neuraldrive/tls/server.key`	System TLS private key	PEM	root:neuraldrive-caddy (600)
`/etc/neuraldrive/tls/neuraldrive-ca.crt`	Root CA for clients	PEM	root:root (644)
`/etc/neuraldrive/firewall-custom.conf`	User-defined firewall rules	nftables	root:root
`/run/neuraldrive/gpu.conf`	GPU detection results (at boot)	KEY=VALUE	root:root (runtime)

Key Configuration Reference

`ollama.conf`

Defines the behavior of the underlying LLM inference engine. The Ollama service uses two configuration sources:

/etc/neuraldrive/ollama.conf — baked-in system defaults.
/var/lib/neuraldrive/config/ollama.conf — persistent user overrides.

Values in the persistent file override the system defaults.

OLLAMA_HOST=127.0.0.1:11434
OLLAMA_MODELS=/var/lib/neuraldrive/models/
OLLAMA_KEEP_ALIVE=5m
OLLAMA_MAX_LOADED_MODELS=0
OLLAMA_NUM_PARALLEL=1

OLLAMA_MAX_LOADED_MODELS: Set to 0 for "auto" mode. Ollama automatically manages how many models stay loaded based on available VRAM, using Least Recently Used (LRU) eviction when memory is required for a new request.

`webui.env`

Configures the Open WebUI chat interface and authentication.

OLLAMA_BASE_URL=http://localhost:11434
DATA_DIR=/var/lib/neuraldrive/webui
ENABLE_SIGNUP=false
DEFAULT_USER_ROLE=user
WEBUI_AUTH=true
WEBUI_NAME=NeuralDrive
ENABLE_EASTER_EGGS=false

`api.key`

This file contains the master authentication token (nd-xxxx) used to secure both the inference API and the system management API. It is generated during the first-boot initialization and should be treated with high sensitivity.

`Caddyfile`

NeuralDrive's reverse proxy configuration. It manages TLS termination and routing to internal services.

Note: For instructions on using your own TLS certificates, see TLS Certificates. To modify boot-time behavior, consult Boot Parameters.

Audience: Admin

Boot Parameters

NeuralDrive supports several kernel command-line parameters to modify system behavior during the boot process.

Supported Parameters

Parameter	Default	Description
`neuraldrive.ssh=1`	unset	Enables the SSH server on boot. Required for remote management.
`neuraldrive.safe=1`	unset	Safe Mode: Skips GPU detection and driver loading. Forces CPU inference.
`neuraldrive.debug=1`	unset	Enables verbose logging during the entire boot sequence.
`neuraldrive.ip=<IP>`	DHCP	Sets a static IP address for the primary network interface.
`persistence`	set	Enables the persistence partition if detected on the USB media.
`toram`	unset	Loads the entire system image into system RAM. Required for CD mode.
`nomodeset`	unset	Disables Kernel Mode Setting (KMS), providing a generic VGA console.

How to Modify Parameters

One-time Modification

To temporarily add a parameter during boot:

Reboot the system and wait for the GRUB menu to appear.
Press e to edit the current boot entry.
Locate the line starting with linux.
Add your parameter(s) to the end of that line.
Press Ctrl+X or F10 to boot with the modified parameters.

Permanent Modification

To permanently change boot parameters in a custom NeuralDrive image, you must modify the GRUB configuration template before generating the final ISO:

Edit /boot/grub/grub.cfg in your source directory.
Update the linux lines with your desired defaults.

Warning: Modifying the persistence parameter may result in data loss if not configured correctly for your target media.

Note: For issues related to system startup, consult the Boot Failures troubleshooting guide.

Audience: Admin

Service Reference

This document provides a detailed overview of the systemd services that power the NeuralDrive appliance.

Master Service Inventory

Service	Type	User	Port	Dependencies	Description
`neuraldrive-setup`	oneshot	root	—	network.target	First-boot initialization and state generation.
`neuraldrive-gpu-detect`	oneshot	root	—	—	GPU auto-detection and driver selection via PCI enumeration.
`neuraldrive-certs`	oneshot	root	—	network-online, local-fs	Generates self-signed TLS certificates (skips if they already exist).
`neuraldrive-zram`	oneshot	root	—	local-fs	Sets up compressed RAM-based swap space.
`neuraldrive-show-ip`	oneshot	root	—	network-online	Displays the current IP address on the physical console.
`neuraldrive-ollama`	long-running	neuraldrive-ollama	11434	gpu-detect	The GPU-accelerated LLM inference and model management engine.
`neuraldrive-webui`	long-running	neuraldrive-webui	3000	ollama	The Open WebUI dashboard and chat interface.
`neuraldrive-caddy`	long-running	neuraldrive-caddy	443, 8443	certs	The TLS reverse proxy and API gateway.
`neuraldrive-gpu-monitor`	long-running	neuraldrive-monitor	1312	gpu-detect	Monitors GPU temperature, VRAM usage, and health.
`neuraldrive-system-api`	long-running	neuraldrive-api	3001	network	The backend service for the System Management API.
`neuraldrive-storage-monitor`	long-running	root	—	local-fs	Monitors available storage space and persistence health.

Systemd Hardening Summary

All NeuralDrive services are configured with systemd-native security hardening to minimize the system attack surface:

PrivateDevices: Most services are denied access to /dev/ nodes. The Ollama service specifically uses PrivateDevices=no to allow access to GPU device nodes required for hardware acceleration.
DeviceAllow: This directive was removed from the Ollama service because cgroup v2 eBPF filters can block CUDA access even when devices are explicitly allowed.
ProtectSystem: The root filesystem is mounted read-only for service processes.
ProtectHome: Services have no access to the /home/ directory.
NoNewPrivileges: Prevents processes from gaining elevated permissions via setuid or setgid.
RestrictAddressFamilies: Limits network communication to only necessary protocols (e.g., AF_INET, AF_INET6, AF_UNIX).

Tip: You can monitor the status and logs of any service using the systemctl status <service> and journalctl -u <service> commands. For a list of common service-related issues, see the Common Issues troubleshooting guide.

Audience: Developers

API Endpoint Reference

NeuralDrive provides two primary interfaces for model inference: an OpenAI-compatible API for standard tool integration and the native Ollama API for low-level control.

Authentication

All API requests must include the nd-xxxx API key in the Authorization header:

Authorization: Bearer nd-xxxx

OpenAI-Compatible API

Base URL: https://<IP_ADDRESS>:8443/v1/

Method	Path	Description
`POST`	`/v1/chat/completions`	Chat completions (supports streaming).
`POST`	`/v1/completions`	Text completions for non-chat models.
`GET`	`/v1/models`	Lists all available local models.
`POST`	`/v1/embeddings`	Generates vector embeddings for a given input.

Chat Completion Example

curl https://neuraldrive.local:8443/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer nd-xxxx" \
  -d '{
    "model": "llama3:8b",
    "messages": [
      {"role": "user", "content": "How do I secure an API?"}
    ]
  }'

Native Ollama API

Base URL: https://<IP_ADDRESS>:8443/api/

Method	Path	Description
`POST`	`/api/generate`	Low-level text generation.
`POST`	`/api/chat`	Native chat completion format.
`GET`	`/api/tags`	List locally installed model tags.
`POST`	`/api/pull`	Download a new model from the registry.
`POST`	`/api/show`	Retrieve detailed model metadata.
`DELETE`	`/api/delete`	Remove a local model.
`POST`	`/api/copy`	Create a copy or alias of a model.

Native Chat Example

curl https://neuraldrive.local:8443/api/chat \
  -H "Authorization: Bearer nd-xxxx" \
  -d '{
    "model": "llama3:8b",
    "messages": [
      {"role": "user", "content": "Explain quantization."}
    ],
    "stream": false
  }'

Note: For information on how to manage the NeuralDrive system itself (logs, services, networking), see the System Management API reference.

Audience: Admin / Developers

System Management API Reference

NeuralDrive provides a dedicated management API for monitoring health, controlling services, and configuring the underlying system.

Authentication

All management requests must include the system API key (found in /etc/neuraldrive/api.key) in the Authorization header:

Authorization: Bearer nd-xxxx

Base URL: https://<IP_ADDRESS>:8443/system/

Health and Status

GET /system/status

Returns high-level system metrics and version information.

Response Schema:

{
  "hostname": "string",
  "cpu_percent": 12.5,
  "memory": {
    "total_gb": 32,
    "used_percent": 45.1
  },
  "disk": {
    "models": {
      "total": "512G",
      "used": "120G",
      "free": "392G"
    },
    "total": "1T"
  },
  "uptime_seconds": 86400,
  "version": "v1.2.0"
}

GET /system/gpu

Reports detected GPU hardware and real-time utilization.

Response Schema:

{
  "vendor": "NVIDIA",
  "devices": [
    {
      "name": "RTX 4090",
      "vram_total_mb": 24576,
      "vram_used_mb": 4096,
      "temp_c": 55
    }
  ]
}

Service Management

GET /system/services

Lists all core services and their current runtime status.

POST /system/services/{name}/

Controls a specific system service.

Actions: start, stop, restart
Allowed Services: neuraldrive-ollama, neuraldrive-webui, neuraldrive-caddy, neuraldrive-gpu-monitor, neuraldrive-system-api, neuraldrive-certs.

GET /system/logs

Retrieves recent journal logs for a specific service.

Query Parameters: service=ollama, lines=50

System Configuration

GET /system/network

Returns current network configuration including interface IPs, hostname, and mDNS status.

POST /system/network/hostname

Updates the system hostname.

Query Parameters: hostname=new-name

POST /system/api-keys/rotate

Generates a new master API key and invalidates the previous one.

Warning: This will immediately break existing client integrations until they are updated with the new key.

GET /system/ca-cert

Downloads the root CA certificate used for TLS signing.

Note: This endpoint does not require authentication.

Note: For information on LLM inference, see the API Endpoint Reference. For details on the system architecture, see Service Reference.

Audience: Admin

Port Reference

This document outlines the network ports utilized by NeuralDrive for internal communication and external access.

Port Assignment Matrix

Port	Protocol	Service	Exposure	Description
443	TCP	`neuraldrive-caddy`	External	Primary Web UI dashboard access (HTTPS).
8443	TCP	`neuraldrive-caddy`	External	API gateway and System Management Panel (HTTPS).
5353	UDP	`avahi-daemon`	External	mDNS discovery for `neuraldrive.local` resolution.
22	TCP	`sshd`	External	Optional SSH access (rate-limited, must be enabled via boot parameter).
11434	TCP	`neuraldrive-ollama`	Internal only	The underlying Ollama inference API.
3000	TCP	`neuraldrive-webui`	Internal only	The Open WebUI dashboard backend.
3001	TCP	`neuraldrive-system-api`	Internal only	The System Management API backend.
1312	TCP	`neuraldrive-gpu-monitor`	Internal only	The GPU Hot health and telemetry monitor.

Exposure Definitions

External: These ports are open on the system firewall (nftables) and are accessible from other machines on the local network.
Internal only: These ports are bound exclusively to the localhost (127.0.0.1) interface. They are not reachable from the network. Caddy acts as a secure reverse proxy to these services, providing TLS termination and authentication.

Warning: Never modify the internal port bindings, as they are hard-coded into the NeuralDrive security model.

Note: For more information on configuring the firewall, see Firewall Configuration. For details on the API endpoints exposed via these ports, see API Endpoint Reference.

Audience: Everyone

Glossary

This alphabetical list defines technical terms and concepts utilized throughout the NeuralDrive documentation.

API Key: A unique authentication token (nd-xxxx) used to secure access to the inference and system management APIs.
Avahi: A system that facilitates service discovery on a local network via mDNS. It allows the neuraldrive.local hostname to resolve without a central DNS server.
Caddy: A high-performance, memory-safe web server that serves as NeuralDrive's reverse proxy, managing TLS encryption and request routing.
CUDA: NVIDIA's parallel computing platform and programming model that enables hardware acceleration on NVIDIA GPUs.
GGUF: The primary file format used by NeuralDrive for storing and distributing quantized LLM weights. It is optimized for fast loading and efficient memory usage.
Inference: The process of using a trained machine learning model to generate an output (e.g., text, images, or embeddings) based on input data.
Live System: An operating system designed to boot and run entirely from removable media (like a USB drive) without requiring installation to a permanent hard disk.
LUKS: Linux Unified Key Setup. The standard for Linux disk encryption, used by NeuralDrive to secure data on the persistence partition.
mDNS: Multicast DNS. A protocol that resolves hostnames in small networks that do not have a dedicated local DNS server.
Ollama: The underlying inference engine in NeuralDrive that manages downloading, loading, and serving large language models.
Open WebUI: A feature-rich, self-hosted web interface that provides a user-friendly chat environment for interacting with local LLMs.
Overlayfs: A union filesystem that allows NeuralDrive to layer a writable storage area (the persistence partition) over a read-only base system.
Persistence: A dedicated writable partition on the NeuralDrive USB media that stores downloaded models, user accounts, and system configuration between reboots.
Quantization: The process of reducing the precision of a model's weights (e.g., from 16-bit to 4-bit) to reduce its memory footprint and increase inference speed.
RAG: Retrieval-Augmented Generation. A technique that combines LLM generation with external data retrieval to improve the accuracy and relevance of responses.
ROCm: AMD's open-source software stack for GPU computing, enabling hardware acceleration on compatible AMD graphics cards.
SquashFS: A highly compressed, read-only filesystem used for the base NeuralDrive operating system image.
TUI: Terminal User Interface. The text-based management console that appears on the physical NeuralDrive device for initial setup and monitoring.
VRAM: Video RAM. The high-speed memory dedicated to the GPU, which determines the maximum size of the model that can be hardware-accelerated.
zram: A kernel feature that creates a compressed swap area in system RAM, increasing effective memory capacity for memory-intensive LLM tasks.

NeuralDrive User Guide