Audience: New users and system administrators seeking a high-level overview of NeuralDrive.
Introduction
NeuralDrive is a specialized Linux distribution designed to turn any x86_64 computer into a high-performance, headless Large Language Model (LLM) inference server. By booting directly from a LiveUSB or LiveCD, you can deploy a complete AI stack—including GPU drivers, runtimes, and web interfaces—without modifying your existing operating system or performing complex manual installations.
How to Use This Documentation
This guide is structured to take you from initial hardware selection to advanced model management. Throughout the manual, you will encounter audience badges that indicate the technical depth of specific sections:
- [User]: General concepts and web interface usage.
- [Admin]: Network configuration, security settings, and hardware management.
- [Developer]: API integration and custom image building.
If you are ready to begin, proceed directly to the Quick Start guide.
Version Note
This documentation covers NeuralDrive based on Debian 12 (Bookworm). It includes support for the latest stable releases of Ollama, Open WebUI, and major GPU compute stacks including NVIDIA CUDA 12.x and AMD ROCm 6.x.
Audience: Users and decision-makers looking for a conceptual overview of the NeuralDrive platform.
What is NeuralDrive
Running large language models locally often requires significant Linux expertise, including managing complex GPU driver installations, orchestrating runtimes like Ollama or llama.cpp, and configuring network security. NeuralDrive removes these barriers by providing a pre-configured, bootable environment that transforms any compatible hardware into a dedicated LLM server.
Overview
NeuralDrive is a Debian 12-based LiveCD/LiveUSB distribution that boots entirely into RAM. It provides a headless inference environment that is ready to use in less than two minutes. By automating hardware detection and driver loading, it ensures that your NVIDIA, AMD, or Intel GPU is immediately available for model acceleration.
Key Features
- Rapid Deployment: Boot-to-inference in under two minutes.
- Hardware Autoprobe: Automatic detection and configuration for NVIDIA (CUDA), AMD (ROCm), and Intel Arc (oneAPI) hardware.
- Standardized API: Fully OpenAI-compatible API available at port 8443, allowing immediate connection with popular coding agents and tools.
- Web-Based Management: Access a feature-rich dashboard via Open WebUI for model downloading and interactive chatting.
- Local Management: A Python-based Textual TUI is available on the local console for system status and network configuration.
- Persistence Support: Configurations and downloaded models survive reboots when using a USB drive with a designated "persistence" partition.
- Customizable: Includes a toolkit for users to build their own customized system images.
Architecture
NeuralDrive uses a layered approach to ensure stability and performance across different hardware configurations.
+-------------------------------------------------------+
| User Interfaces |
| (Open WebUI Dashboard :443 / TUI Console / API) |
+-------------------------------------------------------+
| Security Layer |
| (Caddy Reverse Proxy / nftables / Bearer Auth) |
+-------------------------------------------------------+
| Runtime Stack |
| (Ollama / llama.cpp advanced) |
+-------------------------------------------------------+
| GPU Compute Layer |
| (NVIDIA CUDA / AMD ROCm / Intel oneAPI) |
+-------------------------------------------------------+
| Operating System |
| (Debian 12 / SquashFS / OverlayFS) |
+-------------------------------------------------------+
| Boot Media |
| (LiveUSB / LiveCD / ISO) |
+-------------------------------------------------------+
Design Goals
The development of NeuralDrive is guided by specific performance and usability targets.
| Priority | Goal | Measure |
|---|---|---|
| P0 | Boot-to-inference <2min | First API response within 120s of power-on |
| P0 | GPU auto-detection | NVIDIA, AMD, and Intel GPUs work without manual driver installs |
| P0 | OpenAI-compatible API | Popular coding agents connect out of the box |
| P1 | Multiple concurrent models | Load and unload different models without system restarts |
| P1 | USB persistence | Downloaded models and system configs survive reboots |
| P2 | Web dashboard | Full remote management via browser |
| P3 | Custom image toolkit | Provide tools for users to build and sign their own images |
Use Cases
NeuralDrive is designed for environments where privacy, simplicity, and performance are paramount.
- Home Labs: Run private LLMs on consumer hardware without cluttering your primary OS.
- Developer Workstations: Quickly spin up an inference server to test local AI-integrated applications.
- Small Offices: Provide a shared, local AI resource for a small team over a local network.
- Air-Gapped Environments: Deploy AI capabilities to systems with restricted or no internet access by pre-loading models onto the USB media.
What NeuralDrive is Not
To maintain its focus as a specialized inference tool, NeuralDrive excludes several common features found in other platforms:
- Not a Cloud Service: NeuralDrive is local software that runs on your hardware; no data is sent to external servers unless you explicitly configure it.
- Not Docker-Based: The entire stack runs directly on the OS for maximum performance and reduced overhead.
- Not a Desktop OS: NeuralDrive is a headless server distribution. While it provides a web interface, it does not include a traditional desktop environment like GNOME or KDE.
Audience: Users planning hardware deployments for NeuralDrive inference servers.
Hardware Requirements
NeuralDrive is designed to run on a wide range of x86_64 hardware, but performance varies significantly depending on your CPU, system RAM, and GPU.
Minimum Requirements
The following specifications are suitable for running smaller models (up to 7B parameters) with Q4 quantization.
| Component | Minimum Specification |
|---|---|
| CPU | x86_64 with AVX2 support |
| System RAM | 8 GB |
| GPU | Optional; 6 GB VRAM recommended for acceleration |
| Storage | 16 GB USB 3.0 flash drive |
Recommended Specifications
For high-performance inference using larger models (13B to 70B parameters) or concurrent model loading, the following hardware is recommended.
| Component | Recommended Specification |
|---|---|
| CPU | x86_64 with AVX-512 support |
| System RAM | 32 GB – 64 GB |
| GPU | 24 GB+ VRAM (NVIDIA RTX 3090/4090 or AMD RX 7900 XTX) |
| Storage | 128 GB+ USB 3.0 or external SSD for model persistence |
GPU Compatibility Matrix
NeuralDrive automatically detects and configures drivers for major GPU vendors.
| Vendor | Generation | Example Hardware | Driver Type | Compute Stack | Status |
|---|---|---|---|---|---|
| NVIDIA | Ada Lovelace | RTX 4090 | Proprietary 535+ | CUDA 12.x | Supported |
| NVIDIA | Ampere | RTX 3060 | Proprietary 535+ | CUDA 12.x | Supported |
| NVIDIA | Pascal | GTX 1080 | Proprietary 535+ | CUDA 12.x | Supported |
| AMD | RDNA 3 | RX 7900 XTX | amdgpu + ROCm | ROCm 6.x | Supported |
| AMD | RDNA 2 | RX 6800 XT | amdgpu + ROCm | ROCm 6.x | Supported |
| Intel | Arc | A770 | compute-runtime | oneAPI | Experimental |
| None | CPU | Any x86_64 | N/A | AVX2/AVX-512 | Supported |
Model Size Cheat Sheet
Use this guide to determine if your hardware can support specific model sizes.
- 3B Models: 8 GB System RAM.
- 8B Models: 16 GB System RAM, 8 GB VRAM.
- 70B Models: 64 GB System RAM, 24 GB+ VRAM.
Important Hardware Notes
UEFI and Secure Boot
NeuralDrive supports both hybrid BIOS and UEFI boot modes. However, the proprietary NVIDIA drivers may require you to disable Secure Boot or enroll a Machine Owner Key (MOK) during the first boot. If the GPU is not detected on an NVIDIA system, verify your Secure Boot status in the BIOS/UEFI settings.
USB Media Selection
For the best experience, use a USB 3.0 or faster flash drive. If you plan to maintain a large library of high-parameter models, booting from an external SATA or NVMe SSD via a USB enclosure is strongly recommended for faster load times.
Audience: Users ready to install and deploy NeuralDrive for the first time.
Quick Start Guide
This guide describes how to flash NeuralDrive to a USB drive and start your first inference server.
Prerequisites
- NeuralDrive ISO file.
- A USB flash drive (16 GB or larger).
- A target computer with x86_64 architecture and compatible GPU.
Step 1: Flash the USB Drive
The method depends on which operating system you are using to write the USB drive.
Linux — Use the automated flash script for the simplest experience:
sudo ./scripts/neuraldrive-flash.sh neuraldrive.iso /dev/sdX
This writes the image and creates the persistence partition in one step.
macOS — Use dd with macOS device paths:
diskutil list # find your USB (e.g., /dev/disk4)
diskutil unmountDisk /dev/diskN
sudo dd if=neuraldrive.iso of=/dev/rdiskN bs=4m status=progress
diskutil eject /dev/diskN
Windows — Use Rufus or Balena Etcher to write the ISO to your USB drive.
Any platform — Balena Etcher provides a graphical interface that works on Linux, macOS, and Windows.
[!NOTE] On macOS and Windows, the persistence partition cannot be created during flashing. NeuralDrive will detect this on first boot and offer to set it up automatically. For full details on each method and persistence setup, see Writing the USB Drive.
Step 2: Boot from USB
- Insert the USB drive into the target machine.
- Power on the machine and access the BIOS/UEFI boot menu (typically by pressing F12, F11, or ESC).
- Select the NeuralDrive USB device and press Enter.
Step 3: Complete First-Boot Wizard
NeuralDrive will automatically launch a seven-step wizard to configure your server.
- Set the administrative password.
- Generate your API key.
- Configure Wi-Fi or Ethernet settings.
- Set up the local storage and persistence partition.
Step 4: Record the IP Address
Once the wizard is complete, the local console (TUI) will display the system's IP address and mDNS hostname (default: neuraldrive.local). Note this address for remote access.
Step 5: Access the Web Dashboard
- Open a web browser on a different computer on the same network.
- Navigate to
https://<IP-ADDRESS>/(orhttps://neuraldrive.local/). - You will receive a self-signed certificate warning; accept it to proceed.
Step 6: Log In
Log in using the administrative credentials you created during the first-boot wizard.
Step 7: Pull a Model
- Navigate to the model management section of the dashboard.
- Enter the name of a model (e.g.,
llama3.1) and click Pull. - Once the download is complete, you can begin chatting or using the API.
[!TIP] Connecting a coding agent? See the Connecting Coding Agents guide for API details.
[!NOTE] Booting from CD? If you are using read-only media, see CD Mode vs USB Mode for details on RAM-only operation.
This chapter is for everyone.
USB Writing
NeuralDrive is distributed as a hybrid ISO image. This format allows the same file to be written to USB drives or burned to physical optical media. Because NeuralDrive runs as a live system, your choice of writing method determines whether your settings and models persist across reboots.
WARNING: Writing the NeuralDrive ISO to a USB drive or disk destroys all existing data on that device. Ensure you have backed up any important files before proceeding.
Recommended Method: Automated Flash Script
The most reliable way to create a NeuralDrive USB is using the provided neuraldrive-flash.sh script. This script automates the dd write process and the creation of the required persistence partition in a single step.
- Locate the script in the
scripts/directory of the NeuralDrive repository. - Identify your USB device path (e.g.,
/dev/sdXor/dev/nvmeXn1). - Run the script with root privileges:
sudo ./scripts/neuraldrive-flash.sh neuraldrive.iso /dev/sdX
The script performs the following actions:
- Writes the ISO image to the device using
dd(bs=4M, conv=fsync). - Automatically executes
prepare-usb.shto configure the persistence layer.
Manual Writing Options
If you cannot use the automated script, choose one of the following manual methods based on your operating system.
Linux: Manual dd
-
Identify your USB device:
lsblkLook for your USB drive by size. It will appear as
/dev/sdXor/dev/nvmeXn1. Do not use a partition path like/dev/sdb1— use the whole-disk device. -
Unmount any mounted partitions on the device:
sudo umount /dev/sdX* -
Write the ISO to the USB device:
sudo dd if=neuraldrive.iso of=/dev/sdX bs=4M conv=fsync status=progress -
Initialize the persistence partition:
sudo /usr/lib/neuraldrive/prepare-usb.sh /dev/sdXThe
prepare-usb.shscript creates an ext4 partition labeled "persistence" and writes the necessarypersistence.conffile to enable union mounts.
macOS: dd with diskutil
macOS uses different device paths and a slightly different dd syntax.
-
Identify your USB device:
diskutil listLook for your USB drive by size. It will appear as
/dev/diskN(e.g.,/dev/disk4). Do not use a partition path like/dev/disk4s1— use the whole-disk device. -
Unmount the USB drive (this does not eject it):
diskutil unmountDisk /dev/diskN -
Write the ISO using the raw device (
rdiskN) for significantly faster writes:sudo dd if=neuraldrive.iso of=/dev/rdiskN bs=4m status=progressNote: macOS
dduses lowercase4m(not4M), andconv=fsyncis not supported. The raw device path/dev/rdiskNbypasses the buffer cache and is roughly 10x faster than/dev/diskN. -
Eject the drive:
diskutil eject /dev/diskN -
Persistence partition: The
prepare-usb.shscript requires Linux tools (sfdisk,mkfs.ext4) and cannot run directly on macOS. To set up persistence, choose one of:- Boot NeuralDrive first: Boot the USB on the target machine. On first boot, the system will detect the missing persistence partition and offer to create it.
- Use the Docker builder: Pass the USB device into the builder container and run the script there.
- Use any Linux machine: Mount the USB on a Linux system and run
sudo /usr/lib/neuraldrive/prepare-usb.sh /dev/sdX.
Windows: Rufus
Rufus is a free, open-source tool for writing ISO images on Windows.
- Download and run Rufus.
- Under Device, select your USB drive.
- Under Boot selection, click SELECT and choose the
neuraldrive.isofile. - Set Partition scheme to GPT and Target system to UEFI.
- Click START and wait for the write to complete.
Note: Rufus may offer to write in "ISO Image mode" or "DD Image mode." Either mode works. If you encounter boot issues, try DD Image mode.
- Persistence partition: Rufus does not create the NeuralDrive persistence partition. After flashing, set up persistence using one of the methods described in the macOS section above (boot the target machine, use a Linux system, or use the Docker builder).
Cross-Platform GUI: Balena Etcher
Balena Etcher works on Linux, macOS, and Windows.
- Download and install Balena Etcher.
- Click Flash from file and select the
neuraldrive.isofile. - Click Select target and choose your USB drive.
- Click Flash and wait for the write and verification to complete.
Important: Balena Etcher does not create the persistence partition. Follow the persistence setup instructions for your platform described above.
Ventoy
NeuralDrive is compatible with Ventoy. Simply copy the neuraldrive.iso file to your Ventoy-enabled USB drive. Note that persistence setup via Ventoy may require additional manual configuration not covered by the standard prepare-usb.sh script.
Partition Layout
After a successful flash and persistence setup, your USB drive will have the following partition structure:
| Partition | Label | Type | Filesystem | Size | Purpose |
|---|---|---|---|---|---|
| 1 | EFI | EFI System | FAT32 | 512 MiB | UEFI boot files |
| 2 | NBOOT | Linux | ext2 | 1 GiB | GRUB and kernel images |
| 3 | NSYSTEM | Linux | SquashFS | ~8 GiB | Read-only root filesystem |
| 4 | persistence | Linux | ext4 | Remaining | Persistent storage for models and settings |
Verification
To verify your USB drive is ready:
- Ensure the drive is recognized by your system.
- Check that the "persistence" partition is present and labeled correctly.
- Confirm that the drive is bootable in your system's UEFI settings.
Note on CD/DVD Burning
NeuralDrive can be burned to physical media using any standard ISO burning tool (such as Brasero on Linux or Disk Utility on macOS). Note that physical discs are read-only; persistence features will not be available, and all data will be stored in RAM.
For instructions on what to do after your USB is ready, see First Boot.
This chapter is for everyone.
First Boot
Setting up NeuralDrive for the first time requires a local keyboard and monitor. Once the initial configuration is complete, you can manage the system entirely over your network.
Booting NeuralDrive
Insert your NeuralDrive USB and power on your hardware. Access your system's boot menu (usually via F12, F11, or Esc) and select the USB drive. You will see the GRUB boot menu with several options:
- NeuralDrive (Normal): The standard boot mode. Enables data persistence and hardware acceleration via
nvidia-drm.modeset=1. - NeuralDrive (Safe Mode): Use this if you encounter display or boot errors. This mode disables modesetting (
nomodeset), advanced interrupt controllers (noapic), and persistence. - NeuralDrive (CD Mode - RAM Only): Loads the entire system into memory (
toram). Useful for testing or when using physical optical media where persistence is not possible. - NeuralDrive (Debug): Provides detailed boot information by setting
systemd.log_level=debug. Use this when troubleshooting startup failures.
Boot Sequence Overview
NeuralDrive follows a structured startup process:
- GRUB: Loads the initial bootloader and kernel parameters.
- live-boot: Initializes the Debian live environment and mounts the persistent partition.
- systemd: Starts core system services.
- GPU Detect: Automatically identifies NVIDIA, AMD, or Intel hardware and loads appropriate drivers.
- Services: Launches the internal API, system monitor, and web interface.
- TUI: Displays the final status screen.
Once the boot process is complete, the console will display your system's IP address:
NeuralDrive is ready! Dashboard: https://192.168.x.x/
First-Boot Wizard
If the system has not been initialized, a Text User Interface (TUI) wizard will start automatically. The wizard runs as part of the TUI application, checking for a sentinel file on startup. You must complete these six steps to prepare your server:
- Welcome: Introductory screen with hardware summary and system health check.
- Storage/Persistence: Detects your USB boot device and creates an ext4 persistence partition on unused space. This step also creates required directories under
/var/lib/neuraldrive/(ollama, models, config, webui, logs). - Security: Sets the administrator password and configures system credentials.
- Network: Configure your network connection, including Wi-Fi (if applicable) and IP assignment (DHCP or static).
- Models: Select initial LLM models to download based on your hardware capabilities.
- Done: Final completion summary and display of system credentials.
Re-running the Wizard
If you need to reset your configuration, run neuraldrive-tui --wizard from the console. This command removes the sentinel file and forces the wizard to run again on the next TUI launch.
Write Down Your Credentials
At the end of the wizard, your final credentials and the dashboard URL will be displayed. Record these immediately, as they are required for logging into the web dashboard.
System Initialization Files
NeuralDrive uses a sentinel file to track its state:
/etc/neuraldrive/first-boot-complete: Confirms the user setup wizard has been finished.
Once this file is present, the system will boot directly to the ready state.
Next step: Web Dashboard
This chapter is for everyone.
Web Dashboard
NeuralDrive's web-based interface consists of two applications: the main Chat Dashboard and the administrative System Panel. Both are accessible through your local network.
Accessing the Dashboard
Once your server is booted and configured, navigate to the following URL in your web browser:
https://<SERVER_IP>/ or https://neuraldrive.local/
NeuralDrive uses a secure Caddy web server to proxy all traffic over port 443. Login using the administrator credentials created during the First Boot wizard.
Chat Dashboard (Open WebUI)
The primary interface for NeuralDrive is Open WebUI. It provides a robust environment for managing and interacting with large language models. Key features include:
- Model Management: Download, update, and switch between various models.
- RAG Integration: Upload documents to use as context for your conversations.
- Multi-user Support: Admins can create and manage additional user accounts. Registration is disabled by default for security (
ENABLE_SIGNUP=false). - Custom Branding: The interface is pre-configured with NeuralDrive branding and dark mode.
System Panel
For hardware monitoring and service management, NeuralDrive includes a custom System Panel. This FastAPI application runs separately from the chat interface.
- Main System Panel:
https://<SERVER_IP>:8443/system/ - GPU Monitoring:
https://<SERVER_IP>:8443/monitor/
The System Panel provides real-time data on your hardware performance, thermal status, and running services. This separation ensures that even if a heavy inference task impacts the Chat Dashboard, you can still monitor your hardware health.
Two-Application Architecture
NeuralDrive uses a dual-app architecture to balance user interaction and system reliability:
- Open WebUI: Dedicated to chat, models, and user management.
- System Panel: Dedicated to low-level hardware monitoring and service health.
This design allows for maximum uptime and precise control over your local inference environment.
Next step: Chat Interface
This chapter is for everyone.
Chat Interface
The chat interface is your primary tool for interacting with the AI models running on your NeuralDrive server. It is built to support fluid, real-time conversations.
Starting a New Conversation
To begin a new chat session, click the "New Chat" button located in the sidebar. This opens a fresh workspace where you can enter prompts and receive responses.
Selecting a Model
NeuralDrive allows you to choose which model handles your query.
- Locate the model dropdown menu at the top of the chat interface.
- Select your desired model from the list of installed options.
- You can switch models mid-conversation without needing to reload the page or start over.
Streaming Responses
NeuralDrive features real-time token streaming. As the AI generates a response, the text will appear on your screen immediately. This provides a dynamic experience and allows you to begin reading the output before the complete response is generated.
Conversation History
Your chat history is automatically saved to your NeuralDrive's persistent storage.
- Storage Location: All session data is stored in
/var/lib/neuraldrive/webui/. - Retrieval: You can access previous conversations from the sidebar at any time.
- Concurrent Chats: NeuralDrive supports multiple simultaneous chat sessions, allowing you to manage different tasks or projects independently.
The interface is optimized for high performance, ensuring that even with multiple active sessions, your interaction remains responsive.
For more information on the broader system management, return to the Web Dashboard chapter.
This chapter is for everyone.
Model Management via Web
The Open WebUI provides a rich, graphical interface for managing your LLMs. This is the recommended method for most users who are connected to the NeuralDrive network.
Accessing the Models Page
Once logged into the Open WebUI, you can find the model management tools in the settings or administration area, typically under a Models tab.
Viewing Downloaded Models
The Models page displays a list of all models currently stored on your NeuralDrive. For each model, you can see:
- Name and Tag: (e.g.,
llama3.1:8b) - Size: The disk space occupied by the model.
- Quantization: The precision level of the weights.
- Last Used: When the model was last loaded for a conversation.
Pulling New Models
To download a new model from the official registry:
- Locate the input field titled "Pull a model from Ollama.com".
- Enter the model string (e.g.,
mistral:7b). - Click the download/pull button.
- A progress bar will appear. You can navigate away from the page, and the download will continue in the background.
Deleting Models
If you need to free up storage space:
- Find the model you wish to remove in the list.
- Click the trash can or delete icon associated with that model.
- Confirm the deletion when prompted.
Note that deleting a model from the web UI is permanent and removes the files from the /var/lib/neuraldrive/models/ directory.
Model Details and Customization
Clicking on an individual model in the list allows you to view more detailed metadata, including the Modelfile used to create it. Advanced users can use this interface to create "Model Files" which are customized versions of base models with specific system prompts or parameters (like temperature and top-k) pre-configured.
This chapter assumes familiarity with RAG concepts.
Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is a technique that allows an LLM to access and reference specific information from your own documents during a conversation. This effectively gives the model "long-term memory" and access to data it wasn't originally trained on.
How RAG Works in NeuralDrive
When you upload a document, NeuralDrive processes it through several steps:
- Parsing: The text is extracted from the file (PDF, TXT, DOCX, etc.).
- Chunking: The text is broken down into smaller, manageable pieces.
- Embedding: Each chunk is converted into a numerical vector that represents its semantic meaning.
- Storage: These vectors are stored in a local vector database on your persistent partition.
When you ask a question in a RAG-enabled chat, the system searches the vector database for chunks that are mathematically similar to your query and provides them to the LLM as context.
Using RAG in Conversations
To use your documents in a chat:
- Upload Documents: Use the Documents page in the Open WebUI to upload your files.
- Select for Chat: In the chat interface, you can select specific documents or entire collections to be used as context for your current session.
- Querying: Simply type your question. The model will analyze the provided document context and generate a response based on those facts.
Limitations on LiveUSB
While RAG is a powerful feature, there are important considerations when running from a LiveUSB:
- Processing Power: Embedding large documents is a CPU and GPU intensive task. Processing a hundreds-of-pages PDF may take several minutes.
- Persistence: Ensure your documents are stored on the persistence partition if you want them to remain available after a reboot.
- Storage Space: Vector databases can grow significantly in size. Monitor your Storage Management closely if you plan to index large libraries of documents.
This chapter is for everyone.
Terminal Interface (TUI)
The NeuralDrive Terminal User Interface (TUI) provides a keyboard-driven dashboard for system monitoring and management. Built using Python 3.11 and the Textual framework, it offers a responsive and high-performance way to interact with your system directly from the console.
When to Use the TUI
While the web interface is excellent for daily chat and document management, the TUI is better suited for:
- Initial Configuration: Checking network status and IP addresses.
- System Monitoring: Real-time tracking of CPU, GPU, and VRAM usage.
- Troubleshooting: Reviewing system logs and restarting services if they become unresponsive.
- Offline Use: Managing models and chatting without needing a second device to access the web UI.
Accessing the TUI
By default, the TUI auto-launches on tty1 for the neuraldrive-admin user. If you are at the physical console of the NeuralDrive machine, it should be the first thing you see after boot.
To run it manually from any shell session, use the command:
neuraldrive-tui
This launcher script (installed at /usr/local/bin/neuraldrive-tui) activates the Python virtual environment and runs the application from /usr/lib/neuraldrive/tui/main.py.
Main Dashboard Layout
The main dashboard provides a high-level overview of system health and active models.
┌──────────────── NeuralDrive v1.0.0 ───────────────── 10:45:22 ─┐
│ Host: neuraldrive.local │ Uptime: 2h 15m │ IP: 192.168.1.50 │
├────────────────────────────────────────────────────────────────┤
│ GPU: NVIDIA RTX 4090 │ VRAM: 12.4/24.0 GB │ Temp: 65°C │ 85% │
│ CPU: 12% │ RAM: 18.2/64.0 GB │ Disk: 45.2 GB │
├────────────────────────────────────────────────────────────────┤
│ LOADED MODELS │
│ ● llama3.1:8b [GPU] 4.7 GB │
│ ● codestral:latest [GPU] 8.2 GB │
│ ○ phi3:mini --- (not loaded) │
├────────────────────────────────────────────────────────────────┤
│ F1 Dashboard F2 Models F3 Services F4 Logs F5 Chat Q Quit │
└────────────────────────────────────────────────────────────────┘
Navigation Keybindings
Navigation is performed using function keys:
- F1: Dashboard overview.
- F2: Models screen for managing downloads and loading state.
- F3: Services screen for restarting or stopping system components.
- F4: Logs screen for real-time system and service logs.
- F5: Chat screen for a lightweight, terminal-based LLM chat.
- Q: Quit the TUI and return to the shell.
Within each screen, the following navigation model is used:
- Tab / Shift+Tab: Cycle focus between different screen zones.
- Arrow Keys: Navigate within a focused zone (e.g., scrolling a list).
- Enter: Activate the currently focused element or button.
Troubleshooting and Debugging
If the TUI encounters a critical error, it will write a crash dump to /var/lib/neuraldrive/logs/tui-crash-*.log.
Screenshots captured within the TUI are saved to /var/lib/neuraldrive/screenshots/.
Re-running the First-Boot Wizard
If you need to force the first-boot wizard to run again, launch the TUI with the --wizard flag:
neuraldrive-tui --wizard
This removes the sentinel file and initiates the guided setup process.
Resilience
The TUI is designed to be resilient. If the underlying Ollama service is unavailable, an "Ollama Offline" badge will appear on the dashboard, and certain model management features will be disabled until the service is restored via the Services screen.
This chapter is for everyone.
TUI Dashboard
The Dashboard is the central monitoring hub of NeuralDrive. It is designed to provide immediate visual feedback on the health and workload of your system.
Layout and Sections
The dashboard is divided into three functional areas:
1. Header and System Info
Located at the very top, this section displays the version of NeuralDrive, the current hostname, system uptime, and the primary IP address. A live system clock is displayed in the upper-right corner, showing the exact time of the last data refresh.
2. Hardware Resource Monitor
This section provides real-time metrics for your hardware.
-
GPU: Displays the detected GPU model name, total VRAM capacity, driver version, and CUDA compute capability.
-
VRAM: Shows the current VRAM usage (e.g., 12.4/24.0 GB) and a percentage bar.
-
Temp: Current GPU temperature in Celsius.
-
CPU: Real-time CPU utilization percentage.
-
RAM: System memory usage (e.g., 18.2/64.0 GB).
-
Disk: Total disk space used on the persistence partition (e.g., 45.2 GB).
-
Refresh Rate: Hardware metrics refresh every 2 seconds. You can press R at any time to trigger a manual refresh of all dashboard data.
3. Loaded Models List
This list displays the models currently residing in memory and ready for immediate inference.
-
Status Indicator: A solid circle (●) indicates the model is currently loaded in memory. An open circle (○) indicates the model is cached on disk but not currently loaded.
-
Backend: Shows if the model is running on the [GPU] or [CPU].
-
VRAM Footprint: The amount of memory the model is currently occupying.
-
Refresh Rate: The model list metrics refresh every 10 seconds.
Interaction
The Dashboard is accessed via F1 from any other screen. While it is primarily for monitoring, you can transition to other management screens using the function keys (F2-F5) shown at the bottom. Use the R key to manually refresh the displayed information.
This chapter is for everyone.
Model Management via TUI
The Model Management screen allows you to download, load, unload, and delete LLMs directly from the terminal.
Access
Press F2 from any screen to access Model Management.
Layout
The screen is organized into three zones that you can navigate between using Tab or Shift+Tab:
- Installed Models list (top zone): A scrollable list of models currently on your system.
- Browse Catalog button (middle zone): Opens a popup to browse the Ollama library.
- Pull by name (bottom zone): A text input field and a Pull button for direct model downloads.
Installed Models List
Each model in the list displays its details in a columnar format. A legend header with / separators appears above the list:
Model name | Params | Quant | Disk | VRAM | Status
- Model name: The name of the model (e.g.,
llama3:8b). - Params: Parameter count of the model.
- Quant: Quantization level.
- Disk: Space occupied on disk.
- VRAM: Measured or cached VRAM usage (e.g., "6.2 GB" or "~6.2 GB").
- Status: Current state of the model ("loaded (GPU)", "loaded (CPU)", or "ready").
Navigation
- Tab / Shift+Tab: Cycle focus between the three zones (models → browse → pull-input → pull-btn).
- Up / Down arrows: Navigate through the installed model list. The view scrolls automatically to follow your focus.
- Left / Right arrows: Navigate between the action buttons (Load/Unload/Delete) for the currently selected model. The cursor automatically skips disabled buttons.
- Enter: Activate the focused button or zone.
- PageUp / PageDown: Fast scroll through the model list.
Model Actions
Each model has specific action buttons:
- Load: Loads the model into VRAM for inference. The status will show "Loading..." while in progress. Loaded models use a
keep_alive: -1setting for infinite retention. - Unload: Removes the model from VRAM. The system polls the engine until the unload is confirmed.
- Delete: Permanently removes the model from disk. A confirmation prompt will appear before deletion.
Downloading Models
Browse Catalog
Selecting the Browse Catalog button opens a scrollable popup listing popular models from the Ollama library. Select a model from the list and confirm to start the download.
Pull by Name
To download a specific model, type its name (e.g., llama3:8b) into the text input field in the bottom zone and press Enter or click the Pull button.
A progress bar will show the download status. You can press Escape or the Cancel button to abort an active download.
VRAM Management
VRAM usage values are measured during operation or retrieved from a cache stored in /var/lib/neuraldrive/config/.
Multiple models can be installed and loaded simultaneously. The underlying engine manages VRAM using an LRU (Least Recently Used) eviction policy when the OLLAMA_MAX_LOADED_MODELS setting is set to auto.
This chapter is for everyone.
Service Control
The Services screen provides a centralized interface for managing the background system processes that power NeuralDrive.
Access
Press F3 from any screen to access Service Control.
Layout
The screen displays a scrollable list of services. Each service is represented by a ServiceItem widget showing the service name, its current status, and a set of action buttons.
Services Managed
The TUI allows you to manage critical NeuralDrive components, including:
neuraldrive-ollama: The core model execution engine.neuraldrive-webui: The browser-based user interface.- Any other configured system services specific to the NeuralDrive distribution.
Navigation
- Up / Down arrows: Navigate between the different services in the list.
- Left / Right arrows: Navigate between the action buttons (Start/Stop/Restart) for the currently selected service. The focus will automatically skip buttons that are disabled based on the service's current state.
- Enter: Activate the focused action button.
Action Buttons
Each service has three colored action buttons that enable or disable dynamically:
- Start (green): Starts a service that is currently stopped or inactive.
- Stop (red): Gracefully shuts down a running service.
- Restart (amber): Stops and immediately restarts a running service. This is often the quickest way to resolve minor connectivity or performance issues.
Auto-Refresh and Monitoring
The status of all services is automatically polled every 5 seconds to ensure the interface reflects the actual state of the system.
If a service like neuraldrive-ollama shows a failed or inactive status, use the action buttons to restore it. Continuous monitoring and manual control through this screen help maintain the overall health of your NeuralDrive instance.
This chapter is for everyone.
System Logs
The System Logs screen provides a real-time view of internal events and service outputs. This is an essential tool for diagnosing issues with model loading, network connectivity, or user authentication. Access this screen by pressing L from the main dashboard.
Logs Screen Interface
┌──────────────── System Logs ─────────────────────────────────┐
│ Service: [All Services] │ Level: [Info+] │ Search: [ ]│
├──────────────────────────────────────────────────────────────┤
│ 10:15:30 [OLLAMA] Loaded llama3.1:8b successfully │
│ 10:15:35 [WEBUI] Admin user logged in from 192.168.1.5 │
│ 10:16:12 [OLLAMA] Error: Connection closed by remote peer │
├──────────────────────────────────────────────────────────────┤
│ [S]elect Service [F]ilter [C]lear [B]ack │
└──────────────────────────────────────────────────────────────┘
Filtering and Navigation
The logs can be voluminous, so the TUI provides several ways to narrow down the information:
- S (Select Service): Cycles through specific services (e.g., OLLAMA, WEBUI, SYSTEM) or displays all services combined.
- F (Filter): Adjusts the minimum severity level of logs displayed (e.g., Info, Warning, Error).
- Search: Allows you to type a keyword to highlight or filter lines containing that string.
- C (Clear): Clears the current view of logs. Note that this does not delete the logs from the disk, only from the current TUI session.
Understanding Log Entries
Each log entry includes a timestamp, the source service name in brackets, and the message content.
- [OLLAMA]: Records model pull progress, loading/unloading events, and inference errors.
- [WEBUI]: Tracks user logins, API requests, and document processing events.
- [SYSTEM]: General OS-level events, including disk space warnings and network changes.
If you encounter an error you don't understand, the specific message provided in the logs is the most useful piece of information to include in a support request or bug report.
Press B or Back to return to the main dashboard.
This chapter is for everyone.
Local Terminal Chat
For quick testing and offline interaction, NeuralDrive includes a lightweight, terminal-based chat interface. This allows you to communicate with your local models without needing a web browser or a network connection.
Launching the Chat
Access the local chat by pressing F5 from any screen.
Using the Chat Interface
- Model Selection: A model selector dropdown at the top of the screen lets you choose which installed model to chat with. The selected model persists even when switching away and returning to the chat screen.
- Messaging: Type your message into the input field at the bottom of the screen. Press Enter to send.
- Streaming Responses: The model's response will stream directly into the terminal window in real-time.
- Keyboard Shortcuts:
- F1-F4: Switch to another TUI screen (Dashboard, Models, Services, or Logs).
- Ctrl+C: Interrupt the current response generation.
Features and Limitations
The TUI chat is designed for simplicity and speed. You must have at least one model downloaded and loaded to use the chat interface.
- Fast & Lightweight: Minimal resource overhead compared to the full web UI.
- Persistent Context: The chat maintains a basic conversation history within the current session, allowing for follow-up questions.
- Streaming: Responses appear as they are generated, providing immediate feedback.
Limitations compared to the Web UI:
- No Multimedia: Does not support images, file uploads, or complex markdown rendering.
- Single Session: Conversation history is not saved across TUI restarts.
- No RAG: The local chat cannot access your uploaded documents; for Retrieval-Augmented Generation, use the Web Interface or refer to the RAG chapter.
This chapter is for everyone.
Understanding Models
Large Language Models (LLMs) are the core engines that power NeuralDrive. These models are probabilistic systems trained on vast datasets to predict and generate human-like text, code, and reasoning. NeuralDrive uses Ollama to manage these models, providing a streamlined experience for running them locally without external dependencies.
Model Formats and GGUF
NeuralDrive exclusively uses the GGUF (GPT-Generated Unified Format) format for model storage and execution. GGUF is designed for efficient loading and performance on both CPUs and GPUs. It packs the model weights, configuration, and vocabulary into a single file, making it highly portable.
Parameter Counts
The "size" of a model is often described by its parameter count, typically denoted in billions (B). Parameters are the internal variables the model learned during training.
- 3B (Small): Extremely fast and lightweight. Ideal for basic text processing, classification, or running on hardware with limited VRAM (6GB or less). Examples:
phi3:mini,qwen2.5:3b. - 8B (Medium): The current sweet spot for local deployment. Offers a strong balance of reasoning capability and speed. Fits comfortably on 8GB-12GB VRAM cards. Examples:
llama3.1:8b. - 13B (Large): Provides deeper reasoning and better instruction following. Requires 12GB+ VRAM for optimal performance.
- 70B (Very Large): Top-tier performance comparable to many commercial cloud models. Requires significant hardware resources (24GB+ VRAM for quantized versions). Examples:
llama3.1:70b.
Quantization
Quantization is a compression technique that reduces the precision of model weights (e.g., from 16-bit floats to 4-bit integers). This significantly lowers the VRAM and storage requirements with minimal impact on output quality.
NeuralDrive supports several quantization levels:
- Q4_K_M: The standard recommendation. It offers the smallest file size and fastest inference while maintaining high accuracy.
- Q5_K_M: A balanced option for users who want slightly higher quality than Q4 without the massive overhead of Q8.
- Q8_0: The highest quality available, preserving almost all original model precision. It requires much more VRAM and storage.
Naming Convention
Models in NeuralDrive follow a specific naming string: <name>:<tag>.
- Name: The base model family (e.g.,
llama3.1,codestral). - Tag: Specifies the version or size (e.g.,
8b,latest,mini).
When you pull a model like llama3.1:8b, NeuralDrive downloads the specific version optimized for local execution.
Size vs Resources
A model's resource consumption is determined by its parameter count and quantization level. A larger parameter count requires more VRAM to "fit" the model for processing, while higher quantization levels increase the memory footprint of a model with the same parameter count.
Before downloading, check Model Recommendations to ensure your hardware can support your desired model.
This chapter is for everyone.
Downloading Models
NeuralDrive provides four primary methods for downloading and managing LLMs. Regardless of the method used, all models are stored in a central location on the persistent partition of your drive.
1. Web Interface (Open WebUI)
The most user-friendly method is through the Open WebUI.
- Navigate to the Models page in the web interface.
- Enter the name of the model you wish to pull (e.g.,
llama3.1:8b). - Click the download icon.
- You can monitor the download progress directly in the web UI.
2. Terminal User Interface (TUI)
The TUI provides a fast, keyboard-driven way to manage models without opening a browser.
- From the main dashboard, press M to enter the Models screen.
- Press P to initiate a Pull Model command.
- Enter the model string and press Enter.
- The TUI displays a progress bar and allows you to cancel if needed.
3. Command Line Interface (CLI)
For power users and automated scripts, you can use the Ollama CLI directly from any terminal session.
ollama pull <model_name>
Example:
ollama pull qwen2.5:3b
4. API Request
NeuralDrive's underlying Ollama service exposes an API that can be used to programmatically trigger downloads.
curl -X POST http://localhost:11434/api/pull -d '{"name": "phi3:mini"}'
Storage Location and Progress
All model data is stored in the following path on the persistence partition:
/var/lib/neuraldrive/models/
Specifically:
- Blobs: Actual model weights are stored in
/var/lib/neuraldrive/models/blobs/. - Manifests: Metadata about the models is stored in
/var/lib/neuraldrive/models/manifests/.
When a download is in progress, NeuralDrive tracks the state and prevents simultaneous downloads of the same model. If a download is interrupted, it can typically be resumed by initiating the pull command again. Ensure you have sufficient disk space before starting a large download by checking the Storage Management guide.
This chapter is for everyone.
Model Recommendations
Selecting the right model depends on your hardware capabilities, specifically your Video RAM (VRAM) and CPU performance. Running a model that exceeds your hardware limits will result in extremely slow response times or failure to load.
VRAM Requirements
The most critical factor for performance is the amount of VRAM available on your GPU. The following table provides recommendations for models based on standard VRAM tiers.
| VRAM | Recommended Models |
|---|---|
| 6 GB | qwen2.5:3b, phi3:mini |
| 8 GB | llama3.1:8b |
| 12 GB | codestral:latest |
| 24 GB+ | llama3.1:70b (Q4_K_M) |
CPU-Only Execution
If your system lacks a compatible GPU, NeuralDrive can run models on the CPU. While this is significantly slower, it is still functional for many tasks.
- Minimum: AVX2 support is required.
- Preferred: AVX-512 support provides a noticeable speed boost for CPU inference.
- Recommendation: Stick to smaller models (3B or 8B) for a better experience when running on CPU only.
Concurrent Models
NeuralDrive allows multiple models to be loaded into memory simultaneously, provided there is enough VRAM. This is managed by Ollama using several environment variables:
OLLAMA_MAX_LOADED_MODELS: Defines the maximum number of models kept in memory. The default is0(auto), which allows Ollama to manage loading based on available VRAM.OLLAMA_NUM_PARALLEL: Determines how many concurrent requests can be handled.OLLAMA_KEEP_ALIVE: Sets how long a model stays in memory after the last request before being evicted.
NeuralDrive uses a Least Recently Used (LRU) eviction policy. If you attempt to load a new model and VRAM is full, Ollama handles eviction automatically to make room for the new request.
Model Catalog
For a curated list of models tested and recommended for NeuralDrive, you can inspect the system's model catalog located at:
/etc/neuraldrive/neuraldrive-models.yaml
This file contains recommendations optimized for the NeuralDrive environment. You can also view these recommendations in the Open WebUI Models page or the TUI Models screen.
This chapter is for everyone.
Storage Management
Because LLMs can be several gigabytes in size, managing disk space is essential for a stable NeuralDrive experience. All models are stored on the persistent partition of your drive.
Checking Available Space
You can monitor your disk usage through three primary interfaces:
- TUI Dashboard: The main screen shows current disk usage as a percentage and in GB.
- Web System Panel: The Open WebUI administration area provides a graphical view of storage consumption.
- Command Line: You can run the following command from any terminal to see exactly how much space remains on the persistence partition:
df -h /var/lib/neuraldrive
Storage Thresholds
NeuralDrive monitors storage levels and will alert you when space is running low.
- 80% Usage (Warning): A warning badge appears in the TUI and web interface. You should consider deleting unused models.
- 90% Usage (Critical): System performance may degrade. Urgent action is required to free space.
- 95% Usage (Blocked): Downloads are automatically blocked to prevent the system from becoming unresponsive.
Deleting Models
If you need to free up space, you can delete models that are no longer in use.
- TUI: Navigate to the Models screen (M), highlight a model, and press D to delete it.
- Web UI: Go to the Models page and use the delete icon next to the model name.
- CLI: Run
ollama rm <model_name>.
Deleting a model removes both its weights (blobs) and metadata (manifests) from /var/lib/neuraldrive/models/.
Hardware Recommendations for Storage
For the best experience, we recommend using a USB drive with at least 128GB of capacity. This allows you to store multiple large models (like llama3.1:8b and codestral:latest) while still having room for logs and document storage.
If you find that 128GB is insufficient, you can reinstall NeuralDrive on a larger drive or a high-speed external SSD for even more storage capacity.
This chapter assumes Linux system administration experience.
Pre-loading Models
For large-scale deployments or specialized air-gapped environments, you may want to distribute NeuralDrive images that already contain specific models. This avoids the need for users to download gigabytes of data on first boot.
Why Pre-load?
- Zero-Setup Experience: Users can start chatting immediately without internet access.
- Consistent Environment: Ensures every instance of your custom NeuralDrive image has the exact same model versions.
- Reduced Bandwidth: Saves significant time and network resources during mass deployments.
Build Configuration
Pre-loading is managed through the neuraldrive-build.yaml configuration file used during the image creation process. You can specify a list of models to be included in the models.preload section.
models:
preload:
- llama3.1:8b
- phi3:mini
- qwen2.5:3b
Two-Phase Build Approach
The current build system uses a two-phase approach to ensure models are correctly staged and compressed within the squashfs filesystem of the LiveUSB.
- Phase 1: Environment Staging. The build system sets up the base operating system and installs the Ollama service.
- Phase 2: Model Injection. In this phase, the build system pulls the requested models from the official Ollama registry into a temporary staging directory.
Currently, this injection is a manual staging step during the development of custom images. You must ensure that the staging environment has internet access to perform the initial ollama pull commands before the final image is wrapped.
Model Persistence
It is important to note that models pre-loaded into the base image are stored in the read-only section of the drive. While they are available for use immediately, any new models downloaded by the user after booting will be stored in the /var/lib/neuraldrive/models/ directory on the persistence partition.
For more information on customizing your NeuralDrive build, refer to the Custom Images documentation.
This chapter assumes familiarity with REST APIs.
API Overview
NeuralDrive exposes a comprehensive set of APIs to facilitate integration with external tools, coding agents, and custom scripts. By default, all API traffic is served over TLS on a unified public port.
Base URLs and Ports
The primary entry point for all external communication is port 8443. This port is managed by a Caddy reverse proxy that handles TLS termination and directs traffic to the appropriate internal services.
| API Type | Path Prefix | Internal Port | Description |
|---|---|---|---|
| OpenAI Compatible | /v1/ | 11434 | Compatible with standard OpenAI SDKs and integrations. |
| Ollama Native | /api/ | 11434 | Direct access to native Ollama features and model management. |
| System Management | /system/ | 3001 | NeuralDrive-specific administrative and management operations. |
The base URL for most integrations is:
https://neuraldrive.local:8443
Authentication
NeuralDrive uses Bearer token authentication for all API requests. You must include your API key in the Authorization header.
Header Format:
Authorization: Bearer <API_KEY>
Key Format:
API keys follow the pattern nd-xxxxxxxxxxxxxxxx. You can find your key in /etc/neuraldrive/api.key or rotate it using the System Management API. Refer to the API Keys chapter for details on management and rotation.
TLS and Security
All external connections require TLS. NeuralDrive generates a self-signed certificate on first boot with Subject Alternative Names (SAN) for neuraldrive.local, the local hostname, and the detected IP address.
To establish a secure connection, clients should trust the NeuralDrive Certificate Authority (CA). You can download the CA certificate via SCP from /etc/neuraldrive/tls/neuraldrive-ca.crt or through the /system/ca-cert endpoint. Detailed installation steps are available in the TLS Trust chapter.
Rate Limits and Timeouts
To ensure system stability, NeuralDrive enforces the following limits:
- Rate Limit: 100 requests per minute per source IP address.
- Generation Timeout: 600 seconds (10 minutes) for inference tasks.
- Management Timeout: 30 seconds for administrative operations via the System API.
Internal Access
While port 8443 is the recommended public interface, services are also available on internal ports for local debugging or specialized networking configurations:
- Ollama: 11434
- WebUI: 3000
- System API: 3001
Note that these internal ports typically do not have the same TLS or authentication protections as the public 8443 interface.
This chapter assumes familiarity with REST APIs.
Coding Agents
NeuralDrive is designed to provide high-performance local inference for coding agents and IDE integrations. By following a standard OpenAI-compatible pattern, you can connect your favorite development tools to NeuralDrive with minimal configuration.
Generic Configuration Pattern
Most coding agents use a common set of parameters for OpenAI-compatible providers:
- Provider: OpenAI Compatible (or Custom)
- Base URL:
https://neuraldrive.local:8443/v1 - API Key:
nd-xxxxxxxxxxxxxxxxxxxx - Model Name: e.g.,
llama3.1:8b,codestral, orstarcoder2
Important Note on TLS Trust
Because NeuralDrive uses a self-signed certificate, most agents will fail to connect unless you explicitly trust the NeuralDrive CA. Before configuring your agent, ensure you have followed the steps in the TLS Trust chapter to install the certificate on your local machine.
Cursor
To use NeuralDrive with Cursor:
- Open Cursor Settings > Models.
- In the OpenAI API section, toggle the switch to Enabled.
- Click Override OpenAI Base URL and enter:
https://neuraldrive.local:8443/v1 - Enter your NeuralDrive API key in the API Key field.
- Under Models, add your desired model names (e.g.,
llama3.1:8b). - Ensure other OpenAI models are disabled if you wish to force local inference.
Continue
For the Continue VS Code or JetBrains extension, edit your config.json:
{
"models": [
{
"title": "NeuralDrive",
"provider": "openai",
"baseUrl": "https://neuraldrive.local:8443/v1",
"apiKey": "nd-xxxxxxxxxxxxxxxxxxxx",
"model": "llama3.1:8b"
}
]
}
If you experience TLS verification errors, you may need to set the NODE_EXTRA_CA_CERTS environment variable to point to your neuraldrive-ca.crt file before launching your IDE.
Aider
Aider supports OpenAI-compatible endpoints through environment variables. Use the following command to start Aider with NeuralDrive:
export OPENAI_API_BASE=https://neuraldrive.local:8443/v1
export OPENAI_API_KEY=nd-xxxxxxxxxxxxxxxxxxxx
export REQUESTS_CA_BUNDLE=/path/to/neuraldrive-ca.crt
aider --model openai/llama3.1:8b
By setting REQUESTS_CA_BUNDLE, you ensure that Aider's underlying Python libraries trust the self-signed certificate.
Open Interpreter
Open Interpreter can be configured to use NeuralDrive by specifying the base URL and model:
export SSL_CERT_FILE=/path/to/neuraldrive-ca.crt
interpreter --model openai/llama3.1:8b \
--api_base https://neuraldrive.local:8443/v1 \
--api_key nd-xxxxxxxxxxxxxxxxxxxx
Using --api_base directs traffic to NeuralDrive, while SSL_CERT_FILE handles the TLS verification requirements for the Python environment.
This chapter assumes familiarity with Python.
Python SDK
NeuralDrive provides a seamless integration path for Python developers by maintaining compatibility with the official OpenAI Python library. This allows you to use familiar patterns while running inference entirely on local hardware.
Installation
To get started, install the openai and httpx libraries:
pip install openai httpx
Initializing the Client
Since NeuralDrive uses a self-signed certificate, you must configure the OpenAI client to trust the NeuralDrive CA. The most reliable way is to use an httpx.Client with the verify parameter set to the path of your neuraldrive-ca.crt file.
from openai import OpenAI
import httpx
# Path to the CA certificate downloaded from NeuralDrive
CA_CERT_PATH = "/path/to/neuraldrive-ca.crt"
client = OpenAI(
base_url="https://neuraldrive.local:8443/v1",
api_key="nd-xxxxxxxxxxxxxxxxxxxx",
http_client=httpx.Client(verify=CA_CERT_PATH)
)
Chat Completions
NeuralDrive supports both streaming and non-streaming chat completions.
Streaming Example
Streaming provides real-time feedback as the model generates text, which is ideal for interactive applications.
response = client.chat.completions.create(
model="llama3.1:8b",
messages=[{"role": "user", "content": "Explain quantum entanglement."}],
stream=True
)
for chunk in response:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
Non-Streaming Example
For automated scripts where the full output is needed at once:
response = client.chat.completions.create(
model="llama3.1:8b",
messages=[{"role": "user", "content": "Write a Python function to sort a list."}],
stream=False
)
print(response.choices[0].message.content)
Embeddings
You can generate text embeddings for RAG (Retrieval-Augmented Generation) applications using compatible models.
response = client.embeddings.create(
model="mxbai-embed-large",
input="NeuralDrive provides high-performance local AI."
)
embedding = response.data[0].embedding
print(f"Generated embedding with {len(embedding)} dimensions.")
Cert Trust Options
If you prefer not to specify the CA path in every script, you have three primary alternatives:
- Environment Variables: Set
REQUESTS_CA_BUNDLEorSSL_CERT_FILEin your shell environment. - System-wide Install: Add the CA certificate to your operating system's trusted store.
- Disable Verification (Testing Only): Set
verify=Falsein thehttpx.Client. This is insecure and not recommended for production.
Error Handling
Implement basic error handling to manage timeouts or connection issues:
import openai
try:
response = client.chat.completions.create(
model="llama3.1:8b",
messages=[{"role": "user", "content": "Hi!"}]
)
except openai.APIConnectionError as e:
print(f"Could not connect to NeuralDrive: {e}")
except openai.AuthenticationError as e:
print(f"Invalid API key: {e}")
except openai.APITimeoutError as e:
print(f"Request timed out (NeuralDrive limit: 600s): {e}")
This chapter assumes basic command-line familiarity.
cURL Examples
Using curl is the quickest way to verify your connection to the NeuralDrive API or perform one-off administrative tasks. All examples assume you have downloaded the NeuralDrive CA certificate (neuraldrive-ca.crt) to your current directory.
Trusting the Certificate
NeuralDrive uses a self-signed certificate. For security, we recommend using the --cacert flag to point to the CA certificate.
- Recommended (Secure):
curl --cacert neuraldrive-ca.crt ... - Alternative (Insecure):
curl -k ...(Only use for quick health checks)
OpenAI-Compatible API
These endpoints follow the standard OpenAI request and response formats.
Chat Completion
curl --cacert neuraldrive-ca.crt \
-X POST https://neuraldrive.local:8443/v1/chat/completions \
-H "Authorization: Bearer nd-xxxxxxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
List Available Models
curl --cacert neuraldrive-ca.crt \
-H "Authorization: Bearer nd-xxxxxxxxxxxxxxxxxxxx" \
https://neuraldrive.local:8443/v1/models
Ollama Native API
Directly interact with the underlying Ollama service for tasks like pulling models.
Pull a Model
curl --cacert neuraldrive-ca.crt \
-X POST https://neuraldrive.local:8443/api/pull \
-H "Authorization: Bearer nd-xxxxxxxxxxxxxxxxxxxx" \
-d '{"name": "mistral"}'
Get Model Details
curl --cacert neuraldrive-ca.crt \
-X POST https://neuraldrive.local:8443/api/show \
-H "Authorization: Bearer nd-xxxxxxxxxxxxxxxxxxxx" \
-d '{"name": "llama3.1:8b"}'
Health and Management
Health Check
The health endpoint is public and does not require an API key or certificate verification (though -k is used here for brevity).
curl -k https://neuraldrive.local:8443/health
System CA Certificate
Download the CA certificate directly from the System API if you do not have SSH access. This endpoint is public and does not require authentication. Use -k to skip certificate verification — you cannot verify a certificate you have not yet downloaded.
curl -k https://neuraldrive.local:8443/system/ca-cert -o neuraldrive-ca.crt
Once you have the certificate, verify it by checking its fingerprint against the value displayed in the TUI or on the console during boot.
This chapter is for developers and administrators.
TLS Trust
NeuralDrive uses TLS to secure all communications between your client and the server. Because NeuralDrive is a local-first system, it employs a self-signed certificate rather than one from a public Certificate Authority (CA). To establish a secure connection, you must instruct your tools and operating system to trust the NeuralDrive CA.
Why Self-Signed?
Standard CAs like Let's Encrypt require a public domain name and a publicly reachable server to verify ownership. NeuralDrive is designed to operate on local networks, often without a public DNS record. A self-signed CA allows NeuralDrive to generate valid certificates for neuraldrive.local, local IP addresses, and custom hostnames without external dependencies.
Downloading the CA Certificate
You can retrieve the CA certificate (neuraldrive-ca.crt) using one of three methods:
- SCP (Recommended): Use secure copy to pull the file directly from the NeuralDrive server.
scp <username>@neuraldrive.local:/etc/neuraldrive/tls/neuraldrive-ca.crt ./ - System API: Download the certificate via the management endpoint. This endpoint is public and does not require authentication.
curl -k https://neuraldrive.local:8443/system/ca-cert -o neuraldrive-ca.crt - Local Filesystem: If you have direct terminal access to the NeuralDrive machine, the file is located at
/etc/neuraldrive/tls/neuraldrive-ca.crt.
Installing the Certificate
Operating System Level
Installing the CA at the OS level allows browsers and many native applications to trust NeuralDrive automatically.
- macOS:
- Open Keychain Access.
- Drag
neuraldrive-ca.crtinto the System keychain. - Double-click the certificate, expand Trust, and set When using this certificate to Always Trust.
- Linux (Ubuntu/Debian):
sudo cp neuraldrive-ca.crt /usr/local/share/ca-certificates/neuraldrive.crtsudo update-ca-certificates
- Windows:
- Double-click
neuraldrive-ca.crt. - Click Install Certificate...
- Select Local Machine and click Next.
- Select Place all certificates in the following store and browse for Trusted Root Certification Authorities.
- Double-click
Tool-Specific Configuration
Many development environments maintain their own certificate stores or require explicit paths.
- Python (requests/httpx/OpenAI SDK):
Set the environment variable:
export REQUESTS_CA_BUNDLE=/path/to/neuraldrive-ca.crtOrexport SSL_CERT_FILE=/path/to/neuraldrive-ca.crt - Node.js:
Set the environment variable:
export NODE_EXTRA_CA_CERTS=/path/to/neuraldrive-ca.crt - cURL:
Use the
--cacertflag:curl --cacert neuraldrive-ca.crt ...
Certificate Management
Regeneration
If your server's IP address changes or the certificate expires, you can force a regeneration by deleting the existing server certificates and restarting the certificate services:
sudo rm /etc/neuraldrive/tls/server.crt /etc/neuraldrive/tls/server.key
sudo systemctl restart neuraldrive-certs neuraldrive-caddy
This will generate a new server certificate signed by the existing CA.
Custom Certificates
If you prefer to use your own certificate (e.g., from a corporate CA), replace the files in /etc/neuraldrive/tls/server.crt and /etc/neuraldrive/tls/server.key and restart the neuraldrive-caddy service. Note that the neuraldrive-certs service should be disabled to prevent it from overwriting your custom files on reboot.
This chapter is for everyone.
API Keys
NeuralDrive uses a token-based authentication system to secure access to its APIs. Every request made to the public interface (port 8443) must include a valid API key.
Key Format
API keys follow a standard prefix-based format:
nd-xxxxxxxxxxxxxxxxxxxx
The nd- prefix ensures that keys are easily identifiable in configuration files and logs.
Finding Your Key
There are several ways to retrieve your current API key:
- First-Boot: On your first login to the NeuralDrive console, the initial API key is displayed in the welcome banner.
- NeuralDrive TUI: Launch the Text User Interface by running
neuraldrive-tuion the server. Navigate to the Security or API section to view the active key. - System Files: If you have terminal access, the key is stored in plain text at
/etc/neuraldrive/api.key.
Rotating API Keys
For security reasons, we recommend rotating your API key periodically or immediately if you suspect it has been compromised.
Using the TUI
- Open
neuraldrive-tui. - Select Security > Rotate API Key.
- Confirm the action. The TUI will generate a new key, update the local configuration files, reload the Caddy service, and display the new key.
Using the System API
You can also rotate the key programmatically via the System Management API.
curl --cacert neuraldrive-ca.crt \
-X POST https://neuraldrive.local:8443/system/api-keys/rotate \
-H "Authorization: Bearer <CURRENT_API_KEY>"
The response will contain the newly generated key. Note that the old key becomes invalid immediately after this call.
Pre-setting Keys in Custom Images
When building custom NeuralDrive images using neuraldrive-build.yaml, you can define a static API key in the configuration:
security:
api_key: "nd-mycustomapikey12345"
This allows for pre-configured deployments where the API key is known before the first boot. If this field is omitted, NeuralDrive will generate a random key during the initialization process.
This chapter assumes familiarity with REST APIs.
Ollama Native API
While the OpenAI-compatible API is recommended for most integrations, NeuralDrive also exposes the Ollama Native API for tasks that require finer control over model management and specialized inference parameters.
Why Use the Native API?
The Native API is necessary for operations not covered by the OpenAI specification, such as:
- Downloading (pulling) new models from the library.
- Getting granular progress updates during model downloads.
- Accessing detailed model metadata (modelfile, license, parameters).
- Performing raw text generation without chat-specific formatting.
Base URL and Auth
The Native API is available at the /api/ path on port 8443. Like all other external APIs, it requires a Bearer token in the Authorization header.
Base URL: https://neuraldrive.local:8443/api
Core Endpoints
| Endpoint | Method | Description |
|---|---|---|
/generate | POST | Raw text completion. |
/chat | POST | Structured chat completion. |
/tags | GET | List all models currently available on the server. |
/pull | POST | Download a model from the Ollama library. |
/show | POST | View details, parameters, and the Modelfile for a specific model. |
/delete | DELETE | Remove a model from local storage. |
Remote Ollama CLI
You can use the standard ollama command-line tool to interact with your NeuralDrive instance remotely. This allows you to run models on the server using your local terminal.
To point your local CLI to NeuralDrive, set the OLLAMA_HOST environment variable:
export OLLAMA_HOST=https://neuraldrive.local:8443
ollama run llama3.1:8b
Note that the ollama CLI does not natively support Bearer token authentication in all versions. For secure remote CLI usage, we recommend using curl or a custom wrapper that includes the Authorization: Bearer <API_KEY> header.
Pulling a Model via API
When pulling a model, NeuralDrive returns a stream of JSON objects indicating the progress:
curl --cacert neuraldrive-ca.crt \
-X POST https://neuraldrive.local:8443/api/pull \
-H "Authorization: Bearer nd-xxxxxxxxxxxxxxxxxxxx" \
-d '{"name": "mistral"}'
The response will look similar to this:
{"status":"pulling manifest"}
{"status":"pulling layer","digest":"sha256:e8a35d5...","total":5120000000,"completed":1024000}
...
{"status":"success"}
This chapter is for system administrators.
Network Configuration
NeuralDrive uses NetworkManager for managing both wired and wireless interfaces. By default, the system attempts to obtain an IP address via DHCP.
Static IP Configuration
If a static IP address is required, it can be configured using the TUI (Terminal User Interface) or via the command line with nmcli.
To configure a static IP using nmcli, identify the connection name:
nmcli connection show
Apply the static configuration:
sudo nmcli connection modify "Wired connection 1" \
ipv4.addresses 192.168.1.100/24 \
ipv4.gateway 192.168.1.1 \
ipv4.dns "1.1.1.1,8.8.8.8" \
ipv4.method manual
sudo nmcli connection up "Wired connection 1"
Hostname Configuration
The default hostname is neuraldrive. To change the hostname, use hostnamectl or the TUI:
sudo hostnamectl set-hostname my-neural-node
mDNS and Avahi
NeuralDrive automatically advertises its presence on the local network using mDNS (Multicast DNS) via Avahi. By default, the system is reachable at neuraldrive.local.
Avahi service files are located at:
/etc/avahi/services/neuraldrive-web.service(port 443)/etc/avahi/services/neuraldrive-api.service(port 8443)
The system uses systemd-resolved with mDNS support enabled to handle local name resolution.
Warning: mDNS does not work on all networks. Corporate environments, certain routers, and VPNs often block multicast traffic. If
neuraldrive.localis unreachable, check the IP address on the physical console.
Console IP Display
At boot, the neuraldrive-show-ip.service runs to detect the active IP address and display it directly on the console. This ensures that the node can be located even if mDNS fails or DHCP assigns an unexpected address.
First Boot Setup Network Troubleshooting
This chapter is for system administrators.
Security
The security design of NeuralDrive is built on principles of defense in depth and minimal attack surface. Each component is isolated and operates with only the necessary privileges.
Security Design Principles
NeuralDrive implements several key security architectural features:
- Defense in Depth: Multiple layers of security control from the kernel to the application.
- Minimal Attack Surface: Only essential services are exposed.
- Read-Only Root: The core system is a SquashFS image.
- Default-Deny Networking: All incoming traffic is blocked except for required services.
- No Default Passwords: Credentials are unique to each installation or rely on key-based authentication.
Default Security Posture
By default, the system-level firewall blocks all incoming traffic except for:
- HTTPS (port 443)
- System API (port 8443)
- mDNS (port 5353)
SSH is disabled by default and requires manual activation.
Service Isolation
Every major service in NeuralDrive runs as a dedicated, low-privilege user account. Systemd hardening is applied to each unit to restrict access to the rest of the system.
| Service | User | UID |
|---|---|---|
| neuraldrive-ollama | neuraldrive-ollama | 901 |
| neuraldrive-webui | neuraldrive-webui | 902 |
| neuraldrive-caddy | neuraldrive-caddy | 903 |
| neuraldrive-monitor | neuraldrive-monitor | 904 |
| neuraldrive-api | neuraldrive-api | 905 |
The systemd units for these services employ several hardening flags:
ProtectSystem=full: Makes/usr,/boot, and/etcread-only for the service.NoNewPrivileges=true: Prevents the service from gaining more privileges viasetuid.PrivateTmp=true: Gives the service its own/tmpdirectory.
Data Protection and Permissions
NeuralDrive enforces a strict file permissions model to protect sensitive data:
- Models: Owned by
neuraldrive-ollama:neuraldrive-ollamawith640permissions. - Configurations: Owned by
root:neuraldrive-adminwith640permissions. - Credentials: Owned by
root:rootwith600permissions.
Audit Logging
System audit logs are maintained for monitoring security events. Logs are stored in JSON format for easy programmatic analysis.
Log file: /var/log/neuraldrive/audit.log
Log rotation is handled daily with a maximum of 7 rotations and a 100MB limit per file.
Firewall TLS Certificates SSH Access LUKS Encryption
This chapter is for system administrators.
Firewall
NeuralDrive uses nftables as the primary firewall. The configuration is designed to block all unsolicited incoming traffic while allowing essential system services.
Default Policy
The firewall is configured with a strict default-deny policy for incoming and forwarded traffic.
- Input Chain:
policy drop - Forward Chain:
policy drop - Output Chain:
policy accept
Allowed Traffic
The default ruleset permits the following incoming connections:
- Established and Related: Existing outbound sessions.
- Loopback: Traffic within the system.
- ICMP: Rate-limited echo requests (5 per second).
- Web Services: TCP ports 443 (HTTPS) and 8443 (System API).
- SSH: TCP port 22, rate-limited to 3 new connections per minute with a burst of 5.
- mDNS: UDP port 5353, rate-limited to 10 per second for local service discovery.
Configuration Files
The primary firewall rules are defined in /etc/neuraldrive/nftables.conf.
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
ct state established,related accept
iifname "lo" accept
ip protocol icmp icmp type echo-request limit rate 5/second accept
ip6 nexthdr icmpv6 icmpv6 type echo-request limit rate 5/second accept
tcp dport { 443, 8443 } accept
tcp dport 22 ct state new limit rate 3/minute burst 5 packets accept
udp dport 5353 limit rate 10/second accept
}
chain forward { type filter hook forward priority 0; policy drop; }
chain output { type filter hook output priority 0; policy accept; }
}
Administrators can add custom rules by creating /etc/neuraldrive/firewall-custom.conf. If this file exists, it is included at the end of the ruleset.
Managing the Firewall
To view the active ruleset:
sudo nft list ruleset
To reload the configuration after making changes:
sudo systemctl restart nftables
This chapter is for system administrators.
TLS Certificates
NeuralDrive uses TLS to encrypt all communication between clients and the node. Certificates are automatically managed to ensure secure defaults without requiring manual intervention.
Automatic Generation
The neuraldrive-certs.service is responsible for certificate management. This is a oneshot service that runs at first boot and before the Caddy web server starts.
The certificate generation is idempotent. The service uses an ExecCondition to check for existing certificates and only generates new ones if they are missing.
Certificate Details
The generated certificates are self-signed RSA 4096-bit with SHA-256 signatures. They have a 365-day validity period and include the following Subject Alternative Names (SANs):
DNS:neuraldrive.localDNS:neuraldriveIP:<detected-IP>
Certificate Files
Certificates and keys are stored in /etc/neuraldrive/tls/.
| File | Permissions | Description |
|---|---|---|
server.crt | 644 | The server certificate. |
server.key | 600 | The private key for the server certificate. |
neuraldrive-ca.crt | 644 | The root CA certificate used to sign the server certificate. |
Client Trust
To avoid browser warnings and ensure secure programmatic access, the CA certificate can be downloaded and installed in the client's trust store.
The CA certificate is available at:
https://<IP>:8443/system/ca-cert(no authentication required)/etc/neuraldrive/tls/neuraldrive-ca.crt(on the filesystem)
Manual Management
Regenerating Certificates
If certificates need to be regenerated (e.g., due to a hostname change), delete the existing files and restart the certificate service:
sudo rm /etc/neuraldrive/tls/server.*
sudo systemctl restart neuraldrive-certs neuraldrive-caddy
Using Custom Certificates
To use certificates issued by a third-party CA:
- Replace
server.crtandserver.keyin/etc/neuraldrive/tls/with your custom files. - Ensure the file names match and permissions are correctly set (644 for the certificate, 600 for the key).
- Restart Caddy:
sudo systemctl restart neuraldrive-caddy.
TLS Certificate Trust Security
This chapter is for system administrators.
SSH Access
SSH access is disabled by default in NeuralDrive. This minimizes the initial attack surface and ensures that users must explicitly opt-in to remote command-line management.
Enabling SSH
SSH can be enabled through three primary methods:
- TUI (Terminal User Interface): Use the Services or Security menu to toggle SSH.
- Boot Parameter: Add
neuraldrive.ssh=1to the kernel command line (e.g., in/proc/cmdline) during the boot process. - System API: Send a
POSTrequest to/system/ssh/enableusing an authenticated API client.
SSH Configuration and Security
The system uses a hardened SSH configuration located at /etc/ssh/sshd_config.d/neuraldrive.conf.
- Authentication: Key-only authentication is enforced. Password authentication is disabled (
PasswordAuthentication no). - User Restrictions: Only the
neuraldrive-adminuser is permitted to log in (AllowUsers neuraldrive-admin). - Root Login: Root login is strictly prohibited (
PermitRootLogin no). - Session Settings:
MaxAuthTries 3,ClientAliveInterval 300, andClientAliveCountMax 2are set to ensure session integrity and prevent brute-force attempts.
Adding SSH Keys
Public SSH keys should be placed in the authorized keys file:
/etc/neuraldrive/ssh/authorized_keys
Alternatively, keys can be injected into this location during the image building process.
Brute-Force Protection
In addition to the rate-limiting provided by the firewall, fail2ban monitors SSH login attempts. The configuration is located at /etc/fail2ban/jail.d/neuraldrive.conf.
- Max Retries: 5 failed attempts.
- Ban Time: 600 seconds.
- Find Time: 600 seconds.
Firewall Rate-Limiting
The system firewall (nftables) further protects the SSH port by rate-limiting new connections to 3 per minute with a burst allowance of 5 packets.
See Also
- Security — NeuralDrive security architecture and hardening overview.
- Firewall — nftables configuration and port management.
This chapter is for all users and system administrators.
GPU Monitoring
NeuralDrive provides multiple ways to monitor GPU health, utilization, and resource consumption in real-time.
TUI Dashboard
The Terminal User Interface (TUI) provides a high-level overview of GPU status, including VRAM usage, temperature, and utilization percentages. This dashboard is accessible directly from the physical console or via SSH.
Web System Panel
A more detailed GPU monitoring interface is available through the web System Panel.
- URL:
https://<IP>:8443/monitor/ - Dashboard: "GPU Hot"
This interface provides a real-time dashboard powered by an internal service running on port 1312, which is securely proxied by Caddy.
Command-Line Tools
For advanced diagnostics, standard vendor-specific command-line tools are available:
- NVIDIA:
nvidia-smi - AMD:
rocm-smi - General Hardware Check:
lspci | grep -i vga
Monitoring Service
The neuraldrive-gpu-monitor.service is responsible for collecting and serving GPU telemetry. This service runs as the neuraldrive-monitor user and exposes data on internal port 1312.
System API
Programmatic access to GPU telemetry is available via the System Management API.
- Endpoint:
GET /system/gpu - Response Data:
- Vendor and device names.
- VRAM total and currently used.
- Current temperature.
Terminal Interface System Management API
This chapter is for system administrators.
Service Management
NeuralDrive is composed of several specialized services that handle different aspects of the system, from GPU detection to web application hosting.
NeuralDrive Service List
The following services are managed by systemd:
neuraldrive-setup: Handles initial system setup and first-boot configurations.neuraldrive-gpu-detect: Automatically detects available GPUs and prepares the system before Ollama starts.neuraldrive-certs: Manages the generation and renewal of TLS certificates.neuraldrive-zram: Configures zram-based swap space for efficient memory management.neuraldrive-show-ip: Displays the active IP address on the physical console at boot.neuraldrive-ollama: The core LLM inference server (internal port 11434).neuraldrive-webui: The Open WebUI dashboard for user interaction (internal port 3000).neuraldrive-caddy: The reverse proxy that exposes services via ports 443 and 8443.neuraldrive-gpu-monitor: Collects and serves GPU telemetry data (internal port 1312).neuraldrive-system-api: Provides programmatic access to system management (internal port 3001).neuraldrive-storage-monitor: Monitors disk usage and provides alerts based on defined thresholds.
Monitoring and Control
Service status can be checked using several interfaces:
- TUI Services Screen: Real-time status and control.
- Web System Panel: Visual status overview.
- Command Line: Standard
systemctlcommands.
# Check status
systemctl status neuraldrive-ollama
# Restart a service
sudo systemctl restart neuraldrive-ollama
# View logs
journalctl -u neuraldrive-ollama -f
Service Dependencies
NeuralDrive services are designed with a specific boot order to ensure reliability:
gpu-detectmust complete beforeollamastarts.ollamamust be active beforewebuibegins operation.certsmust successfully generate certificates beforecaddycan bind to its ports.
Storage Monitoring
The neuraldrive-storage-monitor service provides automated alerts when disk usage exceeds specific thresholds:
- 80%: Warning threshold.
- 90%: Critical threshold.
- 95%: Immediate action required.
Service Reference TUI Services
This chapter is for all users.
Updating NeuralDrive
NeuralDrive uses a LiveUSB deployment model. This means that system updates are typically applied by re-flashing the USB drive with a newer ISO image.
Update Model
Unlike traditional operating systems that use package managers for updates, NeuralDrive is designed as an immutable system. This ensures that the core operating environment is always in a known, stable state.
Upgrade Procedure
To upgrade to a new version:
- Back up data: Backup models, configurations, and WebUI data from the persistence partition.
- Re-flash: Use an ISO writer to flash the new version onto the USB drive.
- Initialize persistence: Re-create the persistence partition on first boot.
- Restore data: Copy the backed-up data back to the appropriate locations (optional).
Warning: Re-flashing the USB drive destroys the persistence partition and all data stored on it. Always ensure a complete backup of critical data before proceeding with an upgrade.
Backup Procedure
It is recommended to copy the following directories to an external drive or network location before re-flashing:
/var/lib/neuraldrive/: Contains downloaded models and Open WebUI user data./etc/neuraldrive/: Contains system configurations, TLS certificates, and the API key.
Version Checking
To verify the current version of the system:
- Command line:
cat /etc/neuraldrive/version(e.g.,dev-snapshot). - System API: Send a
GETrequest to/system/statusand check theversionfield.
Future Plans
A specialized neuraldrive-upgrade tool is planned for future releases. This tool will automate the process of downloading and applying updates directly to the persistence partition without requiring a full re-flash.
Writing the USB Drive Storage Management
This chapter is for system administrators and developers.
System Management API
The NeuralDrive System Management API provides programmatic access to system operations, monitoring, and configuration.
Access and Endpoints
The API is accessible over HTTPS:
- URL:
https://<IP>:8443/system/
Internally, the neuraldrive-system-api service runs on port 3001 and is proxied by Caddy to port 8443.
Authentication
All requests to the System Management API require a Bearer token for authentication. This is the same token used for the inference API and can be found in /etc/neuraldrive/api.key.
Common Endpoints
| Method | Path | Purpose |
|---|---|---|
GET | /system/status | CPU, RAM, disk, uptime, and system version. |
GET | /system/services | List all neuraldrive-* services and their status. |
POST | /system/services/{name}/restart | Restart a specific service. |
POST | /system/services/{name}/{action} | Start or stop a specific service. |
GET | /system/logs | Service log tailing (query params: service, lines). |
GET | /system/storage | Detailed disk usage (models and persistence). |
GET | /system/network | Interfaces, hostname, and mDNS status. |
POST | /system/network/hostname | Set a new system hostname. |
GET | /system/gpu | GPU vendor, device names, VRAM usage, and temperature. |
POST | /system/ssh/{action} | Enable or disable SSH access. |
GET | /system/security | Firewall, TLS, and SSH status. |
POST | /system/api-keys/rotate | Rotate the system API key. |
GET | /system/ca-cert | Download the root CA certificate (no authentication required). |
Implementation Details
The API is implemented using FastAPI and is served by Uvicorn. The application environment is located at /usr/lib/neuraldrive/api/.
For security reasons, only services prefixed with neuraldrive- can be managed through the API.
See Also
- System Management API Reference — complete endpoint listing with request and response schemas.
- API Overview — architecture and authentication for the inference API.
- API Key Management — key rotation, storage, and best practices.
Audience: Admin
Building Custom Images
The neuraldrive-builder toolkit provides the necessary infrastructure to generate tailored NeuralDrive ISO images. By modifying the build configuration, you can pre-load specific LLM models, inject custom branding, or bake in specific GPU drivers and system packages.
Prerequisites
Building a custom image requires a Linux environment with sufficient disk space (at least 50GB recommended) and a fast internet connection for downloading packages and models.
- Docker (Recommended): The easiest way to build is using the provided Docker environment.
- Debian 12 (Native): If building natively, you must use Debian 12 (Bookworm) with the
live-buildpackage installed.
Build Process
Follow these steps to generate a custom NeuralDrive ISO.
1. Prepare the Environment
Clone the NeuralDrive repository and navigate to the builder directory.
git clone https://github.com/Rightbracket/NeuralDrive.git
cd NeuralDrive/builder
2. Configure the Build
The build is controlled by neuraldrive-build.yaml. Create your configuration file from the provided example.
cp neuraldrive-build.yaml.example neuraldrive-build.yaml
Edit neuraldrive-build.yaml to suit your requirements. Key sections include:
- system: Define the kernel version, locale, and additional Debian packages.
- gpu: Enable or disable support for NVIDIA, AMD, and Intel GPUs.
- models: Specify which models to pre-load.
- webui: Customize the management interface branding.
- output: Set the filename and compression level.
3. Pre-load Models
Note: Model pre-loading currently requires a manual staging step.
To include models in your image, you must first stage them. The scripts/download-models.sh script uses a temporary Ollama instance to pull the models listed in your configuration into the ./model-staging/ directory.
./scripts/download-models.sh
These models are later packaged into the final image.
4. Execute the Build
You can run the build either natively or via Docker.
Using Docker (Recommended):
docker compose run builder
The Docker environment uses a debian:bookworm base and runs in privileged mode to allow live-build to mount filesystems.
Using Native Build:
sudo ./build.sh
The build.sh script performs the following actions:
- Validates the configuration using
scripts/validate-config.sh. - Prepares the
live-buildenvironment. - Applies branding via
scripts/apply-branding.sh. - Runs the
lb buildprocess. - Post-processes the output via
scripts/post-build.sh(e.g., runningisohybrid).
5. Retrieve the ISO
Once the build completes, the resulting ISO file will be located in the output/ directory.
Configuration Example
neuraldrive:
version: "1.0"
name: "My Custom NeuralDrive"
hostname: "my-llm-server"
system:
kernel: "default" # "default" (6.1) or "backport" (6.12+)
locale: "en_US.UTF-8"
timezone: "UTC"
extra_packages: ["htop", "vim"]
gpu:
nvidia: true
amd: false
intel: false
models:
preload: ["llama3.1:8b", "codestral:latest"]
catalog: "default" # default/minimal/none
network:
ssh_enabled: false
default_ip: "dhcp"
security:
encrypt_persistent: false
api_key: "" # Empty = auto-generate at first boot
webui:
enabled: true
admin_email: "admin@example.com"
branding:
title: "Custom NeuralDrive"
logo: "assets/custom-logo.png"
output:
format: "iso-hybrid"
filename: "neuraldrive-custom.iso"
compression: "xz"
Build Estimates
Build times vary significantly based on your configuration:
- Minimal Image: 30-45 minutes.
- Full GPU Stack (NVIDIA + AMD): 60-70 minutes.
- Heavy Model Pre-loading: 90+ minutes (largely dependent on download speeds).
Output Formats
NeuralDrive supports two primary output formats:
- iso-hybrid (Default): A bootable image compatible with both USB flash drives and optical media (CD/DVD).
- raw-disk: A standard disk image for virtual machines or direct disk writing.
For detailed information on every configuration key, see the Build Configuration Reference. To further customize the system during the build process, refer to Custom Hooks & Overlays.
Audience: Admin
Build Configuration Reference
This reference provides a complete annotation of all keys available in the neuraldrive-build.yaml file. This file controls the generation of the NeuralDrive ISO image and defines the default system state.
Specification
neuraldrive
version: (String) The internal version of the NeuralDrive build. Default:"1.0"name: (String) The human-readable name of the image. This appears in the boot menu and the WebUI title. Default:"NeuralDrive"hostname: (String) The default system hostname. Default:"neuraldrive"
system
kernel: (Enum) Specifies the Linux kernel version.default: Standard Debian 12 kernel (6.1 LTS).backport: Newer kernel from Debian backports (6.12+), recommended for recent hardware.
locale: (String) System locale. Example:"en_US.UTF-8"timezone: (String) System timezone. Example:"UTC"or"America/New_York"extra_packages: (List) Additional Debian packages to install. Example:["htop", "vim", "tmux"]
gpu
nvidia: (Boolean) Enable NVIDIA driver and toolkit installation. Default:trueamd: (Boolean) Enable ROCm drivers and AMD GPU support. Default:falseintel: (Boolean) Enable Intel GPU and XPU support. Default:false
models
preload: (List) A list of models to pre-load during the build process. Example:["llama3.1:8b", "codestral:latest"]catalog: (Enum) The set of models to include in the default available list.default: Includes standard common models.minimal: Includes only the absolute essentials for basic testing.none: No models pre-populated.
network
ssh_enabled: (Boolean) Enable SSH server by default. Default:falsedefault_ip: (String) Default IP configuration. Use"dhcp"or a static IP in CIDR format. Default:"dhcp"
security
encrypt_persistent: (Boolean) Enable LUKS2 encryption for the persistence partition by default. Default:falseapi_key: (String) Default API key for external access. If empty, a key will be auto-generated at first boot. Default:""
webui
enabled: (Boolean) Enable the NeuralDrive management WebUI. Default:trueadmin_email: (String) The default administrative email for the WebUI. Default:"admin@example.com"branding.title: (String) The title displayed in the WebUI. Default:"NeuralDrive"branding.logo: (Path) Path to the custom logo file relative to the builder root. Default:"assets/logo.png"
output
format: (Enum) The final output image format.iso-hybrid: Bootable on both USB and CD.raw-disk: Raw disk image.
filename: (String) The name of the resulting image file. Default:"neuraldrive.iso"compression: (Enum) Compression algorithm for the ISO image.xz: Higher compression, slower build time.gzip: Faster build time, larger image size.
Configuration Examples
NVIDIA-Only Minimal Image
This configuration creates a lightweight image focused on NVIDIA GPUs with no extra overhead.
neuraldrive:
name: "NVIDIA Minimal NeuralDrive"
gpu:
nvidia: true
amd: false
intel: false
models:
catalog: "minimal"
preload: []
Full Image with Pre-loaded Models
A comprehensive image containing all GPU drivers and several models ready for immediate use.
neuraldrive:
name: "Full NeuralDrive with Models"
gpu:
nvidia: true
amd: true
intel: true
models:
catalog: "default"
preload: ["llama3.1:8b", "codestral:latest", "mistral:7b"]
output:
compression: "xz"
Custom Branded Image
Designed for deployment with specific hostname and branding requirements.
neuraldrive:
name: "Company NeuralDrive"
hostname: "company-llm-server"
webui:
branding:
title: "Company AI Lab"
logo: "assets/company-logo.png"
network:
ssh_enabled: true
For more details on the build process, see Building Custom Images.
Audience: Admin (advanced)
Custom Hooks & Overlays
The NeuralDrive build system uses the underlying Debian live-build framework to allow for deep customization. You can inject scripts, configuration files, and third-party packages into the final image without modifying the core build scripts.
Hook System
Hooks are executable scripts that run within the Debian environment (chroot) during the image creation process. They allow you to perform complex setup tasks like installing specific Python packages, configuring system services, or modifying system files.
Implementation
Hooks must be placed in the hooks/chroot/ directory of the builder. During the build, they are executed in alphabetical order. Using a numeric prefix (e.g., 01-, 02-) ensures a deterministic execution sequence.
Example Hook Script
To install custom Python packages via pip:
#!/bin/bash
# File: hooks/chroot/05-custom-pip-packages.hook.chroot
echo "Installing custom pip packages..."
pip install --no-cache-dir langchain-community chromadb
Overlay System
The overlay system provides a way to add or overwrite files in the final root filesystem. The contents of the overlay/ directory are mirrored directly onto the target system's root.
Implementation
If you want to add a file to /etc/neuraldrive/custom.conf, you should place it at overlay/etc/neuraldrive/custom.conf in the builder directory.
Common Uses
- Config Files: Provide default configurations for services like SSH, Nginx, or Ollama.
- Scripts: Inject administrative scripts into
/usr/local/bin/. - Static Assets: Replace or add branding assets like splash screens or icons.
Package Injection
You can automatically install third-party Debian (.deb) packages by placing them in the packages/ directory.
Implementation
Any .deb file found in the packages/ folder will be included in the local repository created during the build and installed by apt alongside the standard system packages. This is particularly useful for proprietary drivers or custom-built software not available in the Debian repositories.
Execution Order
During the lb build process, these customizations are applied as follows:
- Overlay Injection: Files from
overlay/are copied into the chroot environment. - Package Installation: Standard packages and injected
.debfiles are installed viaapt. - Hooks: Scripts in
hooks/chroot/are executed in alphabetical order.
Warning: Hooks run with root privileges inside the chroot. Ensure your scripts are idempotent and do not fail, as a hook failure will cause the entire image build to fail.
For an overview of the full build process, refer to Building Custom Images.
Audience: Everyone
CD Mode vs USB Mode
NeuralDrive's hybrid ISO image can be used in two distinct ways: as a live environment from a USB drive or as a read-only system from a CD/DVD. The choice of medium significantly impacts the features and persistence of the system.
USB Mode (Recommended)
When NeuralDrive is flashed to a USB drive, it creates a persistence partition that allows for the full suite of features. This is the intended deployment method for most users.
- Persistence: Models, system configurations, and user data are saved to the USB drive and survive reboots.
- Model Management: Downloading and updating models via the WebUI or command line is fully supported.
- Performance: USB 3.0+ provides acceptable I/O speeds for system operations.
CD Mode
If the ISO is burned to a CD or DVD, or booted in an environment where the boot medium is read-only, NeuralDrive operates in a restricted live mode.
- Read-Only System: All system changes are made to a
tmpfs(RAM-based filesystem) and are lost upon reboot. - Disabled Downloads: Since there is no persistent storage on the disc, model downloads are disabled. Users will see a warning: "Downloads disabled — connect external storage."
- Stateless Security: Every boot starts with a clean, known state, making it ideal for high-security environments where no data should remain on the hardware.
Booting to RAM
For CD mode, a specialized "Copy to RAM" (toram) boot option is available in the boot menu. This loads the entire image into system memory, which has several benefits:
- Speed: Operating purely from RAM is faster than reading from an optical drive.
- Drive Availability: Once loaded, the CD can be ejected, freeing the drive for other uses.
- Hardware Compatibility: Requires sufficient RAM (typically 16GB+) to hold both the system and the models.
Comparison and Use Cases
| Feature | USB Mode | CD Mode |
|---|---|---|
| Persistence | Full | None (lost on reboot) |
| Model Downloads | Enabled | Disabled (unless external storage added) |
| Encryption | Supported | Not applicable |
| Typical Use Case | Permanent local LLM server | Evaluation, high-security workstations |
When to Use Each
- USB: Best for regular use, developing with LLMs locally, or setting up a dedicated server for a small team.
- CD: Best for testing hardware compatibility, air-gapped security needs, or environments where the operating system must never be modified.
Tip: If using CD mode but still needing persistent model storage, you can connect a separate USB or SATA drive. NeuralDrive will automatically detect and mount it. For more, see External Storage.
For the initial setup process on either medium, refer to First Boot Setup.
Audience: Admin
External Storage
NeuralDrive is designed to work with external storage devices for model persistence and data management. This is particularly useful for systems running from a read-only medium (CD) or for users who maintain a large library of models that exceed the size of a standard USB drive.
Auto-Mounting
The system includes pre-configured udev rules that automatically detect and mount external storage devices (USB, SATA, etc.) as they are connected.
- Mount Point:
/mnt/external/<LABEL> - Directory Name: The drive's volume label is used as the directory name. If no label exists, the device identifier (e.g.,
sdb1) is used.
Configuring External Model Storage
By default, models are stored in /var/lib/neuraldrive/models/. You can redirect this to an external drive using one of two methods.
Method 1: Bind Mount (Recommended)
Edit /etc/fstab to mount your external storage directory directly over the default models directory. This is the most reliable method for ensuring persistence.
/mnt/external/MyModels/ollama /var/lib/neuraldrive/models/ none bind 0 0
Method 2: Symlinking
Alternatively, you can create a symbolic link from the default location to the external drive.
# Stop the Ollama service
sudo systemctl stop ollama
# Move existing models to the external drive
sudo mv /var/lib/neuraldrive/models/* /mnt/external/MyModels/
# Create the symlink
sudo ln -s /mnt/external/MyModels/ /var/lib/neuraldrive/models
# Restart the service
sudo systemctl start ollama
Use Cases
- CD Mode with External Storage: When running NeuralDrive from a CD, you can still have a persistent model library by connecting an external USB drive.
- Shared Model Library: Multiple NeuralDrive instances can share a single large external drive containing a comprehensive library of LLMs.
- Supplementing USB Storage: If your primary USB drive runs low on space, you can seamlessly add a second drive to expand your available model storage.
For more information on model storage and management, see Storage Management and CD Mode vs USB Mode.
Audience: Admin
LUKS Encryption
NeuralDrive supports LUKS2 encryption for its persistence partition. This ensures that models, system configurations, and credentials stored on the USB drive are protected from unauthorized access if the physical medium is lost or stolen.
What is Protected?
When encryption is enabled, it applies specifically to the persistence partition (typically labeled persistence). This partition stores:
- Models: All LLMs downloaded via Ollama.
- Configs: Network settings, API keys, and system customizations.
- WebUI Data: User accounts and administrative settings.
Enabling Encryption
There are two primary ways to enable LUKS encryption.
1. First-Boot Wizard
During the initial setup process, the first-boot wizard will present an option to "Enable Persistence Encryption." Selecting this option will prompt you to enter a passphrase that will be required to unlock the partition on every subsequent boot.
2. Build Configuration
For automated deployments or pre-configured images, you can enable encryption in the neuraldrive-build.yaml file:
security:
encrypt_persistent: true
Boot Experience
When encryption is enabled, the system will pause during the boot sequence to prompt for the decryption passphrase. This occurs before any NeuralDrive services (including the WebUI) are started.
Warning: If you lose your passphrase, the data on the persistence partition is unrecoverable. There is no password reset or recovery mechanism for LUKS-encrypted partitions.
Performance Considerations
Using LUKS2 encryption introduces a minor overhead for disk I/O operations. However, because LLM inference is primarily bound by GPU or CPU performance and VRAM/RAM bandwidth, the impact on overall model performance is negligible for most users.
Important Note: Destructive Operation
Enabling encryption for the first time on a drive that already contains data is a destructive operation. It will reformat the persistence partition. Ensure you back up any critical data before enabling this feature on an existing installation.
For more information on the first-time setup process, see First Boot Setup. To explore other security features, refer to Security.
Audience: Admin / Developers
Performance Tuning
NeuralDrive is pre-optimized for a wide range of hardware, but fine-tuning specific configurations can significantly improve inference speed, concurrent user handling, and memory efficiency.
Ollama Configuration
The primary backend service, Ollama, is controlled via /etc/neuraldrive/ollama.conf. Modifying these parameters allows you to tailor the system's behavior to your specific hardware and workload.
OLLAMA_HOST=127.0.0.1:11434
OLLAMA_MODELS=/var/lib/neuraldrive/models/
OLLAMA_KEEP_ALIVE=5m
OLLAMA_MAX_LOADED_MODELS=0
OLLAMA_NUM_PARALLEL=1
Key Parameters
- OLLAMA_NUM_PARALLEL: (Integer) The number of concurrent requests the server will handle. Increase this for multi-user environments, though this will increase VRAM usage.
- OLLAMA_KEEP_ALIVE: (Duration) How long a model remains loaded in memory after the last request. Setting this to a higher value (e.g.,
30m) avoids the latency of reloading models. - OLLAMA_MAX_LOADED_MODELS: (Integer) The maximum number of models to keep in VRAM simultaneously. The default is
0(auto), which allows Ollama to manage concurrent loading based on available VRAM. When memory is full, Least Recently Used (LRU) models are evicted automatically. Manual overrides can be set in/var/lib/neuraldrive/config/ollama.conf. - OLLAMA_NUM_THREADS: (Integer) Specifies the number of CPU threads to use for inference. By default, this auto-detects based on your hardware.
- OLLAMA_FLASH_ATTENTION: (Boolean) Enabling Flash Attention can significantly improve speed on supported GPUs (e.g., NVIDIA Ampere and newer).
Memory Management
VRAM and RAM Spilling
Ollama uses memory mapping (mmap) by default. This allows the system to load models larger than the available VRAM by spilling some layers into system RAM. While this enables the execution of larger models, it will result in slower inference speeds for the layers processed by the CPU.
zRAM Swap
NeuralDrive includes the neuraldrive-zram.service, which creates a compressed swap device in RAM. This is particularly beneficial when running on systems where the combined requirement of VRAM and RAM is very tight, as it provides a faster alternative to traditional disk-based swap.
Storage and I/O
To minimize I/O wait times and reduce wear on USB flash media, NeuralDrive employs several filesystem optimizations:
- noatime: The system is mounted with the
noatimeoption to prevent unnecessary write operations when files are accessed. - commit=60: Data is committed to disk every 60 seconds (instead of the default 5), reducing the frequency of physical write cycles.
Context Window Management
The size of the context window directly impacts memory consumption. A larger context window allows the model to "remember" more of the conversation but requires significantly more VRAM. Adjust the context size within your application or model configuration to balance memory usage and conversational depth.
Multi-GPU Optimization
If multiple compatible GPUs are present, Ollama will automatically detect them and distribute model layers across all available devices. This is an effective way to run very large models that would otherwise exceed the VRAM of a single card.
For further details on configuration, see Configuration Files Reference. To choose the right models for your hardware, see Model Recommendations.
Audience: Developers (advanced)
llama.cpp Server
While Ollama is the default model server for NeuralDrive, the system also includes the llama-server binary for users who require lower-level control or specialized inference features.
When to Use llama.cpp
The llama-server is ideal for developers who need more granular configuration than the Ollama API provides.
| Feature | Ollama | llama.cpp |
|---|---|---|
| Ease of Use | High (Managed downloads) | Medium (Manual model placement) |
| Model Format | Managed Blobs | Direct GGUF loading |
| Control | Standardized API | Fine-grained sampling & batching |
| Resource Usage | Integrated Management | Lower overhead per instance |
Enabling the Server
The llama-server is included in the base image but is not enabled by default. You can start it manually or configure a custom systemd service.
# Example command to start the llama.cpp server
llama-server --model /path/to/my-model.gguf --port 8080 --host 0.0.0.0
Configuration Options
The server supports a wide variety of command-line flags to tune performance:
--ctx-size: (Integer) Define the maximum context window size.--n-gpu-layers: (Integer) Specifically state how many layers to offload to the GPU.--threads: (Integer) The number of CPU threads to utilize.--batch-size: (Integer) Set the batch size for prompt processing.
Important Note: Model Formats
It's important to understand the difference in model storage between the two servers. Ollama stores models in a proprietary blob format within /var/lib/neuraldrive/models/. These blobs are not directly compatible with the llama-server. To use a model with llama.cpp, you must provide a raw .gguf file.
For more information on model formats, see Understanding LLM Models. To further optimize your inference setup, refer to Performance Tuning.
Audience: Everyone
Common Issues
This guide provides solutions for frequent technical challenges encountered while using NeuralDrive. If your issue is not listed here, consult the specialized troubleshooting pages for GPU, Boot, Network, or Models.
Dashboard and Access
I can't reach the web dashboard
If the web interface does not load at https://neuraldrive.local, verify the following:
- Check the Local IP: The NeuralDrive console (TUI) displays the current IP address. Attempt to connect directly via
https://<IP_ADDRESS>. - Verify Port 443: Ensure your client machine can reach the NeuralDrive device on port 443. Some firewalls or router settings may block this traffic.
- Network Connection: Confirm the NeuralDrive device has an active Ethernet or Wi-Fi connection.
- mDNS Resolution: The
.localhostname requires mDNS (Avahi/Bonjour) support on the client. If your client is on a corporate network, mDNS may be filtered.
The API returns 401 Unauthorized
All API requests must include a valid Bearer token.
- Key Format: Verify your API key follows the
nd-xxxxformat. - Header Syntax: Ensure the header is sent correctly:
Authorization: Bearer nd-xxxx - Key Location: The system API key is stored in
/etc/neuraldrive/api.keyand can be viewed or rotated via the TUI.
System Persistence
My models disappeared after reboot
NeuralDrive is a live system. Data only survives reboots if a persistence partition is active.
- USB Mode Requirement: Persistence is only available when running from a USB drive with a labeled
persistencepartition. It does not function in "Live CD" (ISO only) mode. - Verify Mount: Run the following command to check if the persistence layer is active:
mount | grep persistence - Partition Health: If the partition is present but not mounting, check the filesystem integrity using
fsck.
Resource Management
The system is running out of memory
Large language models require significant RAM or VRAM. If the system becomes unresponsive or returns memory errors:
- Downsize the Model: Use a smaller model (e.g., 7B instead of 70B).
- Increase Quantization: Use a more compressed version of the model (e.g., Q4_K_M instead of Q8_0 or FP16).
- Check Swap: NeuralDrive uses zram for compressed swap. Verify it is active with
zramctl.
Model download is extremely slow or fails
- Storage Space: Model downloads will abort if the disk is full. Check availability with
df -h /var/lib/neuraldrive. - Network Stability: Ensure the device has a stable connection to the internet. Downloads are resumed automatically if interrupted, but high latency can cause timeouts.
- Proxy Settings: If you are behind a corporate proxy, ensure the environment variables are correctly set in
/etc/neuraldrive/ollama.conf.
Service Status
The TUI shows 'Ollama Offline'
If the inference engine is not responding:
- Check Service Status:
systemctl status neuraldrive-ollama - Restart the Service:
systemctl restart neuraldrive-ollama - Check Logs: If the service fails to start, inspect the logs for GPU driver or initialization errors:
journalctl -u neuraldrive-ollama -e
Diagnostics
NeuralDrive-Check
NeuralDrive includes a dedicated diagnostic tool for rapid health assessment. Run this from the console or via SSH:
/usr/bin/neuraldrive-check
This tool verifies:
- GPU driver initialization and VRAM availability.
- Persistence partition mount status.
- Core service health (Ollama, WebUI, Caddy).
- Network connectivity and API key validity.
Tip: Always run
neuraldrive-checkbefore seeking manual support, as it identifies 90% of common configuration errors.
Audience: Everyone
GPU Troubleshooting
NeuralDrive is optimized for hardware-accelerated inference. If the system fails to detect or utilize your GPU, it will fallback to CPU inference, which is significantly slower.
Detection Issues
No GPU detected
If NeuralDrive does not recognize your hardware, verify the following system configurations:
- Hardware Compatibility: Ensure your GPU is listed in the Hardware Compatibility Matrix.
- BIOS Settings:
- IOMMU/VT-d: Ensure these are enabled for proper PCI communication.
- Above 4G Decoding: Must be enabled for modern GPUs (RTX 30-series and newer).
- Resizable BAR: Recommended for improved performance, though not required for detection.
- Secure Boot: NVIDIA drivers require MOK (Machine Owner Key) enrollment to function with Secure Boot. If you cannot enroll the key, disable Secure Boot in the BIOS.
- Mixed Vendors: Mixed-vendor configurations (e.g., one NVIDIA and one AMD card) are not supported. The first vendor detected by the boot sequence will be initialized.
Nouveau Conflict
NeuralDrive automatically blacklists the open-source nouveau driver to prevent conflicts with the proprietary NVIDIA stack. To verify:
lsmod | grep nouveau
If the command returns any output, the blacklist failed. Check /etc/modprobe.d/neuraldrive-blacklist.conf.
Driver and Hardware Support
missing nvidia-uvm module
If Ollama reports CPU-only inference despite having an NVIDIA GPU, the nvidia-uvm (Unified Video Memory) kernel module may not be loaded. This module is essential for CUDA memory allocation.
- Verify Module: Check if the module is loaded:
lsmod | grep nvidia_uvm - Naming Convention: On NeuralDrive (based on Debian), the module is named
nvidia-current-uvm. - Manual Load: If missing, load it manually:
sudo modprobe nvidia-current-uvm && nvidia-modprobe -u - Automatic Loading: NeuralDrive should load this automatically at boot via
/etc/modules-load.d/nvidia-uvm.conf. If it fails, check thejournalctl -u neuraldrive-ollamalogs forExecStartPrefailures.
Without this module, /dev/nvidia-uvm device nodes will be missing, causing CUDA calls to fail silently and Ollama to fall back to CPU.
cgroup v2 / DeviceAllow blocking
If the GPU is detected by the system but Ollama still falls back to CPU inference, systemd DeviceAllow rules might be blocking access.
- cgroup v2 Behavior: On systems using cgroup v2,
DeviceAllowuses eBPF device filters. These filters can block CUDA access even when explicit allow rules for/dev/nvidia*and/dev/dri/*are present. - NeuralDrive Default: The default NeuralDrive Ollama service has all
DeviceAllowlines removed to prevent this. - Custom Units: If you have modified the service unit and re-added
DeviceAllowrules, remove them and ensurePrivateDevices=nois set to restore GPU access.
Diagnostic Tools
NeuralDrive provides several utilities to inspect GPU state:
- NVIDIA: Run
nvidia-smito view VRAM usage, temperature, and driver version. - AMD: Run
rocm-smito inspect ROCm status and device health. - System Config: The file
/run/neuraldrive/gpu.confis generated at boot byneuraldrive-gpu-detect.service. It contains the detected vendor:cat /run/neuraldrive/gpu.conf - PCI Enumeration: Use
lspci | grep -i vgato see if the kernel sees the hardware at the bus level.
Recovery and Safe Mode
Safe Mode Boot
If a GPU driver causes a system hang or kernel panic during boot, use the Safe Mode option in the GRUB menu.
- Effect: Skips all GPU detection and driver loading.
- Result: The system will boot with generic VGA drivers and use CPU-only inference.
- Usage: Ideal for troubleshooting BIOS settings or extracting logs when the GPU is failing.
Warning: Running in Safe Mode will result in extremely high CPU usage and latency during model inference.
Mixed Vendor Support
Current NeuralDrive releases only support single-vendor clusters. If you have multiple GPUs, they must all be from the same manufacturer (e.g., all NVIDIA or all AMD). Ollama will automatically distribute model layers across all compatible GPUs of the same vendor to maximize VRAM utilization.
Note: For detailed kernel parameters related to GPU management, see Boot Parameters.
Audience: Everyone
Boot Failures
This section addresses issues that prevent NeuralDrive from reaching the console or dashboard.
BIOS and UEFI Compatibility
NeuralDrive is distributed as an isohybrid image, which supports both legacy BIOS and modern UEFI boot modes.
- Preferred Mode: UEFI is highly recommended for compatibility with modern GPU drivers and Secure Boot.
- Boot Order: Ensure the USB flash drive is set as the primary boot device in your system firmware.
- USB Port: If the drive is not detected, try a different USB port (e.g., USB 2.0 instead of 3.x).
GRUB Menu Options
When the system starts, the GRUB menu provides three primary boot entries:
- Normal: Standard boot with full GPU detection and driver initialization.
- Safe Mode: Skips GPU driver loading and PCI enumeration. Use this if the system hangs during boot.
- Debug Mode: Appends
debugandverboseto the kernel command line, providing detailed systemd output for troubleshooting.
Common Boot Errors
Black Screen
If the system hangs with a black screen or blinking cursor shortly after selecting a boot entry:
- Driver Issue: This is typically caused by a GPU driver conflict. Reboot and select Safe Mode.
- Kernel Modesetting: If Safe Mode fails, try editing the boot parameter (press 'e' in GRUB) and add
nomodeset.
"No bootable device"
If the hardware fails to recognize the USB drive entirely:
- Flash Verification: The image may have been written incorrectly. Re-flash the USB drive and enable "Verify write" in your flashing utility.
- GPT vs MBR: Ensure your BIOS is set to match the partition style of the flash drive.
Kernel Panic
If the boot process halts with a "Kernel Panic" or "VFS: Unable to mount root fs":
- Corrupt Image: This usually indicates a bad write to the USB drive. Re-flash using a high-quality USB 3.x drive.
- Memory Corruption: In rare cases, this may indicate faulty RAM. Run a MemTest86+ cycle to verify hardware health.
Note: For a complete list of supported kernel arguments, see Boot Parameters. For instructions on creating the bootable media, see Writing the USB Drive.
Audience: Everyone
Network & mDNS Troubleshooting
This guide addresses connectivity issues between your client machine and the NeuralDrive appliance.
mDNS and Hostname Resolution
neuraldrive.local doesn't resolve
The neuraldrive.local address uses Multicast DNS (mDNS) for discovery.
- Client Software: Ensure your client has mDNS support. Windows (via Bonjour), macOS (native), and Linux (Avahi) are supported.
- Network Hardware: Some routers or managed switches block multicast traffic (UDP 5353).
- Corporate Networks: mDNS is often disabled or filtered on enterprise-grade networks.
- Workaround: Use the direct IP address shown on the NeuralDrive console (TUI).
TLS and Certificates
HTTPS certificate warning
NeuralDrive uses self-signed certificates for end-to-end encryption. Browsers will display a "Your connection is not private" warning.
- Expected Behavior: This warning is expected when using the default self-signed CA.
- CA Installation: To resolve this, download the CA certificate and add it to your browser or OS trust store.
- Path:
/etc/neuraldrive/tls/neuraldrive-ca.crt - URL:
https://<IP>:8443/system/ca-cert
- Path:
- Custom Certificates: You can replace the default certificates in
/etc/neuraldrive/tls/with your own.
Wi-Fi Configuration
- Supported hardware: Most Intel and Realtek Wi-Fi chipsets are supported via NetworkManager.
- TUI Configuration: Use the "Network" menu in the TUI to scan and connect to access points.
- Command Line: Advanced users can use
nmcli dev wifi connect <SSID> password <PASSWORD>for manual association.
Firewall and Ports
Can't connect from another machine
- Port Access: Ensure the following ports are open on the host firewall (nftables):
- 443: Web UI Dashboard.
- 8443: API Gateway and System Panel.
- Ping Test: Verify basic ICMP connectivity with
ping <IP_ADDRESS>. If pings are successful but port 443 fails, the web server (Caddy) may be offline.
Note: For more information on configuring networking and firewall rules, see Network Configuration and TLS Certificate Trust.
Audience: Everyone
Model Loading Issues
This guide covers issues related to downloading, loading, and running Large Language Models (LLMs).
Download Failures
"Downloads disabled"
If the system prevents downloading new models, it has likely detected it is running in "Live CD" (read-only) mode.
- USB Mode Requirement: You must run the system from a USB drive with a valid
persistencepartition to save downloaded models. - External Storage: If persistence is not available, you can mount an external disk to
/var/lib/neuraldrive/models/to store downloads.
Slow Download Speed
Model downloads depend on your internet connection and the responsiveness of the source registry.
- Storage Check: Downloads will abort if the target storage is full. Check free space with
df -h /var/lib/neuraldrive/models/. - Resuming: If a download is interrupted, NeuralDrive will automatically attempt to resume from the last successful byte.
Execution Issues
"Model failed to load"
This error occurs when the model weights cannot be loaded into available memory.
- Insufficient VRAM: The model is too large for your GPU. Try a smaller model or a version with higher quantization (e.g., Q4_K_M).
- RAM Fallback: If VRAM is exhausted, Ollama may attempt to load portions of the model into system RAM. This process is very slow and can lead to a load timeout.
Slow Inference
If the model is generating text very slowly (less than 1 token per second):
- CPU Fallback: The system is likely running on the CPU. Check if your GPU was detected:
cat /run/neuraldrive/gpu.conf
Concurrent Model Loading
Multiple models can be loaded simultaneously if your hardware supports it.
- Automatic Management: NeuralDrive uses
OLLAMA_MAX_LOADED_MODELS=0(auto) by default. Ollama manages concurrent loading based on your available VRAM. - Model Eviction: When VRAM is exhausted, least-recently-used models are automatically evicted from memory to make room for new ones.
- Manual Control: You can manually load or unload models via the TUI Models screen (accessible with the F2 key).
Management and Corruption
"Model not found"
- Exact Naming: Models must be called by their full tag (e.g.,
llama3:8b). - Verify Inventory: Use the TUI or the following API command to list all locally available models:
curl -H "Authorization: Bearer nd-xxxx" https://<IP>:8443/api/tags
Model Corruption
If a model loads but produces garbled output or crashes the service:
- Delete and Re-pull: Remove the corrupted model and download it again via the TUI or API:
curl -H "Authorization: Bearer nd-xxxx" \ -X DELETE https://<IP>:8443/api/delete -d '{"name":"model_name"}'
Note: For specific model recommendations based on your hardware, see Model Recommendations. For GPU-specific issues, consult GPU Problems.
Audience: Everyone
Frequently Asked Questions
This guide provides answers to common questions about NeuralDrive's capabilities, architecture, and administration.
Installation and Compatibility
Can I install NeuralDrive to a hard drive?
NeuralDrive is a live system designed to run from removable media. It is not intended for standard disk installation. Persistence of configuration and models is achieved through a dedicated partition on the USB drive. This ensures the host system remains untouched and the appliance remains portable.
Can I run NeuralDrive in a Virtual Machine (VM)?
Yes. You can run NeuralDrive in a VM using the ISO image. For optimal performance, you must use GPU passthrough to give the VM direct access to the host hardware. If GPU passthrough is not available, NeuralDrive will fallback to CPU-only mode, which is significantly slower but functional.
Can I use NeuralDrive without a GPU?
Yes. If no compatible GPU is detected, NeuralDrive will automatically fallback to CPU inference. While functional, the performance will be substantially lower than GPU-accelerated modes. This is ideal for lightweight testing or running small models on high-performance CPUs.
Usage and Administration
How do I add more users?
NeuralDrive uses Open WebUI for its primary dashboard. You can manage users through the Admin Panel at https://<IP_ADDRESS>/admin. Note that user registration is disabled by default to maintain local security; the admin must manually create or approve new user accounts.
Can I use NeuralDrive offline?
NeuralDrive is fully offline-capable. Once models are downloaded, no internet connection is required for inference, API access, or dashboard usage. For entirely air-gapped operations, you can pre-load models onto the persistence partition of your USB drive before moving to the target environment.
How is this different from running Ollama directly?
NeuralDrive is a turnkey appliance that eliminates the complexity of system setup. It includes:
- An optimized operating system with a minimal attack surface.
- Automated GPU driver detection and configuration.
- A built-in web dashboard, API gateway, and system monitoring.
- Self-signed TLS encryption for all traffic.
- Pre-configured firewall and security hardening.
- Portability on a single USB drive.
Can I use my own TLS certificate?
Yes. You can replace the default self-signed certificates in /etc/neuraldrive/tls/ with your own PEM-formatted certificates. After replacing the files, restart the Caddy service to apply the changes:
systemctl restart neuraldrive-caddy
Is there any telemetry or data collection?
No. NeuralDrive is designed for privacy and local-first operations. No data, usage metrics, or telemetry are ever sent to external servers.
Hardware and Performance
What models are recommended?
Model performance depends entirely on your available VRAM and RAM. See Model Recommendations for a list of tested models and their hardware requirements.
Can I use multiple GPUs?
Yes. Ollama automatically detects and utilizes all available GPUs from the same vendor. It will distribute model layers across GPUs to maximize VRAM utilization, enabling the execution of models that are too large for a single card.
Note: For more information on hardware compatibility, see the Hardware Matrix. For advanced system settings, see Boot Parameters.
Audience: Everyone
Hardware Compatibility Matrix
This document provides a comprehensive list of supported GPU hardware and their corresponding compute stacks within the NeuralDrive environment.
Supported Graphics Hardware
| Vendor | Family | Example Cards | Compute Stack | Driver | Status |
|---|---|---|---|---|---|
| NVIDIA | Turing | RTX 2060-2080, T4 | CUDA 12.x | 535+ | Supported |
| NVIDIA | Ampere | RTX 3060-3090, A100, A2000 | CUDA 12.x | 535+ | Supported |
| NVIDIA | Ada Lovelace | RTX 4060-4090, L40 | CUDA 12.x | 535+ | Supported |
| NVIDIA | Hopper | H100 | CUDA 12.x | 535+ | Supported |
| AMD | RDNA 3 | RX 7600-7900 XTX | ROCm 6.x | amdgpu | Supported |
| AMD | CDNA 2/3 | MI250, MI300 | ROCm 6.x | amdgpu | Supported |
| Intel | Arc Alchemist | A770, A750 | oneAPI/SYCL | i915 | Experimental |
| CPU-only | Any x86_64 | Any | N/A | N/A | Supported (slow) |
VRAM Recommendations
The table below outlines the minimum VRAM requirements for common model sizes at various quantization levels.
| Model Size | Quantization (Q4_K_M) | Quantization (Q8_0) | Full Weights (FP16) |
|---|---|---|---|
| 7B - 8B | 6 GB | 10 GB | 16 GB |
| 13B - 14B | 10 GB | 16 GB | 32 GB |
| 30B - 34B | 24 GB | 40 GB | 64 GB |
| 70B | 48 GB | 80 GB | 140 GB |
Note: These values are estimates and do not include the memory required for context window overhead (KV cache). High context lengths will increase VRAM consumption.
Important Hardware Considerations
- Secure Boot: NVIDIA drivers require MOK (Machine Owner Key) enrollment or Secure Boot to be disabled. NeuralDrive uses DKMS-based driver installation.
- Bus Interface: PCIe 4.0 or 5.0 is recommended to minimize latency during model loading and context ingestion.
- Power Supply: Ensure your power supply (PSU) is rated for the peak power consumption of your GPU(s) during inference.
- Cooling: GPUs can generate significant heat during long-running inference tasks. Ensure your system has adequate thermal management.
Note: For detailed hardware setup instructions, see Hardware Requirements. If your GPU is not detected, consult the GPU Troubleshooting guide.
Audience: Admin
Configuration Files
This document provides a complete reference for all critical configuration and state files within the NeuralDrive appliance.
Master File Inventory
| File | Purpose | Format | Owner |
|---|---|---|---|
/etc/neuraldrive/ollama.conf | Ollama baked-in defaults | KEY=VALUE | root:neuraldrive-admin |
/var/lib/neuraldrive/config/ollama.conf | Persistent Ollama overrides | KEY=VALUE | root:neuraldrive-admin |
/etc/neuraldrive/config.yaml | TUI overlay fallback config | YAML | root:neuraldrive-admin |
/var/lib/neuraldrive/config/config.yaml | Persistent TUI configuration | YAML | root:neuraldrive-admin |
/var/lib/neuraldrive/config/api.key | Persistent API key | plaintext | root:root (600) |
/etc/neuraldrive/api.key | System API key (synced) | plaintext | root:root (600) |
/var/lib/neuraldrive/config/credentials.conf | Persistent credentials | KEY=VALUE | root:root (600) |
/etc/neuraldrive/webui.env | Open WebUI configuration | KEY=VALUE | root:neuraldrive-admin |
/etc/neuraldrive/caddy.env | Caddy API key environment | KEY=VALUE | root:neuraldrive-admin |
/etc/neuraldrive/api.env | System API environment | KEY=VALUE | root:neuraldrive-admin |
/etc/neuraldrive/Caddyfile | Caddy reverse proxy configuration | Caddyfile | root:neuraldrive-caddy |
/etc/neuraldrive/nftables.conf | Global firewall rules | nftables | root:root |
/etc/neuraldrive/neuraldrive-models.yaml | Model catalog definitions | YAML | root:neuraldrive-admin |
/etc/neuraldrive/version | Build version string | plaintext | root:root |
/etc/neuraldrive/tls/server.crt | System TLS certificate | PEM | root:neuraldrive-caddy |
/etc/neuraldrive/tls/server.key | System TLS private key | PEM | root:neuraldrive-caddy (600) |
/etc/neuraldrive/tls/neuraldrive-ca.crt | Root CA for clients | PEM | root:root (644) |
/etc/neuraldrive/firewall-custom.conf | User-defined firewall rules | nftables | root:root |
/run/neuraldrive/gpu.conf | GPU detection results (at boot) | KEY=VALUE | root:root (runtime) |
Key Configuration Reference
ollama.conf
Defines the behavior of the underlying LLM inference engine. The Ollama service uses two configuration sources:
/etc/neuraldrive/ollama.conf— baked-in system defaults./var/lib/neuraldrive/config/ollama.conf— persistent user overrides.
Values in the persistent file override the system defaults.
OLLAMA_HOST=127.0.0.1:11434
OLLAMA_MODELS=/var/lib/neuraldrive/models/
OLLAMA_KEEP_ALIVE=5m
OLLAMA_MAX_LOADED_MODELS=0
OLLAMA_NUM_PARALLEL=1
OLLAMA_MAX_LOADED_MODELS: Set to 0 for "auto" mode. Ollama automatically manages how many models stay loaded based on available VRAM, using Least Recently Used (LRU) eviction when memory is required for a new request.
webui.env
Configures the Open WebUI chat interface and authentication.
OLLAMA_BASE_URL=http://localhost:11434
DATA_DIR=/var/lib/neuraldrive/webui
ENABLE_SIGNUP=false
DEFAULT_USER_ROLE=user
WEBUI_AUTH=true
WEBUI_NAME=NeuralDrive
ENABLE_EASTER_EGGS=false
api.key
This file contains the master authentication token (nd-xxxx) used to secure both the inference API and the system management API. It is generated during the first-boot initialization and should be treated with high sensitivity.
Caddyfile
NeuralDrive's reverse proxy configuration. It manages TLS termination and routing to internal services.
Note: For instructions on using your own TLS certificates, see TLS Certificates. To modify boot-time behavior, consult Boot Parameters.
Audience: Admin
Boot Parameters
NeuralDrive supports several kernel command-line parameters to modify system behavior during the boot process.
Supported Parameters
| Parameter | Default | Description |
|---|---|---|
neuraldrive.ssh=1 | unset | Enables the SSH server on boot. Required for remote management. |
neuraldrive.safe=1 | unset | Safe Mode: Skips GPU detection and driver loading. Forces CPU inference. |
neuraldrive.debug=1 | unset | Enables verbose logging during the entire boot sequence. |
neuraldrive.ip=<IP> | DHCP | Sets a static IP address for the primary network interface. |
persistence | set | Enables the persistence partition if detected on the USB media. |
toram | unset | Loads the entire system image into system RAM. Required for CD mode. |
nomodeset | unset | Disables Kernel Mode Setting (KMS), providing a generic VGA console. |
How to Modify Parameters
One-time Modification
To temporarily add a parameter during boot:
- Reboot the system and wait for the GRUB menu to appear.
- Press
eto edit the current boot entry. - Locate the line starting with
linux. - Add your parameter(s) to the end of that line.
- Press
Ctrl+XorF10to boot with the modified parameters.
Permanent Modification
To permanently change boot parameters in a custom NeuralDrive image, you must modify the GRUB configuration template before generating the final ISO:
- Edit
/boot/grub/grub.cfgin your source directory. - Update the
linuxlines with your desired defaults.
Warning: Modifying the
persistenceparameter may result in data loss if not configured correctly for your target media.
Note: For issues related to system startup, consult the Boot Failures troubleshooting guide.
Audience: Admin
Service Reference
This document provides a detailed overview of the systemd services that power the NeuralDrive appliance.
Master Service Inventory
| Service | Type | User | Port | Dependencies | Description |
|---|---|---|---|---|---|
neuraldrive-setup | oneshot | root | — | network.target | First-boot initialization and state generation. |
neuraldrive-gpu-detect | oneshot | root | — | — | GPU auto-detection and driver selection via PCI enumeration. |
neuraldrive-certs | oneshot | root | — | network-online, local-fs | Generates self-signed TLS certificates (skips if they already exist). |
neuraldrive-zram | oneshot | root | — | local-fs | Sets up compressed RAM-based swap space. |
neuraldrive-show-ip | oneshot | root | — | network-online | Displays the current IP address on the physical console. |
neuraldrive-ollama | long-running | neuraldrive-ollama | 11434 | gpu-detect | The GPU-accelerated LLM inference and model management engine. |
neuraldrive-webui | long-running | neuraldrive-webui | 3000 | ollama | The Open WebUI dashboard and chat interface. |
neuraldrive-caddy | long-running | neuraldrive-caddy | 443, 8443 | certs | The TLS reverse proxy and API gateway. |
neuraldrive-gpu-monitor | long-running | neuraldrive-monitor | 1312 | gpu-detect | Monitors GPU temperature, VRAM usage, and health. |
neuraldrive-system-api | long-running | neuraldrive-api | 3001 | network | The backend service for the System Management API. |
neuraldrive-storage-monitor | long-running | root | — | local-fs | Monitors available storage space and persistence health. |
Systemd Hardening Summary
All NeuralDrive services are configured with systemd-native security hardening to minimize the system attack surface:
- PrivateDevices: Most services are denied access to
/dev/nodes. The Ollama service specifically usesPrivateDevices=noto allow access to GPU device nodes required for hardware acceleration. - DeviceAllow: This directive was removed from the Ollama service because cgroup v2 eBPF filters can block CUDA access even when devices are explicitly allowed.
- ProtectSystem: The root filesystem is mounted read-only for service processes.
- ProtectHome: Services have no access to the
/home/directory. - NoNewPrivileges: Prevents processes from gaining elevated permissions via
setuidorsetgid. - RestrictAddressFamilies: Limits network communication to only necessary protocols (e.g.,
AF_INET,AF_INET6,AF_UNIX).
Tip: You can monitor the status and logs of any service using the
systemctl status <service>andjournalctl -u <service>commands. For a list of common service-related issues, see the Common Issues troubleshooting guide.
Audience: Developers
API Endpoint Reference
NeuralDrive provides two primary interfaces for model inference: an OpenAI-compatible API for standard tool integration and the native Ollama API for low-level control.
Authentication
All API requests must include the nd-xxxx API key in the Authorization header:
Authorization: Bearer nd-xxxx
OpenAI-Compatible API
Base URL: https://<IP_ADDRESS>:8443/v1/
| Method | Path | Description |
|---|---|---|
POST | /v1/chat/completions | Chat completions (supports streaming). |
POST | /v1/completions | Text completions for non-chat models. |
GET | /v1/models | Lists all available local models. |
POST | /v1/embeddings | Generates vector embeddings for a given input. |
Chat Completion Example
curl https://neuraldrive.local:8443/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer nd-xxxx" \
-d '{
"model": "llama3:8b",
"messages": [
{"role": "user", "content": "How do I secure an API?"}
]
}'
Native Ollama API
Base URL: https://<IP_ADDRESS>:8443/api/
| Method | Path | Description |
|---|---|---|
POST | /api/generate | Low-level text generation. |
POST | /api/chat | Native chat completion format. |
GET | /api/tags | List locally installed model tags. |
POST | /api/pull | Download a new model from the registry. |
POST | /api/show | Retrieve detailed model metadata. |
DELETE | /api/delete | Remove a local model. |
POST | /api/copy | Create a copy or alias of a model. |
Native Chat Example
curl https://neuraldrive.local:8443/api/chat \
-H "Authorization: Bearer nd-xxxx" \
-d '{
"model": "llama3:8b",
"messages": [
{"role": "user", "content": "Explain quantization."}
],
"stream": false
}'
Note: For information on how to manage the NeuralDrive system itself (logs, services, networking), see the System Management API reference.
Audience: Admin / Developers
System Management API Reference
NeuralDrive provides a dedicated management API for monitoring health, controlling services, and configuring the underlying system.
Authentication
All management requests must include the system API key (found in /etc/neuraldrive/api.key) in the Authorization header:
Authorization: Bearer nd-xxxx
Base URL: https://<IP_ADDRESS>:8443/system/
Health and Status
GET /system/status
Returns high-level system metrics and version information.
Response Schema:
{
"hostname": "string",
"cpu_percent": 12.5,
"memory": {
"total_gb": 32,
"used_percent": 45.1
},
"disk": {
"models": {
"total": "512G",
"used": "120G",
"free": "392G"
},
"total": "1T"
},
"uptime_seconds": 86400,
"version": "v1.2.0"
}
GET /system/gpu
Reports detected GPU hardware and real-time utilization.
Response Schema:
{
"vendor": "NVIDIA",
"devices": [
{
"name": "RTX 4090",
"vram_total_mb": 24576,
"vram_used_mb": 4096,
"temp_c": 55
}
]
}
Service Management
GET /system/services
Lists all core services and their current runtime status.
POST /system/services/{name}/
Controls a specific system service.
- Actions:
start,stop,restart - Allowed Services:
neuraldrive-ollama,neuraldrive-webui,neuraldrive-caddy,neuraldrive-gpu-monitor,neuraldrive-system-api,neuraldrive-certs.
GET /system/logs
Retrieves recent journal logs for a specific service.
- Query Parameters:
service=ollama,lines=50
System Configuration
GET /system/network
Returns current network configuration including interface IPs, hostname, and mDNS status.
POST /system/network/hostname
Updates the system hostname.
- Query Parameters:
hostname=new-name
POST /system/api-keys/rotate
Generates a new master API key and invalidates the previous one.
- Warning: This will immediately break existing client integrations until they are updated with the new key.
GET /system/ca-cert
Downloads the root CA certificate used for TLS signing.
- Note: This endpoint does not require authentication.
Note: For information on LLM inference, see the API Endpoint Reference. For details on the system architecture, see Service Reference.
Audience: Admin
Port Reference
This document outlines the network ports utilized by NeuralDrive for internal communication and external access.
Port Assignment Matrix
| Port | Protocol | Service | Exposure | Description |
|---|---|---|---|---|
| 443 | TCP | neuraldrive-caddy | External | Primary Web UI dashboard access (HTTPS). |
| 8443 | TCP | neuraldrive-caddy | External | API gateway and System Management Panel (HTTPS). |
| 5353 | UDP | avahi-daemon | External | mDNS discovery for neuraldrive.local resolution. |
| 22 | TCP | sshd | External | Optional SSH access (rate-limited, must be enabled via boot parameter). |
| 11434 | TCP | neuraldrive-ollama | Internal only | The underlying Ollama inference API. |
| 3000 | TCP | neuraldrive-webui | Internal only | The Open WebUI dashboard backend. |
| 3001 | TCP | neuraldrive-system-api | Internal only | The System Management API backend. |
| 1312 | TCP | neuraldrive-gpu-monitor | Internal only | The GPU Hot health and telemetry monitor. |
Exposure Definitions
- External: These ports are open on the system firewall (
nftables) and are accessible from other machines on the local network. - Internal only: These ports are bound exclusively to the
localhost(127.0.0.1) interface. They are not reachable from the network. Caddy acts as a secure reverse proxy to these services, providing TLS termination and authentication.
Warning: Never modify the internal port bindings, as they are hard-coded into the NeuralDrive security model.
Note: For more information on configuring the firewall, see Firewall Configuration. For details on the API endpoints exposed via these ports, see API Endpoint Reference.
Audience: Everyone
Glossary
This alphabetical list defines technical terms and concepts utilized throughout the NeuralDrive documentation.
- API Key: A unique authentication token (
nd-xxxx) used to secure access to the inference and system management APIs. - Avahi: A system that facilitates service discovery on a local network via mDNS. It allows the
neuraldrive.localhostname to resolve without a central DNS server. - Caddy: A high-performance, memory-safe web server that serves as NeuralDrive's reverse proxy, managing TLS encryption and request routing.
- CUDA: NVIDIA's parallel computing platform and programming model that enables hardware acceleration on NVIDIA GPUs.
- GGUF: The primary file format used by NeuralDrive for storing and distributing quantized LLM weights. It is optimized for fast loading and efficient memory usage.
- Inference: The process of using a trained machine learning model to generate an output (e.g., text, images, or embeddings) based on input data.
- Live System: An operating system designed to boot and run entirely from removable media (like a USB drive) without requiring installation to a permanent hard disk.
- LUKS: Linux Unified Key Setup. The standard for Linux disk encryption, used by NeuralDrive to secure data on the persistence partition.
- mDNS: Multicast DNS. A protocol that resolves hostnames in small networks that do not have a dedicated local DNS server.
- Ollama: The underlying inference engine in NeuralDrive that manages downloading, loading, and serving large language models.
- Open WebUI: A feature-rich, self-hosted web interface that provides a user-friendly chat environment for interacting with local LLMs.
- Overlayfs: A union filesystem that allows NeuralDrive to layer a writable storage area (the persistence partition) over a read-only base system.
- Persistence: A dedicated writable partition on the NeuralDrive USB media that stores downloaded models, user accounts, and system configuration between reboots.
- Quantization: The process of reducing the precision of a model's weights (e.g., from 16-bit to 4-bit) to reduce its memory footprint and increase inference speed.
- RAG: Retrieval-Augmented Generation. A technique that combines LLM generation with external data retrieval to improve the accuracy and relevance of responses.
- ROCm: AMD's open-source software stack for GPU computing, enabling hardware acceleration on compatible AMD graphics cards.
- SquashFS: A highly compressed, read-only filesystem used for the base NeuralDrive operating system image.
- TUI: Terminal User Interface. The text-based management console that appears on the physical NeuralDrive device for initial setup and monitoring.
- VRAM: Video RAM. The high-speed memory dedicated to the GPU, which determines the maximum size of the model that can be hardware-accelerated.
- zram: A kernel feature that creates a compressed swap area in system RAM, increasing effective memory capacity for memory-intensive LLM tasks.