Audience: Everyone

Glossary

This alphabetical list defines technical terms and concepts utilized throughout the NeuralDrive documentation.

  • API Key: A unique authentication token (nd-xxxx) used to secure access to the inference and system management APIs.
  • Avahi: A system that facilitates service discovery on a local network via mDNS. It allows the neuraldrive.local hostname to resolve without a central DNS server.
  • Caddy: A high-performance, memory-safe web server that serves as NeuralDrive's reverse proxy, managing TLS encryption and request routing.
  • CUDA: NVIDIA's parallel computing platform and programming model that enables hardware acceleration on NVIDIA GPUs.
  • GGUF: The primary file format used by NeuralDrive for storing and distributing quantized LLM weights. It is optimized for fast loading and efficient memory usage.
  • Inference: The process of using a trained machine learning model to generate an output (e.g., text, images, or embeddings) based on input data.
  • Live System: An operating system designed to boot and run entirely from removable media (like a USB drive) without requiring installation to a permanent hard disk.
  • LUKS: Linux Unified Key Setup. The standard for Linux disk encryption, used by NeuralDrive to secure data on the persistence partition.
  • mDNS: Multicast DNS. A protocol that resolves hostnames in small networks that do not have a dedicated local DNS server.
  • Ollama: The underlying inference engine in NeuralDrive that manages downloading, loading, and serving large language models.
  • Open WebUI: A feature-rich, self-hosted web interface that provides a user-friendly chat environment for interacting with local LLMs.
  • Overlayfs: A union filesystem that allows NeuralDrive to layer a writable storage area (the persistence partition) over a read-only base system.
  • Persistence: A dedicated writable partition on the NeuralDrive USB media that stores downloaded models, user accounts, and system configuration between reboots.
  • Quantization: The process of reducing the precision of a model's weights (e.g., from 16-bit to 4-bit) to reduce its memory footprint and increase inference speed.
  • RAG: Retrieval-Augmented Generation. A technique that combines LLM generation with external data retrieval to improve the accuracy and relevance of responses.
  • ROCm: AMD's open-source software stack for GPU computing, enabling hardware acceleration on compatible AMD graphics cards.
  • SquashFS: A highly compressed, read-only filesystem used for the base NeuralDrive operating system image.
  • TUI: Terminal User Interface. The text-based management console that appears on the physical NeuralDrive device for initial setup and monitoring.
  • VRAM: Video RAM. The high-speed memory dedicated to the GPU, which determines the maximum size of the model that can be hardware-accelerated.
  • zram: A kernel feature that creates a compressed swap area in system RAM, increasing effective memory capacity for memory-intensive LLM tasks.