Infrastruktur
Hardware, databases, portability — the physical and logical foundation of your SovereignNode.
Infrastruktur
A single server. Local GPU. No cloud dependency. The SovereignNode is the heart of every AIMOS installation — a physical or virtual server that hosts all components.
Everything runs on-premise: LLM inference, databases, agent processes, and the communication channels. No byte leaves your network — unless you explicitly configure it (e.g., Telegram messages).
| Starter | Business | Professional | Enterprise | |
|---|---|---|---|---|
| Hardware | ||||
| GPU | RTX 4060 Ti 16 GB |
RTX 3090 / 5090 24–32 GB |
2× RTX 3090 NVLink 48 GB |
A100 / H100 80+ GB |
| AI Model | 14B (Q4) | 27B (Q4) | 70B (Q4) | 70B (Q4) + 9B Draft |
| Speculative Decoding | — | Optional on 5090: +4B Draft |
+4B Draft ~17K Context |
+9B Draft ~75K Context |
| Speed | ~30 Tok/s | ~35 Tok/s 5090+Spec: ~90 Tok/s |
~20 Tok/s +Spec: ~50 Tok/s |
~40 Tok/s +Spec: ~100 Tok/s |
| AI Agents | 2–4 | 5–10 5090+Spec: 10–20 |
5–10 | 15–30 |
| Technology | TurboQuant | TurboQuant + SGLang |
TurboQuant + NVLink + Spec. |
TurboQuant + SGLang + Spec. Decoding |
| Hardware approx. | from 1,200 EUR GPU ~400 EUR |
from 2,000 EUR 3090: ~700 | 5090: ~3,500 |
from 2,500 EUR 2× 3090 + NVLink |
on request A100: from ~3,500 used |
| Task Suitability | ||||
| ERP Queries | ||||
| Data Extraction | ||||
| Appointment Management | ||||
| Internal Support | ||||
| Document Search | ||||
| Customer Contact | ||||
| Technical Consulting | ||||
| Multilingual | ||||
| Compliance | ||||
Based on IFEval, MT-Bench, BFCL and Qwen/Llama Benchmarks (2024). Ubuntu 24.04/26.04 LTS, 16+ CPU cores recommended.
Architecture Overview
Dual-DB
AIMOS uses two database systems with clearly separated responsibilities:
Central message relay between Shared Listener, Orchestrator, and agents. Stores incoming messages, audit logs, PII Vault mappings, and session data. Multi-process capable through connection pooling.
Each agent has its own SQLite database with semantic, episodic, and procedural memory. Hybrid search via FTS5 + vector embeddings. Portable by simply copying the file.
Interoperability
AIMOS agents are portable, compatible, and interoperable through open standards.
The Open Agent Package format enables the complete export of an agent including memory, skills, and configuration as a portable archive.
The Model Context Protocol enables external LLMs (Claude, GPT, etc.) to access AIMOS skills as an optional additional interface — not the primary communication path.
Each agent publishes an Agent Card (JSON-LD) according to the Google A2A specification. External systems can query capabilities, input formats, and trust level.
Technical Highlights
No text hacks or regex parsing — AIMOS uses the native tool-calling API of the LLM. The agent controls systems directly, instead of just describing actions.
Speech recognition (Whisper STT) and speech synthesis (Piper TTS) in all languages — Agents understand voice messages and respond in the user's native language.
Every LLM call is captured: input/output tokens, latency, context utilization. Full cost transparency per agent, per conversation, per month.
Every agent knows who it is talking to on which channel. Telegram, email, and internal messages are cleanly separated — no mix-up between conversation partners.