LocalAI - Models

voxtral-mini-4b-realtime

Voxtral Mini 4B Realtime is a speech-to-text model from Mistral AI. It is a 4B parameter model optimized for fast, accurate audio transcription with low latency, making it ideal for real-time applications. The model uses the Voxtral architecture for efficient audio processing.

Links

Tags

streaming-zipformer-en-sherpa

Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.

Links

Tags

neutts-air

NeuTTS Air is the world's first super-realistic, on-device TTS speech language model with instant voice cloning. Built on a 0.5B LLM backbone, it brings natural-sounding speech, real-time performance, and speaker cloning to local devices.

Links

https://github.com/neuphonic/neutts-air

Tags

qwen3-omni-30b-a3b-instruct

Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation model. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. This GGUF build runs on llama.cpp with the bundled mmproj for multimodal inputs.

Links

Tags

rfdetr-base

RF-DETR is a real-time, transformer-based object detection model architecture developed by Roboflow and released under the Apache 2.0 license. RF-DETR is the first real-time model to exceed 60 AP on the Microsoft COCO benchmark alongside competitive performance at base sizes. It also achieves state-of-the-art performance on RF100-VL, an object detection benchmark that measures model domain adaptability to real world problems. RF-DETR is fastest and most accurate for its size when compared current real-time objection models. RF-DETR is small enough to run on the edge using Inference, making it an ideal model for deployments that need both strong accuracy and real-time performance.

Links

https://github.com/roboflow/rf-detr

Tags

llama-3.2-chibi-3b

Small parameter LLMs are ideal for navigating the complexities of the Japanese language, which involves multiple character systems like kanji, hiragana, and katakana, along with subtle social cues. Despite their smaller size, these models are capable of delivering highly accurate and context-aware results, making them perfect for use in environments where resources are constrained. Whether deployed on mobile devices with limited processing power or in edge computing scenarios where fast, real-time responses are needed, these models strike the perfect balance between performance and efficiency, without sacrificing quality or speed.

Links

Tags

kubeguru-llama3.2-3b-v0.1

Kubeguru: Your Kubernetes & Linux Expert AI Ask anything about Kubernetes, Linux, containers—and get expert answers in real-time! Kubeguru is a specialized Large Language Model (LLM) developed and released by the Open Source team at Spectro Cloud. Whether you're managing cloud-native applications, deploying edge workloads, or troubleshooting containerized services, Kubeguru provides precise, actionable insights at every step.

Links

Tags

thedrummer_rivermind-12b-v1

Introducing Rivermind™, the next-generation AI that’s redefining human-machine interaction—powered by Amazon Web Services (AWS) for seamless cloud integration and NVIDIA’s latest AI processors for lightning-fast responses. But wait, there’s more! Rivermind doesn’t just process data—it feels your emotions (thanks to Google’s TensorFlow for deep emotional analysis). Whether you're brainstorming ideas or just need someone to vent to, Rivermind adapts in real-time, all while keeping your data secure with McAfee’s enterprise-grade encryption. And hey, why not grab a refreshing Coca-Cola Zero Sugar while you interact? The crisp, bold taste pairs perfectly with Rivermind’s witty banter—because even AI deserves the best (and so do you). Upgrade your thinking today with Rivermind™—the AI that thinks like you, but better, brought to you by the brands you trust. 🚀✨

Links

Tags

flux.2-klein-4b

The FLUX.2 [klein] model family are our fastest image models to date. FLUX.2 [klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM. FLUX.2 [klein] 4B is a 4 billion parameter rectified flow transformer capable of generating images from text descriptions and supports multi-reference editing capabilities.

Links

https://huggingface.co/black-forest-labs/FLUX.2-klein-4B

Tags

flux.2-klein-9b

The FLUX.2 [klein] model family are our fastest image models to date. FLUX.2 [klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM. FLUX.2 [klein] 9B is a 9 billion parameter rectified flow transformer capable of generating images from text descriptions and supports multi-reference editing capabilities.

Links

https://huggingface.co/black-forest-labs/FLUX.2-klein-4B

Tags

financial-gpt-oss-20b-q8-i1

### **Financial GPT-OSS 20B (Base Model)** **Model Type:** Causal Language Model (Fine-tuned for Financial Analysis) **Architecture:** Mixture of Experts (MoE) – 20B parameters, 32 experts (4 active per token) **Base Model:** `unsloth/gpt-oss-20b-unsloth-bnb-4bit` **Fine-tuned With:** LoRA (Low-Rank Adaptation) on financial conversation data **Training Data:** 22,250 financial dialogue pairs covering stocks (AAPL, NVDA, TSLA, etc.), technical analysis, risk assessment, and trading signals **Context Length:** 131,072 tokens **Quantization:** Q8_0 GGUF (for efficient inference) **License:** Apache 2.0 **Key Features:** - Specialized in financial market analysis: technical indicators (RSI, MACD), risk assessments, trading signals, and price forecasts - Handles complex financial queries with structured, actionable insights - Designed for real-time use with low-latency inference (GGUF format) - Supports S&P 500 stocks and major asset classes across tech, healthcare, energy, and finance sectors **Use Case:** Ideal for traders, analysts, and developers building financial AI tools. Use with caution—**not financial advice**. **Citation:** ```bibtex @misc{financial-gpt-oss-20b-q8, title={Financial GPT-OSS 20B Q8: Fine-tuned Financial Analysis Model}, author={beenyb}, year={2025}, publisher={Hugging Face Hub}, url={https://huggingface.co/beenyb/financial-gpt-oss-20b-q8} } ```

Links

https://huggingface.co/mradermacher/financial-gpt-oss-20b-q8-i1-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

voxtral-mini-4b-realtime

streaming-zipformer-en-sherpa

neutts-air

qwen3-omni-30b-a3b-instruct

rfdetr-base

llama-3.2-chibi-3b

kubeguru-llama3.2-3b-v0.1

thedrummer_rivermind-12b-v1

flux.2-klein-4b

flux.2-klein-9b

financial-gpt-oss-20b-q8-i1