Back to Feed

AI Infrastructure Shifts: OpenRouter Fusion, NVIDIA Speech, and Ollama MLX

June 04, 2026

•

1 min read

OpenRouter has rolled out significant infrastructure and security updates for May, including Model Fusion (parallel multi-model routing with response synthesis), Workspace Guardrails (layered spend limits, zero-data retention, and 30+ OWASP prompt injection patterns), and Pareto Code Router (quality-bar-based cost optimization). They also launched Private Models for enterprise fine-tunes and expanded speech APIs with Whisper, GPT-4o Mini Transcribe, and Voxtral.

NVIDIA released Nemotron 3.5 ASR, a 600M-parameter streaming multilingual speech-to-text model supporting 40 locales from a single checkpoint. It features native punctuation/capitalization and a Cache-Aware FastConformer architecture for low-latency inference. Weights are open on Hugging Face, with full fine-tuning support for domain-specific accents or vocabulary.

ServiceNow released EVA-Bench Data 2.0, an open-source evaluation framework for voice agents. It expands to three enterprise domains (Airline CSM, ITSM, Healthcare HRSD) with 213 scenarios and 121 tools. The dataset includes structured user goals, initial database states, and ground truth outcomes to test agent reliability, authentication flows, and adversarial resilience.

Ollama 0.19 introduces MLX acceleration on Apple Silicon, leveraging unified memory for faster prefill and decode speeds. It adds support for NVIDIA’s NVFP4 quantization and improves caching for agentic workflows. Additionally, Stanford’s OpenJarvis framework is now available for Ollama, enabling local-first personal AI agents with built-in browser, code, and research presets.

OpenAI expanded access to GPT-Rosalind (life sciences research) to eligible global organizations, alongside new Life Sciences Research and NGS Analysis plugins for Codex. Microsoft announced MAI-Thinking-1 (1T parameters, 35B active) and MAI-Code-1-Flash (137B parameters, 5B active), noting their training on proprietary web crawls rather than distilled third-party data.

Sources

OpenRouter: https://openrouter.ai/announcements/may-release-spotlight
OpenRouter: https://openrouter.ai/announcements/guardrails
OpenAI: https://openai.com/index/introducing-new-capabilities-to-gpt-rosalind
Huggingface: https://huggingface.co/blog/nvidia/fine-tuning-nemotron-35-asr
Huggingface: https://huggingface.co/blog/ServiceNow-AI/eva-bench-data
Ollama: https://ollama.com/blog/openjarvis
Ollama: https://ollama.com/blog/mlx
SimonWillison: https://simonwillison.net/2026/Jun/2/microsofts-new-models/#atom-everything

This post was generated with the assistance of AI and reviewed through automated processes. AI can make mistakes. Readers should consult the original sources linked for complete context and verification.

Back to Feed

AI Infrastructure Shifts: OpenRouter Fusion, NVIDIA Speech, and Ollama MLX

Sources

Related Posts

OpenRouter Advisor, Ollama NVFP4, and Claude Fable Risks

Gemini 2.5 Flash and PyTorch Fusion: Optimizing AI Infrastructure