Back to Feed

Post banner

OpenRouter Pareto Routing, Ollama Vulkan, and MicroPython WASM

June 06, 2026
2 min read

OpenRouter shipped several enterprise-grade controls and new routing capabilities aimed at reducing latency and cost for production agents. The Pareto Code Router allows developers to set a min_coding_score to route to the cheapest model meeting a quality threshold, preventing coding agents from overpaying for good-enough outputs. Model Fusion is now available as an API plugin and server tool, enabling parallel routing to multiple models with synthesized responses for higher reliability. For security, Workspace Guardrails allow centralized enforcement of spend limits, provider allowlists, and zero-data retention policies per API key without code changes. Additionally, Private Models (Enterprise) now support routing to custom endpoints with standard guardrails and billing.

NVIDIA released Nemotron 3.5 Content Safety, a 4B parameter model unifying multimodal and multilingual safety checks. It supports custom policy enforcement via natural language specifications and optional reasoning traces (THINK mode) for auditable verdicts. Built on Gemma 3, it maintains low latency for real-time guardrails while covering 12 languages and zero-shot generalization to ~140 others. The release includes the training dataset and supports vLLM/SGLang for local deployment or NVIDIA NIM for optimized inference.

Ollama 0.30 significantly expands hardware compatibility and performance. Vulkan is now enabled by default, extending GPU acceleration to AMD and Intel devices. NVIDIA hardware sees up to 20% throughput improvements. The update enhances GGUF compatibility, allowing direct execution of models from Hugging Face via Modelfile FROM commands. New Nemotron 3 Ultra (55B active) is available on Ollama’s cloud, optimized for long-context agentic workflows with 1M token context and NVFP4 quantization.

Simon Willison released micropython-wasm, an alpha library for running Python code in a sandboxed WebAssembly environment using wasmtime. This addresses the need for safe plugin execution in Python applications like Datasette. The library supports persistent interpreter state, memory limits via wasmtime fuel, and controlled host function exposure. It allows developers to execute untrusted code with strict CPU/memory boundaries and no direct network or filesystem access, providing a robust alternative to traditional Python sandboxing methods for high-stakes plugin architectures.


Sources


This post was generated with the assistance of AI and reviewed through automated processes. AI can make mistakes. Readers should consult the original sources linked for complete context and verification.