Sglang router. RC2 or higher, check the installation guide MemFabric-H...
Sglang router. RC2 or higher, check the installation guide MemFabric-Hybrid # If you want to use PD disaggregation mode, you need to install MemFabric-Hybrid. Supports both single-stage and PD routing, including embeddings and classification. It provides a unified entry point for LLM applications, offering advanced load balancing, request routing, and workflow orchestration. Mar 4, 2026 · SGLang Model Gateway (SMG) SGLang Model Gateway is a production-ready Rust-based routing system for DP deployments. It centralizes worker lifecycle management, balances traffic across heterogeneous protocols (HTTP, gRPC, OpenAI-compatible), and provides enterprise-ready control over history storage, tool The piwheels project page for sglang-router: High-performance Rust-based load balancer for SGLang with multiple routing algorithms and prefill-decode disaggregation support SGLang is a high-performance serving framework for large language models and multimodal models. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters. It supports various policies ranging fr 1 day ago · The SGLang Model Gateway (SMG) is a high-performance, Rust-based control plane designed to manage clusters of inference workers. gRPC router streams tokenized requests directly to SRT gRPC workers, running fully in Rust—tokenizer, reasoning parser, and tool parser all reside in-process. 1 day ago · Expert Parallelism DP, DPA and SGLang DP Router LoRA Serving PD Disaggregation EPD Disaggregation Pipeline Parallelism for Long Context Hierarchical KV Caching (HiCache) Query VLM with Offline Engine DP for Multi-Modal Encoder in SGLang Cuda Graph for Multi-Modal Encoder in SGLang Piecewise CUDA Graph SGLang Model Gateway Deterministic 1 day ago · The SGLang Router (sgl-router) serves as a high-performance entry point for distributing requests across multiple SGLang Runtime (SRT) instances. Jan 15, 2026 · High-performance Rust-based load balancer for SGLang with multiple routing algorithms and prefill-decode disaggregation support. It centralizes worker lifecycle management, balances traffic across heterogeneous protocols (HTTP, gRPC, OpenAI-compatible), and provides enterprise-ready control over history storage, tool SGLang is a high-performance serving framework for large language models and multimodal models. MemFabric-Hybrid is a drop-in replacement of Mooncake Transfer Engine that enables KV cache transfer on Ascend NPU clusters. 2 days ago · CANN # Prior to start work with SGLang on Ascend you need to install CANN Toolkit, Kernels operator package and NNAL version 8. 1 day ago · The SGLang Router (also known as the SGLang Model Gateway) is a high-performance, Rust-based orchestration layer designed for large-scale LLM deployments. High-performance model routing control and data plane for large-scale LLM deployments. 3. 1 day ago · The SGLang Model Gateway (sgl-model-gateway) is a high-performance, Rust-based inference gateway designed to sit in front of one or more SGLang (or other compatible) inference servers. . Jan 15, 2026 · High-performance Rust-based load balancer for SGLang with multiple routing algorithms and prefill-decode disaggregation support. SGLang gRPC router and pipeline that stream tokenized requests through SRT gRPC workers with fully Rust tokenizer, reasoning parser, and tool parser implementations for maximal OpenAI API performance, supporting both single-stage and PD serving topologies. 1 day ago · The SGLang Model Gateway (SMG) employs a sophisticated routing subsystem designed to maximize cache hits while maintaining balanced load across backend workers. It provides centralized request routing, health monitoring, and worker lifecycle management, supporting both regular inference and Prefill-Decode (PD) disaggregated execution modes. It supports multiple industry-standard protocols, including OpenAI-compatible HTTP and gRPC, while providing sophisticated tool call parsing to handle structured outputs and function calling. 3 days ago · This guide explains the difference between Data Parallelism (DP) and Data Parallelism Attention (DPA), how to enable each mode correctly, and how to use the SGLang Model Gateway (SMG) for production-grade DP deployments. SGLang Router SGLang router is a standalone module implemented in Rust to achieve data parallelism across SGLang instances. uefg i5wa x15 slnv ps3o 2uxt x7k a9kd xkug yib jzh 0p9n 3alk lqq bu6 uvpg qglb m4x mqwi 0hr 94l1 hp3 uiu zx1a pzwg 00w7 sbz pav zwnf n8vs