Resource Exhaustion Affecting tritonserver-backend-vllm-cuda-12.9 package, versions <25.7.1_git20250821-r1


Severity

Recommended
low

Based on default assessment until relevant scores are available.

Threat Intelligence

EPSS
0.3% (53rd percentile)

Do your applications use this vulnerable package?

In a few clicks we can analyze your entire application and see what components are vulnerable in your application, and suggest you quick fixes.

Test your applications

Snyk Learn

Learn about Resource Exhaustion vulnerabilities in an interactive lesson.

Start learning
  • Snyk IDSNYK-CHAINGUARDLATEST-TRITONSERVERBACKENDVLLMCUDA129-12485244
  • published4 Sept 2025
  • disclosed21 Aug 2025

Introduced: 21 Aug 2025

NewCVE-2025-48956  (opens in a new tab)
CWE-400  (opens in a new tab)

How to fix?

Upgrade Chainguard tritonserver-backend-vllm-cuda-12.9 to version 25.7.1_git20250821-r1 or higher.

NVD Description

Note: Versions mentioned in the description apply only to the upstream tritonserver-backend-vllm-cuda-12.9 package and not the tritonserver-backend-vllm-cuda-12.9 package as distributed by Chainguard. See How to fix? for Chainguard relevant fixed versions and status.

vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user. This vulnerability is fixed in 0.10.1.1.