vllm@0.5.0.post1 vulnerabilities

A high-throughput and memory-efficient inference and serving engine for LLMs

Direct Vulnerabilities

Known vulnerabilities in the vllm package. This does not include vulnerabilities belonging to this package’s dependencies.

Fix vulnerabilities automatically

Snyk's AI Trust Platform automatically finds the best upgrade path and integrates with your development workflows. Secure your code at zero cost.

Fix for free
VulnerabilityVulnerable Version
  • H
Arbitrary Code Injection

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Arbitrary Code Injection via the config class named Nemotron_Nano_VL_Config. An attacker can execute arbitrary code on the host system by publishing a repository with a crafted configuration that references a malicious backend through the auto_map field, which is then fetched and executed even if remote code execution is explicitly disabled.

How to fix Arbitrary Code Injection?

Upgrade vllm to version 0.11.1 or higher.

[,0.11.1)
  • M
Server-side Request Forgery (SSRF)

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Server-side Request Forgery (SSRF) via the load_from_url and load_from_url_async methods of the MediaConnector class, which fetch and process media from user-supplied URLs without sufficient restrictions on target hosts. An attacker can coerce the vLLM server into making arbitrary requests to internal network resources.

Note:

This vulnerability is particularly critical in containerized environments like llm-d, where a compromised vLLM pod could be used to scan the internal network, interact with other pods, and potentially cause denial of service or access sensitive data.

##Workaround

To address this vulnerability, it is essential to restrict the URLs that the MediaConnector can access. The principle of least privilege should be applied.

It is recommend to implement a configurable allowlist or denylist for domains and IP addresses.

  • Allowlist: The most secure approach is to allow connections only to a predefined list of trusted domains. This could be configured via a command-line argument, such as --allowed-media-domains. By default, this list could be empty, forcing administrators to explicitly enable external media fetching.

  • Denylist: Alternatively, a denylist could block access to private IP address ranges (127.0.0.1, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) and other sensitive domains.

A check should be added at the beginning of the load_from_url methods to validate the parsed hostname against this list before any connection is made.

How to fix Server-side Request Forgery (SSRF)?

Upgrade vllm to version 0.11.0 or higher.

[0.5.0,0.11.0)
  • H
Allocation of Resources Without Limits or Throttling

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Allocation of Resources Without Limits or Throttling through the chat_template and chat_template_kwargs parameters. An attacker can cause excessive CPU and memory consumption by submitting a crafted Jinja template via these parameters, leading to service unavailability.

How to fix Allocation of Resources Without Limits or Throttling?

Upgrade vllm to version 0.11.0 or higher.

[,0.11.0)
  • H
Covert Timing Channel

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Covert Timing Channel via the api_server component. An attacker can gain unauthorized access by exploiting differences in response times during API key validation.

How to fix Covert Timing Channel?

Upgrade vllm to version 0.11.0 or higher.

[,0.11.0)
  • H
Allocation of Resources Without Limits or Throttling

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Allocation of Resources Without Limits or Throttling due to an HTTP GET request with an extremely large header being sent. An attacker can exhaust server memory and cause a crash or unresponsiveness by sending such a request. The attack does not require authentication, making it exploitable by any remote user.

How to fix Allocation of Resources Without Limits or Throttling?

Upgrade vllm to version 0.10.1.1 or higher.

[,0.10.1.1)
  • M
Timing Attack

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Timing Attack due to the PageAttention mechanism. An attacker can observe timing differences to infer details about the processed data by analyzing the time variations in the TTFT (Time to First Token) in environments sharing the same backend.

How to fix Timing Attack?

Upgrade vllm to version 0.9.0 or higher.

[,0.9.0)
  • H
Deserialization of Untrusted Data

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Deserialization of Untrusted Data in the MessageQueue.dequeue() API function. An attacker can execute arbitrary code by sending a malicious payload to the message queue.

How to fix Deserialization of Untrusted Data?

There is no fixed version for vllm.

[0,)
  • C
Deserialization of Untrusted Data

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Deserialization of Untrusted Data in the _make_handler_coro() function used by the async RPC server for communication between nodes, which uses pickle for deserialization. An attacker can execute arbitrary code by sending serialized data from a malicious RPC client.

How to fix Deserialization of Untrusted Data?

Upgrade vllm to version 0.6.2 or higher.

[,0.6.2)
  • H
Allocation of Resources Without Limits or Throttling

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Allocation of Resources Without Limits or Throttling in outlines_logits_processors.py module, which uses a local cache with unbounded size by default. An attacker can occupy all space on the target system by sending a stream of decoding requests with different schemas, adding indefinitely to the outlines cache.

How to fix Allocation of Resources Without Limits or Throttling?

Upgrade vllm to version 0.8.0 or higher.

[,0.8.0)
  • L
Use of Weak Hash

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Use of Weak Hash due to the use of a predictable constant value in the Python 3.12 built-in hash function. An attacker can interfere with subsequent responses and cause unintended behavior by exploiting predictable hash collisions to populate the cache with prompts known to collide with another prompt in use.

How to fix Use of Weak Hash?

Upgrade vllm to version 0.7.2 or higher.

[,0.7.2)
  • H
Deserialization of Untrusted Data

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Deserialization of Untrusted Data via the hf_model_weights_iterator process due to the usage of the torch.load function with the weights_only parameter set to False, which is considered insecure. An attacker can execute arbitrary code during the unpickling process by supplying malicious pickle data.

How to fix Deserialization of Untrusted Data?

Upgrade vllm to version 0.7.0 or higher.

[,0.7.0)
  • M
Uncontrolled Resource Consumption ('Resource Exhaustion')

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Uncontrolled Resource Consumption ('Resource Exhaustion') via the best_of parameter. An attacker can cause the system to become unresponsive and prevent legitimate users from accessing the service by consuming excessive system resources.

How to fix Uncontrolled Resource Consumption ('Resource Exhaustion')?

Upgrade vllm to version 0.6.3 or higher.

[,0.6.3)
  • H
Improper Validation of Syntactic Correctness of Input

vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

Affected versions of this package are vulnerable to Improper Validation of Syntactic Correctness of Input in the process_model_inputs() and process_model_inputs_async() functions, accessible through the completions API. An attacker can crash the server by sending a request with and empty prompt, if a model that does not prepend any data (such as gpt2) is in use by the server. The crash will only happen if the processed prompt that is passed to these functions is still empty.

How to fix Improper Validation of Syntactic Correctness of Input?

Upgrade vllm to version 0.5.5 or higher.

[,0.5.5)