vllm@0.22.1

A high-throughput and memory-efficient inference and serving engine for LLMs

  • latest version

    0.23.0

  • first published

    3 years ago

  • latest version published

    9 days ago

  • licenses detected

  • Direct Vulnerabilities

    Known vulnerabilities in the vllm package. This does not include vulnerabilities belonging to this package’s dependencies.

    Fix vulnerabilities automatically

    Snyk's AI Trust Platform automatically finds the best upgrade path and integrates with your development workflows. Secure your code at zero cost.

    Fix for free
    VulnerabilityVulnerable Version
    • M
    Interpretation Conflict

    vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

    Affected versions of this package are vulnerable to Interpretation Conflict in the image processing pipeline. An attacker can cause the model to interpret images differently from human expectations by supplying images with manipulated EXIF orientation or PNG tRNS transparency, potentially leading to misclassification or unintended model behavior.

    How to fix Interpretation Conflict?

    A fix was pushed into the master branch but not yet published.

    [0.11.0,)
    • H
    Improper Validation of Specified Type of Input

    vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

    Affected versions of this package are vulnerable to Improper Validation of Specified Type of Input due to improper validation of the temperature parameter while sampling. An attacker can cause the inference worker to crash or exhibit undefined behavior by supplying non-finite float values such as NaN or Infinity, which bypass validation and propagate to GPU kernels.

    How to fix Improper Validation of Specified Type of Input?

    A fix was pushed into the master branch but not yet published.

    [0,)
    • H
    Improper Handling of Highly Compressed Data (Data Amplification)

    vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

    Affected versions of this package are vulnerable to Improper Handling of Highly Compressed Data (Data Amplification) through the audio.py file. An attacker can cause excessive memory consumption by uploading a specially crafted compressed audio file that decompresses to a very large size, leading to resource exhaustion and potential service disruption.

    How to fix Improper Handling of Highly Compressed Data (Data Amplification)?

    A fix was pushed into the master branch but not yet published.

    [0,)
    • M
    Insertion of Sensitive Information into Log File

    vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

    Affected versions of this package are vulnerable to Insertion of Sensitive Information into Log File in the error handling process for certain API and WebSocket routes, where unsanitized exception messages containing sensitive memory addresses are returned in response bodies. An attacker can obtain internal memory address information by submitting malformed image data or triggering exceptions that cause object representations to be included in error messages.

    Note: This issue remains due to an incomplete fix for CVE-2026-22778.

    How to fix Insertion of Sensitive Information into Log File?

    A fix was pushed into the master branch but not yet published.

    [0,)
    • L
    Incorrect Conversion between Numeric Types

    vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

    Affected versions of this package are vulnerable to Incorrect Conversion between Numeric Types in the ggml_dequantize, ggml_mul_mat_vec_a8, ggml_mul_mat_a8, and ggml_moe_a8 functions when tensor dimensions are truncated due to an integer overflow. An attacker can access residual GPU memory contents from previous inference requests by supplying a specially crafted model file with tensor dimensions whose product exceeds the maximum value of a 32-bit integer.

    Note: This is only exploitable if the deployment is multi-tenant and loads attacker-controlled GGUF model files.

    How to fix Incorrect Conversion between Numeric Types?

    A fix was pushed into the master branch but not yet published.

    [0.5.5,)
    • M
    Improper Resource Shutdown or Release

    vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

    Affected versions of this package are vulnerable to Improper Resource Shutdown or Release via the OpenAI-compatible Serving Path component. An attacker can cause the service to become unavailable by sending specially crafted requests remotely.

    How to fix Improper Resource Shutdown or Release?

    There is no fixed version for vllm.

    [0,)
    • M
    Use of Uninitialized Resource

    vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

    Affected versions of this package are vulnerable to Use of Uninitialized Resource via the has_mamba_layers function in the KV Block Handler. An attacker can cause unintended behavior by leaking data between sessions.

    How to fix Use of Uninitialized Resource?

    A fix was pushed into the master branch but not yet published.

    [0,)
    • M
    Server-side Request Forgery (SSRF)

    vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

    Affected versions of this package are vulnerable to Server-side Request Forgery (SSRF) via the download_bytes_from_url function. An attacker can cause the server to make arbitrary HTTP or HTTPS requests to internal or external resources by supplying a crafted file_url value in batch input JSON, potentially accessing sensitive internal services or causing denial of service.

    How to fix Server-side Request Forgery (SSRF)?

    A fix was pushed into the master branch but not yet published.

    [0.16.0,)
    • H
    Deserialization of Untrusted Data

    vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

    Affected versions of this package are vulnerable to Deserialization of Untrusted Data via the SUB ZeroMQ socket, where the deserialization is performed using the unsafe pickle library. An attacker on the same cluster can execute arbitrary code on the remote machine by sending maliciously crafted deserialized payloads.

    Note The V0 engine is off by default since v0.8.0, and the V1 engine is not affected. Due to the V0 engine's deprecated status and the invasive nature of a fix, the developers recommend ensuring a secure network environment if the V0 engine with multi-host tensor parallelism is still in use.

    How to fix Deserialization of Untrusted Data?

    There is no fixed version for vllm.

    [0.5.2,)
    • H
    Deserialization of Untrusted Data

    vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs

    Affected versions of this package are vulnerable to Deserialization of Untrusted Data in the MessageQueue.dequeue() API function. An attacker can execute arbitrary code by sending a malicious payload to the message queue.

    How to fix Deserialization of Untrusted Data?

    There is no fixed version for vllm.

    [0,)