In a few clicks we can analyze your entire application and see what components are vulnerable in your application, and suggest you quick fixes.
Test your applicationsUpgrade vllm to version 0.20.0 or higher.
vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs
Affected versions of this package are vulnerable to Incorrect Type Conversion or Cast through the extract_hidden_states speculative decoding. An attacker can cause the server to crash and disrupt service availability by submitting a request containing penalty parameters such as repetition_penalty, frequency_penalty, or presence_penalty.
Note: This is only exploitable if the speculative decoding method is set to extract_hidden_states.
This vulnerability can be mitigated by avoiding the use of extract_hidden_states as the speculative decoding method or by filtering out penalty parameters from incoming requests at an API gateway.