Expected Behavior Violation in llama-index | CVE-2025-6211

Q: How to fix?

Upgrade llama-index to version 0.12.41 or higher.

Threat Intelligence

Proof of Concept

0.05% (14^th percentile)

Do your applications use this vulnerable package?

In a few clicks we can analyze your entire application and see what components are vulnerable in your application, and suggest you quick fixes.

Test your applications

Snyk IDSNYK-PYTHON-LLAMAINDEX-10691925
published11 Jul 2025
disclosed10 Jul 2025
creditSirius

Report a new vulnerability Found a mistake?

Introduced: 10 Jul 2025

NewCVE-2025-6211 (opens in a new tab) CWE-440 (opens in a new tab)

How to fix?

Upgrade llama-index to version 0.12.41 or higher.

Overview

llama-index is an Interface between LLMs and your data

Affected versions of this package are vulnerable to Expected Behavior Violation via the DocugamiReader class. An attacker can cause loss of important document content, disrupt parent-child chunk hierarchies, and lead to inaccurate AI outputs by exploiting hash collisions that result in overwriting structurally distinct document chunks containing identical text.

PoC

from hashlib import md5
from dataclasses import dataclass
from typing import Optional, Dict, List

@dataclass
class FakeDocument:
    text: str
    metadata: Dict[str, str]
    excluded_llm_metadata_keys: Optional[List[str]] = None

XPATH_KEY = "xpath"
ID_KEY = "id"
DOCUMENT_NAME_KEY = "name"
STRUCTURE_KEY = "structure"
TAG_KEY = "tag"

@dataclass
class FakeChunk:
    text: str
    xpath: str
    structure: str
    tag: str
    parent: Optional['FakeChunk'] = None

document_name = "test_doc"
additional_doc_metadata = None
max_text_length = 4096
include_project_metadata_in_doc_metadata = True
parent_id_key = "doc_id"

def _build_framework_chunk(dg_chunk: FakeChunk) -> FakeDocument:
    _hashed_id = md5(dg_chunk.text.encode()).hexdigest()  
    metadata = {
        XPATH_KEY: dg_chunk.xpath,
        ID_KEY: _hashed_id,
        DOCUMENT_NAME_KEY: document_name,
        STRUCTURE_KEY: dg_chunk.structure,
        TAG_KEY: dg_chunk.tag,
    }

    if additional_doc_metadata and include_project_metadata_in_doc_metadata:
        metadata.update(additional_doc_metadata)

    return FakeDocument(
        text=dg_chunk.text[:max_text_length],
        metadata=metadata,
        excluded_llm_metadata_keys=[XPATH_KEY, ID_KEY, STRUCTURE_KEY],
    )

parent_chunk = FakeChunk(
    text="Parent clause.",
    xpath="/Document/Section/Parent",
    structure="section",
    tag="section"
)

dg_chunks = [
    FakeChunk(
        text="This agreement may be terminated at any time.",
        xpath="/Document/Section/Clause",
        structure="clause",
        tag="p",
        parent=parent_chunk,
    ),
    FakeChunk(
        text="This agreement may be terminated at any time.",
        xpath="/Document/Appendix/Note",
        structure="note",
        tag="p",
        parent=parent_chunk,
    ),
]

framework_chunks: Dict[str, FakeDocument] = {}
for dg_chunk in dg_chunks:
    framework_chunk = _build_framework_chunk(dg_chunk)
    chunk_id = framework_chunk.metadata.get(ID_KEY)
    if chunk_id:
        framework_chunks[chunk_id] = framework_chunk  
        if dg_chunk.parent:
            framework_parent_chunk = _build_framework_chunk(dg_chunk.parent)
            parent_id = framework_parent_chunk.metadata.get(ID_KEY)
            if parent_id and framework_parent_chunk.text:
                framework_chunk.metadata[parent_id_key] = parent_id
                framework_chunks[parent_id] = framework_parent_chunk

import pandas as pd

df = pd.DataFrame([
    {
        "id": doc.metadata[ID_KEY],
        "text": doc.text,
        "xpath": doc.metadata[XPATH_KEY],
        "structure": doc.metadata[STRUCTURE_KEY],
        "tag": doc.metadata[TAG_KEY],
        "is_parent": doc.text == "Parent clause.",
    }
    for doc in framework_chunks.values()
])

print(df.to_string(index=False))

References

GitHub Commit

CVSS Base Scores

version 4.0

version 3.1

Attack Vector (AV)
Network
Attack Complexity (AC)
Low
Attack Requirements (AT)
None
Privileges Required (PR)
None
User Interaction (UI)
None

Confidentiality (VC)
None
Integrity (VI)
Low
Availability (VA)
Low

Confidentiality (SC)
None
Integrity (SI)
None
Availability (SA)
None

Expected Behavior Violation Affecting llama-index package, versions [,0.12.41)

Severity