Loop with Unreachable Exit Condition ('Infinite Loop') Affecting kernel-rt-debug-kvm package, versions *


Severity

Recommended
0.0
medium
0
10

Based on Red Hat Enterprise Linux security rating.

Threat Intelligence

EPSS
0.03% (8th percentile)

Do your applications use this vulnerable package?

In a few clicks we can analyze your entire application and see what components are vulnerable in your application, and suggest you quick fixes.

Test your applications
  • Snyk IDSNYK-RHEL7-KERNELRTDEBUGKVM-8641731
  • published16 Jan 2025
  • disclosed15 Jan 2025

Introduced: 15 Jan 2025

CVE-2024-57884  (opens in a new tab)
CWE-835  (opens in a new tab)

How to fix?

There is no fixed version for RHEL:7 kernel-rt-debug-kvm.

NVD Description

Note: Versions mentioned in the description apply only to the upstream kernel-rt-debug-kvm package and not the kernel-rt-debug-kvm package as distributed by RHEL. See How to fix? for RHEL:7 relevant fixed versions and status.

In the Linux kernel, the following vulnerability has been resolved:

mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()

The task sometimes continues looping in throttle_direct_reclaim() because allow_direct_reclaim(pgdat) keeps returning false.

#0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c #2 [ffff80002cb6f990] schedule at ffff800008abc50c #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550 #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68 #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660 #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98 #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8 #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974 #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4

At this point, the pgdat contains the following two zones:

    NODE: 4  ZONE: 0  ADDR: ffff00817fffe540  NAME: "DMA32"
      SIZE: 20480  MIN/LOW/HIGH: 11/28/45
      VM_STAT:
            NR_FREE_PAGES: 359
    NR_ZONE_INACTIVE_ANON: 18813
      NR_ZONE_ACTIVE_ANON: 0
    NR_ZONE_INACTIVE_FILE: 50
      NR_ZONE_ACTIVE_FILE: 0
      NR_ZONE_UNEVICTABLE: 0
    NR_ZONE_WRITE_PENDING: 0
                 NR_MLOCK: 0
                NR_BOUNCE: 0
               NR_ZSPAGES: 0
        NR_FREE_CMA_PAGES: 0

NODE: 4  ZONE: 1  ADDR: ffff00817fffec00  NAME: "Normal"
  SIZE: 8454144  PRESENT: 98304  MIN/LOW/HIGH: 68/166/264
  VM_STAT:
        NR_FREE_PAGES: 146
NR_ZONE_INACTIVE_ANON: 94668
  NR_ZONE_ACTIVE_ANON: 3
NR_ZONE_INACTIVE_FILE: 735
  NR_ZONE_ACTIVE_FILE: 78
  NR_ZONE_UNEVICTABLE: 0
NR_ZONE_WRITE_PENDING: 0
             NR_MLOCK: 0
            NR_BOUNCE: 0
           NR_ZSPAGES: 0
    NR_FREE_CMA_PAGES: 0

In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of inactive/active file-backed pages calculated in zone_reclaimable_pages() based on the result of zone_page_state_snapshot() is zero.

Additionally, since this system lacks swap, the calculation of inactive/ active anonymous pages is skipped.

    crash> p nr_swap_pages
    nr_swap_pages = $1937 = {
      counter = 0
    }

As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having free pages significantly exceeding the high watermark.

The problem is that the pgdat->kswapd_failures hasn't been incremented.

    crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures
    $1935 = 0x0

This is because the node deemed balanced. The node balancing logic in balance_pgdat() evaluates all zones collectively. If one or more zones (e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the entire node is deemed balanced. This causes balance_pgdat() to exit early before incrementing the kswapd_failures, as it considers the overall memory state acceptable, even though some zones (like ZONE_NORMAL) remain under significant pressure.

The patch ensures that zone_reclaimable_pages() includes free pages (NR_FREE_PAGES) in its calculation when no other reclaimable pages are available (e.g., file-backed or anonymous pages). This change prevents zones like ZONE_DMA32, which have sufficient free pages, from being mistakenly deemed unreclaimable. By doing so, the patch ensures proper node balancing, avoids masking pressure on other zones like ZONE_NORMAL, and prevents infinite loops in throttle_direct_reclaim() caused by allow_direct_reclaim(pgdat) repeatedly returning false.

The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused by a node being incorrectly deemed balanced despite pressure in certain zones, such as ZONE_NORMAL. This issue arises from zone_reclaimable_pages ---truncated---

CVSS Base Scores

version 3.1