Lucene search

K
ubuntucveUbuntu.comUB:CVE-2024-40918
HistoryJul 12, 2024 - 12:00 a.m.

CVE-2024-40918

2024-07-1200:00:00
ubuntu.com
ubuntu.com
5
linux kernel
pa-risc
segmentation faults
pa8800
pa8900
memory corruption
cache
tlb
smp
hardware lockups
pte_clear
data corruption

AI Score

7

Confidence

High

In the Linux kernel, the following vulnerability has been resolved:
parisc: Try to fix random segmentation faults in package builds
PA-RISC systems with PA8800 and PA8900 processors have had problems
with random segmentation faults for many years. Systems with earlier
processors are much more stable.
Systems with PA8800 and PA8900 processors have a large L2 cache which
needs per page flushing for decent performance when a large range is
flushed. The combined cache in these systems is also more sensitive to
non-equivalent aliases than the caches in earlier systems.
The majority of random segmentation faults that I have looked at
appear to be memory corruption in memory allocated using mmap and
malloc.
My first attempt at fixing the random faults didn’t work. On
reviewing the cache code, I realized that there were two issues
which the existing code didn’t handle correctly. Both relate
to cache move-in. Another issue is that the present bit in PTEs
is racy.

  1. PA-RISC caches have a mind of their own and they can speculatively
    load data and instructions for a page as long as there is a entry in
    the TLB for the page which allows move-in. TLBs are local to each
    CPU. Thus, the TLB entry for a page must be purged before flushing
    the page. This is particularly important on SMP systems.
    In some of the flush routines, the flush routine would be called
    and then the TLB entry would be purged. This was because the flush
    routine needed the TLB entry to do the flush.
  2. My initial approach to trying the fix the random faults was to
    try and use flush_cache_page_if_present for all flush operations.
    This actually made things worse and led to a couple of hardware
    lockups. It finally dawned on me that some lines weren’t being
    flushed because the pte check code was racy. This resulted in
    random inequivalent mappings to physical pages.
    The __flush_cache_page tmpalias flush sets up its own TLB entry
    and it doesn’t need the existing TLB entry. As long as we can find
    the pte pointer for the vm page, we can get the pfn and physical
    address of the page. We can also purge the TLB entry for the page
    before doing the flush. Further, __flush_cache_page uses a special
    TLB entry that inhibits cache move-in.
    When switching page mappings, we need to ensure that lines are
    removed from the cache. It is not sufficient to just flush the
    lines to memory as they may come back.
    This made it clear that we needed to implement all the required
    flush operations using tmpalias routines. This includes flushes
    for user and kernel pages.
    After modifying the code to use tmpalias flushes, it became clear
    that the random segmentation faults were not fully resolved. The
    frequency of faults was worse on systems with a 64 MB L2 (PA8900)
    and systems with more CPUs (rp4440).
    The warning that I added to flush_cache_page_if_present to detect
    pages that couldn’t be flushed triggered frequently on some systems.
    Helge and I looked at the pages that couldn’t be flushed and found
    that the PTE was either cleared or for a swap page. Ignoring pages
    that were swapped out seemed okay but pages with cleared PTEs seemed
    problematic.
    I looked at routines related to pte_clear and noticed ptep_clear_flush.
    The default implementation just flushes the TLB entry. However, it was
    obvious that on parisc we need to flush the cache page as well. If
    we don’t flush the cache page, stale lines will be left in the cache
    and cause random corruption. Once a PTE is cleared, there is no way
    to find the physical address associated with the PTE and flush the
    associated page at a later time.
    I implemented an updated change with a parisc specific version of
    ptep_clear_flush. It fixed the random data corruption on Helge’s rp4440
    and rp3440, as well as on my c8000.
    At this point, I realized that I could restore the code where we only
    flush in flush_cache_page_if_present if the page has been accessed.
    However, for this, we also need to flush the cache when the accessed
    bit is cleared in
    —truncated—

AI Score

7

Confidence

High