Hardware-based Always-On Heap Memory Safety
- Hide Paper Summary
Link: https://ieeexplore.ieee.org/document/9251969
Year: MICRO 2020
Keyword: AOS; Allocator; malloc; Memory Safety
Highlight:
-
Memory bounds metadata can be embedded in the higher unused bits of a pointer. The metadata can then be used as an index into a table that stores the bounds information.
-
Memory accesses can be validated in the runtime by a special hardware component. This process can be overlapped with regular execution to avoid executing excessive instructions on the CPU.
Comments:
-
Are entries removed from the HBT? It seems that the HBT can only grow during execution, but never shrink. The reason is that bounds information must remain in the HBT even after the pointers are freed to detect use-after-free. I understand that malloc() tends to reuse pointers and give priority to recently freed pointers when giving out allocations for better locality, so this design choice may not be a major issue. However, it is also possible that certain pointers that have been allocated will never be reused, e.g., if the malloc library returns the page containing the pointer back to the OS. In this scenario, is there a way to delete entries that are known to be not part of the heap?
-
The size of HBT grows with the number of blocks within the heap, and so does the overhead of validating pointers since search operations on the HBT is not free. It would be nice to have a sensitivity analysis of the effect of the heap size w.r.t. the number of pointers. Also, I am also eager to see the result on HBT size during the course of execution.
This paper proposes Always-On Memory Safety (AOS), a software-hardware mechanism for ensuring memory access integrity by embedding bound information in unused bits of virtual address pointers. The design is motivated by the existing memory safety mechanism in ARM ISA and further extends the ISA to support integrity check of heap memory accesses. Compared with prior works, AOS minimizes the overhead of memory safety checking by performing them on hardware and by using relatively simpler addressing schemes for memory bound metadata.
The paper is motivated by the increasingly obvious trend that heap attacks have become the mainstream of memory safety attack due to the effectiveness of protection methods against stack-based attacks. In particular, the paper named three heap-based attacks that are rapidly gaining popularity, i.e., heap corruption, out-of-bounds read, and use-after-free. The paper also noted that prior hardware proposals for solving the problem are unsatisfactory for five reasons. First, many prior works require extending registers that hold pointer values with extra metadata bits, hence forming what is called “fat pointers”. The proposed design will, however, incur radical changes to the core pipeline and increase power consumption of the processor. Secondly, most prior works also require an explicit bound checks instruction before every memory operation, which unfortunately causes an non-negligible increase in the number of instruction executed and brings large instruction overhead even if the checks are themselves very fast. Similarly, when performing pointer arithmetics and assignments, these proposals also need explicit instructions to propagate the per-pointer metadata, which further exacerbate the design’s runtime overhead. The fourth reason is high memory overhead, which is especially true if the design allocates shadow memory for every possible memory location in the address space. Lastly, prior works often introduce complicated schemes to address runtime metadata that is essential for performing memory checks. The addressing scheme can potentially introduce a considerable number of extra operations which may slow down execution.
The design of AOS overcomes the above five challenges by embedding the metadata in higher unused bits of pointers and then directly use the bits as an index into a set-associative lookup table in the main memory. AOS is based on an existing feature provided by ARM ISA, namely, the Pointer Authentication (PA) instructions. In the current PA design, a pointer value is embedded with a Pointer Authentication Code (PAC) computed from the pointer value itself plus a context value (which is usually the stack pointer). The PAC, once computed, is stored in higher bits of the pointer, which is then validated by recomputing the PAC using the pointer value plus the context value at dereference time. If the two values match, validation succeeds, and the memory operation is performed normally. Otherwise, validation fails, and the processor raises an exception to notify the OS that an illegal memory access has been detected.
The design of AOS follows a similar approach which we describe as follows. First, in AOS, the unused higher bits of a pointer value is also used to store a PAC computed from the pointer value and a context value. The computed PAC is then treated as an index into an in-memory table, the Hashed Bounds Table (HBT), which stores the bounds of heap allocated blocks. The content of the HBT entry that corresponds to a newly allocated pointer is initialized by software using AOS instructions after malloc() returns. The same entry is also cleared when the pointer is freed by library function free(). At compilation time, the compiler detects invocations to malloc() and free() and inserts AOS instructions around these two operations to update the HBT. During the runtime, the hardware validates the heap memory access by first retrieving bounds metadata from the HBT using the PAC embedded into the pointer as an index and then comparing the requested address of the access against the memory bounds.
We next describe the implementation level details of each AOS component. The most important structure of AOS is the HBT, which is essentially a set-associative lookup table that stores bounds information of allocated blocks. The HBT can be updated using instructions bndstr and bndclr for setting and clearing the bounds information, respectively. Bounds information of a block is stored as a 8-byte entry containing the lower bits of the base virtual address (bit 4 to 32) and the 32-bit size. Note that only storing the lower bits of the base instead of the full address may result in false positives when validating a memory access due to aliasing on the higher bits not stored. The paper suggested, however, that this case is extremely unlikely because addresses that alias with each other must be 4GB away, making them hardly useful for conducting attacks. The HBT consists of a power-of-two number of sets, and the number of sets can be expanded dynamically in the runtime. When an emory access is to be validated, the HBT is addressed using the PAC of the pointer as the set index, and then the hardware walks every entry within the set to search for a matching base address (by comparing the pointer of the access and the partial base address stored in the entry). If a match is found and the access lies within the bound, then the access is successfully validated. Otherwise, the hardware raises a fault to the processor as in the current design. To make it easier for hardware to walk the table and to preserve locality, entries of the same set are stored in consecutive cache blocks. In addition, when a new bound entry is to be inserted into the HBT, but no empty slot can be found, the hardware also raises a fault to the OS kernel, such that the kernel can allocate a large HBT (by doubling the number of ways) and copying existing entries of the table to the new table. The paper suggests that the resizing process can be performed in parallel with regular execution by using a pointer that indicates the process of the copy (similar to the resizing process of a Cuckoo page table). When a pointer is freed, the entry that corresponds to the pointer is cleared using bndclr such that the base address remains unchanged, but the size field is set to zero. The entry can detect use-after-free attack if the same pointer is dereference, since the size field is zero, meaning that any access using the pointer will incur a protection fault to the OS.
The second critical component of AOS is the hardware component that performs memory access validation. When a pointer is returned from the malloc() function, the compiler will issue a pacma/pacmb (which are used for data and instruction, respectively) instruction, which, when executed, computes a PAC value and embed the value on higher bits of the virtual pointer. The PAC value is computed based on the pointer value itself, the allocation size, and a context value. The HBT is also updated using the PAC value and the bndstr instruction with the base address and the allocation size. Then, during regular execution, when a pointer whose higher bits contain a PAC value is used to access memory, the memory instruction is inserted into a dedicated hardware queue, called the Memory Check Queue (MCQ), whose responsibility is to store the information that are essential to validate the access. When a memory access is in the MCQ, a hardware state machine, called the Memory Check Unit (MCU), will then issue memory operations to the cache hierarchy to retrieve bounds information from the HBT. After bounds information is available, the MCU performs the check by comparing the access offset against the size of the allocation stored in the HBT. A fault is raised to the kernel if the validation fails. Memory operations are prevented from being retired in the ROB before their validations complete. The MCQ is also designed to obey the memory consistency order as well as to perform data forwarding if necessary.
In order to reduce the number of memory accesses for retrieving the bounds information, the paper also proposes adding a special cache that holds the most recently used bounds information. The cache is called bounds way buffer (BWB), and it maps the 32-bit partial address (i.e., the one stored in HBT entries) to the last accessed way number of the tag. Note that BWB lookups do not always compare the full 32 bits of the requested address, since the pointer to be validated may point to the middle of an allocated block. In this case, the lowest few bits of the pointer should be masked off to avoid spurious misses in the BWB. To serve this purpose, in the AOS design, the higher bits of a pointer also encodes the size class of the allocation, called the Address Hashing Code (AHC). A total of three size classes are supported, namely, 1–64 bytes, 65–256 bytes, and greater than 256 bytes, and hence two bits are sufficient for encoding size class information. When performing an BWB lookup, the hardware will mask off the lower bits based on the AHC bits, before the lookup is performed.