Talos Vulnerability Report


llama.cpp GGUF library header.n_tensors heap-based buffer overflow vulnerability

February 26, 2024
CVE Number



A heap-based buffer overflow vulnerability exists in the GGUF library header.n_tensors functionality of llama.cpp Commit 18c2e17. A specially crafted .gguf file can lead to code execution. An attacker can provide a malicious file to trigger this vulnerability.


The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

llama.cpp Commit 18c2e17


llama.cpp - https://github.com/ggerganov/llama.cpp


8.8 - CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H


CWE-190 - Integer Overflow or Wraparound


LLaMA.cpp is the cpp implementation for running the LLaMA. This project relies on the ggml library, that is a tensor library with several functionalities.

The LLaMA project and many other relies on the GGUF file format. GGUF is a popular file format for storing LLM model representations. In this library the function that parses the .gguf file is gguf_init_from_file:

struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_params params) {
    FILE * file = fopen(fname, "rb");
    if (!file) {
        return NULL;
    struct gguf_context * ctx = GGML_ALIGNED_MALLOC(sizeof(struct gguf_context));

    // read the header

        ctx->kv    = NULL;
        ctx->infos = NULL;
        ctx->data  = NULL;

        ok = ok && gguf_fread_el(file, &ctx->header.version,   sizeof(ctx->header.version),   &offset);
        ok = ok && gguf_fread_el(file, &ctx->header.n_tensors, sizeof(ctx->header.n_tensors), &offset);
        ok = ok && gguf_fread_el(file, &ctx->header.n_kv,      sizeof(ctx->header.n_kv),      &offset);


    // read the tensor infos
[1]     ctx->infos = malloc(ctx->header.n_tensors * sizeof(struct gguf_tensor_info));

        for (uint64_t i = 0; i < ctx->header.n_tensors; ++i) {
[2]         struct gguf_tensor_info * info = &ctx->infos[i];

            for (int j = 0; j < GGML_MAX_DIMS; ++j) {
                info->ne[j] = 1;

[3]         ok = ok && gguf_fread_str(file, &info->name,                          &offset);
            ok = ok && gguf_fread_el (file, &info->n_dims, sizeof(info->n_dims),  &offset);
            for (uint32_t j = 0; j < info->n_dims; ++j) {
                ok = ok && gguf_fread_el(file, &info->ne[j], sizeof(info->ne[j]), &offset);
            ok = ok && gguf_fread_el (file, &info->type,   sizeof(info->type),    &offset);
            ok = ok && gguf_fread_el (file, &info->offset, sizeof(info->offset),  &offset);

We will focus on the tensor parsing part. At [1] the spaces for the specified ctx->header.n_tensors, value parsed from the provided file, is used to allocate the correct number of gguf_tensor_info elements. Then it is fetched from the file, for each gguf_tensor_info element, its name, the number of elements in the info->ne array and other information. At [1], the ctx->header.n_tensors is an arbitrary uint64_t value, and the sizeof(struct gguf_tensor_info) is 88 bytes. The multiplication between the two can lead to an integer overflow, this would results in allocating less elements than required. Then at [2] the i-th element of the allocated array is fetched, and then used at [3], this could lead to a heap-based buffer overflow in the gguf_fread_str function writing the pointer to a string in info->name.

Crash Information

==3991806==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60700000d990 at pc 0x55c78bdf3a8c bp 0x7ffca35f3020 sp 0x7ffca35f3018
WRITE of size 8 at 0x60700000d990 thread T0
    #0 0x55c78bdf3a8b in gguf_init_from_file /home/vagrant/llama.cpp/ggml.c:18827
    #1 0x55c78be927e9 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, llama_model_kv_override const*) (/home/vagrant/llama.cpp/main+0x1b17e9)
    #2 0x55c78be3e592 in llama_model_load /home/vagrant/llama.cpp/llama.cpp:3792
    #3 0x55c78be5d355 in llama_load_model_from_file /home/vagrant/llama.cpp/llama.cpp:9291
    #4 0x55c78bf7c1b4 in llama_init_from_gpt_params(gpt_params&) common/common.cpp:1105
    #5 0x55c78bd288b1 in main examples/main/main.cpp:187
    #6 0x7f3d78c0fd09 in __libc_start_main ../csu/libc-start.c:308
    #7 0x55c78bd22f49 in _start (/home/vagrant/llama.cpp/main+0x41f49)

Address 0x60700000d990 is a wild pointer.
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/vagrant/llama.cpp/ggml.c:18827 in gguf_init_from_file
Shadow bytes around the buggy address:
  0x0c0e7fff9ae0: 00 00 00 00 00 00 00 fa fa fa fa fa 00 00 00 00
  0x0c0e7fff9af0: 00 00 00 00 00 fa fa fa fa fa 00 00 00 00 00 00
  0x0c0e7fff9b00: 00 00 00 fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c0e7fff9b10: 00 fa fa fa fa fa 00 00 00 00 00 00 00 00 03 fa
  0x0c0e7fff9b20: fa fa fa fa 00 00 00 00 00 00 00 00 00 fa fa fa
=>0x0c0e7fff9b30: fa fa[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0e7fff9b40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0e7fff9b50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0e7fff9b60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0e7fff9b70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0e7fff9b80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc

Databricks has independently reported this vulnerability concurrently with our own discovery.

We have not received a response from the vendor, however, we confirmed that this vulnerability has been fixed.


2024-01-29 - Initial Vendor Contact
2024-01-29 - Vendor Patch Release
2024-01-30 - Vendor Disclosure
2024-02-26 - Public Release


Discovered by Francesco Benvenuto of Cisco Talos.