Talos Vulnerability Report

TALOS-2021-1343

Microsoft Azure Sphere Security Monitor SMSyscallStageBaseManifests offset calculation out-of-bounds read vulnerability

November 9, 2021
CVE Number

CVE-2021-41376

Summary

An out-of-bounds read vulnerability exists in the Security Monitor SMSyscallStageBaseManifests offset calculation of Microsoft Azure Sphere 21.01. A specially crafted manifest could lead to information disclosure. An attacker can use syscalls to trigger this vulnerability.

Tested Versions

Microsoft Azure Sphere 21.01

Product URLs

https://azure.microsoft.com/en-us/services/azure-sphere/

CVSSv3 Score

2.3 - CVSS:3.0/AV:L/AC:L/PR:H/UI:N/S:U/C:L/I:N/A:N

CWE

CWE-119 - Improper Restriction of Operations within the Bounds of a Memory Buffer

Details

Microsoft’s Azure Sphere is a platform for the development of internet-of-things applications. It features a custom SoC that consists of a set of cores that run both high-level and real-time applications, enforces security and manages encryption (among other functions). The high-level applications execute on a custom Linux-based OS, with several modifications to make it smaller and more secure, specifically for IoT applications.

Processes with AZURE_SPHERE_CAP_* are allowed to interact with Pluton and Security Monitor, but only via the syscalls that they are allowed to access. For instance, when a user holds the AZURE_SPHERE_CAP_UPDATE_IMAGE capability, they are allowed to use the following Secmon syscalls:

static azure_sphere_sm_syscall_permission_t azure_sphere_sm_syscall_required_capabilities[] = {
    {.number = SMSyscallInvalidateImage, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallOpenImageForStaging, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallWriteBlockToStageImage, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallCommitImageStaging, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallAbortImageStaging, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallInstallStagedImages, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetComponentCount, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetComponentSummary, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallStageComponentManifests, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetCountOfMissingImagesToDownload, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetMissingImagesToDownload, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallStageBaseManifests, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetCountOfMissingBaseImagesToDownload, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetMissingBaseImagesToDownload, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetSoftwareRollbackInfo, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},

For our present advisory we deal with the SMSyscallStageBaseManifests syscall, but it’s important to stress that an attacker would already needed to have elevated privileges or gained AZURE_SPHERE_CAP_UPDATE_IMAGE.

To start, let us examine the parameters that SMSyscallStageBaseManifests requires:

struct azure_sphere_syscall syscall = {};
syscall.number = SMSyscallStageBaseManifests;
syscall.flags = 0x454;
syscall.args[0] = offset;
syscall.args[1] = manifest_buffer;
syscall.args[2] = manifest_length;

The manifest_buffer must point to valid userspace memory, while manifest_length and offset are just integers without any specific restriction. When the kernel passes this call to Secmon, the only restriction that applies is that the manifest_buffer must be smaller than 0x1060 bytes, but that’s more a generic Security Monitor syscall aspect.

Let’s see how the SMSyscallStageBaseManifests function is implemented in Secmon:

uint32_t SMSyscallStageBaseManifests(int offset, char *buffer, uint manifest_len) {
    int some_global;
    uint32_t ret;

    some_global = get_global_image_struct();
    ret = FUN_803e2020(*(undefined4 **)(some_global + 0x14), (int *)(buffer + offset), // [0]
                     manifest_len);
    return ret;
}

This is simply calling the function FUN_803e2020, passing as parameters a global structure, a pointer to an offset inside the input buffer (which should point to the manifest), and the length of the manifest.
Note however, that the offset is added to the pointer buffer without any checks [0]. Normally, Secmon’s syscall handler makes sure that lengths after a pointer are checked to be smaller or equal to the buffer pointed by the pointer. In this case however, since the offset parameter comes before the buffer pointer, no such checks are performed beforehand. This means that it’s possible to pass any value as offset and pass an arbitrary pointer to the function FUN_803e2020 as second parameter. The last parameter, manifest_len, is restricted to about 0x1060 bytes because of Secmon syscalls size limits.

Since it’s possible to control the buffer position arbitrarily this leads to secmon reading out of bounds. This issue might be exploited to provide an information leak, using the function outputs as an oracle (return value, execution time, image entries added to the global structure).

To give an example of a possible exploitation path, let’s continue on with FUN_803e2020. This passes the manifest buffer to the function FUN_803e1d34, which takes care of parsing and validating the manifest. If this is successful, FUN_803e2020 fills a global structure that keeps track of which images are due to be flashed via other syscalls: for each entry in the manifest, a 0x24 bytes object is allocated, containing the component ID, image ID, and image type, as copied directly from the manifest buffer. An attacker that could successfully add arbitrary data to these objects, can later retrieve them via the syscall SMSyscallGetMissingBaseImagesToDownload, which will list all component IDs for the entries correctly staged by the base manifest.

The manifest however needs to pass some sanity checks in order to be parsed. Let’s look into FUN_803e1d34, function that parses and validates the manifest:

int FUN_803e1d34(int *manifest_buf,uint manifest_len,undefined *param_3,char *param_4,
                 manifest_entry_list *mentry,undefined4 someglobal) {
    // ... multiple manifests logic ... // [1]
    manifest_counter = 0;
    while( true ) {
        if (manifest_count == manifest_counter) { // [2]
            return 0;
        }
        if (manifest_len2 < 4) {
            return DAT_803e1ff8;
        }
        if (*(short *)manifest_buf == 0) {
            return DAT_803e1ff8;
        }
        manifest_parse_header(&manifest,(ushort *)manifest_buf); // [3]
        if ((char)manifest.ok == '\0') {
            return DAT_803e1ff8;
        }
        if (manifest_len2 < manifest.header_size) {
            return DAT_803e1ff8;
        }
        if (manifest.entry_size < 0x28) {
            return DAT_803e1ff8;
        }
        num_objs = (uint)*(ushort *)((int)manifest_buf + 2);
        if ((manifest_len2 - manifest.header_size) / manifest.entry_size < num_objs) {
            return DAT_803e1ff8;
        }
        uVar5 = num_objs * manifest.entry_size + manifest.header_size;
        if (uVar5 == 0) break;
        if (manifest_len2 < uVar5) {
            return DAT_803e1ff8;
        }
        uVar7 = 0x803e1e85;
        manifest_parse_header(&manifest,(ushort *)manifest_buf); // [4]
        if ((char)manifest.ok == '\0') {
            panic(DAT_803e200c,DAT_803e2008,0);
        }
        ...

At the beginning [1] we have the logic that deals with multiple manifests. Suffice it to say that it sets manifest_count and manifest_counter variables accordingly, but let’s assume that we’re dealing with only one manifest.
At [2] starts a series of sanity checks to make sure that the manifest fields are correct. In summary the checks are the following:

- nobj <= (len - headersize) / entrysize
- len >= headersize
- entrysize >= 40
- nobj * entrysize + headersize != 0
- nobj * entrysize + headersize <= len
- len >= 4

It’s trivial to note that even if an attacker controls the manifest pointer, pointing the buffer to random memory will probably make these checks to fail. There is however a way to bypass these checks.
Note the calls to function manifest_parse_header at [3] and at [4]. This function is used to check the header of the manifest (the first bytes). It is however not clear to us why this function is called twice. We can see that the checks are all performed before the second call to manifest_parse_header. After this last call, the only check performed is that the field ok is not 0, otherwise the device will panic and reboot. To understand how this field changes, let’s have a look at manifest_parse_header:

manifest * manifest_parse_header(manifest *dst,ushort *manifest_hdr) {
    uint uVar1;
    ushort version;
    uint uVar2;
    uint *puVar3;
    ushort entry_size;
    
    version = *manifest_hdr;
    if (version == 0) {
        dst->header_size = 0;
        dst->entry_size = 0;
    }
    else {
        puVar3 = (uint *)PTR_DWORD_803d6c60;
        if (version == 2) {
LAB_803d6c20:
            uVar1 = puVar3[1];
            uVar2 = puVar3[2];
            dst->header_size = *puVar3;
            dst->entry_size = uVar1;
            dst->ok = uVar2;
            return dst;
        }
        if (version == 3) {                                                   // [5]
            dst->header_size = (uint)manifest_hdr[2];
            dst->entry_size = (uint)manifest_hdr[3];
            version = (ushort)(*(int *)(manifest_hdr + 2) == DAT_803d6c64);   // [6] DAT_803d6c64 points to 10004c00
        }
        else {
            puVar3 = (uint *)PTR_DWORD_803d6c5c;
            if (version == 1) goto LAB_803d6c20;
            version = manifest_hdr[2];                                        // [7]
            entry_size = manifest_hdr[3];
            dst->header_size = (uint)version;
            dst->entry_size = (uint)entry_size;
            if (version < 0x10) {                                             // [8]
                version = 0;
            }
            else {
                version = (ushort)(0x4b < entry_size);                        // [9]
            }
        }
    }
    *(char *)&dst->ok = (char)version;
    return dst;
}

There are multiple allowed versions of a manifest header. The one we’ve seen used in the “base_manifest” binaries is version 3 [5].
The header of a base manifest is very simple:

uint16 version     // manifest version
uint16 num_entries // number of entries in the base manifest
uint16 header_size // size of header
uint16 entry_size  // size of entries
uint64 date        // date, ignore this

We can see that depending on the version, there are different constraints applied to header_size and entry_size, and there are no checks on num_entries. Version 0 is invalid, version 1 and 2 have hardcoded values for header_size and entry_size. Version 3 gets the sizes from the manifest header, but then checks that they match “10004c00” [6], that is header_size should be 0x10 and entry_size should be 0x4c, otherwise the field ok will get the value 0.
An additional path [7] allows to define arbitrarily big header_size and entry_size: it’s enough to use any version larger than 3. We can see at [7] that the only constraint for this is to have a header_size >= 0x10 [8] and entry_size > 0x4b [9]. So if we respect these constraints, we can get the field ok to have the value 1:

- ver > 3
- headersize >= 0x10
- entrysize > 0x4b

For example, we can have a manifest with version=3, header_size=0x500, entry_size=0x50, num_obj=1. This will pass the checks in manifest_parse_header, however it will not pass the checks in the parent function. This example will already fail the check manifest_len2 < manifest.header_size.
However, since there are two calls to manifest_parse_header, we can let the manifest contain a valid header initially, and change its contents during the syscall execution, right before the second call to manifest_parse_header. This would allow to use arbitrarily big header_size and entry_size fields.
Normally when a secmon syscall is executed from the normal world (Linux Kernel), this is done via the smc instruction. This is a synchronous call into secmon’s SMC handler, thus from the Linux world it won’t be possible to race a secmon syscall. However, Azure Sphere has two M4 cores, that run their code in parallel of secmon syscalls.
In summary, to change the manifest header in the middle of the two manifest_parse_header calls, it’s enough to:

  1. find the manifest input buffer, as stored by secmon’s syscall handler:
    use SMSyscallStageBaseManifests with incrementing (or decrementing) offsets, until the device crashes.
    The input buffer is stored after secmon’s .text and we know that the first unallocated memory bounds are below 0x80000000 and above 0x80400000. When the device crashes, we can calculate exactly where the manifest is stored.
  2. stage manifest in dma:
    allocate a dma buffer, and use an offset such that the manifest will land in the allocated dma buffer (offset is calculated using the address found with the step above).
  3. load a real-time app and share the dma buffer that contains the manifest with the Linux side.
  4. exploit the TOCTTOU: at the same time, execute a SMSyscallStageBaseManifests call from the Linux side, and instruct the real-time app to continuously write header_size and/or entry_size in the manifest header (switching between valid/invalid values).

This will allow to continue execution in the FUN_803e1d34 function and pass all the sanity checks against the header, landing us at [10].

        ...
        uVar7 = 0x803e1e85;
        manifest_parse_header(&manifest,(ushort *)manifest_buf); // [4]
        if ((char)manifest.ok == '\0') {
            panic(DAT_803e200c,DAT_803e2008,0);
        }
        // [10]
        ...

To prove this theory, we implemented the steps described above, however using an invalid header_size (smaller than 0x10), and we managed to read the following in telemetry:

00000000  00 00 2f 00 00 00 00 00  00 00 00 00 ff ff ff ff  |../.............|
00000010  ff ff ff ff 40 01 08 00  00 00 23 00 01 00 00 00  |....@.....#.....|
00000020  1c 74 5a 71 dc 72 0c 57  e3 00 00 00 00 3e 18 eb  |.tZq.r.W.....>..|
00000030  0e 54 74 30 43 bc 20 90  83 cd de a4 b4 c0 03 01  |.Tt0C. .........|
00000040  00 00 00                                          |...             |

This indicates a Secmon panic, and the error code 0xdc715a74:0xe3570c72 indicates that the panic happened at 0x803e1e90, that is the panic call after [4]. This means we originally had a valid header_size that was later successfully changed into an invalid one via the real-time app.

        ...
        uVar7 = 0x803e1e85;
        manifest_parse_header(&manifest,(ushort *)manifest_buf); // [4]
        if ((char)manifest.ok == '\0') {
            panic(DAT_803e200c,DAT_803e2008,0);
        }
        // [10]
        manifest._0_8_ = VectorShiftRight(CONCAT44(uVar7,uVar7),0x20);
        manifest.ok = (uint)PTR_FUN_803d37f0+1_803e2010;
        CreateImageMetadataParser(&aStack144,&manifest,manifest_len2); // [11]
        local_fc = 0;
        uVar5 = FindMetadataSection(&aStack144,0x4449,&meta_sect,0x24,&local_fc); // [12]
        if (((uVar5 & 0xfffff) == 0) && ((*param_4 == '\0' || (meta_sect.img_type != 0x19)))) { // [13]
            dummy = 0;
            if (meta_sect.img_type - 0x17 < 2) { // [14]
                local_f8 = 0;
                local_f0 = 0;
                local_e8 = 0;
                local_e0 = 0;
                manifest._0_8_ = 0;
                manifest.ok = 0;
                VectorShiftRight(CONCAT44(&local_e8,&local_e8),0x30);
                ptr_entry = validate_image(someglobal,&manifest,2,0,0,&dummy); // [15]
                if (ptr_entry == 0) {
                    print(PTR_s__Invalid_Manifest_Signature_803e201c);
                    return DAT_803e1ff8;
                }
            }
            ptr_entry = manifest.header_size + (int)manifest_buf;
            ptr_end = (manifest.entry_size & 0xffff) * num_objs + ptr_entry;
            while (ptr_end != ptr_entry) { // [16]
                // ... fill linked list
            }
        }
        manifest_counter = manifest_counter + 1;
        manifest_buf = (int *)((int)manifest_buf + manifest_len2);
        if ((int)manifest_counter < (int)manifest_count) {
            manifest_len2 = (uint)*(ushort *)((int)local_d8 + manifest_counter * 2 + 0xe);
        }
    }
    return DAT_803e1ff8;
}

After we’re at [10], the CreateImageMetadataParser [11] is used to initialize the parsing for the image metadata: this is the usual metadata found in Azure Sphere images, beginning with the “4X4M” magic. At [12] the section 0x4449 is searched for: this is the “Identity” section, where an image stores its image type, component and image IDs. At [13] we can see a check against the image type: if it is 0x19, the whole manifest is skipped, and only if it is smaller than 0x19 [14] is the manifest image verified by Pluton at [15]. The logic then continues at [16] by filling a linked list containing the component ID of the images found in the manifest, that are going to be passed to the parent function as result of the parsing.

As it was demonstrated in TALOS-2021-1342, it is possible to bypass the signature check at [15] by supplying a base manifest with image type larger than 0x19. This is needed to continue the exploitation along this path, otherwise Pluton will check the manifest signature and will fail with the following error because it performs some checks against the huge manifest offset that we supplied to stage the manifest in dma:

[PLUTON] !! ERROR: Image out of range for memory type: 2 (o: 0xffcb0000, l: 0x164)
[PLUTON] !! ERROR: ValidateImage failed!

After bypassing the signature check, the parsing continues at [16], staging whatever is found within ptr_entry and ptr_end, whose values are partially attacker-controlled by header_size and entry_size (the control is partial because the values are 16 bits, however a large num_obj will make num_objs * entry_size rather large).

We tested the TOCTTOU by changing the entry_size to a large value, using the signature check bypass issue, and read the following telemetry:

00000000  00 00 69 00 00 00 00 00  00 00 00 00 ff ff ff ff  |..i.............|
00000010  ff ff ff ff a3 03 01 00  00 00 5c dc 21 3e 80 d8  |............!>..|
00000020  2c 40 80 01 f3 01 00 20  e4 21 3e 80 00 00 00 0c  |,@..... .!>.....|
00000030  88 fc 32 1f f3 1b 47 a4  b5 91 58 5b 66 b3 7e 3e  |..2...G...X[f.~>|
00000040  18 eb 0e 54 74 30 43 bc  20 90 83 c5 32 3d 80 00  |...Tt0C. ...2=..|
00000050  00 00 00 80 0e 3f 80 c8  0e 3f 80 e0 0e 3f 80 00  |.....?...?...?..|
00000060  0c 3f 80 68 00 00 00 3e  18 eb 0e 54 74 30 43 bc  |.?.h...>...Tt0C.|
00000070  20 90 83 cd de a4 b4 c0  03 01 00 00 00           | ............   |

This indicates a Secmon crash at address 0x803e21dc (unhandled crash), which is located in the parent function, meaning that we successfully returned from the manifest parsing and crashed the parent function while reading one of the attacker-controlled pointers.

When this function returns, the global structure will contain pointers to attacker-controlled memory. As we already mentioned, the parent function FUN_803e2020 will then fill a global structure that keeps track of which images are due to be flashed via other syscalls: for each entry in the manifest, a 0x24 bytes object is allocated, containing the component ID, image ID, image type and partition type, as copied directly from the manifest buffer. An attacker that could successfully add arbitrary data to these objects, can later retrieve them via the syscall SMSyscallGetMissingBaseImagesToDownload, which will list all component IDs for the entries correctly staged by the base manifest.

It’s important to note that due to time constraints, we haven’t managed to implement the full attack described herein. After returning from FUN_803e2020 there are further checks to bypass in the function, so it’s likely that this exploitation path will have some constraints. For example, we noticed that via the base manifest it’s only possible to stage entries with image type 1, 2 or 4. So, the leaked data might be limited to having a 1, 2 or 4 (uint16) at a specific offset (but the leaked data can contain any value, since the component ID that is leaked has no constraints).
Given the complexity of the code, we point out that there might be multiple exploitation avenues so it is likely that, given enough time, an attacker could successfully exploit this issue to perform an information leak. It might even be possible to turn this issue into arbitrary code execution, but given how the data is handled in the functions used by this syscall, we think it’s unlikely.

Timeline

2021-07-19 - Vendor Disclosure
2021-11-09 - Vendor Patch
2021-11-09 - Public Release

Credit

Discovered by Claudio Bozzato and Lilith >_> of Cisco Talos.