Talos Vulnerability Report

TALOS-2021-1341

Microsoft Azure Sphere Security Monitor SMSyscallCommitImageStaging stage-without-manifest denial of service vulnerability

November 9, 2021

Summary

A denial of service vulnerability exists in the Security Monitor SMSyscallCommitImageStaging stage-without-manifest functionality of Microsoft Azure Sphere 21.01. A specially crafted image package can lead to boot looping, requiring manual recovery. An attacker can flash a malicious image to trigger this vulnerability.
This vulnerability is theoretical and was discovered in development mode but has not been confirmed in pre-production or production environments by either Talos or Microsoft.

Tested Versions

Microsoft Azure Sphere 21.01

Product URLs

https://azure.microsoft.com/en-us/services/azure-sphere/

CVSSv3 Score

6.0 - CVSS:3.0/AV:L/AC:L/PR:H/UI:N/S:C/C:N/I:N/A:H

CWE

CWE-306 - Missing Authentication for Critical Function

Details

Important note:
We are releasing this advisory to highlight this potential issue, however Microsoft has not been able to reproduce this in Azure Sphere’s pre-production or production environments and Talos is unable to test outside of the development environment, given the restrictions that are placed on us by the device.
For context, when developing an application for Azure Sphere, we’re supposed to do so in “development mode”: this is a mode that allows to flash applications via USB and makes the device behave in a slightly different manner under some circumstances (e.g. it allows the usage of ptrace for debugging). On the other hand, when an application is ready to be deployed for production, the device is moved to the “DeviceComplete” state and it will then be able to fetch applications (and updates) only via the cloud, disabling USB communications altogether. When a device is in such production mode, it will consistently keep both applications and Azure Sphere firmwares up-to-date. In order for us to test a security issue in this state, we’d need to possess a working exploitation chain against the latest version that gives us (in this case) to have kernel privileges or AZURE_SPHERE_CAP_UPDATE_IMAGE capabilities. As of July 2021 the current version is 21.06 and we are currently only able to test this issue against version 21.01. For this reason we can claim that this issue impacts the device while in development mode, but we can’t make any claim about its impact in pre-production or production mode.
As such, we need to make clear that this vulnerability is theoretical and has not been confirmed in pre-production or production environments by either Talos or Microsoft.

Microsoft’s Azure Sphere is a platform for the development of internet-of-things applications. It features a custom SoC that consists of a set of cores that run both high-level and real-time applications, enforces security and manages encryption (among other functions). The high-level applications execute on a custom Linux-based OS, with several modifications to make it smaller and more secure, specifically for IoT applications.

Processes with AZURE_SPHERE_CAP_* are allowed to interact with Pluton and Security Monitor, but only via the syscalls that they are allowed to access. For instance, when a user holds the AZURE_SPHERE_CAP_UPDATE_IMAGE capability, they are allowed to use the following Secmon syscalls:

static azure_sphere_sm_syscall_permission_t azure_sphere_sm_syscall_required_capabilities[] = {
    {.number = SMSyscallInvalidateImage, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallOpenImageForStaging, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallWriteBlockToStageImage, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallCommitImageStaging, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallAbortImageStaging, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallInstallStagedImages, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetComponentCount, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetComponentSummary, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallStageComponentManifests, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetCountOfMissingImagesToDownload, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetMissingImagesToDownload, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallStageBaseManifests, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetCountOfMissingBaseImagesToDownload, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetMissingBaseImagesToDownload, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},
    {.number = SMSyscallGetSoftwareRollbackInfo, .caps = AZURE_SPHERE_CAP_UPDATE_IMAGE, .linux_caps = 0},

For our present advisory we deal with a subset of the above syscalls, but it’s important to stress that an attacker would already needed to have elevated privileges or gained AZURE_SPHERE_CAP_UPDATE_IMAGE.

In order to stage non-firmware related applications, it’s normally enough to use a sequence of SMSyscallOpenImageForStaging, SMSyscallWriteBlockToStageImage, SMSyscallCommitImageStaging and SMSyscallInstallStagedImages calls.
When installing firmware binaries, however, we are required to stage a “base manifest” or a “component manifest”.

Analyzing a legitimate base manifest, we can notice that it is used to stage entries for either the trusted keystore, or the update-cert-store:

00000000  03 00 02 00 10 00 4c 00  c1 2f 59 5f 00 00 00 00  |......L../Y_....|
00000010  03 45 85 92 a4 e1 5a 42  b9 a8 1f 99 0b 6f 03 bc  |.E....ZB.....o..|  // Trusted Keystore
00000020  4e 99 be 11 a9 30 c2 48  ba 89 88 b0 d2 98 7d 70  |N....0.H......}p|
00000030  13 00 01 00 48 09 00 00  48 09 00 00 00 00 00 00  |....H...H.......|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 76 18 92 92  |............v...|  // Update cert store
00000060  4c 9e e1 47 85 23 ff df  e0 e5 1c 41 cc 55 34 12  |L..G.#.....A.U4.|
00000070  2b fa 5d 4e 95 fe b7 fd  a7 ef 6e 2d 16 00 01 00  |+.]N......n-....|
00000080  00 60 00 00 00 60 00 00  00 00 00 00 00 00 00 00  |.`...`..........|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000a0  00 00 00 00 00 00 00 00  34 58 34 4d 03 00 00 00  |........4X4M....|
...

Analyzing a legitimate component manifest (“recovery.imagemanifest”), we can notice that it lists all other firmware related images:

00000000  03 00 10 00 10 00 4c 00  17 3d 97 5f 00 00 00 00  |......L..=._....|
00000010  18 d3 fd a7 7f 1c 0f 4a  81 a9 fa 9e 3f ea 55 73  |.......J....?.Us| // Pluton
00000020  d0 62 bf 16 7e f4 e6 11  83 9c 00 15 5d 9f 1e 00  |.b..~.......]...|
00000030  02 00 01 00 94 6a 00 00  94 6a 00 00 01 00 00 00  |.....j...j......|
00000040  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 5c d2 27 35  |............\.'5| // Security Monitor
00000060  bb 35 93 47 a6 0b 6a 00  dd 43 56 0f 0c 88 fc 32  |.5.G..j..CV....2|
00000070  1f f3 1b 47 a4 b5 91 58  5b 66 b3 7e 04 00 01 00  |...G...X[f.~....|
00000080  ac 94 01 00 ac 94 01 00  01 00 00 00 01 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000a0  00 00 00 00 00 00 00 00  fb 6d fc ee 83 33 67 4d  |.........m...3gM| // Device Capability
000000b0  9a 51 41 9e 37 bb c7 ff  d8 b5 28 41 c2 ab 55 41  |.QA.7.....(A..UA|
000000c0  9a 33 8a 1f 31 ed 67 ec  0d 00 01 00 88 01 00 00  |.3..1.g.........|
...

Because of the existence of these base manifests and their contents, we think the intention is to not allow installation of any firmware image without staging the relative (Microsoft-signed) manifest beforehand.

For example, while trying to install “rng-tools” for version 20.01, we’ll be stopped at the SMSyscallCommitImageStaging stage, with a return value of 0x700025.

Moreover, we’ve found that it’s not possible to downgrade the whole device’s firmware, even by staging an older recovery manifest: we’re only allowed to re-stage the recovery manifest for the currently running version and re-flash the same version of the running firmware.

We have found, however, that it is possible to install the “Trusted Keystore” image without any manifest and without any version restriction. Installing a different version results in Pluton using different keys for checking the firmware images. While running version 21.01, we noticed that if we flash a 1-year-old version of the “Trusted Keystore” (we tested version 20.01) Pluton will not be able to verify firmware images after reboot and will result in bricking the device, requiring manual recovery (denial-of-service).
In this case, the device will boot-loop and output the following messages via UART:

[1BL] !! ERROR: Could not load key error 0x13
[1BL] !! ERROR: cap image signature validation failed
!!! PANIC: C: 1818399058 L: 3301812344 A: 0

Timeline

2021-07-19 - Vendor Disclosure
2021-11-09 - Public Release

Credit

Discovered by Claudio Bozzato and Lilith >_> of Cisco Talos.

This vulnerability has not been disclosed and cannot be viewed at this time.