Talos Vulnerability Report

TALOS-2016-0036

Matroska libebml EbmlUnicodeString Heap Information Leak

January 28, 2016

Report ID

CVE-2016-1514

Description

A specially crafted unicode string can cause an off-by-few read on the heap in unicode string parsing code in libebml. This issue can potentially be used for information leaks.

Tested Versions

libebml master branch

Product URLs

http://matroska.org

Details

An off-by-few read on heap occurs when parsing unicode strings in EbmlUnicodeString.cpp:UTFstring::UpdateFromUTF8. String is parsed in a for loop but in case of a four byte character, no check is made if the last bytes accessed fall outside the allocated buffer:

Technical information below:

Vulnerable code is located in EbmlUnicodeString.cpp:UTFstring::UpdateFromUTF8:

```
  for (j=0, i=0; i<UTF8string.length(); j++) {
    uint8 lead = static_cast<uint8>(UTF8string[i]);
    if (lead < 0x80) {
      _Data[j] = lead;
      i++;
    } else if ((lead >> 5) == 0x6) {
      _Data[j] = ((lead & 0x1F) << 6) + (UTF8string[i+1] & 0x3F);
      i += 2;
    } else if ((lead >> 4) == 0xe) {
      _Data[j] = ((lead & 0x0F) << 12) + ((UTF8string[i+1] & 0x3F) << 6) + (UTF8string[i+2] & 0x3F);
      i += 3;
    } else if ((lead >> 3) == 0x1e) {
       printf("i is now %d and the highest accessed byte is  %d\n",i,i+3 );
      _Data[j] = ((lead & 0x07) << 18) + ((UTF8string[i+1] & 0x3F) << 12) + ((UTF8string[i+2] & 0x3F) << 6) + (UTF8string[i+3] & 0x3F);
      i += 4;
    } else
      // Invalid char?
      break;
  }
```

If the last byte in the string being parsed satisfies the else if ((lead >> 3) == 0x1e) condition, for example 0xf2, 3 bytes past the end of the buffer will be read thereby causing a out of bounds read on the heap.

Credit

Discovered by Richard Johnson and Aleksandar Nikolic of Cisco Talos.