Talos Vulnerability Report

TALOS-2019-0792

Antenna House Rainbow PDF Office server document converter TxMasterStyleAtom parsing code execution vulnerability

May 14, 2019
CVE Number

CVE-2019-5030

Summary

A buffer overflow vulnerability exists in the PowerPoint document conversion function of Rainbow PDF Office Server Document Converter V7.0 Pro MR1 (7,0,2019,0220). While parsing a document text info container, the TxMasterStyleAtom::parse function is incorrectly checking the bounds corresponding to the number of style levels, causing a vtable pointer to be overwritten, which leads to code execution.

Tested Versions

Antenna House Rainbow PDF Office Server Document Converter v7.0 Pro MR1 for Linux64 (7,0,2019,0220)

Product URLs

https://www.rainbowpdf.com/trial-server-solutions/

CVSSv3 Score

8.8 - CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE

CWE-122: Heap-based Buffer Overflow

Details

Rainbow PDF is a software solution, developed by Antenna House, that converts Microsoft office 97-2016 documents into a PDF.

Office document structures are sometimes complex and they contain strict restraints. Not enforcing such constraints may lead to several sides effects while parsing.

The Microsoft documentation MS-PPT Powerpoint binary file format is explaining some of the structures that are needed to understand the issue described in this advisory. In particular, see below the format for DocumentTextInfoContainer, RecordHeader and TextMasterStyleAtom structures. The RecordHeader is a generic structure, present at the beginning of each container and atom record. A container is a record that defines the structure and hierarchy of atom records and other container records. An atom record contains presentation data. Analogous to a file system, atom records are similar to files that contain data and container records are similar to directories that provide structure and hierarchy for atom records. The DocumentTextInfoContainer record specifies the default text styles for the document and the TextMasterStyleAtom specifies the character-level and paragraph-level formatting of a main master slide.

The RecordHeader is described as follow, a fixed length of eight bytes:

+ recVer (4bits): An unsigned integer specifies the version of the record data that follow the record header. A value of 0xF specifies the record is a container record.
+ recInstance (12bits): An unsigned integer that specifies the record instance data. Interpretation of the value is dependent on the particular record type.
+ recType (2 bytes): A `RecordType` enumeration that specifies the type of the record data that follows the record header.
+ recLen (4 bytes): An unsigned integer that specifies the length, in bytes, of the record data that follows the record header.

The DocumentTextInfoContainer is described as:

+ rh (8 bytes): A `RecordHeader` structure with rh.recType set to the value RT_Environnement (0x3F2).
+ [...] several optional variable length atoms
+ testSIDefaultsAtom (variable): A `TextSIExceptionAtom` record.
+ textMasterStyleAtom (variable): A `TextMasterStyleAtom` record. 

The TextMasterStyleAtom is described as follow:

+ rh (8 bytes): A `RecordHeader` structure where recType value must be a RT_TextMasterStyleAtom (0xFA3) and recInstance value specifies the type of text to which the formatting applies.
+ cLevels (2 bytes): An unsigned integer that specifies the number of styles levels. It MUST be less than or equal to 0x0005
+ LstLvlx (variable): Five optional TextMasterStyleLevel structure that specifies the master formatting for text. Each structure must exist accordingly to the cLevels value. 

The cLevels field specifies it MUST be less than or equal to 0x0005. This is important because the vulnerability depends on the value of this field. The function DfvPptReaderNS::TxMasterStyleAtom::parse is called to parse Microsoft Office PowerPoint character-level and paragraph-level formatting of the main master slide.

bool __fastcall DfvPptReaderNS::TxMasterStyleAtom::parse(DfvPptReaderNS::TxMasterStyleAtom *this, DfvCommon::MSORecParseContext *context)
{
  DfvPptReaderNS::TxMasterStyleAtom *TxMasterStyleAtomTable; 
  unsigned __int16 data_recinstance; 
  unsigned __int16 TextTypeEnum;
  unsigned __int16 current_word_value; 
  unsigned __int16 index; 
  bool status; 
  DfvPptReaderNS::PFStyle *PFStyleAtom; 
  unsigned __int16 cLevels;
  int offset; 

  TxMasterStyleAtomTable = this;                                                                                                                
  recInstance = this->recVer_recInstance;
  offset = 0;
  TextTypeEnum = recInstance >> 4;                                                                                                              
  if ( !DfvCommon::MSORecParseContext::readRecordData(context, this->recLen)                                                                    
       || !DfvCommon::MSORecParseContext::getWord(context, &cLevels, offset) )                                                                  [1]
        goto error_TxMasterStyleAtom;
  
  offset += 2;
  if ( cLevels )                                                                                                                                [2]
  {
    index = 0;
    status = true;
    data_to_read = 1;
    while ( 1 )                                                                                                                                 [3]
    {
      if ( TextTypeEnum <= 8u )                 
      {
        if ( (TextTypeEnum <= 4 )     
        {
          current_index = index;
          if ( index )                                                                                                                          [7]
          {
            DfvPptReaderNS::PFStyle::operator=( TxMasterStyleAtomTable + 96 * index + 0x18, TxMasterStyleAtomTable + 96 * index - 0x48);        [8]
            DfvPptReaderNS::CFStyle::operator=( TxMasterStyleAtomTable + 32 * index + 0x1F8, TxMasterStyleAtomTable + 0x20 * index + 0x1D8);    [9]
          }
          goto read_data;                                                                                                                       [10]
        }
        if ( (TextTypeEnum >= 5 )    
        {
          if ( (unsigned int)DfvCommon::MSORecParseContext::getWord(a2, &current_index, offset) )
          {
            offset += 2;
read_data:
            if ( !data_read
              || (PFStyleObject = TxMasterStyleAtomTable + 96 * index + 24),
                  *((_WORD *)PFStyleObject + 4) = index,
                  DfvPptReaderNS::PFStyle::parse(PFStyleObject, context, &offset)                                                               [5]                             
              &&  DfvPptReaderNS::CFStyle::parse(TxMasterStyleAtomTable + 32 * index + 0x1F8,context,&offset))
            {
              goto next_entry;                                                                                                                  [6]
            }
          }
          goto reset_read_data;
        }
      }
reset_read_data:
      data_read = 0;
next_entry:
      if ( ++index >= cLevels )                                                                                                                 [4]
      {
        if ( data_read )
          return 1LL;
error_TxMasterStyleAtom:
        icu_52::UnicodeString::UnicodeString((icu_52::UnicodeString *)&current_index, 1, L"TxMasterStyleAtom", 17);
        DfvPptReaderNS::PPTError::throwError((DfvPptReaderNS::PPTError *)0xD883, (unsigned __int64)&current_index, v6);
      }
    }
  }

The function DfvPptReaderNS::TxMasterStyleAtom::parse uses the function DfvCommon::MSORecParseContext::getWord to get the cLevels from file at [1], which is returning a word value from the buffer previously read. The algorithm of the function DfvPptReaderNS::TxMasterStyleAtom::parse is quite trivial, checking cLevels for a positive value at [2], then applying an infinite loop which starts at [3]. The bounds is check at [4] compares the incremented value index to cLevels, ending with success or failure if it’s superior or equal to it. We can notice here that cLevels is not compared against 0x0005, as documented in Microsoft’s Documentation.

Inside our loop we can see at [5] two main parsing functions named DfvPptReaderNS::PFStyle::parse and DfvPptReaderNS::CFStyle::parse which are reading binary data to fill in data accordingly. The interesting point is the presence of the constant value 96*index and 32*index respectively, typically demonstrating the usage of indexed tables. Once data is read, checks is performed again at [6] branching directly into [4]. Remember the index incremented at [4], data was previously read at [5] and is recopied at [8] and [9] into the next element of the relevant table. Then the execution continues with a direct branch at [10], until index surpasses cLevels [4].

fvPptReaderNS::PPTDocument *__fastcall DfvPptReaderNS::PPTDocument::PPTDocument(DfvPptReaderNS::PPTDocument *this)
{
  [...]
  *(_QWORD *)v6 = &`vtable for'DfvPptReaderNS::TxMasterStyleAtom + 2;
  *(_WORD *)&v25[v7 - 16] = 0;
  *(_WORD *)&v25[v7 - 8] = 0;
  *(_WORD *)&v25[v7 - 2] = 0;
  *(_WORD *)&v25[v7 - 6] = 8226;
  *(_WORD *)&v25[v7 - 4] = 0;
  *(_WORD *)&v25[v7 + 4] = 0;
  *(_WORD *)&v25[v7 + 6] = 100;
  *(_WORD *)&v25[v7 + 12] = 0;
  *(_QWORD *)&v25[v7 - 24] = vtable_PFStyle;                                                                                                    [11]
  *(_WORD *)&v25[v7 + 8] = 0;
  *(_WORD *)&v25[v7 + 10] = 0;
  *(_WORD *)&v25[v7 + 14] = 0;
  *(_WORD *)&v25[v7 + 16] = 576;
  *(_DWORD *)&v25[v7 - 12] = 0;
  *(_DWORD *)&v25[v7] = 0x1000000;
  *((_QWORD *)v6 + 9) = vtable_TabStops;
  v8 = v6 - v5;
  *((_QWORD *)v6 + 10) = 0LL;
  *(_WORD *)&v5[v8 + 104] = 0;
  *(_WORD *)&v5[v8 + 106] = 0;
  *(_QWORD *)&v5[v8 + 88] = 0LL;
  *(_QWORD *)&v5[v8 + 96] = 0LL;
  *(_WORD *)&v25[v7 + 64] = 0;
  *(_WORD *)&v25[v7 + 66] = 7;
  *(_WORD *)&v25[v7 + 68] = 0;
  v9 = v21 - v5;
  *(_QWORD *)&v1[v9 - 120] = vtable_PFStyle;                                                                                                    [12]
  *(_WORD *)&v1[v9 - 112] = 0;
  *(_WORD *)&v1[v9 - 104] = 0;
  *(_WORD *)&v1[v9 - 102] = 8226;
  *(_WORD *)&v1[v9 - 100] = 0;
  *(_WORD *)&v1[v9 - 98] = 0;
  *(_WORD *)&v1[v9 - 92] = 0;
  *(_WORD *)&v1[v9 - 90] = 100;
  *(_WORD *)&v1[v9 - 88] = 0;
  *(_WORD *)&v1[v9 - 86] = 0;
  *(_WORD *)&v1[v9 - 84] = 0;
  *(_WORD *)&v1[v9 - 82] = 0;
  *(_WORD *)&v1[v9 - 80] = 576;
  *(_DWORD *)&v1[v9 - 108] = 0;
  *(_DWORD *)&v1[v9 - 96] = 0x1000000;
  *((_QWORD *)v6 + 21) = vtable_TabStops;
  *((_QWORD *)v6 + 22) = 0LL;
  *(_WORD *)&v5[v8 + 200] = 0;
  [...]
}

We can easily understand that there is an overflow, which is happening due to the missing check against 0x0005 at [4], but we need to get into the constructor named DfvPptReaderNS::PPTDocument::PPTDocument to understand why. Without describing the whole function DfvPptReaderNS::PPTDocument::PPTDocument, we can see at [11] and [12] the construction of objects related to PFStyle. This function is preparing all objects related to the complete PowerPoint document and is reserving fixed space for objects and their corresponding vtables entries. The overflow is overwriting the vtable objects in the record, which can be used by an attacker to arbitrarily alter the execution flow of the program and thus execute arbitrary code.

Timeline

2019-03-20 - Vendor Disclosure
2019-05-14 - Vendor Patched
2019-05-14 - Public Release

Credit

Discovered by Emmanuel Tacheau of Cisco Talos.