Talos Vulnerability Report

TALOS-2022-1664

Open Babel MOL2 format attribute and value out-of-bounds write vulnerability

July 21, 2023
CVE Number

CVE-2022-43607

SUMMARY

An out-of-bounds write vulnerability exists in the MOL2 format attribute and value functionality of Open Babel 3.1.1 and master commit 530dbfa3. A specially crafted malformed file can lead to arbitrary code execution. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

Open Babel 3.1.1
Open Babel master commit 530dbfa3

PRODUCT URLS

Open Babel - https://openbabel.org/

CVSSv3 SCORE

8.1 - CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H

CWE

CWE-119 - Improper Restriction of Operations within the Bounds of a Memory Buffer

DETAILS

Open Babel is a popular library for converting chemical file formats, currently supporting about 130 different file formats. It implements bindings for several programming languages. Because of the nature of the library, and since there are many online chemical format converters and molecule viewers which might be using Open Babel in their backend for parsing and conversion, we consider this software as potentially accessible via network.

Open Babel ships a simple converter application called obabel that can be used to trigger the issue described in this advisory. obabel supports -i and -o parameters, which select the input and output formats to perform the conversion. obabel supports multiple input and output files (as does the Open Babel library itself): this technically allows multiple vulnerabilities to trigger in sequence, which in turn could make some vulnerabilities easier to exploit. In this advisory, however, we focus on only one input file and a corresponding output file.

When a single input file and output file are supplied, obabel.cpp records the input and output formats (if supplied), and calls OBConversion::FullConvert in obconversion.cpp. Inside this function, there’s a call to OpenAndSetFormat, which uses FormatFromExt to derive the input format from the filename extension if no -i parameter was supplied. Similarly, OpenInAndOutFiles can be used to derive both input and output formats from the filename extensions when none are supplied.

Depending on how the obabel application is invoked, different paths could take place. However, eventually, pInFormat and pOutFormat (of base class OBFormat) objects are allocated, which are instances of the classes that implement the selected input and output formats.

The code then proceeds with a call to OBConversion::Convert, which eventually leads to calling pInFormat->ReadMolecule and pOutFormat->WriteMolecule.

In this advisory, we describe an issue in the mol2 file format (formats/mol2format.cpp) when parsing an input file via ReadMolecule.

    bool MOL2Format::ReadMolecule(OBBase* pOb, OBConversion* pConv)
    {

      ...
[1]   char buffer[BUFF_SIZE];
      char *comment = nullptr;
      string str,str1;
      vector<string> vstr;
      int len;

      ...

      for (;;)
        {
[2]       if (!ifs.getline(buffer,BUFF_SIZE))
            return(false);
[3]       if (pConv->IsOption("c", OBConversion::INOPTIONS) != nullptr && EQn(buffer, "###########", 10))
            {
[4]           char attr[32], val[32];
[5]           sscanf(buffer, "########## %[^:]:%s", attr, val);
              OBPairData *dd = new OBPairData;
              dd->SetAttribute(attr);
              dd->SetValue(val);
              dd->SetOrigin(fileformatInput);
              mol.SetData(dd);
            }
          if (EQn(buffer,"@<TRIPOS>MOLECULE",17))
            break;
        }

The function defines several variables. We’re especially interested in buffer at [1], a char buffer of size 32768 which is used to read lines in the input file. At [2] a line is read, which is expected to contain the string “###########” [3]. In order to enter the if at [3], the option “c” also needs to be used (-ac from the commandline).

At [4], two char arrays are defined, of size 32. At [5], sscanf is used with a “%s” format specifier. “%s” alone does not constrain the read length, which can go well over 32 bytes. This allows an out-of-bounds write on the stack from both attr and val buffers. Depending on how the code is compiled and the stack is laid out, this issue can lead to arbitrary code execution.

Crash Information

$ ./bin/obabel -i mol2 scanf.mol2 -o sdf -ac
=================================================================
==1191000==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xfffdb130 at pc 0xf7a09be5 bp 0xfffda618 sp 0xfffda1f0
WRITE of size 30001 at 0xfffdb130 thread T0
    #0 0xf7a09be4 in scanf_common ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors_format.inc:342
    #1 0xf7a0a6f9 in __interceptor___isoc99_vsscanf ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1531
    #2 0xf7a0a77c in __interceptor___isoc99_sscanf ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1554
    #3 0xf4315454 in OpenBabel::MOL2Format::ReadMolecule(OpenBabel::OBBase*, OpenBabel::OBConversion*) ./src/formats/mol2format.cpp:186
    #4 0xf751a915 in OpenBabel::OBMoleculeFormat::ReadChemObjectImpl(OpenBabel::OBConversion*, OpenBabel::OBFormat*) ./src/obmolecformat.cpp:102
    #5 0xf63c358c in OpenBabel::OBMoleculeFormat::ReadChemObject(OpenBabel::OBConversion*) ./include/openbabel/obmolecformat.h:116
    #6 0xf72a204e in OpenBabel::OBConversion::Convert() ./src/obconversion.cpp:545
    #7 0xf72c717a in OpenBabel::OBConversion::Convert(std::istream*, std::ostream*) ./src/obconversion.cpp:481
    #8 0xf72cf4f3 in OpenBabel::OBConversion::FullConvert(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) ./src/obconversion.cpp:1514
    #9 0x565594ea in main ./tools/obabel.cpp:370
    #10 0xf77923b4 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #11 0xf779247e in __libc_start_main_impl ../csu/libc-start.c:389
    #12 0x5655c356 in _start (./bin/obabel+0x7356)

Address 0xfffdb130 is located in stack of thread T0 at offset 2448 in frame
    #0 0xf43148ef in OpenBabel::MOL2Format::ReadMolecule(OpenBabel::OBBase*, OpenBabel::OBConversion*) ./src/formats/mol2format.cpp:156

  This frame has 91 object(s):
    [32, 33) '<unknown>'
    [48, 49) '<unknown>'
    [64, 65) '<unknown>'
    [80, 81) '<unknown>'
    [96, 97) '<unknown>'
    [112, 113) '<unknown>'
    [128, 129) '<unknown>'
    [144, 145) '<unknown>'
    [160, 161) '<unknown>'
    [176, 177) '<unknown>'
    [192, 193) '<unknown>'
    [208, 209) '<unknown>'
    [224, 225) '<unknown>'
    [240, 241) '<unknown>'
    [256, 260) 'natoms' (line 199)
    [272, 276) 'nbonds' (line 199)
    [288, 292) 'resnum' (line 269)
    [304, 308) 'ri' (line 385)
    [320, 324) 'aid' (line 410)
    [336, 340) 'num' (line 410)
    [352, 356) 'charge' (line 420)
    [368, 372) 'start' (line 438)
    [384, 388) 'end' (line 438)
    [400, 404) '<unknown>'
    [416, 420) '<unknown>'
    [432, 436) '<unknown>'
    [448, 452) '<unknown>'
    [464, 468) '<unknown>'
    [480, 484) '<unknown>'
    [496, 500) '__dnew'
    [512, 516) '__guard'
    [528, 532) '__guard'
    [544, 548) '<unknown>'
    [560, 564) '<unknown>'
    [576, 580) '<unknown>'
    [592, 596) '<unknown>'
    [608, 612) '<unknown>'
    [624, 628) '__guard'
    [640, 644) '__dnew'
    [656, 660) '__guard'
    [672, 676) '__dnew'
    [688, 692) '__guard'
    [704, 708) '__dnew'
    [720, 724) '__guard'
    [736, 740) '<unknown>'
    [752, 756) '<unknown>'
    [768, 772) '<unknown>'
    [784, 788) '<unknown>'
    [800, 804) '<unknown>'
    [816, 820) '__guard'
    [832, 836) 'd' (line 155)
    [848, 852) 'd' (line 155)
    [864, 868) 'd' (line 155)
    [880, 888) 'x' (line 267)
    [912, 920) 'y' (line 267)
    [944, 952) 'z' (line 267)
    [976, 984) 'pcharge' (line 267)
    [1008, 1020) 'vstr' (line 171)
    [1040, 1052) 'atom' (line 474)
    [1072, 1084) 'bit' (line 481)
    [1104, 1116) 'atom' (line 499)
    [1136, 1148) 'bitA' (line 503)
    [1168, 1180) 'bitB' (line 512)
    [1200, 1212) 'bond' (line 533)
    [1232, 1244) 'matom' (line 564)
    [1264, 1288) 'str' (line 170)
    [1328, 1352) 'str1' (line 170)
    [1392, 1416) '<unknown>'
    [1456, 1480) '<unknown>'
    [1520, 1544) 'v' (line 265)
    [1584, 1608) '<unknown>'
    [1648, 1672) '<unknown>'
    [1712, 1736) '<unknown>'
    [1776, 1800) '<unknown>'
    [1840, 1864) '<unknown>'
    [1904, 1928) '<unknown>'
    [1968, 1992) 'nextrti' (line 404)
    [2032, 2056) '<unknown>'
    [2096, 2120) '<unknown>'
    [2160, 2184) 'title' (line 543)
    [2224, 2248) '<unknown>'
    [2288, 2312) '<unknown>'
    [2352, 2384) 'attr' (line 185)
    [2416, 2448) 'val' (line 185)
    [2480, 2588) 'atom' (line 266) <== Memory access at offset 2448 partially underflows this variable
    [2624, 2832) 'errorMsg' (line 343) <== Memory access at offset 2448 partially underflows this variable
    [2896, 3104) 'errorMsg' (line 541) <== Memory access at offset 2448 partially underflows this variable
    [3168, 35936) 'buffer' (line 168) <== Memory access at offset 2448 partially underflows this variable
    [36192, 68960) 'temp_type' (line 268)
    [69216, 101984) 'resname' (line 268)
    [102240, 135008) 'atmid' (line 268)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors_format.inc:342 in scanf_common
Shadow bytes around the buggy address:
  0x3fffb5d0: f2 f2 00 00 00 f2 f2 f2 f2 f2 00 00 00 f2 f2 f2
  0x3fffb5e0: f2 f2 00 00 00 f2 f2 f2 f2 f2 00 00 00 f2 f2 f2
  0x3fffb5f0: f2 f2 00 00 00 f2 f2 f2 f2 f2 00 00 00 f2 f2 f2
  0x3fffb600: f2 f2 00 00 00 f2 f2 f2 f2 f2 00 00 00 f2 f2 f2
  0x3fffb610: f2 f2 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 f2 f2
=>0x3fffb620: f2 f2 00 00 00 00[f2]f2 f2 f2 00 00 00 00 00 00
  0x3fffb630: 00 00 00 00 00 00 00 04 f2 f2 f2 f2 00 00 00 00
  0x3fffb640: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x3fffb650: 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2 00 00
  0x3fffb660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x3fffb670: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
VENDOR RESPONSE

Since the maintainer of this software did not release a patch during the 90 day window specified in our policy, we have now decided to release the information regarding this vulnerability, to make users of the software aware of this problem. See Cisco’s Coordinated Vulnerability Disclosure Policy for more information: https://tools.cisco.com/security/center/resources/vendor_vulnerability_policy.html

TIMELINE

2022-12-20 - Initial Vendor Contact
2023-01-12 - Vendor Disclosure
2023-07-21 - Public Release

Credit

Discovered by Claudio Bozzato of Cisco Talos.