Talos Vulnerability Report

TALOS-2022-1665

Open Babel ORCA format nAtoms out-of-bounds write vulnerabilities

July 21, 2023
CVE Number

CVE-2022-46289,CVE-2022-46290

SUMMARY

Multiple out-of-bounds write vulnerabilities exist in the ORCA format nAtoms functionality of Open Babel 3.1.1 and master commit 530dbfa3. A specially-crafted malformed file can lead to arbitrary code execution. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

Open Babel 3.1.1
Open Babel master commit 530dbfa3

PRODUCT URLS

Open Babel - https://openbabel.org/

CVSSv3 SCORE

9.8 - CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

CWE

CWE-122 - Heap-based Buffer Overflow

DETAILS

Open Babel is a popular library for converting chemical file formats, currently supporting about 130 different file formats. It implements bindings for several programming languages. Because of the nature of the library, and since there are many online chemical format converters and molecule viewers which might be using Open Babel in their backend for parsing and conversion, we consider this software as potentially accessible via network.

Open Babel ships a simple converter application called obabel that can be used to trigger the issue described in this advisory. obabel supports -i and -o parameters, which select the input and output formats to perform the conversion. obabel supports multiple input and output files (as does the Open Babel library itself): this technically allows multiple vulnerabilities to trigger in sequence, which in turn could make some vulnerabilities easier to exploit. In this advisory, however, we focus on only one input file and a corresponding output file.

When a single input file and output file are supplied, obabel.cpp records the input and output formats (if supplied), and calls OBConversion::FullConvert in obconversion.cpp. Inside this function, there’s a call to OpenAndSetFormat, which uses FormatFromExt to derive the input format from the filename extension if no -i parameter was supplied. Similarly, OpenInAndOutFiles can be used to derive both input and output formats from the filename extensions when none are supplied.

Depending on how the obabel application is invoked, different paths could actually take place. Eventually, pInFormat and pOutFormat (of base class OBFormat) objects are allocated, which are instances of the classes that implement the selected input and output formats.

The code then proceeds with a call to OBConversion::Convert, which eventually leads to calling pInFormat->ReadMolecule and pOutFormat->WriteMolecule.

In this advisory, we describe two issues in the orca file format (formats/orcaformat.cpp) when parsing an input file via ReadMolecule.

    bool OrcaOutputFormat::ReadMolecule(OBBase* pOb, OBConversion* pConv)
    {
      ...
      double* confCoords;
      ...
[1]   char buffer[BUFF_SIZE];
      ...
      int nAtoms = 0;

      vector<string> vs;

      mol.BeginModify();
[2]   while (ifs.getline(buffer,BUFF_SIZE)) {
          ...

The function defines several variables. We’re especially interested in buffer at [1], which is used to read lines in the input file. At [2] a line is read in a while loop (consuming all lines in the file).

          if (checkKeywords.find("Geometry Optimization Run") != notFound) {
[3]           geoOptRun = true;
              while (ifs.getline(buffer,BUFF_SIZE)) {
                  string checkNAtoms(buffer);

[4]               if (checkNAtoms.find("Number of atoms") != notFound) {
                      tokenize(vs,buffer);
[5]                   nAtoms = atoi((char*)vs[4].c_str());
                      break;
                  }
              }
          } // if "geometry optimization run"

When a line contains “Geometry Optimization Run”, geoOptRun [3] is set to true, and then the code looks for a line containing “Number of atoms” [4]. When found, tokenize then splits the line on white spaces and populates the tokens into vs. Then, nAtoms is set to the token at position 4. Note that the nAtoms value is of type int and fully controlled by the input file.

From this point two exploitation avenues are possible. We will describe them separately.

CVE-2022-46289 - nAtoms wrap-around

[6]       if (checkKeywords.find("CARTESIAN COORDINATES (ANGSTROEM)") != notFound) {
              //        if(strstr(buffer,"CARTESIAN COORDINATES (ANGSTROEM)") != NULL) {
              if (unitCell) break; // dont't overwrite unit cell coordinate informations
              if (mol.NumAtoms() == 0) {
                  newMol = true;
              }
              if (geoOptRun) {
[7]               confCoords = new double[nAtoms*3];
              }

If the next line contains “CARTESIAN COORDINATES (ANGSTROEM)” we enter [6], and the code allocates confCoords with a size of nAtoms * 3. Under the hood, new double[] actually calls new[] accounting for the size of type, in this case double. So, since double has a size of 8 bytes, this results in a buffer of size nAtoms * 3 * 8. new[] takes a size_t. This might not seem like a wrap-around issue on 64-bit platforms at first, however the operation nAtoms * 3 is performed as int, hence it can wrap around within 32 bits, both in 32-bit and 64-bit systems whose sizeof(int) is 4 bytes (or 2). Keep in mind that the size of int is compiler-dependent, so this issue might not be exploitable in some circumstances.

For example, assuming an int size of 4 (GCC 32 and 64 bits), if nAtoms is 1431655766, then nAtoms * 3 results in 2. 2 * 8 results then in only 16 bytes allocated for confCoords.

              ifs.getline(buffer,BUFF_SIZE);  // ---- ----- ----
              ifs.getline(buffer,BUFF_SIZE);
              tokenize(vs,buffer);
              int i=0;
[8]           while (vs.size() == 4) {

                  x = atof((char*)vs[1].c_str());
                  y = atof((char*)vs[2].c_str());
                  z = atof((char*)vs[3].c_str());

                  if (newMol){
                      atom = mol.NewAtom();
                      atom->SetAtomicNum(OBElements::GetAtomicNum(vs[0].c_str()));                //set atomic number
                      atom->SetVector(x,y,z); //set atom coordinates
                  }
                  if (geoOptRun){
[9]                   confCoords[i*3] = x;
                      confCoords[i*3+1] = y;
                      confCoords[i*3+2] = z;
                      i++;
                  } else {
                      atom->SetVector(x,y,z);
                  }

[10]              if (!ifs.getline(buffer,BUFF_SIZE))
                      break;
                  tokenize(vs,buffer);
              }

At [8], confCoords is written to, with controlled values x, y and z. The loop continues indefinitely as long as the next line has at least 4 tokens [4, 10]. Because the confCoords buffer is smaller than expected, the assignments at [9] will eventually write out-of-bounds on each line with 4 tokens read from input. An attacker can exploit this by supplying an nAtom equal or bigger than 1431655766.
This leads to an out-of-bounds write on the heap, which in turn can lead to arbitrary code execution.

CVE-2022-46290 - nAtoms unrestricted loop

Going back to the check at [6]:

[6]       if (checkKeywords.find("CARTESIAN COORDINATES (ANGSTROEM)") != notFound) {
              //        if(strstr(buffer,"CARTESIAN COORDINATES (ANGSTROEM)") != NULL) {
              if (unitCell) break; // dont't overwrite unit cell coordinate informations
              if (mol.NumAtoms() == 0) {
                  newMol = true;
              }
              if (geoOptRun) {
[7]               confCoords = new double[nAtoms*3];
              }

If the next line contains “CARTESIAN COORDINATES (ANGSTROEM)” we enter [6], and the code allocates confCoords with a size of nAtoms * 3. This means nAtoms arbitrarily controls the size of the confCoords buffer.

              ifs.getline(buffer,BUFF_SIZE);  // ---- ----- ----
              ifs.getline(buffer,BUFF_SIZE);
              tokenize(vs,buffer);
              int i=0;
[8]           while (vs.size() == 4) {

                  x = atof((char*)vs[1].c_str());
                  y = atof((char*)vs[2].c_str());
                  z = atof((char*)vs[3].c_str());

                  if (newMol){
                      atom = mol.NewAtom();
                      atom->SetAtomicNum(OBElements::GetAtomicNum(vs[0].c_str()));                //set atomic number
                      atom->SetVector(x,y,z); //set atom coordinates
                  }
                  if (geoOptRun){
[9]                   confCoords[i*3] = x;
                      confCoords[i*3+1] = y;
                      confCoords[i*3+2] = z;
[10]                  i++;
                  } else {
                      atom->SetVector(x,y,z);
                  }

[11]              if (!ifs.getline(buffer,BUFF_SIZE))
                      break;
                  tokenize(vs,buffer);
              }

At [8], confCoords is written to, with controlled values x, y and z. The loop continues indefinitely as long as the next line has at least 4 tokens [4, 11], because there’s no check comparing i and nAtoms at [10]. This allows the assignments at [9] to happen outside of the confCoords array, as i++ is always executed and the loop termination depends solely on the line contents.

An attacker can exploit this issue by supplying an nAtom that is smaller than the actual number of atom coordinates lines. This leads to an out-of-bounds write on the heap, which in turn can lead to arbitrary code execution.

Crash Information

$ ./bin/obabel -i orca nAtoms.1.loop.orca -o sdf
==1190299==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xf500d948 at pc 0xf4d7c9ff bp 0xffff2658 sp 0xffff264c
WRITE of size 8 at 0xf500d948 thread T0
    #0 0xf4d7c9fe in OpenBabel::OrcaOutputFormat::ReadMolecule(OpenBabel::OBBase*, OpenBabel::OBConversion*) ./src/formats/orcaformat.cpp:215
    #1 0xf751a915 in OpenBabel::OBMoleculeFormat::ReadChemObjectImpl(OpenBabel::OBConversion*, OpenBabel::OBFormat*) ./src/obmolecformat.cpp:102
    #2 0xf63c358c in OpenBabel::OBMoleculeFormat::ReadChemObject(OpenBabel::OBConversion*) ./include/openbabel/obmolecformat.h:116
    #3 0xf72a204e in OpenBabel::OBConversion::Convert() ./src/obconversion.cpp:545
    #4 0xf72c717a in OpenBabel::OBConversion::Convert(std::istream*, std::ostream*) ./src/obconversion.cpp:481
    #5 0xf72cf4f3 in OpenBabel::OBConversion::FullConvert(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) ./src/obconversion.cpp:1514
    #6 0x565594ea in main ./tools/obabel.cpp:370
    #7 0xf77923b4 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #8 0xf779247e in __libc_start_main_impl ../csu/libc-start.c:389
    #9 0x5655c356 in _start (./bin/obabel+0x7356)

0xf500d948 is located 0 bytes to the right of 24-byte region [0xf500d930,0xf500d948)
allocated by thread T0 here:
    #0 0xf7a58bb3 in operator new[](unsigned int) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:98
    #1 0xf4d6b208 in OpenBabel::OrcaOutputFormat::ReadMolecule(OpenBabel::OBBase*, OpenBabel::OBConversion*) ./src/formats/orcaformat.cpp:197
    #2 0xf751a915 in OpenBabel::OBMoleculeFormat::ReadChemObjectImpl(OpenBabel::OBConversion*, OpenBabel::OBFormat*) ./src/obmolecformat.cpp:102
    #3 0xf63c358c in OpenBabel::OBMoleculeFormat::ReadChemObject(OpenBabel::OBConversion*) ./include/openbabel/obmolecformat.h:116
    #4 0xf72a204e in OpenBabel::OBConversion::Convert() ./src/obconversion.cpp:545
    #5 0xf72c717a in OpenBabel::OBConversion::Convert(std::istream*, std::ostream*) ./src/obconversion.cpp:481
    #6 0xf72cf4f3 in OpenBabel::OBConversion::FullConvert(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) ./src/obconversion.cpp:1514
    #7 0x565594ea in main ./tools/obabel.cpp:370
    #8 0xf77923b4 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: heap-buffer-overflow ./src/formats/orcaformat.cpp:215 in OpenBabel::OrcaOutputFormat::ReadMolecule(OpenBabel::OBBase*, OpenBabel::OBConversion*)
Shadow bytes around the buggy address:
  0x3ea01ad0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3ea01ae0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3ea01af0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3ea01b00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3ea01b10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x3ea01b20: fa fa fa fa fa fa 00 00 00[fa]fa fa fd fd fd fa
  0x3ea01b30: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
  0x3ea01b40: fd fd fa fa 00 00 00 07 fa fa 00 00 00 fa fa fa
  0x3ea01b50: 00 00 00 03 fa fa 00 00 00 fa fa fa fd fd fd fa
  0x3ea01b60: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
  0x3ea01b70: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
VENDOR RESPONSE

Since the maintainer of this software did not release a patch during the 90 day window specified in our policy, we have now decided to release the information regarding this vulnerability, to make users of the software aware of this problem. See Cisco’s Coordinated Vulnerability Disclosure Policy for more information: https://tools.cisco.com/security/center/resources/vendor_vulnerability_policy.html

TIMELINE

2022-12-20 - Initial Vendor Contact
2023-01-12 - Vendor Disclosure
2023-07-21 - Public Release

Credit

Discovered by Claudio Bozzato of Cisco Talos.