Talos Vulnerability Report

TALOS-2022-1668

Open Babel GRO format res uninitialized pointer dereference vulnerability

July 21, 2023
CVE Number

CVE-2022-42885

SUMMARY

A use of uninitialized pointer vulnerability exists in the GRO format res functionality of Open Babel 3.1.1 and master commit 530dbfa3. A specially crafted malformed file can lead to arbitrary code execution. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

Open Babel 3.1.1
Open Babel master commit 530dbfa3

PRODUCT URLS

Open Babel - https://openbabel.org/

CVSSv3 SCORE

9.8 - CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

CWE

CWE-824 - Access of Uninitialized Pointer

DETAILS

Open Babel is a popular library for converting chemical file formats, currently supporting about 130 different file formats. It implements bindings for several programming languages. Because of the nature of the library, and since there are many online chemical format converters and molecule viewers which might be using Open Babel in their backend for parsing and conversion, we consider this software as potentially accessible via network.

Open Babel ships a simple converter application called obabel that can be used to trigger the issue described in this advisory. obabel supports -i and -o parameters, which select the input and output formats to perform the conversion. obabel supports multiple input and output files (as does the Open Babel library itself): this technically allows multiple vulnerabilities to trigger in sequence, which in turn could make some vulnerabilities easier to exploit. In this advisory, however, we focus on only one input file and a corresponding output file.

When a single input file and output file are supplied, obabel.cpp records the input and output formats (if supplied) and calls OBConversion::FullConvert in obconversion.cpp. Inside this function, there’s a call to OpenAndSetFormat, which uses FormatFromExt to derive the input format from the filename extension if no -i parameter was supplied. Similarly, OpenInAndOutFiles can be used to derive both input and output formats from the filename extensions when none are supplied.
Depending on how the obabel application is invoked, different paths could actually take place. Eentually, pInFormat and pOutFormat (of base class OBFormat) objects are allocated, which are instances of the classes that implement the selected input and output formats.

The code then proceeds with a call to OBConversion::Convert, which eventually leads to calling pInFormat->ReadMolecule and pOutFormat->WriteMolecule.

In this advisory, we describe an issue in the GRO file format (formats/groformat.cpp) when parsing an input file via ReadMolecule.

    bool GROFormat::ReadMolecule(OBBase* pOb, OBConversion* pConv)
    {
      ...
      char buffer[BUFF_SIZE];
      ...
      int natoms = 0;
      string title = "";
[2]   long int resid = 0; // 5
      string resname = ""; //5
      string atomtype = ""; //5
      //long int atomid = 0; //5
      double x = 0.0; // 8.3
      double y = 0.0; // 8.3
      double z = 0.0; // 8.3
      double vx = 0.0; // 8.4
      double vy = 0.0; // 8.4
      double vz = 0.0; // 8.4
      string tempstr = "";
[3]   long int residx = 0;
      OBAtom* atom;
[1]   OBResidue* res;
      OBVectorData* velocity;

At [1] a res pointer is declared but not initialized. resid [2] and residx [3] are both set to 0.

      ...
      if (!ifs.getline(buffer, BUFF_SIZE)) {
        errorMsg << "Problems reading a GRO file: "
                 << "Cannot read the first line!";
        obErrorLog.ThrowError(__FUNCTION__, errorMsg.str(), obWarning);
        return false;
      }

      // Get the title
[4]   title.assign(buffer);
      if (title.size() < 1) {
        title = pConv->GetTitle();
        pmol->SetTitle(title);
      } else {
        pmol->SetTitle(title);
      }
      ...

A line is read as title [4].

      ...
      if (!ifs.getline(buffer, BUFF_SIZE)) {
        errorMsg << "Problems reading a GRO file: "
                 << "Cannot read the second line!";
        obErrorLog.ThrowError(__FUNCTION__, errorMsg.str(), obWarning);
        return false;
      }

      // Get the number of atoms
[5]   stringstream(buffer) >> natoms;
      if (natoms < 1) {
        errorMsg << "Problems reading a GRO file: "
                 << "There are no atoms in the file or the second line is"
                 << " incorrectly written.";
        obErrorLog.ThrowError(__FUNCTION__, errorMsg.str(), obWarning);
        return false;
      }
      pmol->ReserveAtoms(natoms);
      ...

Another line is read and stored into natoms as an integer [5]. It has to be bigger than 0.

      ...
      // Read all atom records
[6]   for (int i=1; i<=natoms; i++) {
        if (!ifs.getline(buffer,BUFF_SIZE)) {
          errorMsg << "Problems reading a GRO file: "
                   << "Could not read line #" << i+2 << ", file error." << endl
                   << " According to the second line, there should be " << natoms
                   << " atoms, and therefore " << natoms+3 << " lines in the file.";
          obErrorLog.ThrowError(__FUNCTION__, errorMsg.str(), obWarning);
          return false;
        }

        line = buffer;

        // Get atom
        atom  = pmol->NewAtom();

        tempstr.assign(line,0,5);
[7]     stringstream(tempstr) >> resid;

        resname.assign(line,5,5);
        Trim(resname);

        atomtype.assign(line,10,5);
        Trim(atomtype);

        // Not used, OB assigns its own indizes
        //tempstr.assign(line,15,5);
        //stringstream(tempstr) >> atomid;

        tempstr.assign(line,20,8);
        stringstream(tempstr) >> x;

        tempstr.assign(line,28,8);
        stringstream(tempstr) >> y;

        tempstr.assign(line,36,8);
        stringstream(tempstr) >> z;

        if (line.size() > 44) {
          tempstr.assign(line,44,8);
          stringstream(tempstr) >> vx;

          tempstr.assign(line,52,8);
          stringstream(tempstr) >> vy;

          tempstr.assign(line,60,8);
          stringstream(tempstr) >> vz;

          velocity = new OBVectorData();
          velocity->SetData(vx, vy, vz);
          velocity->SetAttribute("Velocity");
          velocity->SetOrigin(fileformatInput);
          atom->SetData(velocity);
        }
        ...

For each line in the file [6], a set of fields are read using fixed-position substrings.
The field we’re interested in is resid, which can be set arbitrarily at [7].

        ...
        // OB translates this and, e.g., OW1 turns into O3
        // Type conversion should be done explicitly if that needs to be
        // controlled.
        atom->SetType(atomtype);

        // Set coordinates of the atom, multiply by 10 to convert from nm to
        // angstrom
        atom->SetVector(x*10, y*10, z*10);

[8]     if (resid == residx) {
          // Add atom to an existing residue
[9]       res->AddAtom(atom);
        } else {
          // Create new residue and use that
[10]      res = pmol->NewResidue();
          res->SetName(resname);
          res->SetNum(resid);
          res->AddAtom(atom);
          residx = resid;
        }

        // Atom type has to be set in residues as AtomID
        res->SetAtomID(atom, atomtype);

At [8], if resid is equal to residx, then AddAtom is called on res [9].
As resid is fully controlled by the input file [7], resid can be set to 0, thus executing the line at [9] while res is still uninitialized [1] (initialization would normally happen at [10]).

Inside AddAtom, there’s a series of calls that depend on the uninitialized pointer res:

    void OBResidue::AddAtom(OBAtom *atom)
    {
      if (atom != nullptr)
        {
          atom->SetResidue(this);

          _atoms.push_back(atom);
          _atomid.push_back("");
          _hetatm.push_back(false);
          _sernum.push_back(0);
        }
    }

Eventually SetAtomID is also called, performing writes on _atomid which is referenced via res:

    void OBResidue::SetAtomID(OBAtom *atom, const string &id)
    {
      for ( unsigned int i = 0 ; i < _atoms.size() ; ++i )
        if (_atoms[i] == atom)
          _atomid[i] = id;
    }

The res pointer is stored on the stack, and, depending on how the code is compiled and the stack is laid out, the pointer could be under full control of an attacker, who could in turn use this issue to execute arbitrary code.

Crash Information

$ ./bin/obabel -i gro res.uninit.gro -o sdf
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1297086==ERROR: AddressSanitizer: SEGV on unknown address 0x00000078 (pc 0xf744a5ad bp 0xffff1e18 sp 0xffff1cf0 T0)
==1297086==The signal is caused by a READ memory access.
==1297086==Hint: address points to the zero page.
    #0 0xf744a5ad in std::vector<OpenBabel::OBAtom*, std::allocator<OpenBabel::OBAtom*> >::push_back(OpenBabel::OBAtom* const&) /usr/include/c++/12/bits/stl_vector.h:1278
    #1 0xf744a5ad in OpenBabel::OBResidue::AddAtom(OpenBabel::OBAtom*) ./src/residue.cpp:891
    #2 0xf559fcc9 in OpenBabel::GROFormat::ReadMolecule(OpenBabel::OBBase*, OpenBabel::OBConversion*) ./src/formats/groformat.cpp:278
    #3 0xf751a915 in OpenBabel::OBMoleculeFormat::ReadChemObjectImpl(OpenBabel::OBConversion*, OpenBabel::OBFormat*) ./src/obmolecformat.cpp:102
    #4 0xf63c358c in OpenBabel::OBMoleculeFormat::ReadChemObject(OpenBabel::OBConversion*) ./include/openbabel/obmolecformat.h:116
    #5 0xf72a204e in OpenBabel::OBConversion::Convert() ./src/obconversion.cpp:545
    #6 0xf72c717a in OpenBabel::OBConversion::Convert(std::istream*, std::ostream*) ./src/obconversion.cpp:481
    #7 0xf72cf4f3 in OpenBabel::OBConversion::FullConvert(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) ./src/obconversion.cpp:1514
    #8 0x565594ea in main ./tools/obabel.cpp:370
    #9 0xf77923b4 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #10 0xf779247e in __libc_start_main_impl ../csu/libc-start.c:389
    #11 0x5655c356 in _start (./bin/obabel+0x7356)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /usr/include/c++/12/bits/stl_vector.h:1278 in std::vector<OpenBabel::OBAtom*, std::allocator<OpenBabel::OBAtom*> >::push_back(OpenBabel::OBAtom* const&)
==1297086==ABORTING
VENDOR RESPONSE

Since the maintainer of this software did not release a patch during the 90 day window specified in our policy, we have now decided to release the information regarding this vulnerability, to make users of the software aware of this problem. See Cisco’s Coordinated Vulnerability Disclosure Policy for more information: https://tools.cisco.com/security/center/resources/vendor_vulnerability_policy.html

TIMELINE

2022-12-20 - Initial Vendor Contact
2023-01-12 - Vendor Disclosure
2023-07-21 - Public Release

Credit

Discovered by Claudio Bozzato of Cisco Talos.