Talos Vulnerability Report

TALOS-2016-0210

Iceni Argus PDF Uninitialized WordStyle Color Length Code Execution Vulnerability

February 27, 2017

Report ID

CVE-2016-8385

Summary

An exploitable uninitialized variable vulnerability which leads to a stack-based buffer overflow exists in Iceni Argus. When it attempts to convert a malformed PDF to XML a stack variable will be left uninitialized which will later be used to fetch a length that is used in a copy operation. In most cases this will allow an aggressor to write outside the bounds of a stack buffer which is used to contain colors. This can lead to code execution under the context of the account running the tool.

Tested Versions

Iceni Argus Version 6.6.04 (Sep 7 2012) NK

Product URLs

http://www.iceni.com/legacy.htm

CVSSv3 Score

8.8 - CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Details

This is an uninitialized variable that leads to a stack-based buffer overflow that occurs in Iceni Argus. This tool is used primarily by MarkLogic to convert PDF files to (X)HTML form. When aggregating styles for the different words defined within a page’s content, the convert app will first try to grab a style that has already been created by calling getStyleColor.

8160c1e:	8b 81 a8 a7 14 00    	mov    0x14a7a8(%ecx),%eax
8160c24:	89 14 24             	mov    %edx,(%esp)  ; colorObject
8160c27:	25 00 00 20 00       	and    $0x200000,%eax
8160c2c:	89 44 24 08          	mov    %eax,0x8(%esp)   ; flags
8160c30:	8b 85 dc fc ff ff    	mov    -0x324(%ebp),%eax
8160c36:	89 44 24 04          	mov    %eax,0x4(%esp)       ; textObject
8160c3a:	e8 21 f4 ff ff       	call   8160060 <getStyleColor>

Inside the getStyleColor function, the app will call ipTextGetColour to get the object representing the actual color, and then pass the result to the ipColorDevice function. ipTextGetColour will simply seek into an object and return a pointer inside one of its properties. Due to some malformed colors defined by the ‘rg’ opcode, this returned pointer will be left initialized as 0 and contain no color information.

8160077:	8b 45 0c             	mov    0xc(%ebp),%eax   ; textObject
816007a:	89 04 24             	mov    %eax,(%esp)
816007d:	e8 7e f9 f9 ff       	call   80ffa00 <ipTextGetColour>
...
8160082:	c7 44 24 0c 00 00 00 	movl   $0x0,0xc(%esp)   ; result
8160089:	00
816008a:	89 44 24 08          	mov    %eax,0x8(%esp)   ; source
816008e:	8d 85 74 ff ff ff    	lea    -0x8c(%ebp),%eax ; XXX: buffer to
initialize
8160094:	89 85 68 ff ff ff    	mov    %eax,-0x98(%ebp)
816009a:	89 44 24 04          	mov    %eax,0x4(%esp)   ; destination
816009e:	8b 45 08             	mov    0x8(%ebp),%eax   ; color object
81600a1:	8b 40 04             	mov    0x4(%eax),%eax
81600a4:	89 04 24             	mov    %eax,(%esp)
81600a7:	e8 e4 b0 0e 00       	call   824b190 <ipColorToDevice>    ; XXX: fails
to initialize pointer to argument

Inside the function ipColorToDevice, the tool will read a byte from the buffer initialized by 00. At address 824b1c2, this causes an integer underflow which means the branch at 824b1c7 will be taken. This function is supposed to copy data from the 3rd argument into the pointer at the 2nd argument, but due to the integer underflow causes the function to return without writing any data to the buffer pointed to by the 2nd argument.

824b1ba:	8b 4d 10             	mov    0x10(%ebp),%ecx  ; source
824b1bd:	31 f6                	xor    %esi,%esi
824b1bf:	0f b6 11             	movzbl (%ecx),%edx      ; XXX: read null byte
824b1c2:	8d 42 ff             	lea    -0x1(%edx),%eax  ; XXX: subtract 1
824b1c5:	3c 0c                	cmp    $0xc,%al
824b1c7:	77 49                	ja     824b212 <ipColorToDevice+0x82>   ; XXX:
branch taken
...
824b212:	89 f0                	mov    %esi,%eax
824b214:	8b 5d f4             	mov    -0xc(%ebp),%ebx
824b217:	8b 75 f8             	mov    -0x8(%ebp),%esi
824b21a:	8b 7d fc             	mov    -0x4(%ebp),%edi
824b21d:	89 ec                	mov    %ebp,%esp
824b21f:	5d                   	pop    %ebp
824b220:	c3                   	ret

After returning back to getStyleColor with the buffer left uninitialized, the application will try and read a byte from the buffer and use it as a terminator to a loop that follows.

81600b4:	8b 4d 90             	mov    -0x70(%ebp),%ecx     ; pointer to sentinel
value
81600b7:	0f b6 95 75 ff ff ff 	movzbl -0x8b(%ebp),%edx     ; XXX: grab
unitialized byte from dest buffer
81600be:	85 c9                	test   %ecx,%ecx
81600c0:	88 95 67 ff ff ff    	mov    %dl,-0x99(%ebp)  ; XXX: write it to lvar
for use in loop
81600c6:	0f 84 61 02 00 00    	je     816032d <getStyleColor+0x2cd>

This loop will actually copy the color data from a source to the destination using the variable at -0x99 as the sentinel for the loop. The size of the target buffer appears to be a maximum of 4 DWORDs, and it takes around 10 DWORDs to get to the stack frame pointer. If the uninitialized value on the stack is larger than 4, then the stack buffer is being overflown.

81600eb:	0f b6 b5 67 ff ff ff 	movzbl -0x99(%ebp),%esi     ; XXX: read sentinel
value
...
81600f4:	8b bd 68 ff ff ff    	mov    -0x98(%ebp),%edi     ; source
81600fa:	31 d2                	xor    %edx,%edx
81600fc:	8d 4d d8             	lea    -0x28(%ebp),%ecx     ; XXX: destination
buffer
81600ff:	90                   	nop
8160100:	8b 44 97 04          	mov    0x4(%edi,%edx,4),%eax    ; read dword
8160104:	89 04 91             	mov    %eax,(%ecx,%edx,4)   ; XXX: write dword
8160107:	83 c2 01             	add    $0x1,%edx
816010a:	39 f2                	cmp    %esi,%edx
816010c:	7c f2                	jl     8160100 <getStyleColor+0xa0>

The size of the stack frame for getStyleColor adds -0xac bytes to the frame pointer and the sentinel is at offset -0x99 relative to the frame. The difference between these is 0x13 bytes which means that in order to control the sentinel, something controlled must be written to 0x10(%ebp) or 0xc(%esp) in the caller. It turns out that there’s only two places in the collectWordStyle function that immediately write to the byte in this DWORD. At address 0x8161013, the pointer for MarkupCmp (0x815fea0) is written to 0xc(%esp). This sets the sentinel value for the loop to 0xfe which is larger than 4. In order to set this sentinel value to this specific value, the loop in collectWordStyles must be hit at least once.

8161007:	8d 8d 64 ff ff ff    	lea    -0x9c(%ebp),%ecx
816100d:	8d 83 a0 8e 20 ff    	lea    -0xdf7160(%ebx),%eax
8161013:	89 44 24 0c          	mov    %eax,0xc(%esp)   ; XXX: write %eax to
0xc(%esp)
8161017:	c7 44 24 08 04 00 00 	movl   $0x4,0x8(%esp)
...
8161033:	7e 42                	jle    8161077 <collectWordStyles+0x517>

Crash Information

$ gdb --quiet --args /opt/MarkLogic/Converts/cvtpdf/convert ~/config


Reading symbols from /opt/MarkLogic/Converters/cvtpdf/convert...done.


(gdb) r


Starting program: /opt/MarkLogic/Converters/cvtpdf/convert /home/user/config/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Loading configuration...
Parsing macros...
Macro synth-bookmarks='true'
Macro image-output='true'
Macro text-output='true'
Macro zones='false'
Macro ignore-text='true'
Macro remove-overprint='false'
Macro illustrations='true'
Macro line-breaks='true'
Macro image-quality='75'
Macro page-start=''
Macro page-end=''
Macro document-start=''
Macro document-end=''
features='11140221'
Processing...
Analysing '/home/user/poc.pdf'
Pages 1 to 1
Processing page 1

Catchpoint 4 (signal SIGSEGV), 0x098eef90 in ?? ()


(gdb) h


-=[registers]=-
[eax: 0x00000000] [ebx: 0x08f57000] [ecx: 0x098eef90] [edx: 0x000000fe]
[esi: 0xf7f16420] [edi: 0x00000002] [esp: 0xfffbf160] [ebp: 0x00000003]
[eflags: NZ SF OF NC ND IF]

-=[stack]=-
fffbf160 | f7f16420 00000034 fffbf148 08f57000 |  d..4...H....p..
fffbf170 | 09976610 00000002 fffbf138 080627c0 | .f......8....'..
fffbf180 | 00000000 00000032 098eef90 08f57000 | ....2........p..
fffbf190 | 098eef90 00000002 fffbf158 0806afdd | ........X.......

-=[disassembly]=-
=> 0x98eef90:   add    %al,(%eax)
   0x98eef92:   add    %al,(%eax)
   0x98eef94:   jo     0x98eef96
   0x98eef96:   data16
   0x98eef97:   add    %ch,(%edx)
   0x98eef99:   add    %al,0x69(%eax,%eax,1)

Credit

Discovered by Marcin Noga of Cisco Talos and a Talos team member.

Timeline

2016-10-10 - Vendor Disclosure
2017-02-27 - Public Release