Talos Vulnerability Report


Iceni Argus PDF Inflate+LZW Decompression Heap-Based Buffer Overflow Vulnerability

February 27, 2017

Report ID



An exploitable heap-based buffer overflow exists in Iceni Argus. When it attempts to convert a malformed PDF with an object encoded w/ multiple encoding types terminating with an LZW encoded type, an overflow may occur due to a lack of bounds checking by the LZW decoder. This can lead to code execution under the context of the account of the user running it.

Tested Versions

Iceni Argus Version 6.6.04 (Sep 7 2012) NK

Product URLs


CVSSv3 Score

8.8 - CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H


This is a heap-based buffer overflow that occurs in Iceni Argus. This tool is used primarily by MarkLogic Server to convert PDF files to (X)HTML form. While decoding an encoded object that is encoded within a PDF with more than one encoding type where one of them is LZW, the tool will call the ipLZWFeedCreate function to initialize the decoder for an object that is LZW encoded. The object is created by first allocated a 0x545c byte buffer which will set aside 0x1000 bytes of space at the end of it for decoding. This is assigned as a pointer which is written into an object at line 0x80cb2b8.

 80cb1d7:	c7 04 24 5c 54 00 00 	movl   $0x545c,(%esp)   ; size
 80cb1de:	8d 83 a4 5b 49 ff    	lea    -0xb6a45c(%ebx),%eax
 80cb1e4:	89 44 24 04          	mov    %eax,0x4(%esp)   ; name
 80cb1e8:	e8 83 63 0d 00       	call   81a1570 <icnMalloc>
 80cb1ed:	89 c6                	mov    %eax,%esi
 80cb2ad:	8d 86 54 44 00 00    	lea    0x4454(%esi),%eax    ; 0x1000 bytes
 80cb2b3:	ba 01 00 00 00       	mov    $0x1,%edx
 80cb2b8:	89 86 54 54 00 00    	mov    %eax,0x5454(%esi)    ; XXX: pointer

Within the same function, ipLZWFeedCreate, the constructor will initialize space for an array containing the decoding-table/code-dictionary. Each entry is initialized with 0x101 (end-of-data) for the value along with an index. This is done all the way up to index 0x100.

 80cb303:	66 c7 86 3c 02 00 00 	movw   $0x101,0x23c(%esi)       ; end-of-data constant
 80cb30a:	01 01
 80cb30c:	8d 74 26 00          	lea    0x0(%esi,%eiz,1),%esi
 80cb310:	66 89 50 02          	mov    %dx,0x2(%eax)
 80cb314:	83 c2 01             	add    $0x1,%edx
 80cb317:	66 c7 00 01 01       	movw   $0x101,(%eax)
 80cb31c:	83 c0 04             	add    $0x4,%eax
 80cb31f:	81 fa 00 01 00 00    	cmp    $0x100,%edx              ; loop 256 times
 80cb325:	75 e9                	jne    80cb310 <ipLZWFeedCreate+0x150>
 80cb327:	66 c7 86 40 42 00 00 	movw   $0x102,0x4240(%esi)

Inside the following loop, the application will look through the code-dictionary and grab each index along with its respective value. This loop will only terminate if the specified index points to an end-of-data entry. If during decoding the end-of-data entry is not found combined with the loop iterating more than 0x1000 times, a buffer overflow can be made to occur due to a missing boundary check for terminating of the loop.

 80cb587:	8d 96 54 44 00 00    	lea    0x4454(%esi),%edx    ; beginning of 0x1000 byte buffer
 80cb58d:	89 55 f0             	mov    %edx,-0x10(%ebp)     ; write pointer
 80cb706:	0f b7 55 da          	movzwl -0x26(%ebp),%edx     ; starting index
 80cb70a:	66 39 96 40 42 00 00 	cmp    %dx,0x4240(%esi)
 80cb711:	0f 8e be 00 00 00    	jle    80cb7d5 <loadLZWBuffer+0x3f5>
 80cb717:	89 d0                	mov    %edx,%eax
 80cb719:	0f bf d0             	movswl %ax,%edx
 80cb728:	8b 4d f0             	mov    -0x10(%ebp),%ecx     ; write pointer
 80cb72b:	0f b7 84 96 3e 02 00 	movzwl 0x23e(%esi,%edx,4),%eax
 80cb732:	00
 80cb733:	88 01                	mov    %al,(%ecx)           ; XXX: crash
 80cb735:	0f bf 94 96 3c 02 00 	movswl 0x23c(%esi,%edx,4),%edx
 80cb73c:	00
 80cb73d:	83 c1 01             	add    $0x1,%ecx
 80cb740:	89 4d f0             	mov    %ecx,-0x10(%ebp)
 80cb743:	66 81 bc 96 3c 02 00 	cmpw   $0x101,0x23c(%esi,%edx,4)    ; check if index points at end-of-data
 80cb74a:	00 01 01
 80cb74d:	75 d9                	jne    80cb728 <loadLZWBuffer+0x348>

Crash Information

$ gdb --quiet --args /opt/MarkLogic/converters/cvtpdf/convert ~/config/

Reading symbols from /opt/MarkLogic/Converters/cvtpdf/convert...done.

(gdb) r

Starting program: /opt/MarkLogic/Converters/cvtpdf/convert /home/user/config/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Loading configuration...
Parsing macros...
Macro synth-bookmarks='true'
Macro image-output='true'
Macro text-output='true'
Macro zones='false'
Macro ignore-text='true'
Macro remove-overprint='false'
Macro illustrations='true'
Macro line-breaks='true'
Macro image-quality='75'
Macro page-start=''
Macro page-end=''
Macro document-start=''
Macro document-end=''
Analysing '/home/user/poc.pdf'
Pages 1 to 1
Processing page 1

Catchpoint 4 (signal SIGSEGV), 0x080cb733 in loadLZWBuffer ()

(gdb) bt 5

#0  0x080cb733 in loadLZWBuffer ()
#1  0x080cb9a9 in ipLZWFeedRead ()
#2  0x08084cd7 in ipDataFeedRead ()
#3  0x08257174 in loadFlateBuffers ()
#4  0x0825751b in ipFlateFeedRead ()
(More stack frames follow...)

(gdb) h

[eax: 0x0000004d] [ebx: 0x08f57000] [ecx: 0x09abd000] [edx: 0x00000117]
[esi: 0x09a794d0] [edi: 0x00000009] [esp: 0xfffbf8e0] [ebp: 0xfffbf918]
[eflags: NZ SF OF NC ND NI]

fffbf8e0 | 09a5e7c8 f7fd8000 09a7970c 09a7d719 | ................
fffbf8f0 | 0117b2f8 09a7d924 09a7d919 00000009 | ....$...........
fffbf900 | 098eb38c 09a7d84d 09abd000 08f57000 | ....M........p..
fffbf910 | 00000000 09a794d0 fffbf948 080cb9a9 | ........H.......

=> 0x80cb733 <loadLZWBuffer+851>:       mov    %al,(%ecx)
   0x80cb735 <loadLZWBuffer+853>:       movswl 0x23c(%esi,%edx,4),%edx
   0x80cb73d <loadLZWBuffer+861>:       add    $0x1,%ecx
   0x80cb740 <loadLZWBuffer+864>:       mov    %ecx,-0x10(%ebp)
   0x80cb743 <loadLZWBuffer+867>:       cmpw   $0x101,0x23c(%esi,%edx,4)
   0x80cb74d <loadLZWBuffer+877>:       jne    0x80cb728 <loadLZWBuffer+840>


Discovered by Marcin Noga of Cisco Talos.


2016-10-10 - Vendor Disclosure
2017-02-27 - Public Release