May 9, 2019 | 6 minute read
64-Bit shellcode execution via Excel 4.0 Macros are a new iteration on an old technique. This article is a technical dissection of the technique and the ways it runs in a 64-bit environment.
While this type of attack is neither new nor as common as malicious VBA code, it is still an effective and intriguing attack.
Excel 4.0 macros (XLM), the older, awkward sibling of VBA, have been the focus of a couple of interesting offensive techniques. Since Stan Hegt and Pieter Ceelen of Outflank first played with the feature, and we have abused it for a funny little lateral movement technique and they have evolved to do some impressive work weaponizing it as a shellcode runner.
We have previously abused the feature as a Device Guard Bypass, and most recently, Stan Hegt has combined both shellcode and lateral movement approaches to enable raw shellcode execution on a remote Excel feature via DCOM. Up until now, some of this was restricted to only 32-bit versions of Excel. This was due to a couple of limitations of the Excel 4.0 macro system.
This type of attack is not as common as malicious VBA code. However, it is effective for two reasons: it can be difficult to analyze and many antivirus solutions struggle to detect it. Further, it is intriguing because even though the Excel 4.0 macros are fairly old, they are still supported in the most recent versions of Microsoft Office.
In this research, we outline how to enable the execution of 64-bit shellcode via Excel 4.0 macros. This document also explains the limitations prohibiting us from simply borrowing the 32-bit shellcode execution technique without changing it.
This is particularly interesting, as extending the technique to 64bit Office shows that the impact of this method can be more broad than previously thought. The recent decision by Microsoft to make 64bit Office the default version to be installed will also significantly increase its prevalence in the future.
Understanding Excel 4.0 macros is rooted in understanding the CALL and REGISTER functions, which allow for the execution of exported functions in arbitrary DLLs. This research focuses on the CALL function, as it is used for the proof of concept.
To use the CALL function, we invoke the ExecuteExcel4Macro method.
$excel.ExecuteExcel4Macro('CALL("advpack", "LaunchINFSectionA", "JJJFJ", 0, 0, "c:\\temp\\test.inf, DefauoltInstall_SingleUser, 1",0)')
Using the CALL function via the ExecuteExcel4Macro method.
CALL receives three mandatory arguments. The first argument is the name of the library from which to import our function, the second is the name of the function itself, and the third is a string representation of the imported function signature. The only thing Excel knows about the imported function is the address it receives via GetProcAddress. This means the macro has to describe the arguments and the return type of the import.
The first character in the string argument "JJJFJ" from ExecuteExcel4Macro represents the return value, and the rest of the characters represent arguments. Each letter corresponds to a data type that Excel 4.0 macros are able to handle."J", for example, denotes a signed 4-byte integer, while "F" is a reference to a null-terminated string.
In the 32-bit version, supported data types can easily substitute unsupported ones of the same size. Pointers, for example, can be treated as the 4-byte "J" type. This allows us to use the following functions to run our shellcode, as all arguments are 4 bytes or shorter.
Functions with 4-byte or shorter arguments:
This macro functionality exists in 64-bit Excel, but if you try to implement a shellcode runner using the same approach, you will quickly encounter a problem. The pointer size for a 64-bit application is, unsurprisingly, 64-bits. The available data types remain the same, which means there is no native 8 byte integer type. Using one of the floating point types will use the XMM registers, which means the function will expect the arguments to be in rcx, rdx, r8, r9 and others, according to the x64 calling convention.
However, the string data types, which are passed by reference, still seem to work. The macro system knows how to handle at least some 8 byte pointers. That doesn't directly help, as we can't precisely supply and receive 8-byte values.
This problem disappears when our pointers are less than 0x0000001'00000000, as they will be representable using only 4 bytes. This is true for at least for the first 4 arguments of the function, which are passed through registers, not the stack.
When entering the register, these arguments will be zero-extended, and 0x50000000 will simply become 0x00000000'50000000. The higher bits will be discarded when used as a 32-bit value.
Because of this, we can use the lpAddress parameter of VirtualAlloc to specify that our memory must be allocated at a specific address in the 0x00000000-0xFFFFFFFF range, which we can supply via our available data-types. For the sake of the proof of concept, we chose 0x50000000 (1342177280) as our candidate address and attempted to run VirtualAlloc via 64-bit Excel.
Running VirtualAlloc via 64-bit Excel.
Fortunately, this succeeded, as the return value (a pointer to our newly allocated buffer) is the same as what we have requested in the lpAddress parameter. Great news!
If the memory isn’t free, it may be unable to allocate at our specific address. This can be because of ASLR and other factors. If so, we will simply try another address representable by 32-bits.
Calling WriteProcessMemory using the same methodology above immediately crashes the process. The stack gets corrupted and we receive an access violation when the function tries to use one of the stack-based parameters.
The 64-bit version of the function that arranges the parameters for the CALL import doesn’t handle 64-bit values effectively. In fact, when using stack-based parameters, it messes with the stack of the next CALL function. We circumvent this by switching out WriteProcessMemory for memset, which uses only three arguments supplied through registers and ignores our stack corruption. A call to CreateThread will start running our shellcode.
The call to CreateThread to begin execution of the shellcode.
A concern with the current remote shellcode injection technique is performance. Writing a payload to a remote machine byte by byte is a rather slow process, and may take a bit of time considering the overhead of the DCOM protocol.
A possible solution to writing multiple bytes at a time is to use the Kernel32!RtlCopyMemory function, which is basically a wrapper for memcpy with only has 3 parameters (remember, WriteProcessMemory crashes 64-bit Excel).
Calling RtlCopyMemory several times with a string representing the bytes we want to write as the *Source parameter allows us to write 10 bytes at a time.
Writing 10 bytes at a time via RtlCopyMemory.
This makes us able to write shellcode into the target buffer faster. However, this technique would again crash the process before running the payload because the stack parameters for CreateThread get corrupted (specifically lpThreadId).
This leads us to believe the memset approach is simply a lucky accident that left the stack parameters intact for CreateThread to work properly.
There are two possible ways to solve this problem:
In this research, we chose to pursue the second option. With the first option, the CALL functionality and argument handling seemed rather difficult to reverse engineer and there was no immediate promise of an elegant solution.
By weighing a couple of different execution primitives and failing to find a simple exported function to create a new thread with up to 4 parameters, we decided to use the APC mechanism to manipulate the execution of the process.
Queuing an APC (Asynchronous Procedure Call) to a thread will make the thread execute caller provide code in the context of that thread as soon as it enters an alertable state. The QueueUserAPC, used for this purpose, only needs three arguments and thus will not look for parameters on the stack. We use this function to queue an APC containing the address of our shellcode to the current thread. The current thread is the thread handling our CALL macro.
Using QueueUserAPC to give context to the thread as it enters an alertable state.
A quick sanity check shows that within a single instance of Excel, the thread responsible for handling our macros is the same thread each time.
A check to confirm that the thread that handles the macro is the same each time.
We can use a function like NtTestAlert to flush and execute the current thread's APC queue and target the correct thread to execute our shellcode.
Using NtTestAlert to target the correct thread and execute our shellcode.
Both the original 32-bit technique and this 64-bit technique are based on DCOM lateral movement. To learn how to deal with this and similar techniques, read our previous article describing other DCOM-based attacks. DCOM access to applications such as Excel should be prohibited by policy and strictly whitelisted as needed, since denying DCOM access to these objects (via dcomcnfg, for example) will most likely not result in unintended consequences.
We find this attack interesting because it presents a new approach to using malicious macros outside of VBA code in a 64-bit environment. Through this attack, we enable the attacker to execute 64-bit shellcode via Excel 4.0 macros. Attackers can use this to gain access to the machine, exfiltrate data, perform lateral movement, and more.
As mentioned previously, while this type of attack is not as common as malicious VBA code, it is effective for two reasons: it can be difficult to analyze and many antivirus solutions struggle to detect it, and it is intriguing because even though the Excel 4.0 macros are fairly old, they are still supported in the most recent versions of Microsoft Office.
Philip Tsukerman is a researcher Cybereason Innovation Labs.