Release v0.8 - Warfare Tactics

Brute Ratel v0.8.0 (Warfare Tactics) is now available for download and provides a major update towards in-memory and network evasion features. This release bring plethora of new capabilities which provide a gateway for in-memory evasion features like self-debugging, unhooking syscalls and hooking your own payload for monitoring via Process Instrumentation. I have listed the technical details of the release below, however a detailed list on the features and bug fixes can be found in the release notes.

Feature Additions

Full Transition to Syscalls

The main feature of this release is full transition of the Badger NTAPIs to syscall. In the previous release (0.7), several API calls were replaced with NTAPI calls, but they were not nearly enough. Several EDRs like Sentinal One or Cylance are known to hook the NTAPIs by adding jump instructions on them (before the ‘move rcx to r10’ of the NTAPI or sometimes even overwriting the syscalls) to detect them and some even implement Process Instrumentation Callbacks. My inspiration initially came from this old blog of 2013 and a few other blogs from https[:]//unknowncheats[.]me. However the techniques discussed were limited to avoiding process instrumentation or adding your own hooks in the userland. One of the other videos from 2015 by Alex Ionescu provided a much clear idea of how the hooking in userland work and why it is important to have them. In order to understand how this works, we will have to go back a few years to where it all started.

In the earlier days (Windows Vista much?), before microsoft introduced Patchguard, alternatively known as KPP (Kernel Patch Protection), several vendors built their filter drivers which patched the kernel in memory and hooked the WinAPIs via callbacks. Some of the vendors started misusing this feature and the vendor drivers ended up becoming rootkits. A very famous example of this is the Sony BMG copy protection rootkit scandal back from 2005. Considering the events, Microsoft decided to implement Patchguard which would usually put an end to all of this while at the same time not providing too much limitation to legitimate vendors who built their drivers. Patchguard basically implemented the core features mentioned below among several others:

  • Detection and blocking of Modification to Kernel patching
  • Detection and blocking of modification to Interrupt Descriptor Table
  • Detection and blocking of modification to Global Descriptor Table
  • Detection and blocking modification to SSDT (System Service Descriptor Tables) hooks
  • Protection to ntoskrnl, Hardware Abstraction Layer and more.

All of these basically meant that a driver cannot hook everything in the kernel space anymore due to the complications created by Patchguard. Some of the known callbacks which are still accessible and used by a several vendors are the ones mentioned below:

CmRegisterCallbackEx IoRegisterFsRegistrationChangeEx IoSetCompletionRoutineEx
FsRtlRegisterFileSystemFilterCallbacks IoRegisterFsRegistrationChangeMountAware IoWMISetNotificationCallback
IoRegisterBootDriverCallback IoRegisterPlugPlayNotification KeInitializeApc
KeInitializeDpc PsCreateSystemThread PsSetLoadImageNotifyRoutine
KeRegisterNmiCallback PsSetCreateProcessNotifyRoutineEx SeRegisterLogonSessionTerminatedRoutine
ObRegisterCallbacks PsSetCreateThreadNotifyRoutine TmEnableCallbacks

NOTE: The above callbacks were indentified after debugging several EDR drivers. I haven’t seen a single EDR implementing all of the above callbacks.

The most famous callback among the EDR vendors is the ObRegisterCallbacks. However, there are no callbacks which implement direct hooking of virtual memory operations. Because to detect something like NtReadVirtualMemory from a kernel mode, a driver would have to modify the callstack of this NTAPI, but the KPP or the Patchguard would restrict that from happening eventually leading to the infamous BSOD. This meant the only way to implement a successful hook would be in userland with a Windows service, which is what the EDRs are currently doing. Now there are two types of hooks which an EDR would implement here. The first and the foremost being a trampoline hook via a DLL loaded into every process (almost every process) on the host where the EDRs would simply add a jump instruction right before the syscall stub. This would look something like this:

Original Stub:

NtAllocateVirtualMemory:
    mov r10, rcx
    mov rax, [0xSyscallByte]
    syscall
    ret

Hooked Syscall Stub:

NtAllocateVirtualMemory:
    jmp 0xEDRDllTrampolineAddress
    ; ... some garbage code due to the fact of overwriting the syscall stub

This means calling the NTAPI would directly land on the jump instruction which will route your code to the EDR’s DLL which was loaded in your process. Once your code has landed in the EDR’s DLL, the DLL would check the callstack to validate if it’s a direct NTAPI call or run some yara rules against the memory space depending on which NTAPI you are calling (QAPC/NTCreateThreadEx) and then exit the process after triggering an alert. If the EDR identifies that the call is legitimate from a legitimate place, it would then proceed to execute the original syscall stub stored in the RX region of the DLL’s .text section (It’s interesting what you can do by just knowing this… more on this later).

However, these hooks were not really hard to circumvent. One can easily perform a manual syscall by storing the syscall stub in their own process and invoke the syscall which would bypass all of the above. But there is another limitation with using syscalls. Syscalls change from version to version of Windows OS and it would be a pain to keep a track of all the syscalls. An Operator would have to know all the different versions of syscalls and store all of them inside the .text section of their payload. The payload would have to filter out which syscall stub to execute after enumerating the windows version using RtlGetVersion. I was reversing a ransomware lately, where by just hooking and modifying the output of RtlGetVersion, I was able to crash the payload of the ransomware since it ended up executing an invalid syscall as the version returned by my hook was different from what the ransomware was expecting. This goes to show that hardcoded syscall stubs may or may not work depending on the scenario. But during an Op, we cannot rely on something that might crash our Badger. We need to be one hundred percent sure that we are going to execute a valid syscall, something that cannot be hooked as easily as above. Another technique to bypass the EDR trampoline hook is to directly read the ntdll.dll from disk and repatch the syscall to the one loaded in memory. However, this means using the NtCreateFile API call to get a HANDLE of the ntdll.dll on disk, but what happens when NtCreateFile itself is hooked? This means an EDR can simply check the DLL’s property which is being read from disk and then trigger an alert if the file being read is ntdll.dll, ‘cause which normal software in their right mind would perform file operations on ntdll.dll right? Even if you somehow find a way to circumvent this, there is another problem we have to eventually face which is the second technique of hooking in userland.

This is where Process Instrumentation comes into picture. Userland hooks were not necessarily meant for ELAM drivers to begin with. They were more often used by large gaming companies like Ubisoft, Sony and Capcom to introduce anti-cheats and detections of hacks in online games. This technique which only a handlful of the EDRs have started implementing in userland was discussed in detail by Alex Ionescu’s in 2015. In order to achieve this, you can use the undocumented NTAPI NtSetInformationProcess to either add your own hook or unhook an existing process instrumentation. This can be done using the code below.

#include "etwti-hook.h"
#include <stdio.h>
#include <windows.h>
#include <winternl.h>
#include <intrin.h>

extern void hookedCallback();

#define ProcessInstrumentationCallback 0x28
typedef struct _PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION
{
    ULONG Version;
    ULONG Reserved;
    PVOID Callback;
} PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION, *PPROCESS_INSTRUMENTATION_CALLBACK_INFORMATION;

VOID HuntSyscall(ULONG_PTR ReturnAddress, ULONG_PTR retSyscallPtr) {
    PVOID ImageBase = ((EtwPPEB)(((_EtwPTEB)(NtCurrentTeb()->ProcessEnvironmentBlock))))->ImageBaseAddress;
    PIMAGE_NT_HEADERS NtHeaders = RtlImageNtHeader(ImageBase);
    if (ReturnAddress >= (ULONG_PTR)ImageBase && ReturnAddress < (ULONG_PTR)ImageBase + NtHeaders->OptionalHeader.SizeOfImage) {
        printf("[+] Syscall detected:  Return address: 0x%X  Syscall value: 0x%X\n", ReturnAddress, retSyscallPtr);
    }
}

BOOL EtwTiMod() {
    PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION InstrumentationCallbackInfo;
    InstrumentationCallbackInfo.Version  = 0;
    InstrumentationCallbackInfo.Reserved = 0;
    InstrumentationCallbackInfo.Callback = hookedCallback;
    NTSTATUS Status = NtSetInformationProcess((HANDLE) -1, ProcessInstrumentationCallback, &InstrumentationCallbackInfo, sizeof(InstrumentationCallbackInfo));
    if (NT_SUCCESS(Status)) {
        printf("Callback added\n");
        return TRUE;
    }
    printf("Failed : %lx\n", Status);
    return FALSE;
}

int main() {
    EtwTiMod();
    OBJECT_ATTRIBUTES objAttr;
    InitializeObjectAttributes(&objAttr, NULL, 0, NULL, NULL);
    CLIENT_ID cID;
    cID.UniqueProcess = (HANDLE) ULongToHandle(2084);
    cID.UniqueThread = 0;
    HANDLE hProcess = INVALID_HANDLE_VALUE;
    NtOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &objAttr, &cID);
    return 0;
}

The full code can be found in my git repository. The above code implements a hook(hookedCallback) which intercepts all syscalls performed in the target process irrespective of whether it was done by invoking manual syscall, direct/indirect syscall or by calling the NTAPI.

section .text

extern HuntSyscall
global hookedCallback

hookedCallback:
    push rcx
    push rdx
    mov rdx, [r10-0x10]
    call HuntSyscall
    pop rdx
    pop rcx
    ret

So, even if you call the syscall from your own process instead of NTAPI, the callback will be performed to the hookedCallback function. Whenever this callback is performed, the return address of the syscall is saved into the r10 register. The below screenshot shows that the r10 register is at address 00007FFCFE8BD364, and our syscall value is at offset -0x10 from this address which is 00007FFCFE8BD354 (not to be confused with 00007FFCFE8BD353 which is the mov instruction).

By running the above code, we can conclude that we have successfully implemented syscall hooks which is now showing that NtOpenProcess was called from the process. You can also see the return address which is basically a part of the .text section of my etwti-hook.exe instead of a valid ntdll.dll’s return address. By performing a calltrace like this, it becomes easy for an EDR to hook syscalls in the user mode and indentify that the syscall ir originating for a suspicious region.

So in short, we are basically extracting the syscall number here. In the case of an EDR, the EDR’s dll will have this callback configured which intercepts all your syscalls and then perform actions accordingly. So now, we are left with two different types of hooks which the EDR performs, the primary one being the trampoline hook and the second one being the Process Instrumentation. Now let’s try to bypass these one by one.

Disabling Process Instrumentation Callback Monitoring

Since we are in full control of your own process, it simply means we should be able to configure the callback for our own PI monitoring. This means, by configuring the InstrumentationCallbackInfo.Callback to NULL, we should be able to disable the PI monitoring or by configuring it to our own function, we should be able to add our own hook to the process. You can configure a hook as shown in the figure above. So our primary problem is solved here. Now you can start performing manual syscalls which were embedded in your own process. However, we are still left out with the problem of dynamically finding the syscall instead of using a hardcoded stub.

Trampoline Hook Evasion

Once we have disabled the PI callback, our next task is to find the valid syscall first. This means two things:

  1. Find whether our NTAPI in ntdll.dll is hooked
  2. Find the legitimate syscall by walking the EAT of ntdll.dll

Now the second part is the easy one. You can start parsing the Image headers and the Export Address Table to reach the actual NTAPI call ordinal and then read the syscall values of your function. This is something similar to what I wrote in the PIC blog of 0.2 release of BRc4 where I walked through the EAT to find the function pointers of the WINAPI calls instead of using GetProcAddress, the source code of which can be found here. But to reach this part, we first have to make sure we are not reading the jump instructions. We can do this by simply parsing the EAT of our existing ntdll.dll loaded into our process and comparing the values at the exported function pointer with the values 'mov r10, rcx' (opcode: \x49\x89\xCA) which is always the first bytecode for any syscall execution. If you find any instruction which starts with the opcode 0xE9, it simply means theres a jump instruction in place which will land you in the DLL’s address space of the EDR.

Now let’s assume that we have identified that there is maybe a jump instruction in place, this mostly means the next set of instructions after the jump instructions are mostly corrupted due to the trampoline hook being written. But here is where the fun actually begins. If there is no syscall on the actual NTAPI function pointer due to the syscall being overwritten by the hook, then there needs to be a syscall stub somewhere in memory of our process in a RX region, which needs to be executed when a legitimate application makes the API execution right? And guess whats the best place to store the original syscall stub? The RX region of the DLL! (remember the interesting part earlier? ;). We can perform this part of the process by simply configuring a VEH (Vectored Exception Handler) with the EFlags configured to STATUS_SINGLE_STEP exception. Once this is configured we can just read the page section of the DLL where the jump instruction is being routed and find where the legitimate syscall stub is being stored. I will leave this part to the readers whoever wants to implement this, but for BRc4 users, this comes as an internal feature of the Badger.

And just to note, all of the Badger features which use any type of memory allocation or execution like object files (coffexec), Badger service, executable, DLL and all the reflective DLL modules (except mimikatz) use dynamically generated syscalls and PI patching now. Badger is also by default configured to disable ‘Process Instrumentation Callback Monitoring’ whether it’s being injected to a new process or to an existing process. This also introduces a few more process injection and memory allocation techniques into the Badger. One more thing worth noting is that these syscalls are purely obfuscated and dynamic in nature. This means that there cannot be any type of static detection towards it and the Badger changes the instruction set randomly with some math, everytime they get executed.

NOTE: Remember that some of the API calls which are monitored with callbacks in the kernel mode registered via ObRegisterCallbacks can still be detected and there is nothing one can do about it. Only detections in the userland and the PI Callbacks can be evaded with the above technique.

Advanced Malleable Profiles

Until the previous release, Brute Ratel did not have any option to change the post request of the Badger. Even though the whole post body being sent by the Badger is custom encrypted, there could still be a possibility of the network indicator where this could’ve been detected. However, with 0.8 release, this has changed. An operator can configure how the post request of the Badger looks like either during the creation of the listener, the payload profile or even from the command line interface.

To add a malleable profile from the GUI, create a new Listener or a Payload Profile and select the ‘+ Malleable Post data’ option.

You can prepend and append any strings as to how your data should look like on the network. You can embed the Badger’s actual response into json, xml or anything that you wish.

And once you have added it, the listener will provide a quick look as to how your post data would look like in the network.

The malleable profiles can be added, removed and changed on the fly for an existing or a new Badger directly from the GUI or the command-line. Below is an example of two listeners running on two different ports, one with the json malleable profile and the other with xml.

Once the profiles are added to the server, you can change the profile of the Badger on the fly by right clicking the Badger and selecting ‘Switch Profile’ Option, or you can also load a json file as shown in the screen shot above directly from disk. This will NOT spawn a new Badger on the host, but will show up under a new Badger ID on the server since the C2 authentication also needs to be changed as per the profile.

Timeloop

One of the features that I required heavily during my operations was to have an option to run a Badger’s command for an x number of times on the host. For example, let’s say you compromised a jump server and gained high integrity privilege but there is no user logged in on the host. Now you might want to run shadowcloak to dump the credentials, but it’s useless unless a user logs in and caches their password. So, in this case you can run the timeloop command. The timeloop command accepts three or more arguments. The first argument is the number of times you want to run the command, the second argument is the interval under which you want to run the command and the third argument is the actual command to run which can have its own set of arguments. For example, the below command executes the C-sharp code InternalMonologue within the Badger’s process every 10 seconds for 6 times.

The timeloop command can also be run during high sleep intervals because it doesn’t need to connect to the server to run the command. For example, you can assign a timeloop command to the Badger during night and let it checkin. Once the Badger checks in, you can let it sleep for the next 8-9 hours. While the Badger is sleeping, it will run the timeloop command as per the interval and counter provided and cache the output in memory without connecting to the server, all while sleeping in an encrypted Read-Write region in memory. Once it checks in, post the 8 hour interval, the whole output will be returned back to the server. The timeloop command supports almost all commands of Brute Ratel.

Webhook

Webhook in Brute Ratel is a method of altering the behavior of a Brute Ratel listener with custom callbacks. These callbacks may be maintained, modified, and managed by operators of the BRc4. The BRc4 listeners support webhooks for all types of Badger comms. This can be enabled by right clicking a listener and selecting the ‘Webhook’ option.

Upon enabling a webhook, you will be asked to enter the https/http, host and the port as to where the Badger output should callback to. You can also specify whether to notify the operator via the callback, for only the initial Badger connections as notifications or the full command output of the Badgers.

The below figure shows how the callback for the initial connection and the output of commands look like post webhook is enabled. In the below figure, I enabled webhook for localhost on port 10443. So the listener will forward all the information on initial connection for all the Badgers and their command output to the respective webhook server. One thing worth noting is that the command output will be encoded in base64 inside json.

Enhancements

There aren’t a lot of enhancements to this release except the ones mentioned below

  • The socksbridge supports adding headers for domain fronting and redirectors now.
  • The parameters supplied to the payloads such as exe, service, dll or shellcodes either for initial access or during remote injections are now RC4 encrypted
  • All Sleep functions are now replaced with syscalls to avoid detections during sleep

Several other minor tweaks and changes were made to the server and the Badger which can be found in the release notes mentioned above.