[QuickNote] Technical Analysis of recent Pikabot Core Module

1. Overview

m4n0w4r
m4n0w4r

--

In early February 2023, cybersecurity experts on Twitter issued a warning about a new malware variant/family being distributed by the #TA577 botnet (associated with the same group from #Qakbot). This malware shares similarities with the Qakbot Trojan, including distribution methods, campaigns, and behaviors. It was quickly nicknamed Pikabot.

Pikabot consists of two components: loader/injector and core module. It utilizes loader/injector to decrypt and inject the core module. Core module then performs malicious behaviors, including gathering information about the victim machine, connecting to command and control server to receive and execute arbitrary commands, downloading and injecting other malware.

Pikabot is continuously upgraded, employing various anti-analysis techniques and different obfuscation methods to make it difficult for analysts to understand its behavior. In the next section of this article, I will focus on analyzing the Pikabot core module, including:

  • How Pikabot obfuscates and decrypts strings.
  • How Pikabot retrieves API addresses.
  • How Pikabot slows down the analysis process.
  • How Pikabot generates victim uuid.
  • Collecting information from the victim’s machine.
  • How Pikabot decrypts C2 addresses.
  • How Pikabot utilizes Syscall.

Sample hash: ce742b7cc94a5c668116d343b6a9677523dc13b358294bba3cd248fba8b880da

2. Decrypt string

In some older versions, to decode strings, Pikabot utilizes a XOR loop to decode encrypted data stored on the stack:

In recent versions of Pikabot, the process of decrypting strings has become more sophisticated.

  • RC4 is used to decrypt encrypted data stored on stack. Each encrypted data has a corresponding RC4 key.
  • The RC4-decrypted string will be converted to a valid Base64 string (by replacing the character ‘_’ with ‘=’) and then decoded using Base64.
  • Finally, AES-CBC will be used to decrypt the decoded data to return the original string.

AES Key and AES IV used in this sample are also decrypted using RC4:

  • Decrypted AES Key: “dVOEz=/e/Xf=0WMiz6uR9cZKe+tyb+VJhSu+tfi0HzT2COoz25r4+8osEx4"
  • Decrypted AES IV: “nsdA1ANUAH+K1XhVjnsg92tGMNQG=fsgrqJQ8AtZIacqaYg"

However, Pikabot only uses 32 bytes from the decrypted AES Key and 16 bytes from the decrypted AES IV. Therefore, the final AES Key and IV used for string decryption are:

  • AES Key: “dVOEz=/e/Xf=0WMiz6uR9cZKe+tyb+VJ
  • AES IV: “nsdA1ANUAH+K1XhV

The entire process was simulated using CyberChef as follows:

Here is the CyberChef recipe:
https://gchq.github.io/CyberChef/#recipe=RC4(%7B'option':'Latin1','string':'currentContextId'%7D,'Hex','Latin1')Find_/_Replace(%7B'option':'Simple%20string','string':'_'%7D,'%3D',true,false,true,false)From_Base64('A-Za-z0-9%2B/%3D',true,false)To_Hex('Space',0)AES_Decrypt(%7B'option':'Latin1','string':'dVOEz%3D/e/Xf%3D0WMiz6uR9cZKe%2Btyb%2BVJ'%7D,%7B'option':'Latin1','string':'nsdA1ANUAH%2BK1XhV'%7D,'CBC','Hex','Raw',%7B'option':'Hex','string':''%7D,%7B'option':'Hex','string':''%7D)&input=NjAgOEUgRkUgMUIgQTYgNTkgRUUgNUEgIDA4IDkzIDc2IEY0IEEyIDVEIDFDIDI2IDc1IEQyIDMwIEFBIEM2IDM4IDdEIEEz

3. Retrieve API address

To get the address of API functions, Pikabot does the following:

  • It gets the base address of the corresponding Dll based on the decrypted input string.
  • Decrypts the API function name, then uses GetProcAddress to optain the real address of the API.

The function pkb_load_dll_based_on_input_str (0x41E657) has the following code graph:

In this function, Pikabot decrypts relevant strings and compares them to the string passed to the function. If the strings match, Pikabot decrypts the name of the corresponding DLL and loads it using LoadLibraryA. Firstly, Pikabot finds the addresses of the GetProcAddress and LoadLibraryA functions using pre-calculated hash values.

The pseudo-code for calculating the hash of API functions is as follows:

Based on the pseudo-code above, we can rewrite it in Python and perform a brute-force to find the API function name corresponding to the pre-calculated hash values:

With the API function addresses obtained above, Pikabot will load the corresponding DLL:

Here is the list of DLLs that Pikabot will load during execution:

The function pkb_get_api_addr_by_name_using_GetProcAddress (0x41E636) will decrypt the API function name and call GetProcAddress to retrieve the function address:

4. Slowing down the analysis process

In order to slow down the code analysis, Pikabot inserts a large number of meaningless junk functions into the execution flow. These functions typically do nothing. This can make it much more time-consuming for analysts to understand the code and identify its malicious behavior.

5. System language check

Pikabot checks the system language code of the victim’s machine before executing its main task by using API function GetUserDefaultLangID. In the previous version, if the result returned a region code for a country such as Russia or Ukraine, the malware would immediately exit without any further activity.

However, in the version I am analyzing, Pikabot simply checks the return code if it is different from 0x1, the function pkb_check_default_lang (0x0042F7A0) will return 0x0:

6. Create Mutex

When the result of the function pkb_check_default_lang (0x42F7A0) return 0x0, Pikabot will continue executing, with the sample I am analyzing it uses the hardcoded mutex name (after decrypting): “{F0B9756B-5D50-4696-A969-4C9AF7B69188}” to prevent reinfection on the victim’s machine.

7. Create victim uuid

After creating the Mutex as described above, Pikabot creates the victim uuid using the function pkb_collect_victim_info_n_gen_victim_uuid (0x42E233). The graph code for this function is as follows:

The string is generated based on the information collected from the victim machine, including:

  • Volume serial number by using API function GetVolumeInformationW. This is a unique identifier assigned to each physical volume on a computer.
  • computer name by using API function GetComputerNameW. This is the name of the computer that the malware is running on.
  • user name by using API function GetUserNameW. This is the name of the user who is currently logged on to the computer.
  • OS product type by using API function GetProductInfo.

The information collected above will be formatted as follows: “<computer_name>\<user_name>|<os_type>“. This information will then be hashed using the algorithm mentioned in 3. Retrieve API address with the hash value will be initialized to the value of VolumeSerialNumber.

The hash value calculated for the collected information along with the VolumeSerialNumber will be futher calculate by using function pkb_calc_hash_2 (0x42E123) below:

Finally, use the API function wsprintfW to format the uuid string in the format %07lX%09lX%lu:

8. Collecting victim machine information

Before connecting to the C2 server, Pikabot will collect some information about the victim machine. The function pkb_collect_victim_system_info (0x410E37) performs the following collection tasks:

  • Retrieves the PEB, gather operating system information, including (OSMajorVersion, OSMinorVersion, OSBuildNumber), determines whether it is running on a 64-bit operating system or not through the API function IsWow64Process.
  • Collects the operating system type by using the GetProductInfo.
  • Gathers the computer name and username by calling the GetComputerNameW and GetUserNameW.
  • Collects CPU information by employing cpuid with the initial value of EAX = 0x80000000.
  • Obtains information about display devices on the machine through the API EnumDisplayDevicesW.
  • Retrieves the RAM capacity of the victim’s machine using GlobalMemoryStatusEx.
  • Gets the system uptime by utillizing the API funciton GetTickCount.
  • Checks if its process is running in admin privileges or not through the GetCurrentProcess, OpenProcessToken, GetTokenInformation.
  • Retrieves information about screen resolution using the GetDesktopWindow and GetWindowRect.
  • Collects the domain name using the API GetComputerNameExW with NameType is ComputerNameDnsDomain.
  • Gathers DomainControllerName, DomainControllerAddress using DsGetDcNameW. If no information is available, Pikabot will assign it as “unknown”.

Next, Pikabot decrypts information related to pikabot version and stream, my sample has respectively info “1.1.17-ghost” and “GG13TH@T@f0adda360d2b4ccda11468e026526576“. Then, the information about the victim collected above will be constructed into a JSON string with the following format:

{
"Xtt2VRnA": "%s",
"qleNiC": "%s",
"LPLLXuTl2": " Win %d.%d %d ",
"0RbIhQuDq": %s,
"6bw35n": "%s",
"FQkA0G": "%s",
"bFFqxURzx": "%s",
"a0xIcXZI": %d,
"LkLMKwP1": "%s",
"R8N3ujt": %d,
"2sIw0rUG": "%s",
"UTrXReY": "%s",
"YoViBQC": "%s",
"QeMM8": "%s",
"VLsFyV4d": "%s",
"EcZbr": %d,
"XKb5WP": %d
}

All information after being formatted into a JSON string will be encrypted. The encryption process is as follows:

  • Call the function pkb_gen_random_chars(0x41BC4A) to generate the session key: aes_key (32 bytes) and aes_iv (16 bytes).
  • Call the function pkb_gen_random_chars(0x41BC4A) for generating 3 random characters, which was used as a marker. I will temporarily call it marker.
  • Call the function pkb_aes_crypt_data (0x40A97A) to encrypt the JSON string with the generated aes_key and iv.
  • Call the function pkb_base64_encode (0x0040B4DD) to encode the encrypted data above.
  • Then all information will be stored in the following format: <marker (rand_3_chars)><aes_key (first 16 bytes)><aes_iv><encoded data><aes_key (last 16 bytes)>.
  • Finally, use a loop to iterate through the entire buffer to replace the character ‘=’ with ‘_’.

Here is the code flow:

9. Information gathering with other commands

In addition to the information collected as mentioned above, Pikabot also executes the following commands to gather additional information from the victim’s machine:

  • netstat.exe –aon
  • ipconfig.exe /all
  • whoami.exe /all

The results of these commands are also encrypted and stored in the same way as above. However, the sample that I am analyzing is configured as DISABLED.

10. Collect running processes

Pikabot call the function pkb_enum_n_collect_all_running_processes (0x415BAF) to gather information about running processes on the victim’s machine by employing the API functions CreateToolhel32Snashot, Process32FirstW Process32NextW. The graph code of this function is as follows:

The information collected will be compiled in the following format:

Then, the information will also be encrypted and encoded in the same way as described above:

11. Decrypt C2 configuration

The C2 addresses (IP and port) will be decrypted by Pikabot during execution. First, Pikabot performs the decryption of C2 encrypted data using RC4, with the decryption key in this sample being “threadId”:

Here is the result with CyberChef:

Then, Pikabot decrypts the character “&” and uses it as delimiter to extract the decrypted string above into sub base64 strings:

Result of the above process when debugged with x32dbg:

Next, Pikabot calls function pkb_decrypt_data (0x41D07B) to perform the task of decrypting the C2 address. The graph code of this function is as follows:

The entire decrypting process is as follows:

  • Allocate buffers to store the AES key and iv.
  • Convert the string to the valid Base64 string by replacing the character _ with =.
  • Discard first 3 characters of string, take the next 16 characters (bytes) and store them to the buffer to create the first part of the AES key.
  • Take the next 16 characters (bytes) and store them to the buffer to use as AES iv.
  • Take the last 16 characters (bytes) to make the second part of the AES key, combine it with the first part to create the complete AES key.
  • Get the string to be decoded after obtaining the AES key and iv.
  • Perform Base64 decode.
  • Use AES-CBC with AES key and iv above to decrypt the final C2 data.

Pseudocode of the entire process is as follows:

Using CyberChef, we get the following results:

We can write a Python script to decrypt all the C2 addresses that Pikabot will use:

12. Pikabot uses Syscall

During the analysis, we will encounter the following functions:

The above function will perform the following tasks:

Iterate over the PEB, check if the loaded dll is ntdll.dll

If yes, proceed to find API functions starting with “Zw” exported by ntdll.dll.

The found functions will be hashed, and the result will be stored in the format: <calced_hash><api_func_RVA>

The calculated table will be then sorted by Function RVA in ascending order:

Finally, compare the pre-calculated hash value with the table containing the calculated hash values above, if equal, return the function ID. This ID value is stored in the EAX register:

Based on the hash algorithm, we can find out the API functions that Pikabot will use as follows:

13. References

End.

m4n0w4r

--

--