Non-Standard Version Information Resource
If you have not read the analysis of Packer pkr_ce1a, please subscribe and check that out. The following research update is based on that work.
Note: based on help from @washi_dev this post has been updated. Parsers are correct in not returning data from non-standard structures.
Summary
The packer pkr_ce1a
from visual inspection in a hex editor has a version information resource. The VS_VERSIONINFO
structure in this resource, however, is missing from the output of Exiftool and many other online and offline malware analysis tools except for one: pefile
. The reason is parsers are correctly checking a struct member, szKey
, for the string StringFileInfo
. In the parsing that pefile
performs, the szKey
member is checked to see if it starts with the string and if so, it is included in its output. For malware analysis purposes, I propose parsing the StringFileInfo
as well as the VarFileInfo
structures regardless of the contents of szKey
. When szKey
is changed, how it is changed should be noted. And finally, any data collected in these ways should be tagged as non-standard.
Looking for the Problem
Visually inspecting the RT_VERSION
resource does not reveal any clear indications of data corruption. However, the output from Exiftool is missing most of the fields that can be seen in the structure.
To explore exactly why this happens, the first step is to determine if the resource is located where it is supposed to be according to the data in the PE file’s headers. The first value to check is the VirtualAddress
field in the Resource
entry in the IMAGE_DATA_DIRECTORY
.
This address is correct, so the next address to check is in the resource directory entry itself. The DataRVA
field is the relative virtual address (RVA) of the RT_VERSION
resource in the resource directory.
Since this value is an RVA, there is one more value that is needed to convert the address to an offset in the file. The offset can then be examined to see if the version information structure is in the expected location. The names of the sections in this particular sample are standard, so the resource section is named .rsrc
. However, some adversaries change the names of the sections to non-standard. Therefore, the best way to find the correct section is pick the one with the virtual address that matches what was found in the IMAGE_DATA_DIRECTORY
earlier. The field needed from this location is the pointer to the raw data.
The offset can then be calculated using the following equation.
offset = RVA - VA + Ptr2Raw
A quick Python snippet can calculate the offset easily using the pefile
library.
import pathlib
import subprocess
import pefile
target = pathlib.Path('sample.exe')
pe = pefile.PE(target)
for entry in pe.OPTIONAL_HEADER.DATA_DIRECTORY:
if entry.name == 'IMAGE_DIRECTORY_ENTRY_RESOURCE':
res_va = entry.VirtualAddress
for section in pe.sections:
if section.VirtualAddress == res_va:
sec_va = section.VirtualAddress
sec_ptrd = section.PointerToRawData
for index, entry in enumerate(pe.DIRECTORY_ENTRY_RESOURCE.entries):
if entry.id == 16:
res_rva = entry.directory.entries[0].directory.entries[0].data.struct.OffsetToData
res_size = entry.directory.entries[0].directory.entries[0].data.struct.Size
offset = res_rva - sec_va + sec_ptrd
with open(target, 'rb') as fh:
fh.seek(offset)
data = fh.read(res_size)
with tempfile.NamedTemporaryFile() as tf:
pathlib.Path(tf.name).write_bytes(data)
process = subprocess.run(['hexdump', '-C', tf.name], capture_output=True)
print(process.stdout.decode())
The version information is located correctly, so this is not the problem. The next place to investigate is in the structure itself. Starting with the Microsoft documentation for the VS_VERSIONINFO
structure, there are no problems to be found.
typedef struct {
WORD wLength;
WORD wValueLength;
WORD wType;
WCHAR szKey;
WORD Padding1;
VS_FIXEDFILEINFO Value;
WORD Padding2;
WORD Children;
} VS_VERSIONINFO;
The next level deeper in the version information, however, is the StringFileInfo
structure. One member in this structure is named szKey
of type WCHAR
which is expected to contain the Unicode string StringFileInfo
. According to the documentation, there can be zero or one StringFileInfo
structures and zero or one VarFileInfo
structures. Using pefile
, the contents of the szKey
member in the two structs is found to be non-standard.
An algorithm for listing the version info in a Windows PE file using the pefile
library can be found in a Github gist published by @spookyahell.
'''Licensed under the MIT License :)'''
import pefile
import pprint
pe = pefile.PE('example.exe')
string_version_info = {}
for fileinfo in pe.FileInfo[0]:
if fileinfo.Key.decode() == 'StringFileInfo':
for st in fileinfo.StringTable:
for entry in st.entries.items():
string_version_info[entry[0].decode()] = entry[1].decode()
pprint.pprint(string_version_info)
The contents of the szKey
member string is being used to determine whether the struct contains version information data rather than the method of using the struct’s type or identity regardless of the contents of that particular member. Because pefile
checks using the startswith
method. This still does not catch every type of change to szKey
, only ones where additional characters have been added to the end of the string.
string_version_info = {}
for fileinfo in pe.FileInfo[0]:
if fileinfo.name == 'StringFileInfo':
try:
for st in fileinfo.StringTable:
for entry in st.entries.items():
string_version_info[entry[0].decode()] = entry[1].decode()
except AttributeError:
pass
The exception handling seen in this snippet is due to a problem with the second StringFileInfo
structure named SomeInfo
. Thanks to @washi_dev for highlighting exactly why the first struct is parsed and the second is not: the struct’s name is set based on the szKey
starting with the string StringFileInfo
or VarFileInfo
in the first place.
Recommendations
If this data is missing from the parsed version information resource in the sample file1 examined above, check how the parser is working. By collecting data from the non-standard structures as well as identifying how they are non-standard, the resulting dataset can be improved. To detect some files that are non-standard, the following YARA rule can be used. This rule can be improved to reduce type II errors, but during testing, no type I errors were identified.
rule NonStandard_StringFileInfo_Key
{
meta:
author = "Malwarology LLC"
date = "2022-11-20"
description = "Detects non-standard szKey members of StringFileInfo structure PE version info resources."
reference = "https://malwarology.substack.com/"
documentation = "https://learn.microsoft.com/en-us/windows/win32/menurc/stringfileinfo"
sharing = "TLP:CLEAR"
exemplar = "fc04e80d343f5929aea4aac77fb12485c7b07b3a3d2fc383d68912c9ad0666da"
strings:
$a = "StringFileInfo" private wide
condition:
for any resource in pe.resources : (
resource.type == 16 and
uint8(resource.offset + 126) != 0x0 and
for 1 i in (1..#a) : (
@a[i] == resource.offset + 98
)
)
}
fc04e80d343f5929aea4aac77fb12485c7b07b3a3d2fc383d68912c9ad0666da