Each code screenshot has a link in the caption taking you to the relevant lines in the source code on Github.
Summary
The packer identified in a previous post employs a technique to hide version information metadata from standard parsers by changing the expected string in the szKey
member of the StringFileInfo
and VarFileInfo
structures. The problem is that by being outside of standard parsing, the information in these structures cannot be easily used in malware analysis. What follows presents a new Python library, versioninfo, which performs a less strict parsing of these structures. It takes as its input the contents of an RT_VERSION
resource and it emits either a Python dictionary or JSON. The raw bytes in certain fields are available directly in the Python dictionary, and to preserve the fidelity of the content in these same fields, they are Base64 encoded in the JSON output.
A Single Core
A careful reading of the documentation for each of the structures contained in this resource shows that there is a single header pattern in almost all of them. The header is made up of four members.
Structure length:
wLength
Value length:
wValueLength
Data type:
wType
Text string:
szKey
This pattern is repeated in each of the different container structures.
VS_VERSIONINFO
StringFileInfo
StringTable
String
VarFileInfo
Var
In addition to these containers, there are three types of value structures. These are found in the Value
member of certain of the container structures.
With these requirements in mind, I developed a single core parser that handles any of the container headers.
For malware analysis, I don’t want to drop anything. Therefore, the parser can be given input in the expected parameter so that it checks if the szKey
is as expected or not, but this check does not stop the process. In fact, the result of the check is included in the output. A failed check indicates the adversary has modified the content to hide metadata from being collected. This is Obfuscated Files or Information in both the Malware Behavior Catalog (MBC) and ATT&CK Framework.
Observations
Determining Structure Type
The top level container structure, VS_VERSIONINFO
, has two possible types of children: StringFileInfo
and VarFileInfo
. The documentation states that there can be zero or one of either.
This presents a problem for structure identification without using the content of the szKey
which can be modified by the adversary. The way to differentiate these structures is actually to examine the first child structure of the one in question. The children of StringFileInfo
structures are StringTable
structures. A StringTable
structure does not have a Value
member and its wValueLength
is always zero. On the other hand, the children of a VarFileInfo
structure are all Var
structures. The Var
structure has a Value
member and it is an array of one or more entries of type DWORD
. Therefore, the wValueLength
member is always greater than or equal to four. A simple diagnostic of checking the wValueLength
member of the immediate child of the structure in question yields the structure’s type.
A Square Peg In a Round Hole
The szKey
member of the StringTable
structure is different from all the others. It is a WCHAR
and null terminated, but the content is not just a text string. It is actually a hexadecimal string containing two WORD
values. The most significant WORD
is a language identifier, and the least significant is a code page. On top of the documented formatting, it is also big endian.
Because of this oddity, the core parser has an extra step where additional parsing occurs. The results are included in the JSON output. I have made an assumption that this member is standard, but the content in these values may be stomped.
Future Improvements
An area of improvement is in the parsing of the VS_FIXEDFILEINFO
structure. The member dwFileFlags
potentially contains a number of interesting flags such as VS_FF_DEBUG
and VS_FF_PATCHED
among many others. The former indicates that the PE was compiled with debugging enabled or contains debugging information. The latter indicates that the PE has been modified from the original file shipped with the same version number. Obviously, these may have no relationship to the actual content of the PE file when dealing with malware. However, these flags could be set in an identifying or unique combination. Similarly, the members dwFileOS
, dwFileType
, and dwFileSubtype
each have a similarly rich set of potentially interesting flags. A more fine grained parsing of these flag fields is an area to improve this library.