BinLex a Genetic Binary Trait Lexer Library and Utility
The purpose of BinLex is to extract basic blocks and functions as traits from binaries.
Most projects attempting this use Python to generate traits, but it's slow. When working with a lot of malware binaries, it is much better to use a faster compiled language like C++.
sudo apt install -y git libcapstone-dev cmake make git clone https://github.com/c3rb3ru5d3d53c/binlex.git cd binlex/ mkdir -p build/ cd build/ && cmake -S ../ -B . && make -j 4 sudo make install cd ../ binlex -m elf:x86 -i tests/elf/elf.x86
- ZIP files in the
tests/directory can be extracted using the password
binlex v1.0.0 - A Binary Genetic Traits Lexer -i --input input file or directory (required) -m --mode set mode (required) -lm --list-modes list modes -h --help display help -t --threads threads -o --output output file or directory (optional) -v --version display version Author: @c3rb3ru5d3d53c
Currently Supported Modes
raw formats can be used on shellcode
General Usage Information
Binlex is designed to do one thing and one thing only, extract genetic traits from executable code in files. This means it is up to you "the researcher" / "the data scientist" to determine which traits are good and which traits are bad. To accomplish this, you need to use your own fitness function. I encourage you to read about genetic programming to gain a better understanding of this in practice. Perhaps watching this introductory video will help your understanding.
Again, it's up to you to implement your own algorithms for detection based on the genetic traits you extract.
Traits will contain binary code represented in hexadecimal form and will use
?? as wild cards for memory operands or other operands subject to change.
Trait files will contain a list of traits ordered by size and use the sha256 of the sample as the file name.
# Example Trait File 12 34 56 ?? ?? 11 12 13 14 15 16 17 18 ?? ?? 21 22 23 # ... More traits to follow
- Don't mix packed and unpacked malware or you will taint your dataset (seen this in academics all the time)
- Verify the samples you are collecting into a group using skilled analysts
- These traits are best used with a hybrid approach (supervised)
Example Fitness Model
Traits will be compared amongst their common malware family, any traits not common to all samples will be discarded.
Once completed, all remaining traits will be compared to traits from a goodware set, any traits that match the goodware set will be discarded.
To further differ the traits from other malware families, the remaining population will be compared to other malware families, any that match will be discarded.
The remaining population of traits will be unique to the malware family tested and not legitimate binaries or other malware families.
This fitness model allows for accurate classification of the tested malware family.