x86-64 Assembler based on Zydis

Related tags

Miscellaneous zasm
Overview

Zasm : x86-64 Assembler based on Zydis

Why?

Some of my projects were using Zydis and AsmJit where instructions where were first decoded with Zydis and then put into AsmJit's Builder to allow processing/analysing of the instructions/branches before re-encoding/relocating the modified code, there are a couple of downsides to this approach which will be explained further down. Zydis recently introduced a way to use the same structures/data it already has to encode instructions which lead to Zasm. This library aims to be a higher level assembler/decoder which can be used for various things like the previously mentioned example.

A strong difference between Zasm and AsmJit is the focus on accurate instruction data such as operand access, hidden register use, correct cpu flags all of which can be either missing or wrong in AsmJit with some exceptions of course, AsmJit aims to a friendly way to generate code on the fly for lets say scripting or high performance computing. So Zasm is not trying to replace AsmJit in any way, it has a different goals.

The second reason for Zasm is that Zydis Encoder being extremly low level which means you don't have things like labels, Zasm provides a high level class for assembling instructions and provides labels like an ordinary assembler would.

Design

Zasm is composed of three components Program, Decoder, Assembler. While Zasm uses Zydis as the framework a lot of structures do not match that of Zydis that is primarily due to Zasm storing instructions in nodes, storing the raw ZydisDecodedInstruction would be extremly heavy on the memory usage, so in some cases I made the choice to only store what I consider relevant, storing 10'000'000 instructions currently uses about 4~ GiB memory. The instruction stores just about enough information for most analysis to work out of the box.

Program is the container that holds all the data and also serves as a doubly linked list, instructions labels and data are stored as nodes which allows the user to remove/insert/re-order quite easily. This container is also responsible for generating the final output by serializing each node in one or multiple passes, it will never take more than 3 passes at most and in most cases its just two, this is driven by how labels are used, this is something that can be improved in the future.

Decoder is a wrapper class for the Zydis Decoder. Decodes instructions from data.

Assembler is a wrapper class that has specialized overloads for all supported instructions that adds new nodes to the Program being attached to. It also allows to directly add instructions from the Decoder

Examples

Generate a basic function for x64.

using namespace zasm;
using namespace zasm::operands;

Program program(ZydisMachineMode::ZYDIS_MACHINE_MODE_LONG_64);
Assembler assembler(program);

assembler.mov(rax, Imm(0xF00B444));
assembler.ret();

// Encodes all the nodes.
program.serialize(0x00400000);

const auto codeDump = getHexDump(program.getCode(), program.getCodeSize());
std::cout << codeDump << "\n";

Feeding instructions from the decoder and re-encode.

using namespace zasm;
using namespace zasm::operands;

const uint64_t baseAddr = 0x00007FF6BC738ED4;
const std::array<uint8_t, 24> code = {
    0x40, 0x53,             // push rbx
    0x45, 0x8B, 0x18,       // mov r11d, dword ptr ds:[r8]
    0x48, 0x8B, 0xDA,       // mov rbx, rdx
    0x41, 0x83, 0xE3, 0xF8, // and r11d, 0xFFFFFFF8
    0x4C, 0x8B, 0xC9,       // mov r9, rcx
    0x41, 0xF6, 0x00, 0x04, // test byte ptr ds:[r8], 0x4
    0x4C, 0x8B, 0xD1,       // mov r10, rcx
    0x74, 0x13,             // je 0x00007FF6BC738EFF
};

Program program(ZydisMachineMode::ZYDIS_MACHINE_MODE_LONG_64);
Assembler assembler(program);

Decoder decoder(program.getMode());

size_t bytesDecoded = 0;

while (bytesDecoded < code.size())
{
    const auto curAddress = baseAddr + bytesDecoded;

    auto decoderRes = decoder.decode(code.data() + bytesDecoded, code.size() - bytesDecoded, curAddress);
    if (!decoderRes)
    {
        std::cout << "Failed to decode at " << std::hex << curAddress << ", " << decoderRes.error() << "\n";
        return;
    }

    const auto& instr = decoderRes.value();
    assembler.fromInstruction(instr);

    bytesDecoded += instr.getLength();
}

program.serialize(baseAddr);

const auto codeDump = getHexDump(program.getCode(), program.getCodeSize());
std::cout << codeDump << "\n";
Issues
  • Performance problem.

    Performance problem.

    I'm currently writing a program which will reencode nearly every instruction of a big problem(a game). So there will be millions of calls on reencode callback. (see here.)

    The problem is that if I declare the Program and Assembler as local variables, then there will be millions of object/memory allocations and deallocations which will be a big impact on speed.

    But if I declare them as global variables, I didn't see any clear methods so the generated code will be accumulated on every call.

    Is there any solutions to this?

    opened by Inori 10
  • A few touches to make zasm useable for something

    A few touches to make zasm useable for something

    Lets say I want to use this write a jit, well in that case I need to be able to pass addresses in of library routines and to get addresses out of generated routines.

    Ie, I need to be able to take the address of a label after serialization, and I need to be able to SET a constant address for some labels before serialization.

    I don't see a way to do these things. There are certainly no examples that do them.

    I made a fork to do this myself.

    opened by differentprogramming 9
  • Calculating offset of memory operand type relative to instruction

    Calculating offset of memory operand type relative to instruction

    image

    In the above example I need to know the offset of the uint (highlighted in grey) for memory operands for the purposes of fixing up rva's in a mutation engine.

    Please can you suggest a way to either extract the size of the mnemonic and I can work it out that way, or even better would be to know the offset of the bytes of the operand relative to the bytes of the full instruction.

    Thanks :)

    opened by CallumCVM 2
  • Calculating addresses of call, jmp etc.

    Calculating addresses of call, jmp etc.

    With zydis, you can call ZydisCalcAbsoluteAddressEx, but there is no way to retreive the relevant data to perform this from the Decoder.

    Can you suggest any way to do this please?

    opened by CallumCVM 2
  • Add a Gitter chat badge to README.md

    Add a Gitter chat badge to README.md

    ZehMatt/zasm now has a Chat Room on Gitter

    @ZehMatt has just created a chat room. You can visit it here: https://gitter.im/zydis-zasm/zasm.

    This pull-request adds this badge to your README.md:

    Gitter

    If my aim is a little off, please let me know.

    Happy chatting.

    PS: Click here if you would prefer not to receive automatic pull-requests from Gitter in future.

    opened by gitter-badger 0
  • Improve encoding performance

    Improve encoding performance

    From

    BM_Assembler_EmitSingle_0_Operands      0.530 us        0.544 us      1120000 Instructions=1.83795M/s
    BM_Assembler_EmitSingle_1_Operands      0.602 us        0.558 us      1120000 Instructions=1.792M/s
    BM_Assembler_EmitSingle_2_Operands      0.687 us        0.670 us      1120000 Instructions=1.49333M/s
    BM_Assembler_EmitSingle_3_Operands      0.827 us        0.837 us       896000 Instructions=1.19467M/s
    BM_Assembler_EmitAll                     8145 us         8160 us           90 Instructions=1.23791M/s
    BM_Serialization/4096                    1.85 ms         2.01 ms          264 BytesEncoded=13.4963M/s Instructions=2.03547M/s
    BM_Serialization/8192                    4.25 ms         4.19 ms          179 BytesEncoded=13.864M/s Instructions=1.95516M/s
    BM_Serialization/16384                   8.59 ms         8.96 ms           75 BytesEncoded=12.9678M/s Instructions=1.82891M/s
    BM_Serialization/32768                   17.2 ms         18.4 ms           45 BytesEncoded=12.5989M/s Instructions=1.7806M/s
    BM_Serialization/65536                   34.5 ms         34.2 ms           21 BytesEncoded=13.6657M/s Instructions=1.91479M/s
    BM_Serialization/131072                  70.0 ms         67.7 ms            9 BytesEncoded=13.8664M/s Instructions=1.93583M/s
    BM_Serialization/262144                   139 ms          138 ms            5 BytesEncoded=13.6542M/s Instructions=1.9065M/s
    BM_Serialization/524288                   280 ms          281 ms            2 BytesEncoded=13.3502M/s Instructions=1.86414M/s
    BM_Serialization/1048576                  567 ms          578 ms            1 BytesEncoded=12.9892M/s Instructions=1.81375M/s
    BM_Serialization/2097152                 1127 ms         1125 ms            1 BytesEncoded=13.35M/s Instructions=1.86414M/s
    

    to

    BM_Assembler_EmitSingle_0_Operands      0.512 us        0.502 us      1120000 Instructions=1.99111M/s
    BM_Assembler_EmitSingle_1_Operands      0.594 us        0.645 us       896000 Instructions=1.54984M/s
    BM_Assembler_EmitSingle_2_Operands      0.695 us        0.698 us       896000 Instructions=1.4336M/s
    BM_Assembler_EmitSingle_3_Operands      0.838 us        0.753 us       746667 Instructions=1.32741M/s
    BM_Assembler_EmitAll                     8295 us         8507 us           90 Instructions=1.18738M/s
    BM_Serialization/4096                    1.25 ms         1.23 ms          747 BytesEncoded=22.0069M/s Instructions=3.31901M/s
    BM_Serialization/8192                    2.72 ms         2.70 ms          249 BytesEncoded=21.5281M/s Instructions=3.03599M/s
    BM_Serialization/16384                   5.53 ms         5.47 ms          100 BytesEncoded=21.2425M/s Instructions=2.99593M/s
    BM_Serialization/32768                   11.1 ms         11.7 ms           56 BytesEncoded=19.785M/s Instructions=2.7962M/s
    BM_Serialization/65536                   22.6 ms         25.6 ms           25 BytesEncoded=18.2527M/s Instructions=2.5575M/s
    BM_Serialization/131072                  45.6 ms         45.8 ms           15 BytesEncoded=20.4844M/s Instructions=2.85975M/s
    BM_Serialization/262144                  91.9 ms         93.8 ms            7 BytesEncoded=20.0261M/s Instructions=2.7962M/s
    BM_Serialization/524288                   183 ms          184 ms            4 BytesEncoded=20.4514M/s Instructions=2.8557M/s
    BM_Serialization/1048576                  366 ms          367 ms            2 BytesEncoded=20.4511M/s Instructions=2.8557M/s
    BM_Serialization/2097152                  736 ms          734 ms            1 BytesEncoded=20.4511M/s Instructions=2.8557M/s
    
    opened by ZehMatt 0
Owner
ζeh Matt
Software Engineer/Bug destroyer
ζeh Matt
x86 Assembler used for generating shellcode

Intel x86 assembler [email protected] syntax: Decimal “integers begin with a non-zero digit followed by zero or more decimal digits (0–9)” B

thescientist 3 Mar 14, 2022
A port of the Linux x86 IOLI crackme challenges to x86-64

This is a port of the original Linux x86 IOLI crackme binaries to x86-64. The original set of IOLI crackmes can be found here: https://github.com/Maij

Julian Daeumer 4 Mar 19, 2022
Commodore 6502ASM, the original 6502/65C02/65CE02 Assembler used by Commodore for C65 project

Commodore 6502ASM This is the source code of the 6502/65C02/65CE02 assembler developed and used by Commodore for the C65 project. It aims to be compat

Michael Steil 15 May 4, 2022
ASMotor is a portable and generic assembler engine and development system written in ANSI C99

ASMotor is a portable and generic assembler engine and development system written in ANSI C99 and licensed under the GNU Public License v3. The package consists of the assembler, the librarian and the linker. It can be used as either a cross or native development system.

null 40 Jun 24, 2022
A mini assembler for x86_64, written for fun and learning.

minias A mini assembler for x86_64, written for fun and learning. Goals: A simple, tiny, fast implementation (in that order). Assemble the output of c

null 187 Jun 21, 2022
A simple assembler, made primarily for assembling output from my compiler.

Assembler This assembler is not currently meant for general use. It supports only the instructions and features emitted (and used) in my C compiler. I

null 2 Nov 14, 2021
Toy 8 bit CPU with a real assembler

neko8 neko8 is a 8 bit CPU emulator designed to be easy to learn written in C. It uses its own simple architecture and can be programmed in its own fo

rem 4 Jan 4, 2022
A basic assembler

Assembler ASSEMBLER DERLEYİCİSİ Programlama Dilleri (derleyiciler) giriş olarak yazılan bir programın kaynak kodunu alır (kodun doğru yazıldığı varsay

Batuhan Tomo 1 Nov 22, 2021
A fully customisable assembler for your own instruction sets

CASM A fully customisable assembler for your own instruction sets! What Is CASM? ?? Documentation ?? Command-Line Usage ?? How To Install CASM ?? Buil

Sjoerd Vermeulen 2 May 7, 2022
Macos-arm64-emulation - A guide for emulating macOS arm64e on an x86-based host.

macos-arm64-emulation Use the following guide to download and configure all of the necessary tools and files for emulating the macOS arm64e kernel. Th

Cylance 216 Jun 23, 2022
a small C library for x86 CPU detection and feature extraction

libcpuid libcpuid provides CPU identification for the x86 (and x86_64). For details about the programming API, you might want to take a look at the pr

Veselin Georgiev 329 May 31, 2022
x86 emulator on Raspberry Pi Pico

picox86 x86 emulator on Raspberry Pi Pico https://user-images.githubusercontent.com/10139098/110543817-13299080-812b-11eb-9c88-674cdae919fc.mp4 PCB fr

null 34 Apr 7, 2022
SerenityOS - Graphical Unix-like operating system for x86 computers. 🐞

SerenityOS is a love letter to '90s user interfaces with a custom Unix-like core. It flatters with sincerity by stealing beautiful ideas from various other systems.

SerenityOS 20.2k Jun 27, 2022
Obfuscate calls to imports by patching in stubs. ICO works on both X86 and X64 binaries.

ICO adds a new section into the image, then begins building stubs for each import that uses a extremely basic routine to decrypt an RVA and places them into the section.

null 34 Jun 4, 2022
Programming language that compiles into a x86 ELF executable.

ocean Programming language that compiles into a x86 ELF executable. The main goal at the moment is to create a C compiler, which can atleast compile i

Richard 165 Jun 21, 2022
rdtsc x86 instruction to detect virtual machines

rdtsc_detector rdtsc x86 instruction to detect virtual machines What is rdtsc? The Time Stamp Counter (TSC) is a 64-bit register present on all x86 pr

null 4 Apr 29, 2022
A D++ Discord Bot template for Visual Studio 2019 (x64 and x86)

D++ Windows Bot Template A D++ Discord Bot template for Visual Studio 2019 (x64 and x86, release and debug). The result of this tutorial. This templat

brainbox.cc 19 Jun 11, 2022
An experimental operating system for x86 and ARM

Odyssey - an experimental operating system for x86 and ARM

Anuradha Weeraman 35 Jun 8, 2022