SleighCraft is a decoder based on ghidra's decompiler implementation.

Overview

SleighCraft

SleighCraft is one of the BinCraft project.

SleighCraft is a decoder (or, linear disassembler) based on ghidra's decompiler implementation. Sleighcraft can be used in Rust or Python, with both high-level and low-level API.

In general, sleighcraft is just like capstone but with IR and more archs.

Features:

  • Rust based API and Python scripting API.
  • Decoding with IR as the semantic meaning.
  • Archs: 110 architectures.

️️ ✔️ : provided

: not provided

🚧 : in construction

🤔 : not sure, maybe not

Comparison with capstone:

Feature SleighCraft Capstone Engine
disassemble ✔️ ✔️
IR ✔️
C API 🚧 ✔️
custom architecture ✔️

Architectures comparision with capstone (according to capstone arch list):

Arch Names SleighCraft Capstone Engine
6502 ✔️ 🤔
6805 ✔️ 🤔
8051 ✔️ 🤔
8048 ✔️ 🤔
8085 ✔️ 🤔
68000 ✔️ 🤔
aarch64(armv8) ✔️ ️️ ✔️
arm ✔️ ️️ ✔️
cp1600 ✔️ 🤔
cr16 ✔️ 🤔
avr8 ✔️ ️️ 🤔
dalvik ✔️ 🤔
jvm ✔️ 🤔
mips ✔️ ️️ ✔️
powerpc ✔️ ️️ ✔️
sparc ✔️ ️️ ✔️
tricore ✔️ 🤔
riscv ✔️ 🤔
z80 ✔️ 🤔
System Z ✔️
xCore ✔️

How to install

Rust

Use cargo:

sleighcraft = { git = "https://github.com/StarCrossPortal/sleighcraft" }

The repo is a bit large to submit on crates-io (because of predefined sla files), but save you the complex of compiling sleigh files yourself.

Python:

# quick install it with pip
$ pip3 install bincraft

# or download binaries than choose the corresponding architecture
$ pip3 install bincraft-0.1.0-cp39-cp39-Arch.whl

# or manual, to do this, you need to have rust compiler installed and maturin
# better with rustup.
$ pip3 install maturin
$ maturin build
$ pip3 install bincraft-0.1.0-cp39-cp39-Arch.whl

NodeJs:

# quick install it with npm 
$ npm i bincraft

# or manual, to do this, you need to have rust compiler installed, nodejs and neon
# better with rustup.
$ npm install -g neon-cli
$ neon build

How to Use

One could refer to doc.rs to see how Rust binding can be used.

Python binding:

from bincraft import Sleigh

code = [0x90, 0x31, 0x32] # code to disassemble

# init the sleigh engine Sleigh(arch, code)
sleigh = Sleigh("x86", code)

# now we are prepared to disassemble!
# disasm(start_addr)
for asm in sleigh.disasm(0):
    addr = asm.addr()
    mnem = asm.mnemonic()
    body = asm.body()

    # quite like capstone, right?
    print(f'Addr: {addr}\t  mnemonic: {mnem}\t body: {body}')

    # but! we also have the IR!
    pcodes = asm.pcodes()
    for pcode in pcodes:
        opcode = pcode.opcode()
        vars = pcode.vars()
        print(f'opcode: {opcode}\t vars: {vars}\t')
    print()

Nodejs binding:

{ let addr = asm.addr(); let mnemonic = asm.mnemonic(); let body = asm.body(); // dump instruction console.log(`addr: ${addr}\t mnemonic: ${mnemonic}\t body: ${body}`); // And we have IR! let pcodes = asm.pcodes(); pcodes.forEach(pcode => { opcode = pcode.opcode(); vars = pcode.vars(); console.log(`opcode: ${opcode}\t vars: ${vars}`); }); }); ">
const Sleigh = require('bincraft');
//or const Sleigh = require('.');

// init the sleigh engine Sleigh(arch, code) like python
const sleigh = new Sleigh("x86",[0x90,90]);

// disasm(start_addr) 
// - start: Default is 0
const asms = sleigh.disasm();

asms.forEach(asm => {
    let addr = asm.addr();
    let mnemonic = asm.mnemonic();
    let body = asm.body();
    // dump instruction
    console.log(`addr: ${addr}\t mnemonic: ${mnemonic}\t body: ${body}`);
    
    // And we have IR!
    let pcodes = asm.pcodes();
    pcodes.forEach(pcode => {
        opcode = pcode.opcode();
        vars = pcode.vars();
        
        console.log(`opcode: ${opcode}\t vars: ${vars}`);
    });
});

Rust (kinda low level):

// Overall procedure:
// 1. get the spec, this is where we know how to decode anything
// 2. get a loader, this is where we fill the input bytes to the engine.
// A predefined loader is provided: `PlainLoadImage`, which sets
// the things to decode by using a single buf.
// 3. set the AssemblyEmit and PcodeEmit instance, these are two
// traits that defines the callback at the decode time.
// 4. do the decode
use sleighcraft::*;
let mut sleigh_builder = SleighBuilder::default();
let spec = arch("x86").unwrap();
let buf = [0x90, 0x32, 0x31];
let mut loader = PlainLoadImage::from_buf(&buf, 0);
sleigh_builder.loader(&mut loader);
sleigh_builder.spec(spec);
let mut asm_emit = CollectingAssemblyEmit::default();
let mut pcode_emit = CollectingPcodeEmit::default();
sleigh_builder.asm_emit(&mut asm_emit);
sleigh_builder.pcode_emit(&mut pcode_emit);
let mut sleigh = sleigh_builder.try_build().unwrap();

sleigh.decode(0).unwrap();

println!("{:?}", asm_emit.asms);
println!("{:?}", pcode_emit.pcode_asms);

A more detailed documentation of Rust API is still under development.

About Us

This is a project started by StarCrossTech PortalLab.

Any contribution through pull request is welcome. ✌️

Issues
  • Rust decompiler improvement

    Rust decompiler improvement

    Current ghidra has a hard problem decompiling Rust programs.

    Fixes:

    • [x] display proper string representation when strings are concatenated into one (in Rust) case. This is resolved in this PR already.
    • [x] wrong stack analysis. Resolved by this PR.
    • [x] wrong parameter analysis. Resolved by this PR
    difficulty: hard 
    opened by Escapingbug 3
  • Pcode patching

    Pcode patching

    This is required for more flexible IR arrangements.

    The background is that, currently the only way to modify semantic of the program is through instruction patching. However, the instruction patching has some drawbacks:

    1. instruction patching cannot insert any instruction
    2. instruction patching cannot modify patch a longer instruction and keep the next instruction untouched

    And, to be honest, those drawbacks are preventing strong analysis such as deobfuscating control flow flattening.

    Obfuscations like control flow flattening would rearrange the basic blocks. But because of the drawbacks mentioned, no possible rearrangements can be done in Ghidra (or IDA). At least, not easily possible.

    The solution of this problem is to allow pcode patching. That is, we allow user to display the raw-pcode and patch them.

    What we need:

    • [ ] an action that pops only when clicked on raw-pcode (this is possible by checking which "row" the user clicked on the instruction.)
    • [ ] parsing the user input Pcode as the reverse version of the PcodeFormatter.
    • [ ] record the pcode
    • [ ] use the recorded pcode and bypass the decompiler calling sleigh engine

    The reason of the last two is that the pcode is not stored in the database and is lifted each time by the sleigh engine as mentioned in this issue.

    So maybe we could find out some way to bypass the translation and remember the last time lifted and use it for the pcode patching feature. Note that not all the functions need the pcode stored, only the ones patched. Or else we might have a database exploded in disk space.

    difficulty: hard 
    opened by Escapingbug 2
  • x86-64 throws BadDataError on basic disassembly

    x86-64 throws BadDataError on basic disassembly

    While attempting to use sleighcraft for x86-64 disassembly I've run into a problem with BadDataErrors being thrown on valid code.

    Code to reproduce when using the python package:

    from bincraft import Sleigh
    # Opcodes for xor rax, rax in x86-64
    # Also for dec eax; xor eax, eax in x86
    code = [72, 49, 192]
    
    # x86 test case
    sleigh = Sleigh("x86", code)
    print(sleigh.disasm(0)) # works
    
    sleigh = Sleigh("x86-64", code)
    print(sleigh.disasm(0)) # fails with OSError: cpp exception: BadDataError
    

    I'm unsure as to what could be causing this issue, or if I missed a configuration step somewhere in the process. Any feedback is appreciated.

    opened by Jumboperson 1
  • UI color configuration refactor

    UI color configuration refactor

    Current UI color is scattered in the code that describes the UI components. Many components use hard-coded color which does not care about the overall look and feel in anyway and may not be configuration. One such example is, if you search for "Color.BLUE" you will likely get a tons of such hard coded color.

    One possible solution for this is to refactor all the place that needs the color and use a seperate config file (xml or json, whatever) to describe those colors.

    As a side note of how this can be implemented, ghidra instance always need an ApplicationLayout to start,

    For example, the GhidraLauncher uses GhidraApplicationLayout class:

    image

    The application layout stores many whole-ghidra level directory structure:

    image

    So, when instantiating the GhidraApplicationLayout, we can use the dir info to find the color configuration file resided in the ghidra installation dir.

    Then, we can instantiate a singleton called something like ColorConfig and parse the file (xml or json?) to get a configuration. To ease the choice of color, we can just name the colors like "primary", "secondary", "foreground_default".

    Whenever some class wants the color, it should query the singleton and get a color of a particular name. In this way, if the LaF of java swing switched (just like in dark theme), we should also provide a new configuration of colors so that the colors switches accordingly.

    difficulty: easy 
    opened by Escapingbug 1
  • Equate Symbol Storage

    Equate Symbol Storage

    When you set a new equate to a number appear in Listing area but not be identified to a variable in deompile area, and then if you want to rename a variable in decompile area, you will get a error message.

    And I have found the reason, ghidra will storage all equate symbols in symbol table and produce their hash storage in LocalSymbolMap, so when you rename a variable, ghidra will search for the LocalSymbolMap and make you operate failed.

    difficulty: medium 
    opened by shizhongpwn 1
  • IDA-like default variable names in decompiler

    IDA-like default variable names in decompiler

    Current ghidra default variable names are verbose, especially those contain stringified address in it. Most of the time, those addresses are not useful. We should strip them out to provide a cleaner decompiler output.

    Example (ghidra): image

    Same function in IDA: image

    Ignore the array analysis and the alloca thing. v2 is definitely visually better than those uStack77832 thing. Nobody cares about 77832 kinda thing.

    This functionality could be also provided to upstream. But in our case, we could set it to enabled by default but official upstream should have it disabled.

    My previous commit can be an illustration of how this can be done properly. But that's not a complete commit, as it does not cover all of the variable name generation algorithm.

    One can implement this by following my commit and complete the whole variable generation. Also, better simpler variable name genration algorithm is also welcome.

    difficulty: medium 
    opened by Escapingbug 1
  • Tool requirement: binary generation from API

    Tool requirement: binary generation from API

    The Sleigh engine is the core of ghidra decompiler. It can deal with the binary stream, disassemble it into instructions and lift it into IRs.

    However, its restriction is that it can only deal with the binary stream instead of text streams. Sometimes we are given the text streams, and we know the underlining semantic of each text instruction. To deal with such situation, the usage of sleigh engine is hard.

    A possible solution of this is to write a tool (possibly in Python?) that could generate the binary according to the text instructions and a sleigh specification that could further translate the binary back to the text format.

    This allows the sleigh engine to be bypassed and let the ghidra do the rest of the job as it is.

    What we need:

    • [ ] API design
    • [ ] instruction choice algorithm (choose the binary format of each instruction when instructions are fed into the API)
    • [ ] sleigh generation algorithm
    • [ ] complete tool
    difficulty: hard 
    opened by Escapingbug 1
  • More fluent convert to char/hex/dec display experience

    More fluent convert to char/hex/dec display experience

    Currently convert functionality (by EquatePlugin) has a strange behavior: after convert, sometimes the decompiler also follows the converting, but sometimes it does not. No explicit message is showed to user. And the expected behavior is not ensured to succeed each time.

    The reason behind is that current decompiler only support show constant in five possible format:

    • decimal
    • oct
    • hex
    • char
    • binary

    This can be proved in decompiler source code database.hh: image

    Formats such as floating is not yet supported (or string? not sure).

    There are two possible solutions to this problem:

    1. show a error message when decompiler cannot follow the rule
    2. fix decompiler, adding the missing parts.

    One more task is to add the convert part to the decompiler, but we identify it not included in 0.1 release.

    difficulty: medium 
    opened by Escapingbug 1
  • UI modernize: dark theme

    UI modernize: dark theme

    Complete UI modernize tracking issue.

    Tasks:

    • [x] dark theme introduce (with FlatLaf)
    • [x] ~~forground coloring fixing: Ghidra uses hard coded colors all across the project. After introducing dark theme, some of the letters can be hard to recognize because of the default color.~~ (Basic level is done already but the next things should rely on #17, so we consider this done for now)
    • [ ] other elements flattening
    • [ ] better icons

    Current showcase: image image

    difficulty: easy 
    opened by Escapingbug 1
  • Add incremental sla compilation

    Add incremental sla compilation

    After this PR, the sla compilation should be incremental. i.e, if you haven't changed sleigh source code, the sla file should be generated only once instead of being generated each time you compile.

    opened by Escapingbug 0
  • build time compile sla

    build time compile sla

    After this PR, we should be able to allow uploading to crates.io.

    Current issue blocking the uploading is the code size, previously with sla files, the code can be quite large. But actually the sleigh files (slaspec) is not that large, we can require user of our crate to compile the sllaspec to sla before using. Hopefully, this PR does the trick.

    opened by Escapingbug 0
  • Rewrite decode_with for rust api

    Rewrite decode_with for rust api

    Delete the function decode_with for original and Expose more Rust low-level APIs.

    • [x] fixed cxx exception content display
    • [x] fixed pcode instructions display error
    • [x] disassembly and pcode instruction api exposed
    • [x] rewrite decode_with for rust api

    #21

    opened by ioo0s 0
  • v0.2.0 api adjustment

    v0.2.0 api adjustment

    I got to admit, current API, especially Rust API is totally a mess.

    Hopefully this PR should beautify our Rust API. But note that Python and Nodejs binding has not been changed, so this is still a work in progress.

    Current todo:

    • [x] better Rust API
    • [ ] doc the Rust API
    • [ ] fix Python binding leveraging the new Rust API
    • [ ] fix Nodejs binding
    opened by Escapingbug 0
  • X86-64 architecture program decompile error

    X86-64 architecture program decompile error

    When using code code = [15, 31, 128, 0, 0, 0, 0] in sleighcraft and set MODE_64, We will get an error BadDataError.

    crash demo

      let mut sleigh_builder = SleighBuilder::default();
        let spec = arch("x86-64").unwrap();
        let buf = [15, 31, 128, 0, 0, 0, 0];
        let mut loader = PlainLoadImage::from_buf(&buf, 0);
        sleigh_builder.loader(&mut loader);
        sleigh_builder.spec(spec);
        sleigh_builder.mode(MODE64);
        let mut asm_emit = CollectingAssemblyEmit::default();
        let mut pcode_emit = CollectingPcodeEmit::default();
        sleigh_builder.asm_emit(&mut asm_emit);
        sleigh_builder.pcode_emit(&mut pcode_emit);
        let mut sleigh = sleigh_builder.try_build().unwrap();
    
        sleigh.decode(0).unwrap();
    
        println!("{:?}", asm_emit.asms);
        println!("{:?}", pcode_emit.pcode_asms);
    

    But using capstone is normal image

    opened by ioo0s 0
  • Better Rust API?

    Better Rust API?

    As noted in the README, the Rust API is kind of low level. Users need to construct internal structures like CollectingAssemblyEmit and call internal methods sleigh.decode(0).unwrap() (what does this do?) to get the results. I guess the developer may expect Rust users to be skilled enough, so they can even develop fancier features based on those low level APIs? How about also providing a higher level one, like shown in the Python/Nodejs bindings?

    By the way, the implementation code contains many wrappers of XXXEmit, such as AssemblyEmit, RustAssemblyEmit, CollectingAssemblyEmit, and the Pcode-series emitters. They are basically doing similar things. I guess the authors may want to provide a callback mechanism and also a default callback that collects the emitted code into a vector. But I think it is kind of over-designed. Maybe a cleaner way is simply returning an interator, so users can iterate through the generated code and collect them in whatever way they want.

    opened by liangjs 0
Owner
PortalLab
StarCross Technology PortaLab 星阑科技PortalLab实验室 (Previous Ret2Lab)
PortalLab
Multichannel HFDL decoder

dumphfdl dumphfdl is a multichannel HFDL (High Frequency Data Link) decoder. HFDL (High Frequency Data Link) is a protocol used for radio communicatio

Tomasz Lemiech 72 May 25, 2022
Quite OK Image (QOI) format encoder/decoder

This project implements encoding and decoding the "Quite OK Image" (QOI) format in the Ć programming language. Ć can be automatically translated to pu

Piotr Fusik 36 Jun 6, 2022
Radiosonde decoder plugin for SDR++

Radiosonde decoder plugin for SDR++ Build instructions Download the SDR++ source code: git clone https://github.com/AlexandreRouma/SDRPlusPlus Open th

Davide Belloli 25 Jun 14, 2022
Minimalist protocol buffer decoder and encoder in C++

protozero Minimalistic protocol buffer decoder and encoder in C++. Designed for high performance. Suitable for writing zero copy parsers and encoders

Mapbox 215 Jun 17, 2022
Browser and NodeJS Web Assembly audio decoder libraries that are highly optimized for size and performance.

WASM Audio Decoders WASM Audio Decoders is a collection of Web Assembly audio decoder libraries that are highly optimized for browser use. Each module

Ethan Halsall 53 May 30, 2022
An implementation of physically based shading & image based lighting in D3D11, D3D12, Vulkan, and OpenGL 4.

Physically Based Rendering (c) 2017 - 2018 Michał Siejak (@Nadrin) An implementation of physically based shading model & image based lighting in vario

Michał Siejak 995 Jun 10, 2022
In DFS-BFS Implementation In One Program Using Switch Case I am Using an Simple And Efficient Code of DFS-BFS Implementation.

DFS-BFS Implementation-In-One-Program-Using-Switch-Case-in-C Keywords : Depth First Search(DFS), Breadth First Search(BFS) In Depth First Search(DFS),

Rudra_deep 1 Nov 17, 2021
uSDR implementation based on a RP2040 Pi Pico

uSDR-pico A uSDR implementation based on a RP2040 Pi Pico. This code is experimental, intended to investigate how the HW and SDK work with an applicat

null 66 Jun 21, 2022
a poc implementation arm64 tracer based on simulation

sim-trace a poc implementation arm64 tracer based on simulation Build Test ndk-build NDK_DEBUG=1 Run Test adb push test /data/local/tmp/test && adb s

null 26 Jun 17, 2022
Xilinx Virtual Cable Implementation based on ESP8266

xvc-esp8266 Xilinx Virtual Cable Implementation based on ESP8266 Compiling Please set the CPU frequency to 160MHz and lwIP variant to "v2 Higher bandw

Vadzim Dambrouski 10 May 15, 2022
xeus-wren is a Jupyter kernel for wren based on the native implementation of the Jupyter protocol xeus.

xeus-wren is a Jupyter kernel for wren based on the native implementation of the Jupyter protocol xeus. Installation xeus-wren has not been packaged f

Thorsten Beier 4 Mar 9, 2022
A read-only, license friendly, FUSE based btrfs implementation

btrfs-fuse About This is a read-only btrfs implementation using FUSE (Filesystem in Userspace). Although btrfs is already in mainline Linux kernel, th

Qu Wenruo 8 Jan 31, 2022
musl - an implementation of the standard library for Linux-based systems

musl libc musl, pronounced like the word "mussel", is an MIT-licensed implementation of the standard C library targetting the Linux syscall API, suit

Ammar Faizi 3 Dec 16, 2021
An un-opinionated url-based Router implementation (Navigator 2.0).

An un-opinionated url-based Router implementation (Navigator 2.0).

gskinner team 3 Feb 5, 2022
FFF is a decentralized blockchain based on IPFS/RIPPLE, which integrates lua virtual machine-based smart contracts.

FFF is a decentralized blockchain based on IPFS/RIPPLE, which integrates lua virtual machine-based smart contracts. It is also a software platform designed to help coordinate voluntary free market operations amongst a set of social actors.

gen2600 5 May 31, 2022
Scrollytroller is a crank-based USB controller, initially inteneded for use with Playdate's Pulp web-based game development tool

Scrollytroller Scrollytroller is a crank-based USB controller, initially inteneded for use with Playdate's Pulp web-based game development tool. This

Scott Lawrence 4 Feb 17, 2022
A Navigator 2.0 based Flutter widget that automatically splits the screen into two views based on available space

A Navigator 2.0 based Flutter widget that automatically splits the screen into two views based on available space

null 3 Feb 8, 2022
A CUDA-accelerated cloth simulation engine based on Extended Position Based Dynamics (XPBD).

Velvet Velvet is a CUDA-accelerated cloth simulation engine based on Extended Position Based Dynamics (XPBD). Why another cloth simulator? There are a

Vital Chen 10 Jun 9, 2022
Implementation of python itertools and builtin iteration functions for C++17

CPPItertools Range-based for loop add-ons inspired by the Python builtins and itertools library. Like itertools and the Python3 builtins, this library

Ryan Haining 1.2k Jun 23, 2022