C++ python bytecode disassembler and decompiler

Overview

Decompyle++

A Python Byte-code Disassembler/Decompiler

Decompyle++ aims to translate compiled Python byte-code back into valid and human-readable Python source code. While other projects have achieved this with varied success, Decompyle++ is unique in that it seeks to support byte-code from any version of Python.

Decompyle++ includes both a byte-code disassembler (pycdas) and a decompiler (pycdc).

As the name implies, Decompyle++ is written in C++. If you wish to contribute, please fork us on github at https://github.com/zrax/pycdc

Building Decompyle++

  • Generate a project or makefile with CMake (See CMake's documentation for details)

    • The following options can be passed to CMake to control debug features:

      Option Description
      -DCMAKE_BUILD_TYPE=Debug Produce debugging symbols
      -DENABLE_BLOCK_DEBUG=ON Enable block debugging output
      -DENABLE_STACK_DEBUG=ON Enable stack debugging output
  • Build the generated project or makefile

    • For projects (e.g. MSVC), open the generated project file and build it
    • For makefiles, just run make
    • To run tests (on *nix or MSYS), run make check

Usage

To run pycdas, the PYC Disassembler: ./pycdas [PATH TO PYC FILE] The byte-code disassembly is printed to stdout.

To run pycdc, the PYC Decompiler: ./pycdc [PATH TO PYC FILE] The decompiled Python source is printed to stdout. Any errors are printed to stderr.

Authors, Licence, Credits

Decompyle++ is the work of Michael Hansen and Darryl Pogue.

Additional contributions from:

  • charlietang98
  • Kunal Parmar
  • Olivier Iffrig
  • Zlodiy

It is released under the terms of the GNU General Public License, version 3; See LICENSE file for details.

Issues
  • Unsupported opcode: LOAD_METHOD on Python 3.7 files

    Unsupported opcode: LOAD_METHOD on Python 3.7 files

    Hello. I am using pycdc, cloned it from mater branch and compiled from scratch today. When I am trying to decompile Python 3.7 files, pycdc failed to decompile it fully.

    Unsupported opcode: LOAD_METHOD
    *********************** (here a part of file, that pycdc is able to decompile)
    # WARNING: Decompyle incomplete
    

    The file, that I am trying to decompile: d.zip

    Can you take a look at this file, please?

    opened by KhArtNJava 11
  • Output File

    Output File

    I use this Tool to decompyle .pyc Files and want write this to a File. Now i use the freopen Method like this: "freopen( "output.py", "w", stdout );"

    My problem is that every time I decompyle a file will be named output.py. But I want that when I e.g. decompyle test.pyc this is then named test.py and when I load test-2.pyc this is then named test-2.py.

    Can you help me?

    opened by MaisKolben 10
  • "Bad MAGIC!"

    After I've compiled complete package (using Microsoft Visual Studio 2012), whenever I try to decompile or disassemble ANY python .pyc file (for example - any file located in "tests" directory) I'm getting as a result

    Bad MAGIC! Could not load file test_with.pyc

    Is there any specific reason why it doesn't work?

    thanks

    opened by deathnoise 5
  • Python 3.4 Decompile Failure.

    Python 3.4 Decompile Failure.

    I'm finding compiled source with defined functions is crashing the decompiler for Python 3.4.x.

    Example Code:

    def func_name(): print ("yelloo") return False

    print ('Hello, World!') func_name()

    Note that the decompiler won't even work unless you implement mancoast's fix for the Magic Number here: https://github.com/zrax/pycdc/issues/49

    the line in ASTree.cpp fails: if (strcmp(code_src->name()->value(), "") == 0) { because code_src->name() is empty. in fact the whole structure is filled with nulls. Something isn't parsing right.

    opened by volfin 4
  • Potential Security Vulnerabilities while parsing PyC files

    Potential Security Vulnerabilities while parsing PyC files

    We would like to report multiple pyc files that can be used to trigger numerous types of crashes, some of them are exploitable (buffer overflows, heap overflows), some a re bit more difficult to exploit (null references).

    What would be the best way to pass these to you (if its just putting them here, let me know), without revealing them to the public - as they may be used maliciously

    You can contact me via github or at ssd[]beyondsecurity.com

    opened by nrathaus 3
  • Handling nan and inf values

    Handling nan and inf values

    a = 1e300 * 1e300 * 0
    b = -1e300 * 1e300 * 0
    c = 1e300 * 1e300
    d = -1e300 * 1e300
    

    gives:

    a = nan
    b = nan
    c = inf
    d = -inf
    

    but Python doesn't allow these in source code to represent their internal values.

    uncompyle6 uses float('nan'), etc. although there are numerous other ways to handle.

    opened by rocky 3
  • Null pointer dereference and segfault in PycRef

    Null pointer dereference and segfault in PycRef

    Compiled with -fsanitize=address,undefined via AFL on Debian 8 x64.

    ./pycdc test000.pyc

    CreateObject: Got unsupported type 0x7F
    # Source Generated with Decompyle++
    # File: test000.pyc (Python 1.5)
    
    ASAN:SIGSEGV
    =================================================================
    ==28496==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000028 (pc 0x000000418b3c sp 0x7ffd0ae66ba0 bp 0x7ffd0ae6a560 T0)
        #0 0x418b3b in PycRef /root/pycdc/pyc_object.h:15
        #1 0x418b3b in PycCode::code() const /root/pycdc/pyc_code.h:41
        #2 0x418b3b in BuildFromCode(PycRef<PycCode>, PycModule*) /root/pycdc/ASTree.cpp:27
        #3 0x547606 in decompyle(PycRef<PycCode>, PycModule*) /root/pycdc/ASTree.cpp:2804
        #4 0x409f96 in main /root/pycdc/pycdc.cpp:28
        #5 0x7f6c04fb7b44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b44)
        #6 0x40d0e9 (/root/pycdc/build/pycdc+0x40d0e9)
    

    With some test cases UBSan will say /root/pycdc/pyc_object.h:15:54: runtime error: member access within null pointer of type 'const struct PycRef'. I've encountered other issues as well like out of bounds reads, but upon minimizing those testcases they all seem rooted in this particular bug.

    opened by geeknik 3
  • pycdc crashes, any tutorial to trace where is the problem

    pycdc crashes, any tutorial to trace where is the problem

    when I run pycdc on some pyc, pycdc crashes. So is there any guide to find which cause this problem, and then we can fix it. I know we can debug the C source file, but it is some difficult for C newbie. So if there some way can aid debug, it will be good for this project and the user. Thanks

    opened by retsyo 3
  • one bug : can not do some pyc

    one bug : can not do some pyc

    when use pycdc
    there is an error PycData::get16() { /* Ensure endianness */ int result = getByte() & 0xFF; result |= (getByte() & 0xFF) << 8; return (result | -(result & 0x8000)); }

    opened by daishaofei 3
  • lambda wrong output

    lambda wrong output

    it outputs this:

        def onPressKeyDict[app.DIK_1]():
            return self._GameWindow__PressNumKey(1)
    

    but schould be:

    onPressKeyDict[app.DIK_1] = lambda : self._GameWindow__PressNumKey(1)

    opened by AnniAmok 3
  • [Errno 2] No such file or directory

    [Errno 2] No such file or directory

    Hi, in the 'pycdc-master' folder, I ran this command: python pycdc C:\Users\Rollo\Desktop\decompile\Test.pyc

    But this error keeps appearing: [Errno 2] No such file or directory

    Can you help me?

    opened by HiMark146 2
  • Segmentation fault on lambda decoding

    Segmentation fault on lambda decoding

    I'm trying to use pycdc on file which apparently contains lambda (not sure what it doing). During decoding it pycdc output Url = (lambda and crashing.
    Crashing point within ASTree.cpp:

                fputs("(lambda ", pyc_output);
                PycRef<ASTNode> code = node.cast<ASTFunction>()->code();
                PycRef<PycCode> code_src = code.cast<ASTObject>()->object().cast<PycCode>();
    

    Problem is with casting to ASTObject within code.cast<ASTObject>(). code variable contains ASTComprehension and dynamic_cast creates PycRef<ASTObject*> with null reference. It is understandable since inheritance ASTComprehension -> ASTNode. Where I can learn about correctly reading output of disassembler to understanding how modify decompiler? I could not find detailed description of opcodes and their arguments. But more importantly I could not fully understand hierarchical output of disassembler especially init section of each level.

    From current understanding of assembler output here part which pycdc could not understand:

                            416     LOAD_CONST              18: <CODE> <listcomp>
                            418     LOAD_CONST              19: 'main.<locals>.download.<locals>.<listcomp>'
                            420     MAKE_FUNCTION           0
                            422     LOAD_GLOBAL             26: re
                            424     LOAD_METHOD             27: finditer
                            426     LOAD_CONST              20: '^([^#].*ts(?:$|\\?\\S+$))'
                            428     LOAD_FAST               3: m3u8
                            430     LOAD_ATTR               28: m3u8Data
                            432     LOAD_GLOBAL             26: re
                            434     LOAD_ATTR               29: M
                            436     CALL_METHOD             3
                            438     GET_ITER
                            440     CALL_FUNCTION           1
                            442     STORE_FAST              12: fragmentsUrl
                            444     LOAD_FAST               12: fragmentsUrl
                            446     BUILD_LIST              0
                            448     COMPARE_OP              2 (==)
                            450     POP_JUMP_IF_FALSE       462
                            454     LOAD_GLOBAL             30: error
                            456     LOAD_CONST              21: 'Fragments is empty. Kindly report this bug'
                            458     CALL_FUNCTION           1
                            460     POP_TOP
                            462     LOAD_GLOBAL             31: Ripper
                            464     LOAD_DEREF              2: arg
                            466     LOAD_DEREF              0: outputFolderName
                            468     LOAD_FAST               6: outputFileName
                            470     LOAD_FAST               7: subtitlesPath
                            472     LOAD_FAST               12: fragmentsUrl
                            474     LOAD_FAST               3: m3u8
    

    PS: I'm sorry I could not upload *.pyc files to public hosting and paste here link.

    opened by tarhan 1
  • Disassembler/Decompiler

    Disassembler/Decompiler

    Why I am able to use pycdas with my file, but when pycdas is used i got the following error.

     # Source Generated with Decompyle++
    # File: siemensExtractor.pyc (Python 3.9)
    
    Unsupported opcode: <255>
    from os.path import exists
    from datetime import datetime
    import json
    import os
    import pandas as pd
    import requests
    import io
    # WARNING: Decompyle incomplete
    
    opened by Intellbg 1
  • Add debug output of asm instructions in pycdc

    Add debug output of asm instructions in pycdc

    I found it useful to see assembly and block scopes next to each other. The PR changes some disassembly functions to output strings instead of directly writing to the stdout.

    Output (set ENABLE_ASM_DEBUG in cmake to see it)

    3      SET_LINENO              0               1         (0)
    6      LOAD_CONST              0: '\ntest_lo...1         (0)
    9      STORE_NAME              0: __doc__      1         (0)
    12     SET_LINENO              9               1         (0)
    15     SET_LINENO              14              1         (0)
    18     SETUP_LOOP              54 (to 72)      1         (0)
    21     LOAD_NAME               1: args         1            while (72)
    22     GET_ITER                                1            while (72)
    25     SET_LINENO              14              1            while (72)
    28     FOR_ITER                43 (to 71)      1            while (72)
    31     STORE_NAME              2: term         1            for (72)
    34     SET_LINENO              15              1            for (72)
    37     SETUP_EXCEPT            18 (to 55)      1            for (72)
    40     SET_LINENO              16              2                    try (55)
    41     PRINT_NEWLINE                           2                    try (55)
    44     SET_LINENO              17              2                    try (55)
    47     CONTINUE_LOOP           22              2                    try (55)
    50     SET_LINENO              18              2                    try (55)
    51     PRINT_NEWLINE                           2                    try (55)
    52     POP_BLOCK                               2                    try (55)
    55     JUMP_FORWARD            13 (to 68)      1                CONTAINER (0)
    58     SET_LINENO              19              2                    except (68)
    59     POP_TOP                                 2                    except (68)
    60     POP_TOP                                 2                    except (68)
    61     POP_TOP                                 2                    except (68)
    64     SET_LINENO              20              2                    except (68)
    67     JUMP_FORWARD            1 (to 68)       2                    except (68)
    68     END_FINALLY                             2                    except (68)
    71     JUMP_ABSOLUTE           22              1            for (72)
    72     POP_BLOCK                               1            for (72)
    75     LOAD_CONST              1: None         2            else (72)
    76     RETURN_VALUE                            1         (0)
    
    opened by ahaensler 0
  • 3.8+ fstring

    3.8+ fstring "=" format specifier not detected

    Python 3.8 adds "=" to its format specifiers. See https://docs.python.org/3/whatsnew/3.8.html#f-strings-support-for-self-documenting-expressions-and-debugging

    Here is an example program that can be used for testing:

    # Tests new "debug" format new in 3.8.
    # RUNNABLE!
    
    """This program is self-checking!"""
    f'{f"{3.1415=:.1f}":*^20}' == '*****3.1415=3.1*****'
    
    # This SEGV's in pycdc
    y = 2
    def f(x, width):
        return f'x={x*y:{width}}'
    
    assert f('foo', 10) ==  'x=foofoo    '
    
    x = 'bar'
    assert f(10, 10), 'x=        20'
    
    
    opened by rocky 0
Owner
Michael Hansen
Michael Hansen
Compile and execute C "scripts" in one go!

c "There isn't much that's special about C. That's one of the reasons why it's fast." I love C for its raw speed (although it does have its drawbacks)

Ryan Jacobs 2k Jul 29, 2022
distributed builds for C, C++ and Objective C

distcc -- a free distributed C/C++ compiler system by Martin Pool Current Documents: https://distcc.github.io/ Formally http://distcc.org/ "pump" func

distcc 1.7k Aug 4, 2022
Roaring bitmaps in C (and C++)

CRoaring Portable Roaring bitmaps in C (and C++) with full support for your favorite compiler (GNU GCC, LLVM's clang, Visual Studio). Included in the

Roaring bitmaps: A better compressed bitset 1k Aug 5, 2022
New generation entropy codecs : Finite State Entropy and Huff0

New Generation Entropy coders This library proposes two high speed entropy coders : Huff0, a Huffman codec designed for modern CPU, featuring OoO (Out

Yann Collet 1.1k Jul 24, 2022
Compression abstraction library and utilities

Squash - Compresion Abstraction Library

null 364 Jul 31, 2022
Multi-format archive and compression library

Welcome to libarchive! The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of form

null 1.8k Jul 27, 2022
Easing the task of comparing code generated by cc65, vbcc, and 6502-gcc

6502 C compilers benchmark Easing the way to compare code generated by cc65, 6502-gcc, vbcc, and KickC. This repository contains scripts to: Compile t

Sylvain Gadrat 16 Dec 15, 2021
Secure ECC-based DID intersection in Go, Java and C.

SecureUnionID Secure ECC-based DID intersection. ABSTRACT This project is used to protect device ID using Elliptic Curve Cryptography algorithm. The d

Volcengine 19 Jul 22, 2022
nanoc is a tiny subset of C and a tiny compiler that targets 32-bit x86 machines.

nanoc is a tiny subset of C and a tiny compiler that targets 32-bit x86 machines. Tiny? The only types are: int (32-bit signed integer) char (8-

Ajay Tatachar 16 Feb 13, 2022
Smaller C is a simple and small single-pass C compiler

Smaller C is a simple and small single-pass C compiler, currently supporting most of the C language common between C89/ANSI C and C99 (minus some C89 and plus some C99 features).

Alexey Frunze 1.1k Aug 9, 2022
Microvm is a virtual machine and compiler

The aim of this project is to create a stack based language and virtual machine for microcontrollers. A mix of approaches is used. Separate memory is used for program and variable space (Harvard architecture). An interpreter, virtual machine and compiler are available. A demostration of the interpreter in action is presented below.

null 11 Jun 21, 2022
Pre-configured LLVM and ANTLR4 for C++

LLVM + ANTLR4 Starter Project Starter project for ANTLR4 and LLVM C++ project. Prerequisite LLVM 12 Java (for ANTLR4) git Install prerequisite librari

Nathanael Demacon 11 Jul 10, 2022
Aheui JIT compiler for PC and web

아희짓 개요 아희짓은 아희 언어를 위한 JIT (Just in Time) 컴파일러입니다. 어셈블러와 유틸 라이브러리외에 외부 라이브러리에 전혀 의존하지 않고 JIT을 바닥부터 구현합니다. 지원 환경 64비트 windows, mac, linux (x86 아키텍쳐) 웹어셈

Sunho Kim 27 Jan 2, 2022
Interpreter and miner for the LODA language written in C++

LODA Interpreter and Miner (C++) LODA is an assembly language, a computational model and a tool for mining integer sequences. You can use it to search

LODA Language 12 Jul 8, 2022
is a c++20 compile and runtime Struct Reflections header only library.

is a c++20 compile and runtime Struct Reflections header only library. It allows you to iterate over aggregate type's member variables.

RedSkittleFox 4 Apr 18, 2022
🌳 A compressed rank/select dictionary exploiting approximate linearity and repetitiveness.

The block-ε tree is a compressed rank/select dictionary that achieves new space-time trade-offs by exploiting the approximate linearity and the repeti

Giorgio Vinciguerra 10 Jun 5, 2022
A LLVM and Clang compiler toolchain built for kernel development

Cosmic-Clang Toolchain This is a LLVM and Clang compiler toolchain built for kernel development. Builds are always made from the latest LLVM sources r

Ǥђ๏ຮ₮⌁Ⲙครtє࿐ 0 Apr 12, 2022
Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and deploy without Python.

Python Inference Script(PyIS) Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and d

Microsoft 10 Feb 23, 2022
SleighCraft is a decoder based on ghidra's decompiler implementation.

SleighCraft is a decoder (or, linear disassembler) based on ghidra's decompiler implementation. Sleighcraft can be used in Rust or Python, with both high-level and low-level API.

PortalLab 231 Jul 10, 2022
IDA StrikeOut: A Hex-Rays decompiler plugin to patch the Ctree

StrikeOut is an plugin for the Hex-Rays Decompiler. It allows you to delete (hide) statements from the AST, thus simplifying the pseudocode output. This is a useful scenario when you are dealing with lots of junk code or code that don't necessarily increase your understanding of the pseudocode.

Elias Bachaalany 79 Jun 20, 2022
Disassembler for compiled Lua scripts

Luad English | Русский Luad - Disassembler for compiled Lua scripts. At the moment the program is in development (v0.12-pre-alpha). Supported compiler

Vitaliy Vorobets 9 Jun 15, 2022
Capstone disassembly/disassembler framework

Capstone Engine Capstone is a disassembly framework with the target of becoming the ultimate disasm engine for binary analysis and reversing in the se

Capstone Engine 2 Nov 8, 2021
Capstone disassembly/disassembler framework: Core + bindings.

Capstone disassembly/disassembler framework: Core (Arm, Arm64, BPF, EVM, M68K, M680X, MOS65xx, Mips, PPC, RISCV, Sparc, SystemZ, TMS320C64x, Web Assembly, X86, X86_64, XCore) + bindings.

Capstone Engine 5.8k Aug 5, 2022
Capstone disassembly/disassembler framework

Capstone Engine Capstone is a disassembly framework with the target of becoming the ultimate disasm engine for binary analysis and reversing in the se

Capstone Engine 37 Mar 30, 2022
Simple Virtual Machine with its own Bytecode and Assembly language.

BM Simple Virtual Machine with its own Bytecode and Assembly language. Build We are using nobuild build system which requires a bootstrapping step wit

Tsoding 77 Aug 3, 2022
eBPF bytecode assembler and compiler

An eBPF bytecode assembler and compiler that * Assembles the bytecode to object code. * Compiles the bytecode to C macro preprocessors. Symbolic

Emil Masoumi 6 Jan 23, 2022
A modern dynamically typed programming language that gets compiled to bytecode and is run in a virtual machine called SVM (Strawbry Virtual Machine).

Strawbry A bytecode programming language. Here is what I want Strawbry to look like: var a = 1 var b = 2 var c = a + b print(c) func sqrt(x) { re

PlebusSupremus1234 6 Jan 5, 2022
Kuroko - A bytecode-compiled scripting language

Kuroko - A bytecode-compiled scripting language Kuroko is a bytecode-compiled, dynamic, interpreted programming language with familiar Python-like syn

Kuroko 281 Aug 5, 2022
use ptrace hook Hotspot JavaVM, instrument java bytecode

taycan 通过native层修改java层(JVM),使用JVMTI及JNI API可以修改java任意类、执行任意代码,完成hook、插入内存马、反射等功能。 适用环境 LINUX KERNEL version > 3.2 GLIBC > 2.15 openJDK/OracleJDK 1.8

null 26 Jul 12, 2022