A Compiler Writing Journey

Overview

A Compiler Writing Journey

In this Github repository, I'm documenting my journey to write a self-compiling compiler for a subset of the C language. I'm also writing out the details so that, if you want to follow along, there will be an explanation of what I did, why, and with some references back to the theory of compilers.

But not too much theory, I want this to be a practical journey.

Here are the steps I've taken so far:

There isn't a schedule or timeline for the future parts, so just keep checking back here to see if I've written any more.

Copyrights

I have borrowed some of the code, and lots of ideas, from the SubC compiler written by Nils M Holm. His code is in the public domain. I think that my code is substantially different enough that I can apply a different license to my code.

Unless otherwise noted,

  • all source code and scripts are (c) Warren Toomey under the GPL3 license.
  • all non-source code documents (e.g. English documents, image files) are (c) Warren Toomey under the Creative Commons BY-NC-SA 4.0 license.
Issues
  • why not generate intermediate code ?

    why not generate intermediate code ?

    This project is excellent! I learn a lot from it. But I am curious why not generate intermediate code in your project? Will it look a little "incomplete" ?

    opened by dslu7733 3
  • Is there a reason that a lot of things are statically allocated rather than dynamically allocated?

    Is there a reason that a lot of things are statically allocated rather than dynamically allocated?

    For example, the global symbol table has a max number of entries, I've been implementing everything dynamically and reallocing as necessary, however I am worried that I have missed some important reason for statically allocating memory.

    opened by Jachdich 2
  •  Can i repost project to my blog?

    Can i repost project to my blog?

    Hello, I am a developer from China. Now, i am following your project to study. I like your project very much. Excuse me, can I translate and repost your project to my blog. Looking forward to your reply. Thanks !

    opened by Shaw9379 2
  • unsupported symbol modifier in branch relocation: “call printf@PLT”

    unsupported symbol modifier in branch relocation: “call [email protected]

    Firstly, thank you for sharing this great compiler writing jounery, I'm rather interested!

    I'm in chapter 4 (generating assembly code), and trying to follow the code.

    Here's the out assembly codes (saved in file "out.s"):

    	.text
    LC0:
    	.string	"%d\n"
    _printint:
    	pushq	%rbp
    	movq	%rsp, %rbp
    	subq	$16, %rsp
    	movl	%edi, -4(%rbp)
    	movl	-4(%rbp), %eax
    	movl	%eax, %esi
    	leaq	LC0(%rip), %rdi
    	movl	$0, %eax
    	call	[email protected]
    	nop
    	leave
    	ret
    
    	.globl	_main
    _main:
    	pushq	%rbp
    	movq	%rsp, %rbp
    	movq	$2, %r8
    	movq	$3, %r9
    	movq	$5, %r10
    	imulq	%r9, %r10
    	addq	%r8, %r10
    	movq	$8, %r8
    	movq	$3, %r9
    	movq	%r8,%rax
    	cqo
    	idivq	%r9
    	movq	%rax,%r8
    	subq	%r8, %r10
    	movq	%r10, %rdi
    	call	_printint
    	movl	$0, %eax
    	popq	%rbp
    	ret
    

    When running it with

    cc -o out out.s

    , it complained with

    out.s:13:2: error: unsupported symbol modifier in branch relocation
     call [email protected]
     ^
    

    How is the error happened and how to fix it?

    Thanks in advance!

    PS:

    1. I am using macOS Catalina 10.15.2
    2. cc version is
    Apple clang version 11.0.0 (clang-1100.0.33.17)
    Target: x86_64-apple-darwin19.2.0
    Thread model: posix
    InstalledDir: /Library/Developer/CommandLineTools/usr/bin
    
    1. The souce codes are compiled with cc -o comp1 -g cg.c expr.c gen.c main.c scan.c tree.c
    opened by MichaelScofield 2
  • Compilation error: data.h

    Compilation error: data.h

    This is probably a problem with my system. Starting from 05_Statements I'm getting the error:

    data.h:10:9: error: unknown type name ‘FILE’
       10 | extern_ FILE *Infile;                   // Input and output files
          |         ^~~~
    data.h:11:9: error: unknown type name ‘FILE’
       11 | extern_ FILE *Outfile;
          |         ^~~~
    data.h:13:19: error: ‘TEXTLEN’ undeclared here (not in a function)
       13 | extern_ char Text[TEXTLEN + 1];         // Last identifier scanned
          |                   ^~~~~~~
    data.h:14:30: error: ‘NSYMBOLS’ undeclared here (not in a function)
       14 | extern_ struct symtable Gsym[NSYMBOLS]; // Global symbol table
    

    Can anyone point me to the right direction. Thank you.

    opened by prince-ao 1
  • Part 08 IF statements

    Part 08 IF statements

    In the file cg.c at the line 176 you defined the following array for the inverted jump instructions:

    // List of inverted jump instructions,
    // in AST order: A_EQ, A_NE, A_LT, A_GT, A_LE, A_GE
    static char *invcmplist[] = { "jne", "je", "jge", "jle", "jg", "jl" };
    

    I think you inverted the jump instructions for greater/less than and greater/less equal.

    In your code A_LT defines his inverse as "jge" and A_LE as "jg" and the same is with A_GT and A_GE which are respectively defined as "jle" and "jl".

    Looking on various documentation online you can find that JL is defined as Jump short if Less and JLE as Jump short if Less or Equal. The same applies for the greater than/equal jumps, JG and JGE.

    This means you defined the inverse for A_LT (less than) the jump if less or equal than and for A_LE (less equal) the jump if less than. As mentioned above the same error is done for the greater than/equal jumps.

    The array should then be corrected as following:

    // List of inverted jump instructions,
    // in AST order: A_EQ, A_NE, A_LT, A_GT, A_LE, A_GE
    static char *invcmplist[] = { "jne", "je", "jg", "jl", "jge", "jle" };
    

    I've looked through all the parts of the tutorial and the error is still present in the last part (62 Cleanup).

    opened by gioele97 1
  • docs: fix simple typo, identifer -> identifier

    docs: fix simple typo, identifer -> identifier

    There is a small typo in 06_Variables/misc.c, 07_Comparisons/misc.c, 08_If_Statements/misc.c, 09_While_Loops/misc.c, 10_For_Loops/misc.c, 11_Functions_pt1/misc.c, 12_Types_pt1/misc.c, 13_Functions_pt2/misc.c, 14_ARM_Platform/misc.c, 15_Pointers_pt1/misc.c, 16_Global_Vars/Readme.md, 16_Global_Vars/misc.c, 17_Scaling_Offsets/misc.c, 18_Lvalues_Revisited/misc.c, 19_Arrays_pt1/misc.c, 20_Char_Str_Literals/misc.c, 21_More_Operators/misc.c, 23_Local_Variables/misc.c, 24_Function_Params/misc.c, 25_Function_Arguments/misc.c, 26_Prototypes/misc.c, 27_Testing_Errors/misc.c, 28_Runtime_Flags/misc.c, 29_Refactoring/misc.c, 30_Design_Composites/Readme.md, 30_Design_Composites/misc.c, 31_Struct_Declarations/misc.c, 32_Struct_Access_pt1/Readme.md, 32_Struct_Access_pt1/expr.c, 32_Struct_Access_pt1/misc.c, 33_Unions/expr.c, 33_Unions/misc.c, 34_Enums_and_Typedefs/expr.c, 34_Enums_and_Typedefs/misc.c, 35_Preprocessor/expr.c, 35_Preprocessor/misc.c, 36_Break_Continue/expr.c, 36_Break_Continue/misc.c, 37_Switch/expr.c, 37_Switch/misc.c, 38_Dangling_Else/expr.c, 38_Dangling_Else/misc.c, 39_Var_Initialisation_pt1/expr.c, 39_Var_Initialisation_pt1/misc.c, 40_Var_Initialisation_pt2/expr.c, 40_Var_Initialisation_pt2/misc.c, 41_Local_Var_Init/expr.c, 41_Local_Var_Init/misc.c, 42_Casting/expr.c, 42_Casting/misc.c, 43_More_Operators/expr.c, 43_More_Operators/misc.c, 44_Fold_Optimisation/expr.c, 44_Fold_Optimisation/misc.c, 45_Globals_Again/expr.c, 45_Globals_Again/misc.c, 46_Void_Functions/expr.c, 46_Void_Functions/misc.c, 47_Sizeof/expr.c, 47_Sizeof/misc.c, 48_Static/expr.c, 48_Static/misc.c, 49_Ternary/expr.c, 49_Ternary/misc.c, 50_Mop_up_pt1/expr.c, 50_Mop_up_pt1/misc.c, 51_Arrays_pt2/expr.c, 51_Arrays_pt2/misc.c, 52_Pointers_pt2/Readme.md, 52_Pointers_pt2/misc.c, 53_Mop_up_pt2/misc.c, 54_Reg_Spills/misc.c, 56_Local_Arrays/misc.c, 57_Mop_up_pt3/misc.c, 58_Ptr_Increments/misc.c, 59_WDIW_pt1/misc.c, 60_TripleTest/misc.c, 62_Cleanup/misc.c.

    Should read identifier rather than identifer.

    opened by timgates42 1
  • Part 7: Comparison Operators

    Part 7: Comparison Operators

    the comparison operators have higher precedence than multiply and divide?

    The expression '10 * 3 > 2 * 5' means '10 * 1 * 5= 15' ???

    I don't agree with that. C Operator Precedence table is in descending precedence. That means comparison operators has lower precedence than mul and div.

    precedence

    opened by alex-xia-xia 1
  • confusion on part8 in README.md

    confusion on part8 in README.md

      if (condition is true) 
        perform this first block of code
      else
        perform this other block of code
    

    In readme, you write it like the following.

           perform the opposite comparison
           jump to L1 if true
           perform the first block of code
           jump to L2
    L1:
           perform the other block of code
    L2:
    

    But there seems some wrong in jump to L1 if true. Do you mean jump to L1 if false ?

    opened by dslu7733 1
  • Part 7: register's b-suffix not mentioned in doc

    Part 7: register's b-suffix not mentioned in doc

    The variable breglist is declared here but not mentioned, and just shows up from nowhere in the doc.

    https://github.com/DoctorWkt/acwj/blob/b11b269c71727297b367203eaed164364291db16/07_Comparisons/cg.c#L13

    IMO it would better to mention the b-suffix somewhere before breglist appears in the journey.

    opened by Psycho7 1
  • Issue with negative integer scanning

    Issue with negative integer scanning

    Hi again 😺

    You're probably aware of this already, but the scanner is too eager with scanning the '-' character as being part of an integer instead of an operand. For example:

    #include <stdio.h>
    
    int main() {
      printf("%d\n",1-1); // can't parse this, as scanner scans this as 2 consecutive T_INTLITs
      return (0);
    }
    

    Thanks,

    opened by luke-gru 1
  • 19_Arrays_pt1,cgarm is not correct

    19_Arrays_pt1,cgarm is not correct

    after introducing array, code generator for arm seems not changed. And make armtest,input20 also fails with Segmentation fault. I guess function cgglobsym in cg_arm.c requires change.

    opened by drowning-in-codes 0
  • does not generate correct code for Windows

    does not generate correct code for Windows

    I tired to port the compiler to Windws but there are some errors:

    Generated out.s does not compile correctly on Windows.

    GAS(the AT&T asm) on Windows don't work identical, it have a little difference about psedo-ops. So I wrote a cg.c for NASM.

    It passes the compiling stage, but it still faces a runtime error. It seems because of something called shadow storage on Windows:

    The x64 Application Binary Interface (ABI) uses a four-register fast-call calling convention by default. Space is allocated on the call stack as a shadow store for callees to save those registers. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170

    the x64 calling convention on Windows it very different (and hard to understand for me), do you have any idea on fixing this?

    opened by szdytom 0
  • Failure to error on redefinition of variable as function

    Failure to error on redefinition of variable as function

    int foo;
    void foo();
    

    On GCC, this gives:

    test.c:2:6: error: ‘foo’ redeclared as different kind of symbol
        2 | void foo() {}
          |      ^~~
    test.c:1:5: note: previous declaration of ‘foo’ with type ‘int’
        1 | int foo;
          |     ^~~
    

    cwj compiles the file without an error when it probably should error (it already errors if, say, the second declaration is char foo;, so it seems easy to do this with a function redefinition too)

    opened by GabrielRavier 0
  • Compiler hangs forever when compiling simple structure

    Compiler hangs forever when compiling simple structure

    Trying to compile this code:

    struct node
    {
            int count;
            struct node *left;
    } words[14];
    

    Makes cwj (at least the one from chapter 62) hang forever. This seems to be caused by cgglobsym having an inner loop accidentally use the same i for looping, overwriting the value of the one intended for the outer loop (and since the loop bound for the inner loop is lower than the outer loop's bound, i will never reach the outer loop's bound).

    opened by GabrielRavier 0
  • Part 07 precedence of comparisons

    Part 07 precedence of comparisons

    I believe the precedence of arithmetic operations must have greater value than ones of comparison operator. (In OpPrec[])

    For example, if input was,

    print 1 + 2 < 4
    

    then the output assembly is,

    	.text
    .LC0:
    	.string	"%d\n"
    printint:
    	pushq	%rbp
    	movq	%rsp, %rbp
    	subq	$16, %rsp
    	movl	%edi, -4(%rbp)
    	movl	-4(%rbp), %eax
    	movl	%eax, %esi
    	leaq	.LC0(%rip), %rdi
    	movl	$0, %eax
    	call	[email protected]
    	nop
    	leave
    	ret
    
    	.globl	main
    	.type	main, @function
    main:
    	pushq	%rbp
    	movq	%rsp, %rbp
    	movq	$1, %r8
    	movq	$2, %r9
    	movq	$4, %r10
    	cmpq	%r10, %r9
    	setl	%r10b
    	andq	$255,%r10
    	addq	%r8, %r10
    	movq	%r10, %rdi
    	call	printint
    	movl $0, %eax
    	popq %rbp
    	ret
    

    As you can see, they calculate 1 + (2 < 4), not (1 + 2) < 4

    opened by minoring 1
Owner
Warren
Warren
PL/0 to C compiler to teach basic compiler construction from a practical, hands-on perspective.

pl0c pl0c is a compiler for the PL/0 language. It reads in PL/0 source code and outputs equivalent C source code. It was written to be the subject of

Brian Callahan 89 Jun 28, 2022
Tutorial: Writing a "bare metal" operating system for Raspberry Pi 4

Tutorial: Writing a "bare metal" operating system for Raspberry Pi 4

Adam Greenwood-Byrne 2.3k Jun 24, 2022
A reliable and easy to use CPP program header file for simplifying, code writing in cpp

CPP Custom Header This header file main purpose is to implement most famous and most used algorithm that are easy to implement but quite lengthy and t

Jitesh Kumar 1 Dec 22, 2021
a header-only crossplatform type-safe dynamic compiler generator based on C++ 17.

Mu Compiler Generator MuCompilerGenerator(MuCplGen) a Header-Only dynamic compiler generator based on C++ 17. Why MuCplGen? header-only cross-platform

MuGdxy 11 Dec 31, 2021
A BASIC Compiler and IDE for Amiga Computers

AQB: A BASIC Compiler and IDE for Amiga Computers About Project Scope Requirements Installation Benchmark Results Source Code Command Reference Refere

Guenter Bartsch 54 Jun 21, 2022
Colang - Programming language and compiler —WORK IN PROGRESS—

Co programming language Building Initial setup: ./init.sh will install the following into deps/: ckit build tool and rbase library ckit-jemalloc memor

Rasmus 68 Jun 5, 2022
My journey through learning C following the "The ANSI C programming language" book

The ANSI C programming language: Some of the exercises This is a repo containing my attempts at some of the exercices present in the "The ANSI C progr

Radhi SGHAIER 19 May 24, 2022
Just getting started with Data Structure and Algorithms? Make your first contribution here and start the journey of learning DSA.

Getting Started ! ✨ If you are just beginning with open source then let's make your first contribution in this repository ! Contributing Tutorial ?? P

amega 3 Apr 18, 2022
Trying to extract Widewine key: A journey to FaIlUrE

Trying to extract Widewine key: A journey to FaIlUrE Notes This work is based (obviously) on the widevine-l3-decryptor extension. Many parts are the s

null 702 Jun 23, 2022
Take your first step in writing a compiler.

first-step Take your first step in writing a compiler. Building from Source Before building first-step, please make sure you have installed the follow

PKU Compiler Course 26 May 23, 2022
Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

OpenAI 3.7k Jun 20, 2022
Writing a basic compiler frontend following LLVM's tutorial, with complete added supports Hindi and UTF-8 in general

सारस | SARAS Started with following LLVM's tutorial In development, a hobby project only JIT is broken right now, 'jit' branch par code hai uska Compi

Aditya Gupta 4 May 1, 2022
JuCC - Jadavpur University Compiler Compiler

JuCC This is the official Jadavpur University Compiler Compiler repository. Key Features Supports a subset of the C language for now. Custom grammar f

Shuvayan Ghosh Dastidar 34 Jun 4, 2022
PL/0 to C compiler to teach basic compiler construction from a practical, hands-on perspective.

pl0c pl0c is a compiler for the PL/0 language. It reads in PL/0 source code and outputs equivalent C source code. It was written to be the subject of

Brian Callahan 89 Jun 28, 2022
Compiler Design Project: Simulation of front-end phase of C Compiler involving switch-case construct.

CSPC41 Compiler Design Project Assignment Compiler Design Project: Simulation of front-end phase of C Compiler involving switch-case construct. Using

Adeep Hande 1 Dec 15, 2021
The DirectX Shader Compiler project includes a compiler and related tools used to compile High-Level Shader Language (HLSL) programs into DirectX Intermediate Language (DXIL) representation

DirectX Shader Compiler The DirectX Shader Compiler project includes a compiler and related tools used to compile High-Level Shader Language (HLSL) pr

Microsoft 2.2k Jun 27, 2022
A simple C++ library for reading and writing audio files.

AudioFile A simple header-only C++ library for reading and writing audio files. Current supported formats: WAV AIFF Author AudioFile is written and ma

Adam Stark 596 Jun 18, 2022
A C library for reading and writing sound files containing sampled audio data.

libsndfile libsndfile is a C library for reading and writing files containing sampled audio data. Authors The libsndfile project was originally develo

null 1k Jun 21, 2022
Library for writing text-based user interfaces

IMPORTANT This library is no longer maintained. It's pretty small if you have a big project that relies on it, just maintain it yourself. Or look for

null 1.9k Jun 18, 2022
Small header only C++ library for writing multiplatform terminal applications

Terminal Terminal is small header only library for writing terminal applications. It works on Linux, macOS and Windows (in the native cmd.exe console)

Jupyter Xeus 206 Jun 18, 2022
A modern C++ library for reading, writing, and analyzing CSV (and similar) files.

Vince's CSV Parser Motivation Documentation Integration C++ Version Single Header CMake Instructions Features & Examples Reading an Arbitrarily Large

Vincent La 566 Jun 16, 2022
The DirectX Tool Kit (aka DirectXTK) is a collection of helper classes for writing DirectX 11.x code in C++

DirectX Tool Kit for DirectX 11 http://go.microsoft.com/fwlink/?LinkId=248929 Copyright (c) Microsoft Corporation. All rights reserved. January 9, 202

Microsoft 2.1k Jun 22, 2022
C++ (with python bindings) library for easily reading/writing/manipulating common animation particle formats such as PDB, BGEO, PTC. See the discussion group @ http://groups.google.com/group/partio-discuss

Partio - A library for particle IO and manipulation This is the initial source code release of partio a tool we used for particle reading/writing. It

Walt Disney Animation Studios 400 Jun 20, 2022
Reading, writing, and processing images in a wide variety of file formats, using a format-agnostic API, aimed at VFX applications.

README for OpenImageIO Introduction The primary target audience for OIIO is VFX studios and developers of tools such as renderers, compositors, viewer

OpenImageIO 1.5k Jun 29, 2022
C Hypertext Library - A library for writing web applications in C

CHL C Hypertext Library - A library for writing web applications in C #include <chl/chl.h> int main() { chl_set_default_headers(); chl_print_header

null 272 Jun 20, 2022
Library for writing text-based user interfaces

IMPORTANT This library is no longer maintained. It's pretty small if you have a big project that relies on it, just maintain it yourself. Or look for

null 1.9k Jun 18, 2022
Header only library for writing build recipes in C.

nobuild Header only library for writing build recipes in C. Main idea The idea is that you should not need anything but a C compiler to build a C proj

Tsoding 100 Jun 28, 2022
nabs is a single-header library for writing build recipes in C++

nabs is a single-header library for writing build recipes in C++. It is directly inspired by nobuild, but with more feature (bloat) and built-in support for makefile-like dependency resolution.

zhiayang 6 May 8, 2021
LibMEI is a C++ library for reading and writing MEI files

C++ library and Python bindings for the Music Encoding Initiative format

Distributed Digital Music Archives and Libraries Lab 54 May 31, 2022