A Compiler Writing Journey

Overview

A Compiler Writing Journey

In this Github repository, I'm documenting my journey to write a self-compiling compiler for a subset of the C language. I'm also writing out the details so that, if you want to follow along, there will be an explanation of what I did, why, and with some references back to the theory of compilers.

But not too much theory, I want this to be a practical journey.

Here are the steps I've taken so far:

There isn't a schedule or timeline for the future parts, so just keep checking back here to see if I've written any more.

Copyrights

I have borrowed some of the code, and lots of ideas, from the SubC compiler written by Nils M Holm. His code is in the public domain. I think that my code is substantially different enough that I can apply a different license to my code.

Unless otherwise noted,

  • all source code and scripts are (c) Warren Toomey under the GPL3 license.
  • all non-source code documents (e.g. English documents, image files) are (c) Warren Toomey under the Creative Commons BY-NC-SA 4.0 license.
Comments
  • why not generate intermediate code ?

    why not generate intermediate code ?

    This project is excellent! I learn a lot from it. But I am curious why not generate intermediate code in your project? Will it look a little "incomplete" ?

    opened by dslu7733 3
  • Is there a reason that a lot of things are statically allocated rather than dynamically allocated?

    Is there a reason that a lot of things are statically allocated rather than dynamically allocated?

    For example, the global symbol table has a max number of entries, I've been implementing everything dynamically and reallocing as necessary, however I am worried that I have missed some important reason for statically allocating memory.

    opened by Jachdich 2
  •  Can i repost project to my blog?

    Can i repost project to my blog?

    Hello, I am a developer from China. Now, i am following your project to study. I like your project very much. Excuse me, can I translate and repost your project to my blog. Looking forward to your reply. Thanks !

    opened by Shaw9379 2
  • unsupported symbol modifier in branch relocation: “call printf@PLT”

    unsupported symbol modifier in branch relocation: “call [email protected]

    Firstly, thank you for sharing this great compiler writing jounery, I'm rather interested!

    I'm in chapter 4 (generating assembly code), and trying to follow the code.

    Here's the out assembly codes (saved in file "out.s"):

    	.text
    LC0:
    	.string	"%d\n"
    _printint:
    	pushq	%rbp
    	movq	%rsp, %rbp
    	subq	$16, %rsp
    	movl	%edi, -4(%rbp)
    	movl	-4(%rbp), %eax
    	movl	%eax, %esi
    	leaq	LC0(%rip), %rdi
    	movl	$0, %eax
    	call	[email protected]
    	nop
    	leave
    	ret
    
    	.globl	_main
    _main:
    	pushq	%rbp
    	movq	%rsp, %rbp
    	movq	$2, %r8
    	movq	$3, %r9
    	movq	$5, %r10
    	imulq	%r9, %r10
    	addq	%r8, %r10
    	movq	$8, %r8
    	movq	$3, %r9
    	movq	%r8,%rax
    	cqo
    	idivq	%r9
    	movq	%rax,%r8
    	subq	%r8, %r10
    	movq	%r10, %rdi
    	call	_printint
    	movl	$0, %eax
    	popq	%rbp
    	ret
    

    When running it with

    cc -o out out.s

    , it complained with

    out.s:13:2: error: unsupported symbol modifier in branch relocation
     call [email protected]
     ^
    

    How is the error happened and how to fix it?

    Thanks in advance!

    PS:

    1. I am using macOS Catalina 10.15.2
    2. cc version is
    Apple clang version 11.0.0 (clang-1100.0.33.17)
    Target: x86_64-apple-darwin19.2.0
    Thread model: posix
    InstalledDir: /Library/Developer/CommandLineTools/usr/bin
    
    1. The souce codes are compiled with cc -o comp1 -g cg.c expr.c gen.c main.c scan.c tree.c
    opened by MichaelScofield 2
  • Compilation error: data.h

    Compilation error: data.h

    This is probably a problem with my system. Starting from 05_Statements I'm getting the error:

    data.h:10:9: error: unknown type name ‘FILE’
       10 | extern_ FILE *Infile;                   // Input and output files
          |         ^~~~
    data.h:11:9: error: unknown type name ‘FILE’
       11 | extern_ FILE *Outfile;
          |         ^~~~
    data.h:13:19: error: ‘TEXTLEN’ undeclared here (not in a function)
       13 | extern_ char Text[TEXTLEN + 1];         // Last identifier scanned
          |                   ^~~~~~~
    data.h:14:30: error: ‘NSYMBOLS’ undeclared here (not in a function)
       14 | extern_ struct symtable Gsym[NSYMBOLS]; // Global symbol table
    

    Can anyone point me to the right direction. Thank you.

    opened by prince-ao 1
  • Part 08 IF statements

    Part 08 IF statements

    In the file cg.c at the line 176 you defined the following array for the inverted jump instructions:

    // List of inverted jump instructions,
    // in AST order: A_EQ, A_NE, A_LT, A_GT, A_LE, A_GE
    static char *invcmplist[] = { "jne", "je", "jge", "jle", "jg", "jl" };
    

    I think you inverted the jump instructions for greater/less than and greater/less equal.

    In your code A_LT defines his inverse as "jge" and A_LE as "jg" and the same is with A_GT and A_GE which are respectively defined as "jle" and "jl".

    Looking on various documentation online you can find that JL is defined as Jump short if Less and JLE as Jump short if Less or Equal. The same applies for the greater than/equal jumps, JG and JGE.

    This means you defined the inverse for A_LT (less than) the jump if less or equal than and for A_LE (less equal) the jump if less than. As mentioned above the same error is done for the greater than/equal jumps.

    The array should then be corrected as following:

    // List of inverted jump instructions,
    // in AST order: A_EQ, A_NE, A_LT, A_GT, A_LE, A_GE
    static char *invcmplist[] = { "jne", "je", "jg", "jl", "jge", "jle" };
    

    I've looked through all the parts of the tutorial and the error is still present in the last part (62 Cleanup).

    opened by gioele97 1
  • docs: fix simple typo, identifer -> identifier

    docs: fix simple typo, identifer -> identifier

    There is a small typo in 06_Variables/misc.c, 07_Comparisons/misc.c, 08_If_Statements/misc.c, 09_While_Loops/misc.c, 10_For_Loops/misc.c, 11_Functions_pt1/misc.c, 12_Types_pt1/misc.c, 13_Functions_pt2/misc.c, 14_ARM_Platform/misc.c, 15_Pointers_pt1/misc.c, 16_Global_Vars/Readme.md, 16_Global_Vars/misc.c, 17_Scaling_Offsets/misc.c, 18_Lvalues_Revisited/misc.c, 19_Arrays_pt1/misc.c, 20_Char_Str_Literals/misc.c, 21_More_Operators/misc.c, 23_Local_Variables/misc.c, 24_Function_Params/misc.c, 25_Function_Arguments/misc.c, 26_Prototypes/misc.c, 27_Testing_Errors/misc.c, 28_Runtime_Flags/misc.c, 29_Refactoring/misc.c, 30_Design_Composites/Readme.md, 30_Design_Composites/misc.c, 31_Struct_Declarations/misc.c, 32_Struct_Access_pt1/Readme.md, 32_Struct_Access_pt1/expr.c, 32_Struct_Access_pt1/misc.c, 33_Unions/expr.c, 33_Unions/misc.c, 34_Enums_and_Typedefs/expr.c, 34_Enums_and_Typedefs/misc.c, 35_Preprocessor/expr.c, 35_Preprocessor/misc.c, 36_Break_Continue/expr.c, 36_Break_Continue/misc.c, 37_Switch/expr.c, 37_Switch/misc.c, 38_Dangling_Else/expr.c, 38_Dangling_Else/misc.c, 39_Var_Initialisation_pt1/expr.c, 39_Var_Initialisation_pt1/misc.c, 40_Var_Initialisation_pt2/expr.c, 40_Var_Initialisation_pt2/misc.c, 41_Local_Var_Init/expr.c, 41_Local_Var_Init/misc.c, 42_Casting/expr.c, 42_Casting/misc.c, 43_More_Operators/expr.c, 43_More_Operators/misc.c, 44_Fold_Optimisation/expr.c, 44_Fold_Optimisation/misc.c, 45_Globals_Again/expr.c, 45_Globals_Again/misc.c, 46_Void_Functions/expr.c, 46_Void_Functions/misc.c, 47_Sizeof/expr.c, 47_Sizeof/misc.c, 48_Static/expr.c, 48_Static/misc.c, 49_Ternary/expr.c, 49_Ternary/misc.c, 50_Mop_up_pt1/expr.c, 50_Mop_up_pt1/misc.c, 51_Arrays_pt2/expr.c, 51_Arrays_pt2/misc.c, 52_Pointers_pt2/Readme.md, 52_Pointers_pt2/misc.c, 53_Mop_up_pt2/misc.c, 54_Reg_Spills/misc.c, 56_Local_Arrays/misc.c, 57_Mop_up_pt3/misc.c, 58_Ptr_Increments/misc.c, 59_WDIW_pt1/misc.c, 60_TripleTest/misc.c, 62_Cleanup/misc.c.

    Should read identifier rather than identifer.

    opened by timgates42 1
  • Part 7: Comparison Operators

    Part 7: Comparison Operators

    the comparison operators have higher precedence than multiply and divide?

    The expression '10 * 3 > 2 * 5' means '10 * 1 * 5= 15' ???

    I don't agree with that. C Operator Precedence table is in descending precedence. That means comparison operators has lower precedence than mul and div.

    precedence

    opened by alex-xia-xia 1
  • confusion on part8 in README.md

    confusion on part8 in README.md

      if (condition is true) 
        perform this first block of code
      else
        perform this other block of code
    

    In readme, you write it like the following.

           perform the opposite comparison
           jump to L1 if true
           perform the first block of code
           jump to L2
    L1:
           perform the other block of code
    L2:
    

    But there seems some wrong in jump to L1 if true. Do you mean jump to L1 if false ?

    opened by dslu7733 1
  • Part 7: register's b-suffix not mentioned in doc

    Part 7: register's b-suffix not mentioned in doc

    The variable breglist is declared here but not mentioned, and just shows up from nowhere in the doc.

    https://github.com/DoctorWkt/acwj/blob/b11b269c71727297b367203eaed164364291db16/07_Comparisons/cg.c#L13

    IMO it would better to mention the b-suffix somewhere before breglist appears in the journey.

    opened by Psycho7 1
  • Issue with negative integer scanning

    Issue with negative integer scanning

    Hi again 😺

    You're probably aware of this already, but the scanner is too eager with scanning the '-' character as being part of an integer instead of an operand. For example:

    #include <stdio.h>
    
    int main() {
      printf("%d\n",1-1); // can't parse this, as scanner scans this as 2 consecutive T_INTLITs
      return (0);
    }
    

    Thanks,

    opened by luke-gru 1
  • 26 Function prototypes

    26 Function prototypes

    I may be missing something here, but I think there's a typo in the code to check function parameters vs a prototype...

    Screen Shot 2022-12-27 at 11 04 20 AM

    It looks as though the type for each argument is matched against the type of the function (Symtable[id].type), not the parameter in the prototype (which would be indexed by param_id)

    opened by ThrudTheBarbarian 0
  • Fix modify_type

    Fix modify_type

    We check both operands' type around = in modify_type() when the op is A_LOGOR or A_LOGAND, but it was misspelled to ltype when checking the right type. Also, I fix Now now to Not now in README file. fix #53

    opened by MikasaAkerman 0
  • Misspells in Part 53

    Misspells in Part 53

    1. if (!inttype(ltype) && !ptrtype(rtype)) should be if (!inttype(rtype) && !ptrtype(rtype)) in modify_type().
    2. Now now should be Not now in README file.
    opened by MikasaAkerman 0
  • Part 48 Static

    Part 48 Static

    There is no need to call freestaticsyms() in do_compile() function. Because we already had clear_symtable() before we parse the source file. And clear_symtable() will clear all syms in Globhead chain.

    opened by MikasaAkerman 0
  • 19_Arrays_pt1,cgarm is not correct

    19_Arrays_pt1,cgarm is not correct

    after introducing array, code generator for arm seems not changed. And make armtest,input20 also fails with Segmentation fault. I guess function cgglobsym in cg_arm.c requires change.

    opened by drowning-in-codes 0
Owner
Warren
Warren
PL/0 to C compiler to teach basic compiler construction from a practical, hands-on perspective.

pl0c pl0c is a compiler for the PL/0 language. It reads in PL/0 source code and outputs equivalent C source code. It was written to be the subject of

Brian Callahan 100 Dec 30, 2022
Tutorial: Writing a "bare metal" operating system for Raspberry Pi 4

Tutorial: Writing a "bare metal" operating system for Raspberry Pi 4

Adam Greenwood-Byrne 2.5k Dec 31, 2022
A reliable and easy to use CPP program header file for simplifying, code writing in cpp

CPP Custom Header This header file main purpose is to implement most famous and most used algorithm that are easy to implement but quite lengthy and t

Jitesh Kumar 1 Dec 22, 2021
a header-only crossplatform type-safe dynamic compiler generator based on C++ 17.

Mu Compiler Generator MuCompilerGenerator(MuCplGen) a Header-Only dynamic compiler generator based on C++ 17. Why MuCplGen? header-only cross-platform

MuGdxy 11 Dec 31, 2021
A BASIC Compiler and IDE for Amiga Computers

AQB: A BASIC Compiler and IDE for Amiga Computers About Project Scope Requirements Installation Benchmark Results Source Code Command Reference Refere

Guenter Bartsch 59 Dec 30, 2022
Colang - Programming language and compiler —WORK IN PROGRESS—

Co programming language Building Initial setup: ./init.sh will install the following into deps/: ckit build tool and rbase library ckit-jemalloc memor

Rasmus 70 Dec 5, 2022
My journey through learning C following the "The ANSI C programming language" book

The ANSI C programming language: Some of the exercises This is a repo containing my attempts at some of the exercices present in the "The ANSI C progr

Radhi SGHAIER 22 Nov 4, 2022
Just getting started with Data Structure and Algorithms? Make your first contribution here and start the journey of learning DSA.

Getting Started ! ✨ If you are just beginning with open source then let's make your first contribution in this repository ! Contributing Tutorial ?? P

amega 3 Apr 18, 2022
Trying to extract Widewine key: A journey to FaIlUrE

Trying to extract Widewine key: A journey to FaIlUrE Notes This work is based (obviously) on the widevine-l3-decryptor extension. Many parts are the s

null 745 Dec 14, 2022
Take your first step in writing a compiler.

first-step Take your first step in writing a compiler. Building from Source Before building first-step, please make sure you have installed the follow

PKU Compiler Course 28 Aug 20, 2022
Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

OpenAI 4.6k Dec 26, 2022
Writing a basic compiler frontend following LLVM's tutorial, with complete added supports Hindi and UTF-8 in general

सारस | SARAS Started with following LLVM's tutorial In development, a hobby project only JIT is broken right now, 'jit' branch par code hai uska Compi

Aditya Gupta 4 May 1, 2022
JuCC - Jadavpur University Compiler Compiler

JuCC This is the official Jadavpur University Compiler Compiler repository. Key Features Supports a subset of the C language for now. Custom grammar f

Shuvayan Ghosh Dastidar 36 Sep 1, 2022
PL/0 to C compiler to teach basic compiler construction from a practical, hands-on perspective.

pl0c pl0c is a compiler for the PL/0 language. It reads in PL/0 source code and outputs equivalent C source code. It was written to be the subject of

Brian Callahan 100 Dec 30, 2022
Compiler Design Project: Simulation of front-end phase of C Compiler involving switch-case construct.

CSPC41 Compiler Design Project Assignment Compiler Design Project: Simulation of front-end phase of C Compiler involving switch-case construct. Using

Adeep Hande 1 Dec 15, 2021
The DirectX Shader Compiler project includes a compiler and related tools used to compile High-Level Shader Language (HLSL) programs into DirectX Intermediate Language (DXIL) representation

DirectX Shader Compiler The DirectX Shader Compiler project includes a compiler and related tools used to compile High-Level Shader Language (HLSL) pr

Microsoft 2.4k Jan 3, 2023
A simple C++ library for reading and writing audio files.

AudioFile A simple header-only C++ library for reading and writing audio files. Current supported formats: WAV AIFF Author AudioFile is written and ma

Adam Stark 683 Jan 4, 2023
A C library for reading and writing sound files containing sampled audio data.

libsndfile libsndfile is a C library for reading and writing files containing sampled audio data. Authors The libsndfile project was originally develo

null 1.1k Jan 2, 2023
Library for writing text-based user interfaces

IMPORTANT This library is no longer maintained. It's pretty small if you have a big project that relies on it, just maintain it yourself. Or look for

null 1.9k Dec 22, 2022
Small header only C++ library for writing multiplatform terminal applications

Terminal Terminal is small header only library for writing terminal applications. It works on Linux, macOS and Windows (in the native cmd.exe console)

Jupyter Xeus 274 Jan 2, 2023