JuCC - Jadavpur University Compiler Compiler

Overview

JuCC

JuCC logo


build codecov

This is the official Jadavpur University Compiler Compiler repository.

Key Features

  • Supports a subset of the C language for now.
  • Custom grammar files to easily extend the language.
  • LL(1) parsing with panic mode error recovery.
  • Generates .json parse tree outputs for easy visualization with Treant.js.
  • 100% Open Source (Apache-2.0 License)

Quickstart

The JuCC project is built and tested on Ubuntu 20.04.

$ git clone https://github.com/TheSYNcoder/JuCC
$ cd JuCC
$ sudo ./script/installation/packages.sh
$ cd server
$ npm i
$ cd ..
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Release ..
$ ninja jucc
$ ./bin/jucc -g  -f  -o 

To run the unit tests provided,

$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Release ..
$ ninja
$ ./bin/jucc_test

To run the benchmarks, Note: -DCMAKE_BUILD_TYPE=Release is needed

$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Release ..
$ ninja
$ ./bin/jucc_benchmark

Before pushing or making a pull request ( The tests must pass, compulsory !! )

$ ninja
$ ninja check-format
$ ninja check-clang-tidy
$ ninja check-lint
$ ninja test

To add a new unit test, make a folder with the same relative path as in the src folder, and define your test. Please refer to docs for more details about writing tests using the googletest framework.

Additional Notes:

  • If you know what you're doing, install the prerequisite packages from ./script/installation/packages.sh manually.

For Developers

Please see the docs.

Contributing

Contributions from everyone are welcome!

Comments
  • Feature : Parsing Implementation.

    Feature : Parsing Implementation.

    Parsing and Parsing table

    This PR is for the implementation of the parsing process and implementation of the parsing table.

    Remaining tasks

    • [x] Parsing table implementation
    • [x] Parsing
    • [x] Integrate it from the main function with all other modules.
    • [x] Write more tests.

    Further work

    • A module to parse an entire input file.
    • A web-app ( Distant future)
    enhancement 
    opened by TheSYNcoder 7
  • Feature :  symbol table Implementation

    Feature : symbol table Implementation

    Heading

    Symbol Table Implementation

    Description

    This PR is for the implementation of the symbol table.

    Remaining Tasks

    • [x] Implement all methods
    • [x] Fix clang errors
    • [x] Fix memory leaks
    • [x] Integrate with lexer.
    enhancement 
    opened by TheSYNcoder 7
  • Fixed: Lexer and added more tests

    Fixed: Lexer and added more tests

    Fixed: Some issues with Lexer

    Improved lexer.

    Description

    Improved Lexer, added namespace jucc::lexer, added more tests. Improved codecov for lexer.cpp towards 100% Updated grammar Parser.

    Remaining tasks

    • [ ] Review Required
    bug 
    opened by noob77777 6
  • Fix: grammar.g : constrain main function naming

    Fix: grammar.g : constrain main function naming

    Updating grammar and lexer

    Description

    Adds improvements and fixes to lexer and parsing grammar. resolves #13

    Remaining tasks

    • [x] Fix grammar
    • [x] Update lexer
    enhancement 
    opened by noob77777 4
  • Fixed: grammar.g is now ll(1) after left recursion removal and left f…

    Fixed: grammar.g is now ll(1) after left recursion removal and left f…

    Grammar is now ll(1)

    Description

    grammar.g is updated to a ll(1) context-free-grammar after left-recursion removal and left-factoring.

    Changes

    1. Removed left recursion for <init_declarator_list> and <block_item_list>
    2. Removed <cast_expression>
    3. Removed support for assignment operator:
    x = 5; // not supported
    int x = 5; // supported. grammar limited to declarations
    
    1. Support dropped for the following:
    // this is not supported.
    if ( <assignment_expression> )
        <statement>
    else
        <statement>
    
    // this is supported
    if ( <assigment_expression> ) {
        <statements>
    } else {
        <statements>
    }
    

    resolves #27

    bugfix 
    opened by noob77777 3
  • FIX THE PARSER

    FIX THE PARSER

    BUG REPORT: FIX THE PARSER

    OS: Ubuntu (LTS) 20.04 or macOS 10.14+ (please specify version). Compiler: GCC 7.0+ or Clang 8.0+. CMake Profile: all

    Steps to Reproduce

    inputs

    grammar.g

    ## This is the grammar file for JuCC
    ## Edit this file to make changes to the parsing grammar
    ## Epsilon is represented by special string EPSILON
    
    ## Terminals
    %terminals
    else float if int void
    ( ) { } * + - / % ,
    << >> < > <= >= = == != ;
    identifier integer_constant float_constant
    main cin cout
    %
    
    ## Non Terminals
    %non_terminals
    <primary_expression> <constant> <unary_operator> <unary_expression>
    <type_specifier> <multiplicative_expression> <additive_expression>
    <shift_expression> <relational_expression> <equality_expression>
    <assignment_expression> <expression>
    <declaration> <init_declarator_list> <init_declarator>
    <initializer> <declarator> <direct_declarator>
    <statement> <compound_statement> <block_item_list> <block_item>
    <expression_statement> <selection_statement> <program>
    %
    
    ## Start Symbol
    %start
    <program>
    %
    
    ## Grammar for the language
    %rules
    ## Expressions
    <primary_expression> : identifier
    <primary_expression> : <constant>
    <primary_expression> : ( <expression> )
    <constant> : integer_constant
    <constant> : float_constant
    <unary_operator> : +
    <unary_operator> : -
    <unary_expression> : <primary_expression>
    <unary_expression> : <unary_operator> <primary_expression>
    <multiplicative_expression> : <unary_expression>
    <multiplicative_expression> : <multiplicative_expression> * <unary_expression>
    <multiplicative_expression> : <multiplicative_expression> / <unary_expression>
    <multiplicative_expression> : <multiplicative_expression> % <unary_expression>
    <additive_expression> : <multiplicative_expression>
    <additive_expression> : <additive_expression> + <multiplicative_expression>
    <additive_expression> : <additive_expression> - <multiplicative_expression>
    <shift_expression> : <additive_expression>
    <shift_expression> : cin >> <additive_expression>
    <shift_expression> : cout << <additive_expression>
    <shift_expression> : <shift_expression> << <additive_expression>
    <shift_expression> : <shift_expression> >> <additive_expression>
    <relational_expression> : <shift_expression>
    <relational_expression> : <relational_expression> < <shift_expression>
    <relational_expression> : <relational_expression> > <shift_expression>
    <relational_expression> : <relational_expression> <= <shift_expression>
    <relational_expression> : <relational_expression> >= <shift_expression>
    <equality_expression> : <relational_expression>
    <equality_expression> : <equality_expression> == <relational_expression>
    <equality_expression> : <equality_expression> != <relational_expression>
    <assignment_expression> : <equality_expression>
    <assignment_expression> : <assignment_expression> = <equality_expression>
    <expression> : <assignment_expression>
    
    ## Declarations
    <declaration> : <type_specifier> <init_declarator_list> ;
    <init_declarator_list> : <init_declarator>
    <init_declarator_list> : <init_declarator_list> , <init_declarator>
    <init_declarator_list> : EPSILON
    <init_declarator> : <declarator>
    <init_declarator> : <declarator> = <initializer>
    <type_specifier> : void
    <type_specifier> : int
    <type_specifier> : float
    <declarator> : <direct_declarator>
    <direct_declarator> : identifier
    <direct_declarator> : ( <declarator> )
    <initializer> : <assignment_expression>
    
    ## Statements
    <statement> : <compound_statement>
    <statement> : <expression_statement>
    <statement> : <selection_statement>
    <compound_statement> : { <block_item_list> }
    <block_item_list> : <block_item>
    <block_item_list> : <block_item> <block_item_list> 
    <block_item_list> : EPSILON
    <block_item> : <declaration>
    <block_item> : <statement>
    <expression_statement> : <expression> ;
    <expression_statement> : ;
    <selection_statement> : if ( <expression> ) <compound_statement>
    <selection_statement> : if ( <expression> ) <compound_statement> else <compound_statement>
    
    ## Main
    <program> : <type_specifier> main ( ) <compound_statement>
    %
    

    intest1.cc

    int main() {
    	int x = 1 + 2 + 3 + 4;
    	if 1;
    }
    
    

    intest2.cc

    int
    
    

    Expected output:

    Some error :anger:

    Current Output:

    Segmentation fault (core dumped)
    
    bug needs-immediate-attention 
    opened by noob77777 3
  • Improvement in grammar.

    Improvement in grammar.

    Summary

    No cout, cin terminals in current grammar, in src/grammar/grammar.g

    Solution

    The grammar should be able to support cout, cin and should have its respective rules.


    bug 
    opened by TheSYNcoder 3
  • Fix: grammar and lexer

    Fix: grammar and lexer

    Bug Report

    Make grammar and lexer consistent with problem statement.

    Summary

    Some features are still missing in main branch grammar and lexer.

    Data Type : integer (int), floating point (float) and void Declaration statements : identifiers are declared in declaration statements as basic data types and may also be assigned constant values (integer of floating) Condition constructs: if, else, nested statements are supported. There may be if statement without else statement. Assignments to the variables are performed using the input / output constructs: cin >> x - Read into variable x cout << x - Write variable x to output Only arithmetic operators {+, -, *, %} and assignment operator `=’ are supported Relational operators used in the if statement are < (less than), > (greater than), == (equal) and != (not equal) Only function is main(), there is no other function. The main() function does not contain arguments and no return statements.

    Expected Behavior

    Rule for following operators should be present in grammar:

    • %

    Support for additional tokens:

    • >=
    • <=
    • !=
    • +
    • -
    • *
    • /
    • %

    Solution

    Update grammar.g file and lexer.

    bug 
    opened by noob77777 3
  • Left recursion, Left Factoring, Trie, Unit Tests

    Left recursion, Left Factoring, Trie, Unit Tests

    Left Recursion

    Description

    • [x] Left Recursion Done
    • [ ] Indirect left recursion (if required, after going through the grammar, a decision needs to be made).
    enhancement ready-to-be-squashed 
    opened by bisakhmondal 3
  • Fixed: linking errors

    Fixed: linking errors

    Fix: Lexer on Main

    Description

    Fixing linking and formatting issues in main branch.

    Remaining tasks

    • [x] Resolve clang-tidy issues
    • [x] Add more tests
    documentation enhancement 
    opened by noob77777 3
  • Duplicate Symbols in Symbol Table

    Duplicate Symbols in Symbol Table

    Bug Report

    Summary

    Symbol Table bug

    Environment

    To address the bug, especially if it environment specific, we need to know what kind of configuration you are running on. Please include the following:

    OS: Ubuntu (LTS) 20.04 or macOS 10.14+ (please specify version).

    Compiler: GCC 7.0+ or Clang 8.0+.

    CMake Profile: all

    Steps to Reproduce

    
    int main() {
        int x, y;
        cin >> x >> y;
        if (x != 0) {
            if (y > 0) {
                cout << y;
            } else {
                cout << -y;
            }
        }
        float z = 1 + 2 + 3 + 1000/ 50 * 23.2 * (x * y * 10);
        // cout << x << y << z;
        float z0 = 1 + 2 + 3 + 1000/ 50 * 23.2 * (x * y * 10);
        float z1 = 1 + 2 - 3 + 1000/ 50 * 23.2 * (x * y * 10);
        float z2 = 1 + 2 / 3 + 1000/ 50 * 23.2 * (x * y * 10);
        float z3 = 1 + 2 * 3 + 1000/ 50 * 23.2 * (x * y * 10);
        float z4 = 1 + 2 % 3 + 1000/ 50 * 23.2 * (x * y * 10);
        float z5 = 1 + 2 > 3 + 1000/ 50 * 23.2 * (x * y * 10);
        float z6 = 1 + 2 == 3 + 1000/ 50 * 23.2 * (x * y * 10);
        float z7 = 1 + 2 != 3 + 1000/ 50 * 23.2 * (x * y * 10);
        float z8 = 1 + 2 >= 3 + 1000/ 50 * 23.2 * (x * y * 10);
        float z9 = 1 + 2 <= 3 + 1000/ 50 * 23.2 * (x * y * 10);
        cout << z0 << z1 << z2 << z3 << z4;
        cout << z5 << z6 << z7 << z8 << z9;
    }
    
    

    Expected Behavior

    ok

    Actual Behavior

    jucc: duplicate symbol: x duplicate symbol: y duplicate symbol: x duplicate symbol: y duplicate symbol: x duplicate symbol: y duplicate symbol: x duplicate symbol: y duplicate symbol: x duplicate symbol: y duplicate symbol: x duplicate symbol: y duplicate symbol: x duplicate symbol: y duplicate symbol: x duplicate symbol: y duplicate symbol: x duplicate symbol: y duplicate symbol: x duplicate symbol: y duplicate symbol: x duplicate symbol: y

    bug 
    opened by noob77777 2
  • [Snyk] Security upgrade ubuntu from 20.04 to rolling

    [Snyk] Security upgrade ubuntu from 20.04 to rolling

    Keeping your Docker base image up-to-date means you’ll benefit from security fixes in the latest version of your chosen image.

    Changes included in this PR

    • Dockerfile

    We recommend upgrading to ubuntu:rolling, as this image has only 19 known vulnerabilities. To do this, merge this pull request, then verify your application still works as expected.

    Some of the most important vulnerabilities in your base image include:

    | Severity | Priority Score / 1000 | Issue | Exploit Maturity | | :------: | :-------------------- | :---- | :--------------- | | medium severity | 586 | CVE-2021-3996
    SNYK-UBUNTU2004-UTILLINUX-2387723 | No Known Exploit | | medium severity | 586 | CVE-2021-3996
    SNYK-UBUNTU2004-UTILLINUX-2387723 | No Known Exploit | | medium severity | 586 | CVE-2021-3995
    SNYK-UBUNTU2004-UTILLINUX-2387728 | No Known Exploit | | medium severity | 586 | CVE-2021-3995
    SNYK-UBUNTU2004-UTILLINUX-2387728 | No Known Exploit | | medium severity | 586 | CVE-2021-3995
    SNYK-UBUNTU2004-UTILLINUX-2387728 | No Known Exploit |


    Note: You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.

    For more information: 🧐 View latest project report

    🛠 Adjust project settings

    opened by snyk-bot 0
  • Add CI for macOS

    Add CI for macOS

    Currently, the CI is only running on ubuntu 20.04. However, JUCC has been tested on macOS with the required dependencies.

    Maybe it's time to add a CI and put the project on a temporary hold 🍺 Only bug fixes, no new feature for some time.

    enhancement 
    opened by bisakhmondal 0
  • Improve Symbol Table

    Improve Symbol Table

    Improvement of Symbol Table

    Summary

    The current implementation of the symbol tables deletes the lexemes on scope end and thus cannot be used for look-up after the lexer phase.

    Solution

    A better implementation would be an introduction of visibility flags - to be set to false on scope end rather than deleting.

    enhancement 
    opened by TheSYNcoder 1
  • Improvement of lexer to report detailed errors.

    Improvement of lexer to report detailed errors.

    Improve lexer

    Summary

    The lexer should be able to report the errors in parsing, and also report the line numbers of the input file associated with the same.

    Solution

    A proper structured error consisting of line number and column number ( if possible ) and the associated error message.

    enhancement 
    opened by TheSYNcoder 0
Owner
Shuvayan Ghosh Dastidar
Life is all about coffee and hence earning to afford some more .
Shuvayan Ghosh Dastidar
A Small C Compiler

8cc C Compiler Note: 8cc is no longer an active project. The successor is chibicc. 8cc is a compiler for the C programming language. It's intended to

Rui Ueyama 5.8k Jan 8, 2023
A C compiler for LLVM. Supports C++11/14/1z C11

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies

LLVM 17.5k Jan 8, 2023
GNU Prolog is a native Prolog compiler

GNU Prolog is a native Prolog compiler with constraint solving over finite domains (FD)

Daniel Diaz 64 Dec 13, 2022
Take your first step in writing a compiler.

first-step Take your first step in writing a compiler. Building from Source Before building first-step, please make sure you have installed the follow

PKU Compiler Course 28 Aug 20, 2022
NCC is an ANSI/ISO-compliant optimizing C compiler.

The compiler is retargetable by design, but, at present, it only produces binaries for Linux/x86_64. As the compiler ABI differs somewhat from the System V ABI used by Linux, its code cannot be linked against Linux system libraries. It does, however, provide its own (incomplete) standard ANSI/Posix C library.

Charles E. Youse 0 Apr 1, 2022
nanoc is a tiny subset of C and a tiny compiler that targets 32-bit x86 machines.

nanoc is a tiny subset of C and a tiny compiler that targets 32-bit x86 machines. Tiny? The only types are: int (32-bit signed integer) char (8-

Ajay Tatachar 19 Nov 28, 2022
Smaller C is a simple and small single-pass C compiler

Smaller C is a simple and small single-pass C compiler, currently supporting most of the C language common between C89/ANSI C and C99 (minus some C89 and plus some C99 features).

Alexey Frunze 1.2k Jan 7, 2023
Microvm is a virtual machine and compiler

The aim of this project is to create a stack based language and virtual machine for microcontrollers. A mix of approaches is used. Separate memory is used for program and variable space (Harvard architecture). An interpreter, virtual machine and compiler are available. A demostration of the interpreter in action is presented below.

null 10 Aug 14, 2022
yadcc - Yet Another Distributed C++ Compiler

Yet Another Distributed C++ Compiler. yadcc是一套腾讯广告自研的分布式编译系统,用于支撑腾讯广告的日常开发及流水线。相对于已有的同类解决方案,我们针对实际的工业生产环境做了性能、可靠性、易用性等方面优化。

Tencent 276 Dec 29, 2022
Aheui JIT compiler for PC and web

아희짓 개요 아희짓은 아희 언어를 위한 JIT (Just in Time) 컴파일러입니다. 어셈블러와 유틸 라이브러리외에 외부 라이브러리에 전혀 의존하지 않고 JIT을 바닥부터 구현합니다. 지원 환경 64비트 windows, mac, linux (x86 아키텍쳐) 웹어셈

Sunho Kim 28 Sep 23, 2022
C implementation of the Tiny BASIC compiler found in an article by Dr. Austin Henley

Teeny Tiny Basic A C implementation of the Tiny BASIC compiler found in this article and this github repo by Dr. Austin Henley. I did pretty well in A

Gavin Morris 7 Oct 4, 2022
bcc is an interactive compiler of a language called b.

bcc is an interactive compiler of a language called b.

kparc 18 Nov 7, 2022
This is a compiler written from scratch in C

C Compiler This is a compiler written from scratch in C, with fully supporting C18 as a goal. It can currently compile itself, and most simple program

null 29 Jan 6, 2023
mrcceppc is a reimplementation project for the Metrowerks mwcceppc compiler.

Compiler | mrcceppc mrcceppc is a reimplementation project for the Metrowerks mwcceppc compiler. Compiling Run generate_{version}.bat for which versio

null 9 Nov 21, 2022
A C header that allow users to compile brainfuck programs within a C compiler.

brainfuck.h A C header that allow users to compile brainfuck programs within a C compiler. You can insert the header into the top of your brainfuck so

null 1 Dec 30, 2022
Gilbraltar is a version of the OCaml compiler to be able to build a MirageOS for RaspberryPi 4.

Gilbraltar is a version of the OCaml compiler to be able to build a MirageOS for RaspberryPi 4. It's a work in progress repository to provide a dune's toolchain (as ocaml-freestanding) specialized for Raspberry Pi 4.

Calascibetta Romain 49 Oct 30, 2022
NaiveCC: a compiler frontend for a subset of C

NaiveCC: a compiler frontend for a subset of C This is a toy compiler frontend for a subset of the C programming language based on the LR(1) parsing t

Yuxiang Wei 1 Nov 15, 2021
A LLVM and Clang compiler toolchain built for kernel development

Cosmic-Clang Toolchain This is a LLVM and Clang compiler toolchain built for kernel development. Builds are always made from the latest LLVM sources r

Ǥђ๏ຮ₮⌁Ⲙครtє࿐ 0 Apr 12, 2022
A compiler written in C++ to convert .hoi code to hoi4 code

What is HC4? HC4 is a compiler that converts .hoic filenames to Hearts of Iron IV's .txt. Usage Use hc4 in the terminal (./hc4 if on Unix) and it will

SaCode 1 Jul 31, 2022