The PULP Ara is a 64-bit Vector Unit, compatible with the RISC-V Vector Extension Version 0.9, working as a coprocessor to CORE-V's CVA6 core

Overview

Ara

Ara is a vector unit working as a coprocessor for the CVA6 core. It supports the RISC-V Vector Extension, version 0.9.

Dependencies

Check DEPENDENCIES.md for a list of hardware and software dependencies of Ara.

Supported instructions

Check FUNCTIONALITIES.md to check which instructions are currently support by Ara.

Get started

Make sure you clone this repository recursively to get all the necessary submodules:

git submodule update --init --recursive

If the repository path of any submodule changes, run the following command to change your submodule's pointer to the remote repository:

git submodule sync --recursive

Toolchain

Ara requires a RISC-V GCC toolchain capable of understanding the vector extension, version 0.9.x.

To build this toolchain, run the following command in the project's root directory.

# Build the GCC toolchain
make toolchain

Verilator

Ara requires an updated version of Verilator, for RTL simulations.

To build it, run the following command in the project's root directory.

# Build Verilator
make verilator

Configuration

Ara's parameters are centralized in the config folder, in the config.mk file. Please check config/README.md for more details.

Software

Build Applications

The apps folder contains example applications that work on Ara. Run the following command to build an application. E.g., hello_world:

cd apps
make bin/hello_world

RISC-V Tests

The apps folder also contains the RISC-V tests repository, including a few unit tests for the vector instructions. Run the following command to build the unit tests:

cd apps
make riscv_tests

RTL Simulation

To simulate the Ara system with ModelSim, go to the hardware folder, which contains all the SystemVerilog files. Use the following command to run your simulation:

# Go to the hardware folder
cd hardware
# Apply the patches (only need to run this once)
make apply-patches
# Only compile the hardware without running the simulation.
make build
# Run the simulation with the *hello_world* binary loaded
app=hello_world make sim
# Run the simulation with the *some_binary* binary. This allows specifying the full path to the binary
preload=/some_path/some_binary make sim
# Run the simulation without starting the gui
app=hello_world make simc

It is also possible to simulate the unit tests compiled in the apps folder. Given the number of unit tests, we use Verilator. Use the following command to install Verilator, verilate the design, and run the simulation:

# Go to the hardware folder
cd hardware
# Apply the patches (only need to run this once)
make apply-patches
# Verilate the design
make verilate
# Run the tests
make riscv_tests_simv

Alternatively, you can also use the riscv_tests target at Ara's top-level Makefile to both compile the RISC-V tests and run their simulation.

Publication

If you want to use Ara, you can cite us:

@Article{Ara2020,
  author = {Matheus Cavalcante and Fabian Schuiki and Florian Zaruba and Michael Schaffner and Luca Benini},
  journal= {IEEE Transactions on Very Large Scale Integration (VLSI) Systems},
  title  = {Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor With Multiprecision Floating-Point Support in 22-nm FD-SOI},
  year   = {2020},
  volume = {28},
  number = {2},
  pages  = {530-543},
  doi    = {10.1109/TVLSI.2019.2950087}
}
Issues
  • Kernels update

    Kernels update

    Merge https://github.com/pulp-platform/ara/pull/81 and https://github.com/pulp-platform/ara/pull/101 before this one.

    Add baseline Jacobi2d, Dropout, Convolution3D benchmark

    The convolution is now defined by its data type and its dimensions. fconv3d, for example, processes double-precision floating-point data, using 3D filters with depth ch (channels): (i*i*ch) ∗ (f*f*ch) = (o*o)

    Even if fconv3d is parameterized on the number of channels and can also be used with ch = 1 becoming a fconv2d, the code for fconv2d is kept since it is more optimized for that particular case.

    fconv3d: F = {7}, optimized with an enhanced algorithm

    fconv2d: F = {3, 7}. F == 3 is optimized, F == 7 is optimized with an enhanced algorithm

    iconv2d: F = {3, 5, 7}. F == 3 is optimized, F == 7 is optimized with an enhanced algorithm. F == 5 is not optimized

    We will support and optimize the other filter sizes in the future.

    The roofline plots produced for the convolutions are produced with the following parameters: iconv2d = F = 3 fconv2d = F = 3 fconv3d = F = 7

    Changelog

    Fixed

    • Generate data.S files before compiling the programs
    • Clean intermediate app object files with make clean
    • Add a fence before stopping the cycle counter, to let the last vector store complete

    Added

    • Add fconv3d kernel, optimized for 7x7 filters
    • Optimize fconv2d and iconv2d kernels for 3x3 filters
    • Add convolutions to the benchmark app, and print the related roofline plots
    • Add corner case test to vslidedown instruction

    Changed

    • Update README with instructions on how to compile convolutions
    • Refactor benchmark app
    • Double the testbench memory size
    • Update the python-requirements list

    Checklist

    • [x] Automated tests pass
    • [x] Changelog updated
    • [x] Code style guideline is observed

    Please check our contributing guidelines before opening a Pull Request.

    opened by mp-17 8
  • ”make verilator“fail with ”CC=$(CLANG_CC) CXX=$(CLANG_CXX) CXXFLAGS=$(CLANG_CXXFLAGS) LDFLAGS=$(CLANG_LDFLAGS) \“ configed.

    ”make verilator“fail with ”CC=$(CLANG_CC) CXX=$(CLANG_CXX) CXXFLAGS=$(CLANG_CXXFLAGS) LDFLAGS=$(CLANG_LDFLAGS) \“ configed.

    fail information: In file included from ../V3Combine.cpp:27: ../V3DupFinder.h:50:5: error: constructor for 'V3DupFinder' must explicitly initialize the const member 'm_hasher' V3DupFinder(){}; ^ ../V3DupFinder.h:46:20: note: 'm_hasher' declared here const V3Hasher m_hasher; ^ 1 error generated. ../Makefile_obj:297: recipe for target 'V3Combine.o' failed make[3]: *** [V3Combine.o] Error 1

    But if i del "CC=$(CLANG_CC) CXX=$(CLANG_CXX) CXXFLAGS=$(CLANG_CXXFLAGS) LDFLAGS=$(CLANG_LDFLAGS) " this, all version of verilator can compile succeed!

    Best Wishes!

    opened by dongdeji 8
  • Make  bin/hello_world failed (library not found)

    Make bin/hello_world failed (library not found)

    Hi, The ligloss library cannot be found when compiling hello_world. When I add the tool chain install directory to the path, the same problem still exists. [email protected]:/share/zhuxuanlong/Vector_Work/ara/apps# make bin/hello_world chmod +x /share/zhuxuanlong/Vector_Work/ara/apps/common/script/align_sections.sh rm -f /share/zhuxuanlong/Vector_Work/ara/apps/common/link.ld && cp /share/zhuxuanlong/Vector_Work/ara/apps/common/arch.link.ld /share/zhuxuanlong/Vector_Work/ara/apps/common/link.ld /share/zhuxuanlong/Vector_Work/ara/apps/common/script/align_sections.sh 4 /share/zhuxuanlong/Vector_Work/ara/apps/common/link.ld /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -c hello_world/main.c -o hello_world/main.c.o /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -c common/crt0.S -o common/crt0-llvm.S.o /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -c common/printf.c -o common/printf-llvm.c.o /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -c common/string.c -o common/string-llvm.c.o /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -c common/serial.c -o common/serial-llvm.c.o mkdir -p bin/ /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -Iinclude -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -o bin/hello_world hello_world/main.c.o common/crt0-llvm.S.o common/printf-llvm.c.o common/string-llvm.c.o common/serial-llvm.c.o -static -nostartfiles -lm -T/share/zhuxuanlong/Vector_Work/ara/apps/common/link.ld ld.lld: error: unable to find library -lgloss clang-13: error: ld command failed with exit code 1 (use -v to see invocation) make: *** [Makefile:59: bin/hello_world] Error 1 rm hello_world/main.c.o common/string-llvm.c.o common/crt0-llvm.S.o common/printf-llvm.c.o common/serial-llvm.c.o

    Thank you.

    opened by zhuxuanlong 7
  • Stuck at the complie flow `make riscv_tests_simv`

    Stuck at the complie flow `make riscv_tests_simv`

    Hi, @mp-17 @suehtamacv When I try to make riscv_tests_simv according to the README file, my terminal has been stuck with no message update for a long while, about a few hours.

    (base) ➜ hardware git:(main) ✗ make riscv_tests_simv build/verilator/Vara_tb_verilator -l ram,/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd,elf &> build/rv64uv-ara-vadd.trace

    And I checked the message in the build/rv64uv-ara-vadd.trace file for several times, which is listed as below. It remains the same for a long while as well.

    Program header number 0 in `/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd' low is 80000000
    Program header number 0 in `/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd' high is 80004179
    Program header number 1 in `/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd' high is 80004877
    Program header number 2 in `/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd' high is 80004b17
    Program header number 3 in `/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd' is not of type PT_LOAD; ignoring.
    Set `ram TOP.ara_tb_verilator.dut.i_ara_soc.i_dram 10 0x80000000 0x80000 write with offset: 0x0 write with size: 0x4b18
    Simulation of Ara
    =================
    
    
    Simulation running, end by pressing CTRL-c.
    
    

    Note that, my QuestaSim version is Mentor Graphics QuestaSim 10.6c instead of Mentor Graphics QuestaSim 2020.1. And I merely make a fake version soft link to 2020.1, with no modification in the hardware/Makefile.

    Is this experimental phenomenon normal? If yes, could you please tell me how long this process approximately lasts? If no, would you please help me check if there is something wrong with my experimental environment?

    Thanks in advance!!!

    opened by fantasysee 7
  • Verilator Simulation Error

    Verilator Simulation Error

    When I run this command:

    ~/ara/hardware$make apply-patches 
    

    I face this error:

    Makefile:62: "Specified QuestaSim version (questa-2020.1) not found in PATH /home/hpc-user/xilinx/Vivado/2016.2/bin:/home/hpc-user/intelFPGA_pro/21.2/modelsim_ase/bin:/usr/bin/sbt:/home/hpc-user/riscv-gnu-toolchain/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
    cd deps/tech_cells_generic && git apply ../../patches/0001-tech-cells-generic-sram.patch
    error: patch failed: src/rtl/tc_sram.sv:124
    error: src/rtl/tc_sram.sv: patch does not apply
    make: *** [Makefile:101: apply-patches] Error 1
    

    I ignored this error, and I ran the next one:

    ~/ara/hardware$make verilate
    

    Again, I face this error:

    Makefile:62: "Specified QuestaSim version (questa-2020.1) not found in PATH /home/hpc-user/xilinx/Vivado/2016.2/bin:/home/hpc-user/intelFPGA_pro/21.2/modelsim_ase/bin:/usr/bin/sbt:/home/hpc-user/riscv-gnu-toolchain/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
    mkdir -p build
    rm -rf build/verilator; mkdir -p build/verilator
    ./bender script verilator -t rtl -t ara_test -t cva6_test -t verilator --define NR_LANES=4 --define VLEN=4096 --define RVV_ARIANE=1 > build/verilator/bender_script_default
    bash: ./bender: No such file or directory
    make: *** [Makefile:145: build/verilator/Vara_tb_verilator] Error 127
    make: *** Waiting for unfinished jobs....
    Successfully installed bender 0.21.0 in '/home/hpc-user/ara/hardware'.
    bender 0.21.0 available.
    

    The Second Time I run this command, face:

    Makefile:62: "Specified QuestaSim version (questa-2020.1) not found in PATH /home/hpc-user/xilinx/Vivado/2016.2/bin:/home/hpc-user/intelFPGA_pro/21.2/modelsim_ase/bin:/usr/bin/sbt:/home/hpc-user/riscv-gnu-toolchain/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
    rm -rf build/verilator; mkdir -p build/verilator
    ./bender script verilator -t rtl -t ara_test -t cva6_test -t verilator --define NR_LANES=4 --define VLEN=4096 --define RVV_ARIANE=1 > build/verilator/bender_script_default
    /home/hpc-user/ara/install/verilator/bin/verilator -f build/verilator/bender_script_default           \
      -GNrLanes=4                                                         \
      -O3                                                                           \
      -Wno-BLKANDNBLK                                                               \
      -Wno-CASEINCOMPLETE                                                           \
      -Wno-CMPCONST                                                                 \
      -Wno-LATCH                                                                    \
      -Wno-LITENDIAN                                                                \
      -Wno-UNOPTFLAT                                                                \
      -Wno-UNPACKED                                                                 \
      -Wno-UNSIGNED                                                                 \
      -Wno-WIDTH                                                                    \
      -Wno-WIDTHCONCAT                                                              \
      --hierarchical                                                                \
      tb/verilator/waiver.vlt                                                       \
      --Mdir build/verilator                                                       \
      -Itb/dpi                                                                      \
      --compiler clang                                                              \
      -CFLAGS "-DTOPLEVEL_NAME=ara_tb_verilator"                                        \
      -CFLAGS "-DNR_LANES=4"                                              \
      -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp       \
      -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp \
      -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp \
      ""                                                             \
      -LDFLAGS "-lelf"                                                              \
      ""                                                              \
      --exe                                                                         \
      /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/*.cc            \
      /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp/*.cc      \
      /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp/*.cc      \
      /home/hpc-user/ara/hardware/tb/verilator/ara_tb.cpp                                           \
      --cc                                                                          \
      --top-module ara_tb_verilator &&                                                  \
    cd build/verilator && OBJCACHE='' make -j4 -f Vara_tb_verilator.mk
    %Error: Verilator internal fault, sorry. Suggest trying --debug --gdbbt
    %Error: Command Failed /home/hpc-user/ara/install/verilator/bin/verilator_bin -f build/verilator/bender_script_default -GNrLanes=4 -O3 -Wno-BLKANDNBLK -Wno-CASEINCOMPLETE -Wno-CMPCONST -Wno-LATCH -Wno-LITENDIAN -Wno-UNOPTFLAT -Wno-UNPACKED -Wno-UNSIGNED -Wno-WIDTH -Wno-WIDTHCONCAT --hierarchical tb/verilator/waiver.vlt --Mdir build/verilator -Itb/dpi --compiler clang -CFLAGS -DTOPLEVEL_NAME=ara_tb_verilator -CFLAGS -DNR_LANES=4 -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp  -LDFLAGS -lelf  --exe /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/dpi_memutil.cc /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/sv_scoped.cc /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp/verilator_memutil.cc /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp/verilated_toplevel.cc /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp/verilator_sim_ctrl.cc /home/hpc-user/ara/hardware/tb/verilator/ara_tb.cpp --cc --top-module ara_tb_verilator
    make: *** [Makefile:146: build/verilator/Vara_tb_verilator] Error 255
    

    The version of the Verilator is 4.210.

    opened by mohammadhosein1997 7
  • RTL simulation getting failed with Verilator

    RTL simulation getting failed with Verilator

    I am trying to run makefile target make verilate but I am getting the following errors on the default settings.

    $ make verilate 
    Makefile:43: "Specified QuestaSim version (questa-2020.1) not found in PATH 
    rm -rf build/verilator; mkdir -p build/verilator
    ./bender script verilator -t rtl -t ara_test -t cva6_test -t verilator --define NR_LANES=4 --define VLEN=4096 --define RVV_ARIANE=1 > build/verilator/bender_script
    /home/ara/install/verilator/bin/verilator -f build/verilator/bender_script                     \
      -GNrLanes=4                                                         \
      -O3                                                                           \
      -Wno-BLKANDNBLK                                                               \
      -Wno-CASEINCOMPLETE                                                           \
      -Wno-CMPCONST                                                                 \
      -Wno-LITENDIAN                                                                \
      -Wno-MODDUP                                                                   \
      -Wno-PINMISSING                                                               \
      -Wno-SYMRSVDWORD                                                              \
      -Wno-UNOPTFLAT                                                                \
      -Wno-UNPACKED                                                                 \
      -Wno-UNSIGNED                                                                 \
      -Wno-WIDTH                                                                    \
      -Wno-WIDTHCONCAT                                                              \
      --Mdir build/verilator --trace                                               \
      -Itb/dpi                                                                      \
      -CFLAGS "-std=c++11 -Wall -DTOPLEVEL_NAME=ara_tb_verilator"                       \
      -CFLAGS "-DNR_LANES=4"                                              \
      -CFLAGS -I/home/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp       \
      -CFLAGS -I/home/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp \
      -CFLAGS -I/home/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp \
      -LDFLAGS "-lelf"                                                              \
      --exe                                                                         \
      /home/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/*.cc            \
      /home/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp/*.cc      \
      /home/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp/*.cc      \
      /home/ara/hardware/tb/verilator/ara_tb.cpp                                           \
      --cc                                                                          \
      --top-module ara_tb_verilator &&                                                  \
    cd build/verilator && OBJCACHE='' make -j4 -f Vara_tb_verilator.mk
    %Error: /home/ara/hardware/deps/tech_cells_generic/src/rtl/tc_sram.sv:93:38: Unsupported or unknown PLI call: $urandom
       93 |           "random": init_val[i][j] = $urandom();
          |                                      ^~~~~~~~
    %Error: /home/ara/hardware/tb/ara_tb.sv:30:28: syntax error, unexpected TIME NUMBER, expecting TYPE-IDENTIFIER
       30 |   localparam ClockPeriod = 1ns;
          |                            ^~~
    %Warning-STMTDLY: /home/ara/hardware/tb/ara_tb.sv:48:10: Unsupported: Ignoring delay on this delayed statement.
       48 |   always #(ClockPeriod/2) clk = !clk;
          |          ^
                      ... Use "/* verilator lint_off STMTDLY */" and lint_on around source to disable this message.
    %Warning-STMTDLY: /home/ara/hardware/tb/ara_tb.sv:56:7: Unsupported: Ignoring delay on this delayed statement.
       56 |       #(ClockPeriod);
          |       ^
    %Warning-STMTDLY: /home/ara/hardware/tb/ara_tb.sv:98:7: Unsupported: Ignoring delay on this delayed statement.
       98 |       #ClockPeriod;
          |       ^
    %Error: Exiting due to 2 error(s), 3 warning(s)
            ... See the manual and https://verilator.org for more assistance.
    Makefile:123: recipe for target 'build/verilator/Vara_tb_verilator' failed
    make: *** [build/verilator/Vara_tb_verilator] Error 1
    

    What could be the possible cause for that?

    question 
    opened by mahmoodulhassan-lm 7
  • Error when compiling hello_world

    Error when compiling hello_world

    i'm trying to run make bin/hello_world, but i got the following errors:

    make bin/hello_world /home/workspace/pulp/ara_lenovo_unix/install/riscv-gcc/bin/riscv64-unknown-elf-gcc -mcmodel=medany -march=rv64gcv -mabi=lp64 -I/home/workspace/pulp/ara_lenovo_unix/apps/common -static -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -c hello_world/main.c -o hello_world/main.c.o /home/workspace/pulp/ara_lenovo_unix/install/riscv-gcc/bin/riscv64-unknown-elf-gcc -mcmodel=medany -march=rv64gcv -mabi=lp64 -I/home/workspace/pulp/ara_lenovo_unix/apps/common -static -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -c common/crt0.S -o common/crt0.S.o common/encoding.h: Assembler messages: common/encoding.h:1: Error: unknown pseudo-op: ..' common/crt0.S:80: Error: illegal operandsli t0,PMP_NAPOT|PMP_R|PMP_W|PMP_X' common/crt0.S:92: Error: illegal operands li t0,(1<<CAUSE_LOAD_PAGE_FAULT)|(1<<CAUSE_STORE_PAGE_FAULT)|(1<<CAUSE_FETCH_PAGE_FAULT)|(1<<CAUSE_MISALIGNED_FETCH)|(1<<CAUSE_USER_ECALL)|(1<<CAUSE_BREAKPOINT)' common/crt0.S:102: Error: illegal operandsli t0,(MSTATUS_FS&(MSTATUS_FS>>1))' common/crt0.S:106: Error: illegal operands `li t0,(MSTATUS_VS&(MSTATUS_VS>>1))' /home/workspace/pulp/ara_lenovo_unix/apps/common/runtime.mk:66: recipe for target 'common/crt0.S.o' failed make: *** [common/crt0.S.o] Error 1 rm hello_world/main.c.o

    what could be the reason for this ?

    opened by strongwind312 6
  • ara core hangs with 2 lanes and 4 lanes configuration

    ara core hangs with 2 lanes and 4 lanes configuration

    We saw the same program (binary) runs fine on 8 and 16 lanes configuration, however hangs when running on 2 lanes or 4 lanes configuration. From the waveform, when it hangs the PC stopped moving forward. And it is not introduced by any particular vector instruction, it seems to be a mix of scalar and vector instructions that is causing this hang. Tested on latest commits and showing also the same result.

    Just wondering if this is a known issue, and do you need a minimum sequence that can reproduce this issue?

    bug 
    opened by yanghao 5
  • Hotfixes

    Hotfixes

    Priority PR - No dependencies on other PRs

    Description of the fixes in the commits.

    Changelog

    Fixed

    • AXI transactions on an opposite channel w.r.t. the channel currently in use are started only after the completion of the previous transactions.
    • Fix the number of elements to be requested for a vslidedown instruction.

    Changed

    • Cut a timing-critical path from Addrgen to Sequencer (1 cycle more to start an AXI transaction)
    • Cut a timing-critical path in the VSTU, relative to the calculation of the pointer to the VRF word received from the lanes

    Checklist

    • [ ] Automated tests pass
    • [x] Changelog updated
    • [x] Code style guideline is observed
    opened by mp-17 5
  • make toolchain-gcc failed

    make toolchain-gcc failed

    I follow the guideline in README, and when I go to step make riscv_tests

    it seems that I need to make toolchain-gcc first 飞书20211113-122207

    when I make toolchian-gcc, another problem comes 飞书20211113-130512

    the log shows several similar errors like this "cannot stat 'xxx.gmo': No such file or directory" 飞书20211113-131020

    How should I solve this problem?

    opened by kaitoukito 4
  • run RTL simulation with make verilate report no such file or directory

    run RTL simulation with make verilate report no such file or directory

    hi,I am trying to run makefile,but there were some mistakes ~/riscv/ara/hardware$ make verilate Makefile:43: "Specified QuestaSim version (questa-2020.1) not found in PATH /home/wu/riscv/gcc/riscv-unknown-elf-gcc/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin" rm -rf build/verilator; mkdir -p build/verilator ./bender script verilator -t rtl -t ara_test -t cva6_test -t verilator --define NR_LANES=4 --define VLEN=4096 --define RVV_ARIANE=1 > build/verilator/bender_script /home/wu/riscv/ara/install/verilator/bin/verilator -f build/verilator/bender_script
    -GNrLanes=4
    -O3
    -Wno-BLKANDNBLK
    -Wno-CASEINCOMPLETE
    -Wno-CMPCONST
    -Wno-LITENDIAN
    -Wno-MODDUP
    -Wno-PINMISSING
    -Wno-SYMRSVDWORD
    -Wno-UNOPTFLAT
    -Wno-UNPACKED
    -Wno-UNSIGNED
    -Wno-WIDTH
    -Wno-WIDTHCONCAT
    --Mdir build/verilator --trace
    -Itb/dpi
    -CFLAGS "-std=c++11 -Wall -DTOPLEVEL_NAME=ara_tb_verilator"
    -CFLAGS "-DNR_LANES=4"
    -CFLAGS -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp
    -CFLAGS -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp
    -CFLAGS -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp
    -LDFLAGS "-lelf"
    --exe
    /home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/.cc
    /home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp/
    .cc
    /home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp/*.cc
    /home/wu/riscv/ara/hardware/tb/verilator/ara_tb.cpp
    --cc
    --top-module ara_tb_verilator &&
    cd build/verilator && OBJCACHE='' make -j4 -f Vara_tb_verilator.mk make[1]: Entering directory '/home/wu/riscv/ara/hardware/build/verilator' g++ -I. -MMD -I/home/wu/riscv/ara/install/verilator/share/verilator/include -I/home/wu/riscv/ara/install/verilator/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -std=c++11 -Wall -DTOPLEVEL_NAME=ara_tb_verilator -DNR_LANES=4 -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp -std=gnu++14 -Os -c -o ara_tb.o /home/wu/riscv/ara/hardware/tb/verilator/ara_tb.cpp g++ -I. -MMD -I/home/wu/riscv/ara/install/verilator/share/verilator/include -I/home/wu/riscv/ara/install/verilator/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -std=c++11 -Wall -DTOPLEVEL_NAME=ara_tb_verilator -DNR_LANES=4 -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp -std=gnu++14 -Os -c -o dpi_memutil.o /home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/dpi_memutil.cc g++ -I. -MMD -I/home/wu/riscv/ara/install/verilator/share/verilator/include -I/home/wu/riscv/ara/install/verilator/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -std=c++11 -Wall -DTOPLEVEL_NAME=ara_tb_verilator -DNR_LANES=4 -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp -std=gnu++14 -Os -c -o sv_scoped.o /home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/sv_scoped.cc g++ -I. -MMD -I/home/wu/riscv/ara/install/verilator/share/verilator/include -I/home/wu/riscv/ara/install/verilator/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -std=c++11 -Wall -DTOPLEVEL_NAME=ara_tb_verilator -DNR_LANES=4 -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp -I/home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp -std=gnu++14 -Os -c -o verilator_memutil.o /home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp/verilator_memutil.cc /home/wu/riscv/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/dpi_memutil.cc:11:10: fatal error: libelf.h: No such file or directory 11 | #include <libelf.h> | ^~~~~~~~~~ compilation terminated. make[1]: *** [Vara_tb_verilator.mk:75: dpi_memutil.o] Error 1 make[1]: *** Waiting for unfinished jobs.... make[1]: Leaving directory '/home/wu/riscv/ara/hardware/build/verilator' make: *** [Makefile:127: build/verilator/Vara_tb_verilator] Error 2

    thanks

    opened by Fog-cake 4
  • [hardware] Floating-Point Reductions

    [hardware] Floating-Point Reductions

    WIP PR

    Changelog

    Fixed

    • Description of changes

    Added

    • Description of changes

    Changed

    • Description of changes

    Checklist

    • [ ] Automated tests pass
    • [ ] Changelog updated
    • [ ] Code style guideline is observed
    • [ ] No frequency degradation
    opened by mp-17 0
  • [hardware] :bug: Fix reductions + Rework the VALU

    [hardware] :bug: Fix reductions + Rework the VALU

    The previous mechanism to handle the commit during a reduction was confusing and led to bugs. Now, the reduction triggers its commit only after the inter-lanes phase is over. Also, some recurrent lines of code have been grouped into macros

    Changelog

    Fixed

    • Description of changes

    Added

    • Description of changes

    Changed

    • Description of changes

    Checklist

    • [ ] Automated tests pass
    • [ ] Changelog updated
    • [ ] Code style guideline is observed
    opened by mp-17 0
  • Check whether we can access vs1 and vs2 in VMADC/VMSBC

    Check whether we can access vs1 and vs2 in VMADC/VMSBC

    In ara_dispatcher, when it decodes a VMADC/VMSBC instruction, you check accessibilities of vs1 and vs2 using unique case (ara_req_d.emul) LMUL_2: if ((insn.varith_type.rs2 & 5'b00001) == (insn.varith_type.rd & 5'b00001)) illegal_insn = 1'b1; LMUL_4: if ((insn.varith_type.rs2 & 5'b00011) == (insn.varith_type.rd & 5'b00011)) illegal_insn = 1'b1; LMUL_8: if ((insn.varith_type.rs2 & 5'b00111) == (insn.varith_type.rd & 5'b00111)) illegal_insn = 1'b1; default: if (insn.varith_type.rs2 == insn.varith_type.rd) illegal_insn = 1'b1; endcase I don't understand what you want to do here. Actually in LMUL_2, when rd=v6 and vs2=v2 will cause a illegal instruction according to these code. But this is legal in rvv. Can you explain it detailly? Thanks for your time!

    opened by Zissi-Lei 0
  • scale_vl in vslide/vstore

    scale_vl in vslide/vstore

    When the new EEW is not equal to the old EEW, then a RESHUFFLE is inserted to ARA. And the scale_vl is used to scale the length of source register's elements. But I notice that the scale_vl is also asserted in vslide and vstore instructions. Why? I think the vl in these instructions doesn't change.

    opened by Zissi-Lei 0
  • Hazard with different LMUL

    Hazard with different LMUL

    I found that only one single vd is marked in write_list in ara_sequencer. However, multiple vd will be written when LMUL > 1. Consider case as follows: vsetvli a0, x0, e32, m8 vadd v8, v16, v24 vsetvli x0, x0, e32, m1 vmul v4, v12, v16

    v8~v15 are written by vadd, and the eight VRFs should be marked in write_list/global_hazard_list. Then vmul will stall until v12 is written back. But it seems not this way in ara_sequencer.

    Could you help clarify this issue? Thanks in advance.

    opened by bonewp 1
  • Questions about Compiling and CodeGen

    Questions about Compiling and CodeGen

    As Ara supports the RISC-V Vector Extension, I want to know how Ara compiles a normal C program into an executable file with vector instructions? I see that the Makefile in apps/riscv-tests/benchmarks uses "riscv64-unknown-elf-gcc" to compile the C program. I wonder how does riscv64-unknown-elf-gcc compile the program with the prior knowledge of the ISA extension? How can I compile my self-written C code into an executable file with your implementation of RISC-V Vector Extension?

    opened by tangcy98 1
Releases(v2.2.0)
  • v2.2.0(Nov 2, 2021)

    Fixed

    • Fix typo on the build instructions of the README
    • Fix Gnuplot installation on GitHub's CI
    • The number of elements requested by the Store Unit and the Element Requester now depends both on the requested eew and the past eew of the vector of the used register
    • When the VRF is written and EMUL > 1, the eew of all the interested registers is updated
    • Memory operations can change EMUL when EEW != VSEW
    • The LSU now correctly handles bursts with a saturated length of 256 beats
    • AXI transactions on an opposite channel w.r.t. the channel currently in use are started only after the completion of the previous transactions
    • Fix the number of elements to be requested for a vslidedown instruction

    Added

    • benchmarks app to benchmark Ara
    • CI task to create roofline plots of imatmul and fmatmul, available as artifacts
    • Vector floating-point compare instructions (vmfeq, vmfne, vmflt, vmfle, vmfgt, vmfge)
    • Vector single-width floating-point/integer type-convert instructions (vfcvt.xu.f, vfcvt.x.f, vfcvt.rtz.xu.f, vfcvt.rtz.x.f, vfcvt.f.xu, vfcvt.f.x)
    • Vector widening floating-point/integer type-convert instructions (vfwcvt.xu.f, vfwcvt.x.f, vfwcvt.rtz.xu.f, vfwcvt.rtz.x.f, vfwcvt.f.xu, vfwcvt.f.x, vfwcvt.f.f)
    • Vector narrowing floating-point/integer type-convert instructions (vfncvt.xu.f, vfncvt.x.f, vfncvt.rtz.xu.f, vfncvt.rtz.x.f, vfncvt.f.xu, vfncvt.f.x, vfncvt.f.f)
    • Vector whole-register move instruction vmv<nr>
    • Vector whole-register load/store vl1r, vs1r
    • Vector load/store mask vle1, vse1
    • Whole-register instructions are executed also if vtype.vl == 0
    • Makefile option (trace=1) to generate waveform traces when running simulations with Verilator

    Changed

    • Add spill register at the lane edge, to cut the timing-critical interface between the Mask unit and the VFUs
    • Increase latency of the 16-bit multiplier from 0 to 1 to cut an in-lane timing-critical path
    • Widen CVA6's cache lines
    • Implement back-to-back accelerator instruction issue mechanism on CVA6
    • Use https protocol when cloning DTC from main Makefile
    • Use https protocol for newlib-cygwin in .gitmodules
    • Cut a timing-critical path from Addrgen to Sequencer (1 cycle more to start an AXI transaction)
    • Cut a timing-critical path in the VSTU, relative to the calculation of the pointer to the VRF word received from the lanes
    • Create ara_system wrapper containing Ara, Ariane, and an AXI mux, instantiated from within Ara's SoC
    • Retime address calculation of the addrgen
    • Push MASKU operand muxing from the lanes to the Mask Unit
    • Reduce CVA6's default cache size
    • Update Verilator to v4.214
    • Update bender to v0.23.1
    Source code(tar.gz)
    Source code(zip)
  • v2.1.0(Jul 16, 2021)

    Fixed

    • Fix calculation of vstu's vector length
    • Fix vslideup and vslidedown operand's vector length trimming
    • Mute mask requests on idle lanes
    • Mute instructions with vector length zero on the respective lane_sequencer and operand_requester
    • Fix simd_div's offset calculation
    • Delay acknowledgment of memory requests if the axi_inval_filter is busy

    Added

    • Format source files in the apps folder with clang-format by running make format
    • Support for the 2_lanes, 8_lanes, and 16_lanes configurations, besides the default 4_lanes one

    Changed

    • Compile Verilator and Ara's verilated model with LLVM, for a faster compile time.
    • Verilator updated to version v4.210.
    • Verilation is done with a hierarchical verilation flow
    • Replace ara_soc's LLC with a simple main memory
    • Reduce number of words on the main memory, for faster Verilation
    • Update common_cells to v1.22.1
    • Update axi to v0.29.1
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Jun 24, 2021)

    Added

    • Script to align all the elf sections to the AXI Data Width (the testbench requires it)
    • RISC-V V intrinsics can now be compiled
    • Add support for vsetivli, vmv<nr>r.v instructions
    • Add support for strided memory operations
    • Add support for stores misaligned w.r.t. the AXI Data Width

    Changed

    • Alignment with lowRISC's coding guidelines
    • Update Ara support for RISC-V V extension to V 0.10, with the exception of the instructions that were already missing
    • Replace toolchain from GCC to LLVM when compiling for RISC-V V extension
    • Update toolchain and SPIKE support to RISC-V V 0.10
    • Patches for GCC and SPIKE are no longer required
    • Ara benchmarks are now compatible with RISC-V V 0.10

    Fixed

    • Fix vrf_seq_byte definition in the Load Unit
    • Fix check to discriminate a valid byte in the VRF word, in the Load Unit
    • Fix axi_addrgen_d.len calculation in the Address Generation Unit
    • Correctly check whether the generated address corresponds to the vector load or the store unit
    • Typos on the ChangeLog's dates
    • Remove unwanted latches in the addrgen, simd_div, instr_queue, and decoder
    • Fix vl == 0 memory operations bug. Ara correctly tells Ariane that the memory operation is over
    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Apr 12, 2021)

    Added

    • Hardware support for:
      • Vector slide instructions (vslideup, vslide1up, vfslide1up, vslidedown, vslide1down, vfslide1down)
    • Software implementation of a integer 2D convolution kernel
    • CI job to check the conv2d execution on Ara

    Fixed

    • Removed dependency to a specific gcc g++ version in Makefile
    • Arithmetic and memory vector instructions with vl == 0 are considered as a NOP
    • Increment bit width of the vector length type (vlen_t), accounting for vectors whose length is VLMAX
    • Fix vector length calculation for the MaskB operand, which depends on vsew
    • Fix typo on the vrf_pnt updating logic at the Mask Unit
    • Update README to highlight dependency with Spike
    • Update Bender's link dependency to the public CVA6 repository
    • Retrigger the compile module if the ModelSim compilation did not succeed

    Changed

    • The encoding.h in the common Ara runtime is now a copy from the encoding.h in the Spike submodule
    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Mar 25, 2021)

  • v1.1.0(Mar 18, 2021)

    1.1.0 - 2020-03-18

    Added

    • GitHub Actions-based CI
    • Hardware support for:
      • Vector single-width floating-point fused multiply-add instructions (vfnmacc, vfmsac, vfnmsac, vfnmadd, vfmsub, vfnmsub)
      • Vector floating-point sign-injection instructions (vfsgnj, vfsgnjn, vfsgnjx)
      • Vector widening floating-point add/subtract instructions (vfwadd, vfwsub, vfwadd.w, vfwsub.w)
      • Vector widening floating-point multiply instructions (vfwmul)
      • Vector widening floating-point fused multiply-add instructions (vfwmacc, vfwnmacc, vfwmsac, vfwnmsac)
      • Vector floating-point merge instruction (vfmerge)
      • Vector floating-point move instruction (vfmv)

    Changed

    • Contributing guidelines updated to include commit message and C++ code style guidelines
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Mar 10, 2021)

    Added

    • Hardware support for:
      • Vector single-width floating-point add/subtract instructions (vfadd, vfsub, vfrsub)
      • Vector single-width floating-point multiply instructions (vfmul)
      • Vector single-width floating-point fused multiply-add instructions (vfmacc, vfmadd)
      • Vector single-width floating-point min/max instructions (vfmin, vfmax)
    • Software implementation of a floating-point matrix multiplication kernel
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 9, 2021)

    Added

    • Support for a coherent mode between Ara and Ariane
      • Snoop AW channel from Ara to L2
      • Invalidate Ariane's L1 cache sets accordingly
      • Coherent mode can be toggled together with consistent mode using the LSB of CSR 0x702

    Changed

    • Ariane's data cache is active by default
    • The matrix multiplication kernel achieves better performance
      • It reports the performance and the utilization for several matrix sizes
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Mar 9, 2021)

    Added

    • Hardware support for:
      • Vector single-width integer divide instructions (vdivu, vdiv, vremu, vrem)
      • Vector integer comparison instructions (vmseq, vmsne, vmsltu, vmslt, vmsleu, vmsle, vmsgtu, vmsgt)
    • Runtime measurement functions
    • Consistent mode which orders scalar and vector loads/stores.
      • Conservative ordering without address comparison
      • Consistent mode is enabled per default, can be disabled by clearing the LSB of CSR 0x702.

    Fixed

    • Ariane's accelerator dispatcher module was rewritten, fixing a bug where instructions would get skipped.
    • The Vector Store unit takes the EEW of the source vector register into account to shuffle the elements before writing them to memory.

    Changed

    • Vector mask instructions (vmand, vmnand, vmandnot, vmxor, vmor, vmnor, vmornot, vmxnor) no longer require the non-compliant constraint that the vector length is divisible by eight.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Mar 9, 2021)

    Added

    • Hardware compilation with Verilator
    • Software implementation of a matrix multiplication kernel

    Changed

    • The riscv_tests_simc Makefile target was deprecated. The riscv-tests are now run with the Verilated design, which can be called through the riscv_tests_simv Makefile target.
    • The operand queues now take as a parameter the type conversions they support (currently, SupportIntExt2, SupportIntExt4, and SupportIntExt8)
    • The Vector Multiplier unit now has independent pipelines for each element width.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Mar 9, 2021)

    Added

    • Hardware support for:
      • Vector single-width integer multiply instructions (vmul, vmulh, vmulhu, vmulhsu)
      • Vector single-width integer multiply-add instructions (vmacc, vnmsac, vmadd, vnmsub)
      • Vector integer add-with-carry/subtract-with-borrow instructions (vadc, vsbc)
      • Vector widening integer multiply instructions (vwmul, vwmulu, vwmulsu)
      • Vector widening integer multiply-add instructions (vwmaccu, vwmacc, vwmaccsu, vwmaccus)

    Changed

    • Explicit scan chain signals added to the lane's and Ara's interfaces

    Fixed

    • Miscellaneous fixes for compatibility with Synopsys DC
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Mar 9, 2021)

    Added

    • Hardware support for:
      • Bit-shift instructions (vsll, vsrl, vsra)
      • Vector widening integer add/subtract (vwadd, vwaddu, vwsub, vwsubu)
      • Vector integer extension (vzext, vsext)
      • Vector integer merge and move instructions (vmerge, vmv)
      • Vector narrowing integer right shift instructions (vnsrl, vnsra)

    Changed

    • Bender updated to version 0.21.0

    Fixed

    • CVA6's forwarding mechanism of operand B for accelerator instructions
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Mar 9, 2021)

    Added

    • Hardware support for vector configuration instructions: vsetvl, vsetvli.
    • Hardware support for basic arithmetic and logic instructions: vadd, vsub, vrsub, vmin(u), vmax(u), vand, vor, vxor.
    • Hardware support for vector mask operations: vmand, vmnand, vmandnot, vmor, vmnor, vmornot, vmxor, vmxnor.
    • Hardware support for masked instructions.
    • Hardware support for vector length multipliers.
    • Software support for vector code running on Ara.
    Source code(tar.gz)
    Source code(zip)
C Language version for yolo in risc-v

RISC-V C-Embedding Yolo 基于Yolo v2的蜂鸟e203 RISC-V部署代码,其中的加速器由队伍中负责硬件的人使用Verilog编写(暂不提供),并在硬件提供的C API上搭建了yolo的部署代码。其中,加速器硬件模块暂由c编写的神经网络加速器模拟器来代替。 网络实现了人脸

Ling Zhang 2 Jul 19, 2022
We implemented our own sequential version of GA, PSO, SA and ACA using C++ and the parallelized version with CUDA support

We implemented our own sequential version of GA, PSO, SA and ACA using C++ (some using Eigen3 as matrix operation backend) and the parallelized version with CUDA support. All of them are much faster than the popular lib scikit-opt.

Aron751 4 May 7, 2022
Operating system model using an assembler RISC-V RV32I instruction set.(development)

General Information Operating system model using an assembler RISC-V RV32I instruction set.(development) С++ Standard - c++17 gcc 9.3.0(Linux,unicode)

Alex Green 1 Dec 21, 2021
Provide sample code of efficient operator implementation based on the Cambrian Machine Learning Unit (MLU) .

Cambricon CNNL-Example CNNL-Example 提供基于寒武纪机器学习单元(Machine Learning Unit,MLU)开发高性能算子、C 接口封装的示例代码。 依赖条件 操作系统: 目前只支持 Ubuntu 16.04 x86_64 寒武纪 MLU SDK: 编译和

Cambricon Technologies 1 Mar 7, 2022
Minctest - tiny unit testing framework for ANSI C

Minctest Minctest is a very minimal unit-testing "framework" written in ANSI C and implemented in a single header file. It's handy when you want some

Lewis Van Winkle 47 Jun 23, 2022
Open-source vector similarity search for Postgres

Open-source vector similarity search for Postgres

Andrew Kane 561 Jul 26, 2022
Libcamera with OpenCV in Raspberry Pi 64 bit Bullseye

Libcamera OpenCV RPi Bullseye 64OS Libcamera + OpenCV on a Raspberry Pi 4 with 64-bit Bullseye OS In the new Debian 11, Bullseye, you can only capture

Q-engineering 6 Apr 14, 2022
The Intel 8080 ("eighty-eighty") is the second 8-bit microprocessor designed and manufactured by Intel.

i8080(Intel 8080) The Intel 8080 ("eighty-eighty") is the second 8-bit microprocessor designed and manufactured by Intel. It first appeared in April 1

VitorMob 13 Jul 20, 2022
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

Liu Liu 6.9k Jul 30, 2022
VNOpenAI 23 Jul 31, 2022
The core engine forked from NVidia's Q2RTX. Heavily modified and extended to allow for a nicer experience all-round.

Nail & Crescent - Development Branch Scratchpad - Things to do or not forget: Items are obviously broken. Physics.cpp needs more work, revising. Proba

PalmliX Studio 13 Jul 6, 2022
A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems

mpi-histo A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems. T

Raj Shrestha 2 Dec 21, 2021
the C++ version of solov2 with ncnn

the C++ version of SOLOV2 with ncnn

DayBreak 66 Jul 13, 2022
Final version of Plan 9 4th Edition from Bell Labs

This is a re-release of the final version of the 4th Edition of Plan 9 from Bell Labs distributed directly by Bell Labs. 4th Edition was originally r

Serge Vakulenko 8 Jun 21, 2022
the C++ version of Seq2Seq with ncnn

the C++ version of Seq2Seq with ncnn

DayBreak 19 Aug 7, 2022
This work is an expend version of livox_camera_calib(hku-mars/livox_camera_calib), which is suitable for spinning LiDAR。

expend_lidar_camera_calib This work is an expend version of livox_camera_calib, which is suitable for spinning LiDAR。 In order to apply this algorithm

afei 43 Jul 24, 2022
A lightweight version of OrcVIO that uses monocular images, inertial data, as well as bounding box measurements

OrcVIO-Lite About Object residual constrained Visual-Inertial Odometry (OrcVIO) is a visual-inertial odometry pipeline, which is tightly coupled with

Sean 22 Jun 30, 2022
OpenFOAM Foundation repository for OpenFOAM version 9

README for OpenFOAM-9 # About OpenFOAM OpenFOAM is a free, open source computational fluid dynamics (CFD) software package released by the OpenFOAM Fo

Official OpenFOAM Repository 59 Jul 23, 2022
Simple inference deep head pose ncnn version

ncnn-deep-head-pose Simple implement inference deep head pose ncnn version with high performance and optimized resource. This project based on deep-he

Đỗ Công Minh 11 Jun 13, 2022