The PULP Ara is a 64-bit Vector Unit, compatible with the RISC-V Vector Extension Version 0.9, working as a coprocessor to CORE-V's CVA6 core

Overview

Ara

Ara is a vector unit working as a coprocessor for the CVA6 core. It supports the RISC-V Vector Extension, version 0.9.

Dependencies

Check DEPENDENCIES.md for a list of hardware and software dependencies of Ara.

Supported instructions

Check FUNCTIONALITIES.md to check which instructions are currently support by Ara.

Get started

Make sure you clone this repository recursively to get all the necessary submodules:

git submodule update --init --recursive

If the repository path of any submodule changes, run the following command to change your submodule's pointer to the remote repository:

git submodule sync --recursive

Toolchain

Ara requires a RISC-V GCC toolchain capable of understanding the vector extension, version 0.9.x.

To build this toolchain, run the following command in the project's root directory.

# Build the GCC toolchain
make toolchain

Verilator

Ara requires an updated version of Verilator, for RTL simulations.

To build it, run the following command in the project's root directory.

# Build Verilator
make verilator

Configuration

Ara's parameters are centralized in the config folder, in the config.mk file. Please check config/README.md for more details.

Software

Build Applications

The apps folder contains example applications that work on Ara. Run the following command to build an application. E.g., hello_world:

cd apps
make bin/hello_world

RISC-V Tests

The apps folder also contains the RISC-V tests repository, including a few unit tests for the vector instructions. Run the following command to build the unit tests:

cd apps
make riscv_tests

RTL Simulation

To simulate the Ara system with ModelSim, go to the hardware folder, which contains all the SystemVerilog files. Use the following command to run your simulation:

# Go to the hardware folder
cd hardware
# Apply the patches (only need to run this once)
make apply-patches
# Only compile the hardware without running the simulation.
make build
# Run the simulation with the *hello_world* binary loaded
app=hello_world make sim
# Run the simulation with the *some_binary* binary. This allows specifying the full path to the binary
preload=/some_path/some_binary make sim
# Run the simulation without starting the gui
app=hello_world make simc

It is also possible to simulate the unit tests compiled in the apps folder. Given the number of unit tests, we use Verilator. Use the following command to install Verilator, verilate the design, and run the simulation:

# Go to the hardware folder
cd hardware
# Apply the patches (only need to run this once)
make apply-patches
# Verilate the design
make verilate
# Run the tests
make riscv_tests_simv

Alternatively, you can also use the riscv_tests target at Ara's top-level Makefile to both compile the RISC-V tests and run their simulation.

Publication

If you want to use Ara, you can cite us:

@Article{Ara2020,
  author = {Matheus Cavalcante and Fabian Schuiki and Florian Zaruba and Michael Schaffner and Luca Benini},
  journal= {IEEE Transactions on Very Large Scale Integration (VLSI) Systems},
  title  = {Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor With Multiprecision Floating-Point Support in 22-nm FD-SOI},
  year   = {2020},
  volume = {28},
  number = {2},
  pages  = {530-543},
  doi    = {10.1109/TVLSI.2019.2950087}
}
Comments
  • [HW] Add support for vector fixed-point instructions

    [HW] Add support for vector fixed-point instructions

    Add support for vssra, vssrl, vnclip, vnclipu and vsmul vector fixed-point instructions

    Changelog

    Fixed

    • N/A

    Added

    • Support for vector single-width fractional multiply with rounding and saturation instruction: vsmul
    • Support for vector single-width scaling shift instructions: vssra, vssrl
    • Support for vector narrowing fixed-point clip instructions: vnclip, vnclipu
    • Test for vnclip and vnclipu vector fixed-point instructions

    Changed

    • Updated tests for vssra, vssrl and vsmul vector fixed-point instructions

    Checklist

    • [x] Automated tests pass
    • [x] Changelog updated
    • [x] Code style guideline is observed

    Please check our contributing guidelines before opening a Pull Request.

    opened by M-Ijaz-10x 20
  • [HW] Adding support for Fixed-Point vector instructions

    [HW] Adding support for Fixed-Point vector instructions

    Description of PR that completes issue here...

    Changelog

    • Support for Fixed-Point vector instructions: vaadd, vaaddu, vsadd, vsaddu, vssub, vssubu, vasub, vasubu
    • Two status register for fixed-point unit: vxrm, vxsat

    Fixed

    • Description of changes

    Added

    • Support for Fixed-Point vector instructions: vaadd, vaaddu, vsadd, vsaddu, vssub, vssubu, vasub, vasubu
    • Two status register for fixed-point unit: vxrm, vxsat

    Checklist

    • [x] Automated tests pass
    • [x] No frequency degradation
    • [x] Changelog updated
    • [x] Code style guideline is observed

    Please check our contributing guidelines before opening a Pull Request.

    opened by hossein1387 16
  • [HW] Add support for vector mask instructions

    [HW] Add support for vector mask instructions

    Add support for vmsbf, vmsof, vmsif, viota and vid vector mask instructions

    Changelog

    Fixed

    • Test for viota vector mask instruction

    Added

    • Support for vmsbf, vmsof, vmsif, viota and vid vector mask instructions

    Changed

    • Tests for vmsbf, vmsof, vmsif and vid vector mask instructions

    Checklist

    • [x] Automated tests pass
    • [x] Changelog updated
    • [x] Code style guideline is observed

    Please check our contributing guidelines before opening a Pull Request.

    opened by M-Ijaz-10x 13
  • [HW] Add support for vcpop and vfirst instructions

    [HW] Add support for vcpop and vfirst instructions

    Added support for vector population count vcpop and vector find first set bit vfirst instructions

    Changelog

    Fixed

    • N/A

    Added

    • Support for vector population count vcpop and vector find first set bit vfirst instructions
    • Test for vector population count vcpop instruction

    Changed

    • Test for find first set bit vfirst instruction

    Checklist

    • [x] Automated tests pass
    • [x] Changelog updated
    • [x] Code style guideline is observed

    Please check our contributing guidelines before opening a Pull Request.

    opened by M-Ijaz-10x 9
  • minimal example that reproduce hang on ara-2-lanes.

    minimal example that reproduce hang on ara-2-lanes.

    Note: this minimal example is not as strong as the full blown program, which not only hangs on ara-2-lanes but also ara-4-lanes. And it seems there is a pattern that VL==128 hangs ara-2-lanes and VL==256 hangs ara-4-lanes (those are reproduced), and maybe VL==512 would hang ara-8-lanes, VL==1024 would hang ara-16-lanes (those are not tried out yet).

    opened by yanghao 8
  • Kernels update

    Kernels update

    Merge https://github.com/pulp-platform/ara/pull/81 and https://github.com/pulp-platform/ara/pull/101 before this one.

    Add baseline Jacobi2d, Dropout, Convolution3D benchmark

    The convolution is now defined by its data type and its dimensions. fconv3d, for example, processes double-precision floating-point data, using 3D filters with depth ch (channels): (i*i*ch) ∗ (f*f*ch) = (o*o)

    Even if fconv3d is parameterized on the number of channels and can also be used with ch = 1 becoming a fconv2d, the code for fconv2d is kept since it is more optimized for that particular case.

    fconv3d: F = {7}, optimized with an enhanced algorithm

    fconv2d: F = {3, 7}. F == 3 is optimized, F == 7 is optimized with an enhanced algorithm

    iconv2d: F = {3, 5, 7}. F == 3 is optimized, F == 7 is optimized with an enhanced algorithm. F == 5 is not optimized

    We will support and optimize the other filter sizes in the future.

    The roofline plots produced for the convolutions are produced with the following parameters: iconv2d = F = 3 fconv2d = F = 3 fconv3d = F = 7

    Changelog

    Fixed

    • Generate data.S files before compiling the programs
    • Clean intermediate app object files with make clean
    • Add a fence before stopping the cycle counter, to let the last vector store complete

    Added

    • Add fconv3d kernel, optimized for 7x7 filters
    • Optimize fconv2d and iconv2d kernels for 3x3 filters
    • Add convolutions to the benchmark app, and print the related roofline plots
    • Add corner case test to vslidedown instruction

    Changed

    • Update README with instructions on how to compile convolutions
    • Refactor benchmark app
    • Double the testbench memory size
    • Update the python-requirements list

    Checklist

    • [x] Automated tests pass
    • [x] Changelog updated
    • [x] Code style guideline is observed

    Please check our contributing guidelines before opening a Pull Request.

    opened by mp-17 8
  • ”make verilator“fail with ”CC=$(CLANG_CC) CXX=$(CLANG_CXX) CXXFLAGS=$(CLANG_CXXFLAGS) LDFLAGS=$(CLANG_LDFLAGS) \“ configed.

    ”make verilator“fail with ”CC=$(CLANG_CC) CXX=$(CLANG_CXX) CXXFLAGS=$(CLANG_CXXFLAGS) LDFLAGS=$(CLANG_LDFLAGS) \“ configed.

    fail information: In file included from ../V3Combine.cpp:27: ../V3DupFinder.h:50:5: error: constructor for 'V3DupFinder' must explicitly initialize the const member 'm_hasher' V3DupFinder(){}; ^ ../V3DupFinder.h:46:20: note: 'm_hasher' declared here const V3Hasher m_hasher; ^ 1 error generated. ../Makefile_obj:297: recipe for target 'V3Combine.o' failed make[3]: *** [V3Combine.o] Error 1

    But if i del "CC=$(CLANG_CC) CXX=$(CLANG_CXX) CXXFLAGS=$(CLANG_CXXFLAGS) LDFLAGS=$(CLANG_LDFLAGS) " this, all version of verilator can compile succeed!

    Best Wishes!

    opened by dongdeji 8
  • [HW] Floating-Point classify, division, sqrt

    [HW] Floating-Point classify, division, sqrt

    This PR depends on https://github.com/pulp-platform/ara/pull/133, https://github.com/pulp-platform/ara/pull/142.

    Add floating-point classify, division, sqrt instructions, delayed because of the FPU. I updated the CVA6 submodule, which now points to the modified FPU, able to classify as expected. Moreover, the bug related to the synchronization of the FPU lanes is now solved.

    Changelog

    Added

    • Vector floating-point classify instruction (vfclass)
    • Vector floating-point divide instructions (vfdiv, vfrdiv)
    • Vector floating-point square-root instruction (vfsqrt)

    Checklist

    • [x] Automated tests pass
    • [x] Changelog updated
    • [x] Code style guideline is observed
    • [x] No frequency degradation
    opened by mp-17 7
  • Make  bin/hello_world failed (library not found)

    Make bin/hello_world failed (library not found)

    Hi, The ligloss library cannot be found when compiling hello_world. When I add the tool chain install directory to the path, the same problem still exists. [email protected]:/share/zhuxuanlong/Vector_Work/ara/apps# make bin/hello_world chmod +x /share/zhuxuanlong/Vector_Work/ara/apps/common/script/align_sections.sh rm -f /share/zhuxuanlong/Vector_Work/ara/apps/common/link.ld && cp /share/zhuxuanlong/Vector_Work/ara/apps/common/arch.link.ld /share/zhuxuanlong/Vector_Work/ara/apps/common/link.ld /share/zhuxuanlong/Vector_Work/ara/apps/common/script/align_sections.sh 4 /share/zhuxuanlong/Vector_Work/ara/apps/common/link.ld /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -c hello_world/main.c -o hello_world/main.c.o /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -c common/crt0.S -o common/crt0-llvm.S.o /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -c common/printf.c -o common/printf-llvm.c.o /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -c common/string.c -o common/string-llvm.c.o /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -c common/serial.c -o common/serial-llvm.c.o mkdir -p bin/ /share/zhuxuanlong/Vector_Work/ara/install/riscv-llvm/bin/clang -Iinclude -march=rv64gcv0p10 -mabi=lp64d -menable-experimental-extensions -mno-relax -fuse-ld=lld -mcmodel=medany -I/share/zhuxuanlong/Vector_Work/ara/apps/common -std=gnu99 -O3 -ffast-math -fno-common -fno-builtin-printf -DNR_LANES=4 -Wunused-variable -Wall -Wextra -Wno-unused-command-line-argument -o bin/hello_world hello_world/main.c.o common/crt0-llvm.S.o common/printf-llvm.c.o common/string-llvm.c.o common/serial-llvm.c.o -static -nostartfiles -lm -T/share/zhuxuanlong/Vector_Work/ara/apps/common/link.ld ld.lld: error: unable to find library -lgloss clang-13: error: ld command failed with exit code 1 (use -v to see invocation) make: *** [Makefile:59: bin/hello_world] Error 1 rm hello_world/main.c.o common/string-llvm.c.o common/crt0-llvm.S.o common/printf-llvm.c.o common/serial-llvm.c.o

    Thank you.

    opened by zhuxuanlong 7
  • Stuck at the complie flow `make riscv_tests_simv`

    Stuck at the complie flow `make riscv_tests_simv`

    Hi, @mp-17 @suehtamacv When I try to make riscv_tests_simv according to the README file, my terminal has been stuck with no message update for a long while, about a few hours.

    (base) ➜ hardware git:(main) ✗ make riscv_tests_simv build/verilator/Vara_tb_verilator -l ram,/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd,elf &> build/rv64uv-ara-vadd.trace

    And I checked the message in the build/rv64uv-ara-vadd.trace file for several times, which is listed as below. It remains the same for a long while as well.

    Program header number 0 in `/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd' low is 80000000
    Program header number 0 in `/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd' high is 80004179
    Program header number 1 in `/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd' high is 80004877
    Program header number 2 in `/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd' high is 80004b17
    Program header number 3 in `/home/fantasysee/Projects/ara/apps/bin/rv64uv-ara-vadd' is not of type PT_LOAD; ignoring.
    Set `ram TOP.ara_tb_verilator.dut.i_ara_soc.i_dram 10 0x80000000 0x80000 write with offset: 0x0 write with size: 0x4b18
    Simulation of Ara
    =================
    
    
    Simulation running, end by pressing CTRL-c.
    
    

    Note that, my QuestaSim version is Mentor Graphics QuestaSim 10.6c instead of Mentor Graphics QuestaSim 2020.1. And I merely make a fake version soft link to 2020.1, with no modification in the hardware/Makefile.

    Is this experimental phenomenon normal? If yes, could you please tell me how long this process approximately lasts? If no, would you please help me check if there is something wrong with my experimental environment?

    Thanks in advance!!!

    opened by fantasysee 7
  • Verilator Simulation Error

    Verilator Simulation Error

    When I run this command:

    ~/ara/hardware$make apply-patches 
    

    I face this error:

    Makefile:62: "Specified QuestaSim version (questa-2020.1) not found in PATH /home/hpc-user/xilinx/Vivado/2016.2/bin:/home/hpc-user/intelFPGA_pro/21.2/modelsim_ase/bin:/usr/bin/sbt:/home/hpc-user/riscv-gnu-toolchain/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
    cd deps/tech_cells_generic && git apply ../../patches/0001-tech-cells-generic-sram.patch
    error: patch failed: src/rtl/tc_sram.sv:124
    error: src/rtl/tc_sram.sv: patch does not apply
    make: *** [Makefile:101: apply-patches] Error 1
    

    I ignored this error, and I ran the next one:

    ~/ara/hardware$make verilate
    

    Again, I face this error:

    Makefile:62: "Specified QuestaSim version (questa-2020.1) not found in PATH /home/hpc-user/xilinx/Vivado/2016.2/bin:/home/hpc-user/intelFPGA_pro/21.2/modelsim_ase/bin:/usr/bin/sbt:/home/hpc-user/riscv-gnu-toolchain/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
    mkdir -p build
    rm -rf build/verilator; mkdir -p build/verilator
    ./bender script verilator -t rtl -t ara_test -t cva6_test -t verilator --define NR_LANES=4 --define VLEN=4096 --define RVV_ARIANE=1 > build/verilator/bender_script_default
    bash: ./bender: No such file or directory
    make: *** [Makefile:145: build/verilator/Vara_tb_verilator] Error 127
    make: *** Waiting for unfinished jobs....
    Successfully installed bender 0.21.0 in '/home/hpc-user/ara/hardware'.
    bender 0.21.0 available.
    

    The Second Time I run this command, face:

    Makefile:62: "Specified QuestaSim version (questa-2020.1) not found in PATH /home/hpc-user/xilinx/Vivado/2016.2/bin:/home/hpc-user/intelFPGA_pro/21.2/modelsim_ase/bin:/usr/bin/sbt:/home/hpc-user/riscv-gnu-toolchain/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
    rm -rf build/verilator; mkdir -p build/verilator
    ./bender script verilator -t rtl -t ara_test -t cva6_test -t verilator --define NR_LANES=4 --define VLEN=4096 --define RVV_ARIANE=1 > build/verilator/bender_script_default
    /home/hpc-user/ara/install/verilator/bin/verilator -f build/verilator/bender_script_default           \
      -GNrLanes=4                                                         \
      -O3                                                                           \
      -Wno-BLKANDNBLK                                                               \
      -Wno-CASEINCOMPLETE                                                           \
      -Wno-CMPCONST                                                                 \
      -Wno-LATCH                                                                    \
      -Wno-LITENDIAN                                                                \
      -Wno-UNOPTFLAT                                                                \
      -Wno-UNPACKED                                                                 \
      -Wno-UNSIGNED                                                                 \
      -Wno-WIDTH                                                                    \
      -Wno-WIDTHCONCAT                                                              \
      --hierarchical                                                                \
      tb/verilator/waiver.vlt                                                       \
      --Mdir build/verilator                                                       \
      -Itb/dpi                                                                      \
      --compiler clang                                                              \
      -CFLAGS "-DTOPLEVEL_NAME=ara_tb_verilator"                                        \
      -CFLAGS "-DNR_LANES=4"                                              \
      -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp       \
      -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp \
      -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp \
      ""                                                             \
      -LDFLAGS "-lelf"                                                              \
      ""                                                              \
      --exe                                                                         \
      /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/*.cc            \
      /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp/*.cc      \
      /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp/*.cc      \
      /home/hpc-user/ara/hardware/tb/verilator/ara_tb.cpp                                           \
      --cc                                                                          \
      --top-module ara_tb_verilator &&                                                  \
    cd build/verilator && OBJCACHE='' make -j4 -f Vara_tb_verilator.mk
    %Error: Verilator internal fault, sorry. Suggest trying --debug --gdbbt
    %Error: Command Failed /home/hpc-user/ara/install/verilator/bin/verilator_bin -f build/verilator/bender_script_default -GNrLanes=4 -O3 -Wno-BLKANDNBLK -Wno-CASEINCOMPLETE -Wno-CMPCONST -Wno-LATCH -Wno-LITENDIAN -Wno-UNOPTFLAT -Wno-UNPACKED -Wno-UNSIGNED -Wno-WIDTH -Wno-WIDTHCONCAT --hierarchical tb/verilator/waiver.vlt --Mdir build/verilator -Itb/dpi --compiler clang -CFLAGS -DTOPLEVEL_NAME=ara_tb_verilator -CFLAGS -DNR_LANES=4 -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp -CFLAGS -I/home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp  -LDFLAGS -lelf  --exe /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/dpi_memutil.cc /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_dpi/cpp/sv_scoped.cc /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_memutil_verilator/cpp/verilator_memutil.cc /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp/verilated_toplevel.cc /home/hpc-user/ara/hardware/tb/verilator/lowrisc_dv_verilator_simutil_verilator/cpp/verilator_sim_ctrl.cc /home/hpc-user/ara/hardware/tb/verilator/ara_tb.cpp --cc --top-module ara_tb_verilator
    make: *** [Makefile:146: build/verilator/Vara_tb_verilator] Error 255
    

    The version of the Verilator is 4.210.

    opened by mohammadhosein1997 7
  • Documentation of Ara

    Documentation of Ara

    This PR Adds the basic structure for Ara Documentation starting with Complete documentation for Vector Register File. Sphinx documentation generator is used to convert reStructuredText (.rst) files into html pages with readthedocs theme. CI support is added to generate documentation from source (rst files) and make them live on github pages. Following are the changes used to add support of above features:

    Changelog

    Added

    • Add the basic structure of Ara documentation
    • Add Vector Register File docs
    • Add CI support to automatically generate docs for github pages

    Checklist

    • [x] Automated tests pass
    • [x] Changelog updated
    • [x] Code style guideline is observed

    Please check our contributing guidelines before opening a Pull Request.

    opened by sharafat-10xEngineers 0
  • [HW] Shift popcount logic for vcpop.m instruction to lanes

    [HW] Shift popcount logic for vcpop.m instruction to lanes

    Shifted popcount logic for vcpop.m instruction from mask unit to lanes

    Changelog

    Fixed

    • N/A

    Added

    • N/A

    Changed

    • Shifted popcount logic for vcpop.m instruction from mask unit to lanes

    Checklist

    • [x] Automated tests pass
    • [x] Changelog updated
    • [x] Code style guideline is observed

    Please check our contributing guidelines before opening a Pull Request.

    opened by M-Ijaz-10x 0
  • Failing to generate wavefrom

    Failing to generate wavefrom

    The problem i am experiencing is that everytime I compile the hardware with the command trace=1 to generate waveform traces the program compiles but after that when I run the simulation with a random binary loaded the program crashes with the message make: *** [Makefile:208: simv] Segmentation fault (core dumped) . If I dont use the trace=1 the program runs fine. I also have a problem generating the vector trace. I am not sure what exact command am i supposed to use. The insrtuctions say to compile a program like this make bin/${program}.ideal but I cant figure out what the "{program}" is supposed to be. Is it a binary like rv64uv-ara-vxor or something else? Can anyone please help with these problems? Thank you.

    opened by StyleDiablo 0
  • Can't run a single simulation of any kind

    Can't run a single simulation of any kind

    Hello I am trying to run some rtl simulations with verilator but i can't. When I run make riscv_tests_simv it crashes and it's giving me make: *** [Makefile:214: rv64uv-ara-vaadd] Error 139. When I run app=rv64uv-ara-vadd make simv or app=hello_world make simv it gives me make: *** [Makefile:208: simv] Segmentation fault (core dumped) . Can anyone please help I don't know what to do. Thank you.

    opened by StyleDiablo 3
  • [HW] Add support for single lane configuration

    [HW] Add support for single lane configuration

    PR for the single lane support

    This PR focuses on single lane with VLEN = 128, 256, 512. Also Default rv64uv tests are not ready to check the functionality for single lane configuration.

    Changelog

    Changed

    • Changes from this PR handle single lane configuration in ara slide unit and lane sequencer.
    • Handle NrLanes == 1 case in the ara sequencer as there is no lane desynchronization problem in case of single lane.
    • Corrected vector length calculation for single lane by handling NrLanes == 1 case in lane sequencer where division of vector length is not required.
    • Corrected vector start calculation for single lane by handling NrLanes == 1 case in lane_sequencer where division of vstart is not required.
    • Handle a single lane case in the mask unit.
    • Fixes datapath for slide instructions and NrLanes == 1 case to support single lane configuration.
    • Update address generation logic of strided and indexed load store for AXIDataWidth = 32 when loading element of SEW=64.

    Checklist

    • [ ] Automated tests pass
    • [x] Changelog updated
    • [x] Code style guideline is observed
    opened by sharafat-10xEngineers 3
  • [HW] :bug: Stall `vfdiv`/`vfsqrt` not to violate insn ordering

    [HW] :bug: Stall `vfdiv`/`vfsqrt` not to violate insn ordering

    Hotfix for vmfpu.

    Changelog

    Fixed

    • Preventively stall vfdiv and vfsqrt if there is a latency problem within vmfpu

    Checklist

    • [ ] Automated tests pass
    • [ ] Changelog updated
    • [x] Code style guideline is observed
    opened by mp-17 0
Releases(v2.2.0)
  • v2.2.0(Nov 2, 2021)

    Fixed

    • Fix typo on the build instructions of the README
    • Fix Gnuplot installation on GitHub's CI
    • The number of elements requested by the Store Unit and the Element Requester now depends both on the requested eew and the past eew of the vector of the used register
    • When the VRF is written and EMUL > 1, the eew of all the interested registers is updated
    • Memory operations can change EMUL when EEW != VSEW
    • The LSU now correctly handles bursts with a saturated length of 256 beats
    • AXI transactions on an opposite channel w.r.t. the channel currently in use are started only after the completion of the previous transactions
    • Fix the number of elements to be requested for a vslidedown instruction

    Added

    • benchmarks app to benchmark Ara
    • CI task to create roofline plots of imatmul and fmatmul, available as artifacts
    • Vector floating-point compare instructions (vmfeq, vmfne, vmflt, vmfle, vmfgt, vmfge)
    • Vector single-width floating-point/integer type-convert instructions (vfcvt.xu.f, vfcvt.x.f, vfcvt.rtz.xu.f, vfcvt.rtz.x.f, vfcvt.f.xu, vfcvt.f.x)
    • Vector widening floating-point/integer type-convert instructions (vfwcvt.xu.f, vfwcvt.x.f, vfwcvt.rtz.xu.f, vfwcvt.rtz.x.f, vfwcvt.f.xu, vfwcvt.f.x, vfwcvt.f.f)
    • Vector narrowing floating-point/integer type-convert instructions (vfncvt.xu.f, vfncvt.x.f, vfncvt.rtz.xu.f, vfncvt.rtz.x.f, vfncvt.f.xu, vfncvt.f.x, vfncvt.f.f)
    • Vector whole-register move instruction vmv<nr>
    • Vector whole-register load/store vl1r, vs1r
    • Vector load/store mask vle1, vse1
    • Whole-register instructions are executed also if vtype.vl == 0
    • Makefile option (trace=1) to generate waveform traces when running simulations with Verilator

    Changed

    • Add spill register at the lane edge, to cut the timing-critical interface between the Mask unit and the VFUs
    • Increase latency of the 16-bit multiplier from 0 to 1 to cut an in-lane timing-critical path
    • Widen CVA6's cache lines
    • Implement back-to-back accelerator instruction issue mechanism on CVA6
    • Use https protocol when cloning DTC from main Makefile
    • Use https protocol for newlib-cygwin in .gitmodules
    • Cut a timing-critical path from Addrgen to Sequencer (1 cycle more to start an AXI transaction)
    • Cut a timing-critical path in the VSTU, relative to the calculation of the pointer to the VRF word received from the lanes
    • Create ara_system wrapper containing Ara, Ariane, and an AXI mux, instantiated from within Ara's SoC
    • Retime address calculation of the addrgen
    • Push MASKU operand muxing from the lanes to the Mask Unit
    • Reduce CVA6's default cache size
    • Update Verilator to v4.214
    • Update bender to v0.23.1
    Source code(tar.gz)
    Source code(zip)
  • v2.1.0(Jul 16, 2021)

    Fixed

    • Fix calculation of vstu's vector length
    • Fix vslideup and vslidedown operand's vector length trimming
    • Mute mask requests on idle lanes
    • Mute instructions with vector length zero on the respective lane_sequencer and operand_requester
    • Fix simd_div's offset calculation
    • Delay acknowledgment of memory requests if the axi_inval_filter is busy

    Added

    • Format source files in the apps folder with clang-format by running make format
    • Support for the 2_lanes, 8_lanes, and 16_lanes configurations, besides the default 4_lanes one

    Changed

    • Compile Verilator and Ara's verilated model with LLVM, for a faster compile time.
    • Verilator updated to version v4.210.
    • Verilation is done with a hierarchical verilation flow
    • Replace ara_soc's LLC with a simple main memory
    • Reduce number of words on the main memory, for faster Verilation
    • Update common_cells to v1.22.1
    • Update axi to v0.29.1
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Jun 24, 2021)

    Added

    • Script to align all the elf sections to the AXI Data Width (the testbench requires it)
    • RISC-V V intrinsics can now be compiled
    • Add support for vsetivli, vmv<nr>r.v instructions
    • Add support for strided memory operations
    • Add support for stores misaligned w.r.t. the AXI Data Width

    Changed

    • Alignment with lowRISC's coding guidelines
    • Update Ara support for RISC-V V extension to V 0.10, with the exception of the instructions that were already missing
    • Replace toolchain from GCC to LLVM when compiling for RISC-V V extension
    • Update toolchain and SPIKE support to RISC-V V 0.10
    • Patches for GCC and SPIKE are no longer required
    • Ara benchmarks are now compatible with RISC-V V 0.10

    Fixed

    • Fix vrf_seq_byte definition in the Load Unit
    • Fix check to discriminate a valid byte in the VRF word, in the Load Unit
    • Fix axi_addrgen_d.len calculation in the Address Generation Unit
    • Correctly check whether the generated address corresponds to the vector load or the store unit
    • Typos on the ChangeLog's dates
    • Remove unwanted latches in the addrgen, simd_div, instr_queue, and decoder
    • Fix vl == 0 memory operations bug. Ara correctly tells Ariane that the memory operation is over
    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Apr 12, 2021)

    Added

    • Hardware support for:
      • Vector slide instructions (vslideup, vslide1up, vfslide1up, vslidedown, vslide1down, vfslide1down)
    • Software implementation of a integer 2D convolution kernel
    • CI job to check the conv2d execution on Ara

    Fixed

    • Removed dependency to a specific gcc g++ version in Makefile
    • Arithmetic and memory vector instructions with vl == 0 are considered as a NOP
    • Increment bit width of the vector length type (vlen_t), accounting for vectors whose length is VLMAX
    • Fix vector length calculation for the MaskB operand, which depends on vsew
    • Fix typo on the vrf_pnt updating logic at the Mask Unit
    • Update README to highlight dependency with Spike
    • Update Bender's link dependency to the public CVA6 repository
    • Retrigger the compile module if the ModelSim compilation did not succeed

    Changed

    • The encoding.h in the common Ara runtime is now a copy from the encoding.h in the Spike submodule
    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Mar 25, 2021)

  • v1.1.0(Mar 18, 2021)

    1.1.0 - 2020-03-18

    Added

    • GitHub Actions-based CI
    • Hardware support for:
      • Vector single-width floating-point fused multiply-add instructions (vfnmacc, vfmsac, vfnmsac, vfnmadd, vfmsub, vfnmsub)
      • Vector floating-point sign-injection instructions (vfsgnj, vfsgnjn, vfsgnjx)
      • Vector widening floating-point add/subtract instructions (vfwadd, vfwsub, vfwadd.w, vfwsub.w)
      • Vector widening floating-point multiply instructions (vfwmul)
      • Vector widening floating-point fused multiply-add instructions (vfwmacc, vfwnmacc, vfwmsac, vfwnmsac)
      • Vector floating-point merge instruction (vfmerge)
      • Vector floating-point move instruction (vfmv)

    Changed

    • Contributing guidelines updated to include commit message and C++ code style guidelines
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Mar 10, 2021)

    Added

    • Hardware support for:
      • Vector single-width floating-point add/subtract instructions (vfadd, vfsub, vfrsub)
      • Vector single-width floating-point multiply instructions (vfmul)
      • Vector single-width floating-point fused multiply-add instructions (vfmacc, vfmadd)
      • Vector single-width floating-point min/max instructions (vfmin, vfmax)
    • Software implementation of a floating-point matrix multiplication kernel
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 9, 2021)

    Added

    • Support for a coherent mode between Ara and Ariane
      • Snoop AW channel from Ara to L2
      • Invalidate Ariane's L1 cache sets accordingly
      • Coherent mode can be toggled together with consistent mode using the LSB of CSR 0x702

    Changed

    • Ariane's data cache is active by default
    • The matrix multiplication kernel achieves better performance
      • It reports the performance and the utilization for several matrix sizes
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Mar 9, 2021)

    Added

    • Hardware support for:
      • Vector single-width integer divide instructions (vdivu, vdiv, vremu, vrem)
      • Vector integer comparison instructions (vmseq, vmsne, vmsltu, vmslt, vmsleu, vmsle, vmsgtu, vmsgt)
    • Runtime measurement functions
    • Consistent mode which orders scalar and vector loads/stores.
      • Conservative ordering without address comparison
      • Consistent mode is enabled per default, can be disabled by clearing the LSB of CSR 0x702.

    Fixed

    • Ariane's accelerator dispatcher module was rewritten, fixing a bug where instructions would get skipped.
    • The Vector Store unit takes the EEW of the source vector register into account to shuffle the elements before writing them to memory.

    Changed

    • Vector mask instructions (vmand, vmnand, vmandnot, vmxor, vmor, vmnor, vmornot, vmxnor) no longer require the non-compliant constraint that the vector length is divisible by eight.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Mar 9, 2021)

    Added

    • Hardware compilation with Verilator
    • Software implementation of a matrix multiplication kernel

    Changed

    • The riscv_tests_simc Makefile target was deprecated. The riscv-tests are now run with the Verilated design, which can be called through the riscv_tests_simv Makefile target.
    • The operand queues now take as a parameter the type conversions they support (currently, SupportIntExt2, SupportIntExt4, and SupportIntExt8)
    • The Vector Multiplier unit now has independent pipelines for each element width.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Mar 9, 2021)

    Added

    • Hardware support for:
      • Vector single-width integer multiply instructions (vmul, vmulh, vmulhu, vmulhsu)
      • Vector single-width integer multiply-add instructions (vmacc, vnmsac, vmadd, vnmsub)
      • Vector integer add-with-carry/subtract-with-borrow instructions (vadc, vsbc)
      • Vector widening integer multiply instructions (vwmul, vwmulu, vwmulsu)
      • Vector widening integer multiply-add instructions (vwmaccu, vwmacc, vwmaccsu, vwmaccus)

    Changed

    • Explicit scan chain signals added to the lane's and Ara's interfaces

    Fixed

    • Miscellaneous fixes for compatibility with Synopsys DC
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Mar 9, 2021)

    Added

    • Hardware support for:
      • Bit-shift instructions (vsll, vsrl, vsra)
      • Vector widening integer add/subtract (vwadd, vwaddu, vwsub, vwsubu)
      • Vector integer extension (vzext, vsext)
      • Vector integer merge and move instructions (vmerge, vmv)
      • Vector narrowing integer right shift instructions (vnsrl, vnsra)

    Changed

    • Bender updated to version 0.21.0

    Fixed

    • CVA6's forwarding mechanism of operand B for accelerator instructions
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Mar 9, 2021)

    Added

    • Hardware support for vector configuration instructions: vsetvl, vsetvli.
    • Hardware support for basic arithmetic and logic instructions: vadd, vsub, vrsub, vmin(u), vmax(u), vand, vor, vxor.
    • Hardware support for vector mask operations: vmand, vmnand, vmandnot, vmor, vmnor, vmornot, vmxor, vmxnor.
    • Hardware support for masked instructions.
    • Hardware support for vector length multipliers.
    • Software support for vector code running on Ara.
    Source code(tar.gz)
    Source code(zip)
Owner
null
C Language version for yolo in risc-v

RISC-V C-Embedding Yolo 基于Yolo v2的蜂鸟e203 RISC-V部署代码,其中的加速器由队伍中负责硬件的人使用Verilog编写(暂不提供),并在硬件提供的C API上搭建了yolo的部署代码。其中,加速器硬件模块暂由c编写的神经网络加速器模拟器来代替。 网络实现了人脸

Ling Zhang 2 Jul 19, 2022
Operating system model using an assembler RISC-V RV32I instruction set.(development)

General Information Operating system model using an assembler RISC-V RV32I instruction set.(development) С++ Standard - c++17 gcc 9.3.0(Linux,unicode)

Alex Green 1 Dec 21, 2021
We implemented our own sequential version of GA, PSO, SA and ACA using C++ and the parallelized version with CUDA support

We implemented our own sequential version of GA, PSO, SA and ACA using C++ (some using Eigen3 as matrix operation backend) and the parallelized version with CUDA support. All of them are much faster than the popular lib scikit-opt.

Aron751 4 May 7, 2022
Provide sample code of efficient operator implementation based on the Cambrian Machine Learning Unit (MLU) .

Cambricon CNNL-Example CNNL-Example 提供基于寒武纪机器学习单元(Machine Learning Unit,MLU)开发高性能算子、C 接口封装的示例代码。 依赖条件 操作系统: 目前只支持 Ubuntu 16.04 x86_64 寒武纪 MLU SDK: 编译和

Cambricon Technologies 1 Mar 7, 2022
Minctest - tiny unit testing framework for ANSI C

Minctest Minctest is a very minimal unit-testing "framework" written in ANSI C and implemented in a single header file. It's handy when you want some

Lewis Van Winkle 47 Oct 20, 2022
Open-source vector similarity search for Postgres

Open-source vector similarity search for Postgres

Andrew Kane 712 Jan 7, 2023
Libcamera with OpenCV in Raspberry Pi 64 bit Bullseye

Libcamera OpenCV RPi Bullseye 64OS Libcamera + OpenCV on a Raspberry Pi 4 with 64-bit Bullseye OS In the new Debian 11, Bullseye, you can only capture

Q-engineering 8 Nov 24, 2022
The Intel 8080 ("eighty-eighty") is the second 8-bit microprocessor designed and manufactured by Intel.

i8080(Intel 8080) The Intel 8080 ("eighty-eighty") is the second 8-bit microprocessor designed and manufactured by Intel. It first appeared in April 1

VitorMob 13 Oct 29, 2022
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

Liu Liu 6.9k Jan 6, 2023
VNOpenAI 31 Dec 26, 2022
The core engine forked from NVidia's Q2RTX. Heavily modified and extended to allow for a nicer experience all-round.

Nail & Crescent - Development Branch Scratchpad - Things to do or not forget: Items are obviously broken. Physics.cpp needs more work, revising. Proba

PalmliX Studio 21 Dec 22, 2022
A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems

mpi-histo A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems. T

Raj Shrestha 2 Dec 21, 2021
the C++ version of solov2 with ncnn

the C++ version of SOLOV2 with ncnn

DayBreak 70 Jan 4, 2023
Final version of Plan 9 4th Edition from Bell Labs

This is a re-release of the final version of the 4th Edition of Plan 9 from Bell Labs distributed directly by Bell Labs. 4th Edition was originally r

Serge Vakulenko 8 Jun 21, 2022
the C++ version of Seq2Seq with ncnn

the C++ version of Seq2Seq with ncnn

DayBreak 22 Nov 3, 2022
This work is an expend version of livox_camera_calib(hku-mars/livox_camera_calib), which is suitable for spinning LiDAR。

expend_lidar_camera_calib This work is an expend version of livox_camera_calib, which is suitable for spinning LiDAR。 In order to apply this algorithm

afei 56 Dec 21, 2022
A lightweight version of OrcVIO that uses monocular images, inertial data, as well as bounding box measurements

OrcVIO-Lite About Object residual constrained Visual-Inertial Odometry (OrcVIO) is a visual-inertial odometry pipeline, which is tightly coupled with

Sean 26 Oct 27, 2022
OpenFOAM Foundation repository for OpenFOAM version 9

README for OpenFOAM-9 # About OpenFOAM OpenFOAM is a free, open source computational fluid dynamics (CFD) software package released by the OpenFOAM Fo

Official OpenFOAM Repository 61 Nov 10, 2022
Simple inference deep head pose ncnn version

ncnn-deep-head-pose Simple implement inference deep head pose ncnn version with high performance and optimized resource. This project based on deep-he

Đỗ Công Minh 13 Dec 16, 2022