Want a faster ML processor? Do it yourself! -- A framework for playing with custom opcodes to accelerate TensorFlow Lite for Microcontrollers (TFLM).

Overview

CFU Playground

Want a faster ML processor? Do it yourself!

This project provides a framework that an engineer, intern, or student can use to design and evaluate enhancements to an FPGA-based “soft” processor, specifically to increase the performance of machine learning (ML) tasks. The goal is to abstract away most infrastructure details so that the user can get up to speed quickly and focus solely on adding new processor instructions, exploiting them in the computation, and measuring the results.

This project enables rapid iteration on processor improvements -- multiple iterations per day.

This is how it works:

  • Choose a TensorFlow Lite model; a quantized person detection model is provided, or bring your own.
  • Execute the inference on the Arty FPGA board to get cycle counts per layer.
  • Choose an TFLite operator to accelerate, and dig into that code.
  • Design new instruction(s) that can replace multiple basic operations.
  • Build a custom function unit (a small amount of hardware) that performs the new instruction(s).
  • Modify the TFLite/Micro library kernel to use the new instruction(s), which are available as intrinsics with function call syntax.
  • Rebuild the FPGA Soc, recompile the TFLM library, and rerun to measure improvement.

The focus here is performance, not demos. The inputs to the ML inference are canned/faked, and the only output is cycle counts. It would be possible to export the improvements made here to an actual demo, but currently no pathway is set up for doing so.

With the exception of Vivado, everything used by this project is open source.

Disclaimer: This is not an officially supported Google project. Support and/or new releases may be limited.

This is an early prototype of a ML exploration framework; expect a lack of documentation and occasional breakage. If you want to collaborate on building out this framework, reach out to [email protected]! See "Contribution guidelines" below.

Required hardware/OS

  • Currently, the only supported target is the Arty 35T board from Digilent.
  • The only supported host OS is Linux (Debian / Ubuntu).

If you want to try things out using Renode simulation, then you don't need either the Arty board or Vivado software. You can also perform Verilog-level cycle-accurate simulation with Verilator, but this is much slower.

Assumed software

  • Vivado must be manually installed.

Other required packages will be checked for and, if on a Debian-based system, automatically installed by the setup script below.

Setup

Clone this repo, cd into it, then get run:

scripts/setup

Use

Build the SoC and load the bitstream onto Arty:

cd proj/proj_template
make prog

This builds the SoC with the default CFU from proj/proj_template. Later you'll copy this and modify it to make your own project.

Build a RISC-V program and execute it on the SoC that you just loaded onto the Arty:

make load

To use Renode to execute on a simulator on the host machine (no Vivado or Arty board required), execute:

make renode

To use Verilator to execute on a cycle-accurate RTL-level simulator (no Vivado or Arty board required), execute:

make PLATFORM=sim load

Underlying open-source technology

  • LiteX: Open-source framework for assembling the SoC (CPU + peripherals)
  • VexRiscv: Open-source RISC-V soft CPU optimized for FPGAs
  • nMigen: Python toolbox for building digital hardware

Licensed under Apache-2.0 license

See the file LICENSE.

Contribution guidelines

If you want to contribute to CFU Playground, be sure to review the contribution guidelines. This project adheres to Google's code of conduct. By participating, you are expected to uphold this code.

Issues
  • proj_template/ make prog  doesn't select which python3 it needs

    proj_template/ make prog doesn't select which python3 it needs

    I've run sauron:~/fpga/CFU-Playground$ ./scripts/setup and it worked fine.

    Then

    sauron:~/fpga/CFU-Playground/proj/proj_template$ make prog
    (...)
    INFO:SoC:IRQ Handler (up to 32 Locations).
    IRQ Locations: (2)
    - uart   : 0
    - timer0 : 1
    INFO:SoC:--------------------------------------------------------------------------------
    Traceback (most recent call last):
      File "./common_soc.py", line 54, in <module>
        main()
      File "./common_soc.py", line 50, in main
        workflow.run()
      File "/home/merlin/fpga/CFU-Playground/soc/board_specific_workflows/general.py", line 114, in run
        self.load(soc, soc_builder)
      File "/home/merlin/fpga/CFU-Playground/soc/board_specific_workflows/general.py", line 103, in load
        prog.load_bitstream(bitstream_filename)
      File "/home/merlin/fpga/CFU-Playground/third_party/python/litex/litex/build/openocd.py", line 21, in load_bitstream
        config = self.find_config()
      File "/home/merlin/fpga/CFU-Playground/third_party/python/litex/litex/build/generic_programmer.py", line 72, in find_config
        import requests
    ModuleNotFoundError: No module named 'requests'
    sauron:~/fpga/CFU-Playground/proj/proj_template$ type python3
    python3 is hashed (/opt/symbiflow/xc7/conda/envs/xc7/bin/python3)
    

    ../../soc/common_soc.py runs env python3 but I found nothing in https://cfu-playground.readthedocs.io/en/latest/setup-guide.html that selects/chooses which python3 should be run.

    The system python3 works worse

        raise YosysError("Could not find an acceptable Yosys binary. The `nmigen-yosys` PyPI "
    nmigen._toolchain.yosys.YosysError: Could not find an acceptable Yosys binary. The `nmigen-yosys` PyPI package, if available for this platform, can be used as fallback
    

    and the fomu python3 doesn't work either

    sauron:~/fpga/CFU-Playground/proj/proj_template$ type python3
    python3 is /home/merlin/fpga/fomu-toolchain-linux/bin/python3
    sauron:~/fpga/CFU-Playground/proj/proj_template$ make prog
    /home/merlin/fpga/CFU-Playground/scripts/pyrun /home/merlin/fpga/CFU-Playground/proj/proj_template/cfu_gen.py
    Traceback (most recent call last):
      File "/home/merlin/fpga/CFU-Playground/proj/proj_template/cfu_gen.py", line 38, in <module>
        main()
      File "/home/merlin/fpga/CFU-Playground/proj/proj_template/cfu_gen.py", line 31, in main
        new_verilog = verilog.convert(cfu, name='Cfu', ports=cfu.ports)
      File "/home/merlin/fpga/CFU-Playground/third_party/python/nmigen/nmigen/back/verilog.py", line 61, in convert
        return _convert_rtlil_text(rtlil_text, strip_internal_attrs=strip_internal_attrs)
      File "/home/merlin/fpga/CFU-Playground/third_party/python/nmigen/nmigen/back/verilog.py", line 10, in _convert_rtlil_text
        yosys = find_yosys(lambda ver: ver >= (0, 9))
      File "/home/merlin/fpga/CFU-Playground/third_party/python/nmigen/nmigen/_toolchain/yosys.py", line 228, in find_yosys
        raise YosysError("Could not find an acceptable Yosys binary. The `nmigen-yosys` PyPI "
    nmigen._toolchain.yosys.YosysError: Could not find an acceptable Yosys binary. The `nmigen-yosys` PyPI package, if available for this platform, can be used as fallback
    make: *** [../proj.mk:187: generate_cfu] Error 1
    
    opened by marcmerlin 24
  • Verilator compile in Conda env can't find the compiler

    Verilator compile in Conda env can't find the compiler

    To reproduce: go to this colab: https://colab.research.google.com/drive/1_GlX-pO4rune8GMIK4q_IuhxpIhnPTni?resourcekey=0-jMbF8wzg0csZu0fxpOy78A&usp=sharing (currently only readable within Google)

    Then select "Runtime" --> "Run all"

    Output:

    make -j -C /content/CFU-Playground/soc/build/sim.proj_template_v/gateware/obj_dir -f Vsim.mk Vsim
    make[3]: Entering directory '/content/CFU-Playground/soc/build/sim.proj_template_v/gateware'
    make[3]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
    Vsim.mk:13: warning: NUL character seen; rest of line ignored
    ccache x86_64-conda-linux-gnu-c++ -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /content/CFU-Playground/env/conda/envs/cfu-common/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /content/CFU-Playground/env/conda/envs/cfu-common/include -I.  -MMD -I/content/CFU-Playground/env/conda/envs/cfu-common/share/verilator/include -I/content/CFU-Playground/env/conda/envs/cfu-common/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=1 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /content/CFU-Playground/env/conda/envs/cfu-common/include -ggdb -Wall -O0   -DTRACE_FST -I/content/CFU-Playground/third_party/python/litex/litex/build/sim/core  -std=gnu++14 -Os -c -o veril.o /content/CFU-Playground/third_party/python/litex/litex/build/sim/core/veril.cpp
    ccache: error: Could not find compiler "x86_64-conda-linux-gnu-c++" in PATH
    Vsim.mk:71: recipe for target 'veril.o' failed
    make[3]: *** [veril.o] Error 1
    make[3]: Leaving directory '/content/CFU-Playground/soc/build/sim.proj_template_v/gateware/obj_dir'
    /content/CFU-Playground/third_party/python/litex/litex/build/sim/core/Makefile:38: recipe for target 'sim' failed
    make[2]: *** [sim] Error 2
    make[2]: *** Waiting for unfinished jobs....
    make[4]: Leaving directory '/content/CFU-Playground/soc/build/sim.proj_template_v/gateware/modules/gmii_ethernet'
    cp gmii_ethernet/gmii_ethernet.so gmii_ethernet.so
    make[3]: Leaving directory '/content/CFU-Playground/soc/build/sim.proj_template_v/gateware/modules'
    make[2]: Leaving directory '/content/CFU-Playground/soc/build/sim.proj_template_v/gateware'
    /content/CFU-Playground/soc/sim.mk:56: recipe for target 'run' failed
    make[1]: *** [run] Error 1
    make[1]: Leaving directory '/content/CFU-Playground/soc'
    ../proj.mk:354: recipe for target 'load' failed
    make: *** [load] Error 2
    

    To highlight the error:

    ccache: error: Could not find compiler "x86_64-conda-linux-gnu-c++" in PATH
    

    Looking under env/conda/envs/cfu-common/bin/, I see:

    /content/CFU-Playground/env/conda/bin/x86_64-conda-linux-gnu-ld
    /content/CFU-Playground/env/conda/bin/x86_64-conda_cos6-linux-gnu-ld
    

    and nothing else with an x86_64 prefix.

    opened by tcal-x 23
  • Add support for verilated CFU in Renode

    Add support for verilated CFU in Renode

    This is related to #11.

    What's new:

    1. New subdirectory common/renode-verilator-integration which inludes CmakeLists.txt, sim_main.cpp and renode_h.patch. These files are used to build verilated CFU library. Since these are adjusted to work as Renode plugin, renode_h.patch changes include path in renode.h to match our custom structure (we download few files, see 2.).
    2. VerilatorIntegrationLibrary - files that are part of library are downloaded to third_party/renode/verilator-integration-library. These files are minimum to build CFU library and they must be downloaded since these are not present in Renode portable. As soon as Renode includes them in portable version we will get rid of this workaround.
    3. Renode scripts are now generating Verilated.CFUVerilatedPeripheral with required settings to run it. It's also appended to predefined scripts so new scripts should not contain CFU in them. Our scripts (proj.mk with generate_renode_scripts.py) will take care if CFU should or should not be included in Renode scripts (if flag SW_ONLY=1 is passed, CFU won't be added).
    4. Projects added to CI are now being tested without any additional build parameters by default. You can add different build variants by adding ci_build_params.txt.{0-9} but it won't turn off default tests. It is done for mnv2_first right now.
    5. example_cfu is added to CI workflow.
    opened by robertszczepanski 20
  • Enable QPI mode for Crosslink-NX Evaluation board

    Enable QPI mode for Crosslink-NX Evaluation board

    This PR enables the usage of QPI mode on CNX ENV board. It depends on the changes made in litex-hub/litespi#53 and in enjoy-digital/litex#979. Since we're switching to the flashboot process that LiteX provides, the additional step of using the MiSoC image file writer (mkmscimg) is required before flashing the software to insert length and CRC32:

    python <path_to_litex>/litex/soc/software/mkmscimg.py -o output.bin --little --fbi <path_to_cfu_proj>/build/software.bin
    

    PS I've noticed that this repo is not using the upstream version of LiteX and instead is using the fork that @tcal-x provides. What is the reason behind this and are there plans to upstream these changes?

    opened by fkokosinski 18
  • Enable running benchmarks on CrossLink NX Evaluation board

    Enable running benchmarks on CrossLink NX Evaluation board

    Currently it is not possible to run benchmarks (or any other piece of software) from this repository on CNX EVN board. This PR adds a new platform (based on existing hps platform), that enables that by using the on-board SPI flash chip. Building gateware/software (in project directory, e.g. proj/proj_template):

    UART_SPEED=115200 PLATFORM=cnx_evn TARGET=lattice_crosslink_nx_evn make bitstream software
    

    Flashing/uploading using ecpprog (in project directory, e.g. proj/proj_template):

    ecpprog -so 2097152 build/software.bin
    ecpprog -S ../../soc/build/cnx_evn.proj_template/gateware/cnx_evn_platform.bit
    
    opened by fkokosinski 18
  • Dynamic clock control between CPU and CFU

    Dynamic clock control between CPU and CFU

    After tests done in #514 we want to use a possibility of disabling/enabling clocks to minimize power consumption during inference.

    This PR introduces dynamic clock control between CPU and CFU. When CPU receives a confirmation that a command is accepted (cmd.ready asserted) its clock gets disabled until CFU is ready to response (rsp.valid asserted). Then CPU's clock is enabled and CFU's clock is disabled.

    Right now this is a draft since even though clock control between SoC and CFU works, for some reason CFU hangs after receiving a command from CPU. Maybe it requires few additional cycles for setup at system boot?

    There is also a weird anomaly - after boot, power consumption is about 44-45mW (~38mW with CFU clock disabled) but after running some tests (e.g. 1: TfLM Models menu -> 1: HPS models -> g: Golden tests (check for expected outputs)) power consumption rises to around 54mW and stays on that level after each operation (it goes down to 42-44mW during other executions of CFU functions e.g. HPS golden tests, but it rises again after finish). It falls back to 44mW after asserting reset signal in CFU clock domain.

    opened by robertszczepanski 17
  • Use dcache metrics CSRs in benchmarks

    Use dcache metrics CSRs in benchmarks

    This PR introduces displaying of additional dcache-related matrics from CPU's CSRs in benchmarks:

    Running sequential stores benchmark
    Hello, Store!
    Val:28  Cycles: 11389084   Cycles/store: 10
    [Dcache accesses] Before: 3223075888, After: 3224127310, Diff: 1051422
    [Dcache refills]  Before: 952539, After: 952545, Diff: 6
    [Dcache stalls]   Before: 89447457, After: 96635863, Diff: 7188406
    
    opened by fkokosinski 16
  • hps_accel: max pool fails golden tests

    hps_accel: max pool fails golden tests

    hps_accel is exhibiting strange behavior.

    To reproduce:

    1. Use code at PR #450
    2. Build and run on NX/17
    3. From the menu, select 1 (TfLM models), 1 (HPS models), 3 (presence model), g (golden tests) - tests fail
    4. Build and run in simulator (make load PLATFORM=sim) - tests pass.
    opened by alanvgreen 14
  • `mnv2_first` CFU timeout in Renode

    `mnv2_first` CFU timeout in Renode

    CFU has been recently added to Renode and I am working on using it in CI in CFU-Playground. example_cfu_v seems to work fine but I've encountered a problem with mnv2_first project.

    The verilated peripheral is built using Renode Verilator Integration repository and Verilator Integration Library inside Renode VerilatorPlugin. This generates library that is then bind with Renode by adding Verilated.CFUVerilatedPeripheral to a platform. After that whenever there is an opcode pattern that matches CFU pattern CFU is being executed in Renode which executes a function from verilated peripheral library.

    I've already noticed that functionID is incorrectly retrieved from instruction pattern and there is a fix about to be merged soon. It will be done by merging funct7 with funct3 like (funct7 << 3) + funct3. The problem is everything works fine for example_cfu_v but doesn't work for mnv2_first. For mnv2 project every CFU execution ends with Operation timeout from our CFU which means that rsp_valid is never set to 1. I also tried checking rsp_payload_response_ok instead of rsp_valid and then there's no timeout but all tests fail anyway so I'm not sure if that's a good approach.

    Maybe CFU expects something more to be set from CPU in case of mnv2_first and it's not properly handled in execute()? If you have any ideas I would be grateful for help :)

    FYI @tcal-x

    opened by robertszczepanski 14
  • add framebuffer enable realted settings and utility functions

    add framebuffer enable realted settings and utility functions

    Hi :

    In this pull request, I add some framebuffer stuff to enable framebuffer function for testing.

    1. add framebuffer enable realted settings in board_specific_workflows and common_socmk.
    2. add simple framebuffer utility functions, includes draw line, draw box, draw string and fill box.
    3. add spi card enable setting, but not start real trying.

    for framebuffer output function enable, there some add on module could be add.

    1. VGA version, please check the following URLs. https://raspi.tv/2014/gert-vga-666-review-and-video https://hackaday.com/2016/02/21/5-vga-for-raspberry-pi/

    2. HDMI version, buy a HDMI output ready fpga board, such like qmtech wukong board, nexsys video. https://reference.digilentinc.com/learn/programmable-logic/tutorials/nexys-video-hdmi-demo/start

    BR, Akio

    opened by akioolin 14
  • gen2 CFU flakiness

    gen2 CFU flakiness

    Then gen2 cfu seems to work ~90% of the time. The other 10% of the time it fails in a fairly, but not competely consistent manner.

    To reproduce

    1. Build with #383 patched
    2. Run on proto2, selecting menu option "3" (project menu) then "5" (test layer 05)
    3. Keep pressing "5" until experiencing a failure. Typically, between 10 and 50 runs are needed to see the first failure.

    Usually the failure report "8576 differences", though on some proto2 boards, other numbers of failures are sometimes seen.

    Observations:

    In the most common form of failure, examining the first 32 words shows that many of the failures result from output appearing exactly 4 words after it should (i.e expected[n] appears at expected[n+4]). However not all of the wrong words exactly follow this pattern.

    Building at target different clocks speeds, with different seeds and with/without abc9 did not appear to affect the behaviour.

    opened by alanvgreen 13
  • Accuracy of perf_get_mcycle() and perf_get_mcycle64() using Arty a7 100T board

    Accuracy of perf_get_mcycle() and perf_get_mcycle64() using Arty a7 100T board

    Hi @tcal-x , I've tried to profile some cfu-function calls for my accel. design and as well as for a "test-setup". I'm seeing some unusually high cycle times. For ex:- Using the basic template cfu.v file:

    // Trivial handshaking for a combinational CFU
      assign rsp_valid = cmd_valid;
      assign cmd_ready = rsp_ready;
    
      //
      // select output -- note that we're not fully decoding the 3 function_id bits
      //
      assign rsp_payload_outputs_0 = cmd_payload_function_id[0] ? 
                                               cmd_payload_inputs_1 :
                                               cmd_payload_inputs_0 ;
    

    And using the "test-setup" code in the header file:

    /*Single cycle cfu test*/
      int start_cfu_comb = perf_get_mcycle();
      int test_3 = cfu_op1(0,1,2);
      int test_4 = cfu_op0(2,1,2);
      int test_5 = cfu_op1(1,3,4);
      int test_6 = cfu_op0(0,4,5);
      int end_cfu_comb = perf_get_mcycle();
      printf("test_3 %d\n", test_3); //Exp: 2
      printf("test_4 %d\n", test_4); //Exp: 1
      printf("test_5 %d\n", test_5); //Exp: 4
      printf("test_6 %d\n", test_6); //Exp: 4
      printf("Time taken (perf ) %d\n", end_cfu_comb- start_cfu_comb);
      //Breaker
      while(1==1);
      printf("Done\n");
    

    When running the above and profiling via perf_get_mcycle ( or mcycle 64) on the Arty a7 100T board @ 70 MHz clk rate, I'm getting cycle times around 180 (mcycle64()) and 92(mcycle()) for the above setup. It should ideally be around 4-6 cycles, so is there any issue with perf_get_mcycle() or mcycle64() ? I'l try using custom counters in the cfu file to check it as well. Please let me know soon. Thanks, Bala.

    opened by bala122 8
  • Keyword Spotting doesn't fit on Fomu with current Yosys due to memory R/W check (but there's a fix)

    Keyword Spotting doesn't fit on Fomu with current Yosys due to memory R/W check (but there's a fix)

    After the recent dependency bump (#619), I thought that it caused the KWS project to no longer fit on Fomu (over the limit by ~80 LCs). But then I was puzzled that the Fomu CI job was still passing.

    I found that the actual difference was which Yosys was being used. In CI (and also running locally after a normal setup), the build actually uses a Yosys v0.14 binary directly downloaded (this is being removed: #623). But during my testing, I had removed the local Yosys v0.14, which resulted in the build instead using the Conda-provided Yosys v0.19, which resulted in significantly higher LC count.

    The increase in LC count is not because of worse optimization; it's because of a new check for simultaneous writes/reads to a memory block, which adds extra logic to ensure proper semantics. There is some discussion about it here: https://github.com/YosysHQ/yosys/pull/3351. If the check and extra logic is not needed, because you know that the design will never read and write the same address during the same cycle, you can add the attribute (* no_rw_check *).

    When I add this attribute to the regfile and ICache memory blocks in VexRiscv, the LC count drops to 100 LCs under the number available. I will check with Charles that this is ok, and if so, how to get the attribute added to VexRiscv verilog generation.

    opened by tcal-x 1
  • Usage of fpga DRAM in synthesis

    Usage of fpga DRAM in synthesis

    Hi @tcal-x , I had a general doubt regarding synthesis of the Vexriscv core with the accelerator peripheral designed on an fpga. Are the memory resources used in synthesis restricted to flip flops, BRAM, registers, and all single cycle access elements on the fpga OR is DRAM also used by default? Do we need to write our own memory interface IPs to access DRAM/ off chip memory or is this done by default using vivado?

    Essentially, I want my setup to have multi cycle memory accesses when running on an fpga, so that upon more data reuse in buffers, I see performance benefits( because I avoid these high latency accesses) Thanks, Bala.

    opened by bala122 6
  • symbiflow_synth fails in Yosys Techmap stage

    symbiflow_synth fails in Yosys Techmap stage

    Hello,

    If this is error has been observed before, can someone please point me in the right direction ?

    Thanks Raj

    make[4]: Entering directory '/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/soc/build/digilent_arty.mnv2/gateware' symbiflow_synth -t digilent_arty -v /home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/proj/mnv2/cfu.v /home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/third_party/python/pythondata_cpu_vexriscv/pythondata_cpu_vexriscv/verilog/VexRiscv_FullCfu.v /home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/soc/build/digilent_arty.mnv2/gateware/digilent_arty.v -d artix7 -p xc7a100tcsg324-1 -x digilent_arty.xdc > /dev/null /home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/env/symbiflow/share/symbiflow/techmaps/xc7_vpr/techmap/cells_map.v:0: ERROR: Can't find object for defparam RAM_EXTENSION_A! make[4]: *** [Makefile:34: digilent_arty.eblif] Error 1 make[4]: Leaving directory '/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/soc/build/digilent_arty.mnv2/gateware' Traceback (most recent call last): File "./common_soc.py", line 57, in main() File "./common_soc.py", line 53, in main workflow.run() File "/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/soc/board_specific_workflows/general.py", line 125, in run soc_builder = self.build_soc(soc) File "/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/soc/board_specific_workflows/digilent_arty.py", line 73, in build_soc return super().build_soc(soc, **kwargs) File "/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/soc/board_specific_workflows/general.py", line 102, in build_soc soc_builder.build(run=self.args.build, **kwargs) File "/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/third_party/python/litex/litex/soc/integration/builder.py", line 350, in build vns = self.soc.build(build_dir=self.gateware_dir, **kwargs) File "/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/third_party/python/litex/litex/soc/integration/soc.py", line 1147, in build return self.platform.build(self, *args, **kwargs) File "/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/third_party/python/litex/litex/build/xilinx/platform.py", line 73, in build return self.toolchain.build(self, *args, **kwargs) File "/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/third_party/python/litex/litex/build/xilinx/symbiflow.py", line 241, in build _run_make() File "/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/third_party/python/litex/litex/build/xilinx/symbiflow.py", line 84, in _run_make raise OSError("Error occured during Symbiflow's script execution.") OSError: Error occured during Symbiflow's script execution. make[3]: *** [/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/soc/common_soc.mk:115: build/digilent_arty.mnv2/gateware/digilent_arty.bit] Error 1 make[3]: Leaving directory '/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/soc' make[2]: *** [../proj.mk:310: prog] Error 2 make[2]: Leaving directory '/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/proj/mnv2'

    The yosys version called by Symbiflow underneath is: Yosys 0.19+18 (git sha1 08c319fc3, x86_64-conda-linux-gnu-cc 11.2.0 -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -fdebug-prefix-map=/home/runner/work/conda-eda/conda-eda/workdir/conda-env/conda-bld/yosys_1657668832012/work=/usr/local/src/conda/yosys-0.19_19_g08c319fc3 -fdebug-prefix-map=/home/rajsaktish/repos/all_CFU_space/synth_BG/CFU-FLOW/env/conda/envs/cfu-symbiflow=/usr/local/src/conda-prefix -fPIC -Os -fno-merge-constants)

    opened by rajsaktish 0
  • Add PLL to control clocks

    Add PLL to control clocks

    This introduces PLL into HPS design with an additional option to enable/disable output clocks. It makes power consumption much higher. Initial tests with multimeter result in 70mW when CFU is disabled. It's ~30mW higher than version with DCC only presented in #597.

    CI will fail because nextpnr-nexus requires a small fix to work. There is not needed tile generated into output .fasm. If you want to test it locally, build it as usual, prjoxide will fail to generate a bitstream with an error: thread 'main' panicked at 'No enum named PLL_LLC.MODE in tilegroup R28C1_PLL_LLC. Then go to generated .fasm and just search for PLL_LLC.MODE and remove a line that contains it. Run again prjoxide and it should generate a bitstream just fine:

    cd soc/build/hps.hps_accel/gateware
    prjoxide pack hps_proto2_platform.fasm hps_proto2_platform.bit
    
    opened by robertszczepanski 0
Owner
Google
Google ❤️ Open Source
Google
Number recognition with MNIST on Raspberry Pi Pico + TensorFlow Lite for Microcontrollers

About Number recognition with MNIST on Raspberry Pi Pico + TensorFlow Lite for Microcontrollers Device Raspberry Pi Pico LCDディスプレイ 2.8"240x320 SPI TFT

iwatake 48 Jul 28, 2022
Eloquent interface to Tensorflow Lite for Microcontrollers

This Arduino library is here to simplify the deployment of Tensorflow Lite for Microcontrollers models to Arduino boards using the Arduino IDE.

null 172 Aug 11, 2022
TensorFlow Lite for Microcontrollers

TensorFlow Lite for Microcontrollers Build Status Official Builds Community Supported Builds Additional Documentation TensorFlow Lite for Microcontrol

null 833 Aug 4, 2022
Pose-tensorflow - Human Pose estimation with TensorFlow framework

Human Pose Estimation with TensorFlow Here you can find the implementation of the Human Body Pose Estimation algorithm, presented in the DeeperCut and

Eldar Insafutdinov 1.1k Aug 8, 2022
Openvino tensorflow - OpenVINO™ integration with TensorFlow

English | 简体中文 OpenVINO™ integration with TensorFlow This repository contains the source code of OpenVINO™ integration with TensorFlow, designed for T

OpenVINO Toolkit 153 Aug 1, 2022
TensorFlow Lite, Coral Edge TPU samples (Python/C++, Raspberry Pi/Windows/Linux).

TensorFlow Lite, Coral Edge TPU samples (Python/C++, Raspberry Pi/Windows/Linux).

Nobuo Tsukamoto 84 Jun 29, 2022
Swapping face using Face Mesh with TensorFlow Lite

demo.mp4 Aiine Transform (アイン変換) Swapping face using FaceMesh. (could be used to unveil masked faces) Tested Environment Computer Windows 10 (x64) + V

iwatake 17 Apr 26, 2022
A demo to run tensorflow-lite on Penglai TEE.

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

IPADS 4 Dec 15, 2021
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla

InferenceHelper This is a helper class for deep learning frameworks especially for inference This class provides an interface to use various deep lear

iwatake 154 Aug 1, 2022
Lite.AI.ToolKit 🚀🚀🌟: A lite C++ toolkit of awesome AI models such as RobustVideoMatting🔥, YOLOX🔥, YOLOP🔥 etc.

Lite.AI.ToolKit ?? ?? ?? : A lite C++ toolkit of awesome AI models which contains 70+ models now. It's a collection of personal interests. Such as RVM, YOLOX, YOLOP, YOLOR, YoloV5, DeepLabV3, ArcFace, etc.

DefTruth 1.9k Aug 7, 2022
vs2015上使用tensorRT加速yolov5推理(Using tensorrt to accelerate yolov5 reasoning on vs2015)

1、安装环境 CUDA10.2 TensorRT7.2 OpenCV3.4(工程中已给出,不需安装) vs2015 下载相关工程:https://github.com/wang-xinyu/tensorrtx.git 2、生成yolov5s.wts文件 在生成yolov5s.wts前,首先需要下载模

null 16 Apr 19, 2022
VNOpenAI 23 Jul 31, 2022
OpenEmbedding is an open source framework for Tensorflow distributed training acceleration.

OpenEmbedding English version | 中文版 About OpenEmbedding is an open-source framework for TensorFlow distributed training acceleration. Nowadays, many m

4Paradigm 19 Jul 25, 2022
Spying on Microcontrollers using Current Sensing and embedded TinyML models

Welcome to CurrentSense-TinyML CurrentSense-TinyML is all about detecting microcontroller behaviour with current sensing and TinyML. Basically we are

Santander Security Research 69 Jul 27, 2022
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state

JoliBrain 2.4k Aug 2, 2022
Training and Evaluating Facial Classification Keras Models using the Tensorflow C API Implemented into a C++ Codebase.

CFace Training and Evaluating Facial Classification Keras Models using the Tensorflow C API Implemented into a C++ Codebase. Dependancies Tensorflow 2

null 8 Nov 23, 2021
TensorFlow implementation of SQN based on RandLA-Net's encoder

SQN_tensorflow This repo is a TensorFlow implementation of Semantic Query Network (SQN). For Pytorch implementation, check our SQN_pytorch repo. Our i

PointCloudYC 7 Jul 10, 2022
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.

PSTensor : Custimized a Tensor Data Structure Compatible with PyTorch and TensorFlow. You may need this software in the following cases. Manage memory

Jiarui Fang 8 Feb 12, 2022
Movenet cpp deploy; model transformed from tensorflow

MoveNet-PaddleLite Adapted from PaddleDetection; Movenet cpp deploy based on PaddleLite; Movenet model transformed from tensorflow; 简介 Movenet是近年的优秀开源

null 9 May 28, 2022