Page 1 of 1

Xilinx Webpack 2015.4 HLx edition

PostPosted: Tue Dec 29, 2015 1:52 pm
by theover
As mentioned in the the "minimal setup" thread here, indeed there appears to be a free C based IP design module in the latest Vivado download.

I haven't tried it yet but read through the main documentation which looks promising.

In principle, you can turn a function (and it's sub-functions) in a C program into a FPGA design, and most of it automatically, and there are HDL libraries for various C functions (including math, but I don't know how much of it, video processing, etc). Vivado will create state machines, and even Zynq interfaces automatically for your C program turned HDL. Arrays turn into block memories, and there optimization and source code block functions.

A "minimal" approach with this addition will be very useful, I suppose.

Theo V.

Re: Xilinx Webpack 2015.4 HLx edition

PostPosted: Sat Jan 02, 2016 11:03 pm
by theover
So I've installed the Vivado 2015.4 HLx, and fave come to a few conclusions thus far.

vivado_hlx works as advertised in a couple of example projects I tried, so I think it puts example C code into readable, sensible Verilog/VHDL files, inlcuding special C types for various computation accuracies, and (on my very fast Linux machine) pretty fast.

I'd like to try out the result of making bit-files out of these C examples to try them out on my 7010 Zynq from the Parallella board, so I face two issues: getting the "empty example project" from one of the threads here to work on Vivado 2015.4 (I think there are IP locks when just reading the previous version project), and somehow integrating the vivado_hlx results.

It is possible to create a HLx project that creates AXI interfaces, I think including ARM initializing and example driver programs, which I'll look into. Maybe the description of the "empty project" can be turned into using this.

I tried importing a little IP example from vivado_hlx to vivado, which appears at first sight to work, but that leaves me with making an interface adaption somehow, we'll see if that can work easily enough, or requires intelligently plowing through some Xilinx course material from early HLx jsut to find it doesn't really work...

Anyhow, it is an interesting prospect to be able to use C code to make accelerated hardware design language and bit files for the Parallella board.


Re: Xilinx Webpack 2015.4 HLx edition

PostPosted: Sun Jan 03, 2016 1:04 pm
by kirill
I think the easiest thing is to try to replicate the interface of "my_mult" IP, which is using AXI-lite. That way you can keep all the testing side the same. I think code below should be equivalent (I only looked at generated code, not tried to instantiate an IP, you might need more pragmas to match interface exactly, check reference for HLS)

Code: Select all
#include <inttypes.h>

void my_mult_hlx(uint32_t const * in, uint32_t * out)
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE s_axilite register port=out
#pragma HLS INTERFACE s_axilite register port=in
    uint32_t ab = *in;
    uint16_t a = ab>>16;
    uint16_t b = ab&0xFFFF;

    *out = a*b;

Using 2015.4 instead of 2015.3 is no problem at all. You do need to remove the check in the TCL script, I think I removed this check from the script already, still need to update readme to mention it works with 2015.4 as well.

I'm working on making an AXI-Stream + DMA version of the same sample IP using HLS (same code as above but with different pragmas). Using DMA is a bit more involved since you need to get access to physical memory and to control DMA block. But the pay off is a much better throughput.

Re: Xilinx Webpack 2015.4 HLx edition

PostPosted: Sun Jan 03, 2016 2:33 pm
by theover

Indeed the loading of the project via the script worked, except for a "already present" warning about some .v file. I'll do it again to check some more things out, because I didn't yet load the resulting 2015.4 bit.bin file in the Parallella board due to a warning that there was a group of MIO's that was considered a mixture of 1.8 and 3.3 Volt pins, and was forced to 3.3V by Vivado.

I'll try some more tonight, probably.

Theo V.

Re: Xilinx Webpack 2015.4 HLx edition

PostPosted: Tue Jan 05, 2016 1:06 am
by theover
Well, I created and exported IP from the vivado_hls tool from the little C program above , which works, it shows up in the IP list in vivado 2015.4 at the same machine. So I deleted the block in the "empty project" with the multiply example, created an instance of the HLS based mult function, which works and makes a block with "HLS" graphics in it, which I connected in the place of the old mul block, after which the compile works.

I could load the resulting bit file into the Parallella withe the patched device tree, and the board still runs, but the multiply C code from the "empyy project" doesn't work, it doesn't hang or anything, but the multiply gives zero only. Now, I had to change the interface addresses by hand, because the vivado_hls tool used different addresses, but more inmportantly, maybe there is a difference between "slave interface S00_AXI" and "s_axi_AXILiteS", or the different base names (S00_AXI_reg / Reg) are the problem.


Re: Xilinx Webpack 2015.4 HLx edition

PostPosted: Tue Jan 05, 2016 11:44 am
by kirill
First make sure you assign the same address for hls-based ip in the address editor in vivado as was assigned in the original sample design (7002_0000 with 4K address space), otherwise you will need to update devicetree.

Second you'll need to modify test app slightly. I looked at driver code HLS generates and the register mapping is different from what test app expects. Basically input register is at an offset 0x10 bytes and output is at an offset 0x18 bytes.

In uio_mult_test.c add these two lines

Code: Select all
const size_t IN  = 0x10/4;
const size_t OUT = 0x18/4;

Then replace p[0] with p[IN] and p[1] with p[OUT]. This should make it work.

Re: Xilinx Webpack 2015.4 HLx edition

PostPosted: Tue Jan 05, 2016 6:44 pm
by theover
That makes that nice array access a bit more, well, cache problems prepared, I suppose.

But, with the above changes,to put it mild: IT WORKS, so I happily have device tested my first automated C-to-FPGA on the Parallella board.


Re: Xilinx Webpack 2015.4 HLx edition

PostPosted: Wed Jan 06, 2016 8:46 am
by kirill
Good to hear it worked, Theo.

Regarding your comment about cache, it doesn't really make any difference. Here is why:

1. UIO driver disables caching on memory it exposes to user-space. It's a sensible approach since most of the time you don't want cache coherency get in the way, if you care about efficiency you write custom driver anyway, and not use generic_uio.

2. Cache line on Zynq is 32 bytes wide, which is 8 4-byte registers. Assuming register 0 is cache aligned (has to be anyway), all 8 registers fit into 1 cache line. So even if you use custom driver that doesn't disable caching and use cache-cherent interface to the PL, using registers 0,1 vs 4,6 won't make a difference throughput-wise.

Re: Xilinx Webpack 2015.4 HLx edition

PostPosted: Tue May 31, 2016 11:00 am
by theover
I didn't update my Linux to the latest, but I did download the latest vivado_hl, and will try if it still can create a module I connect to the existing project like in this thread.

Maybe it will be interesting to try simple things like creating a memory module in the FPGA (in C that should be easy) that can be read from a Zynq/ARM C program, and some other types of AXI access to FPGA that might come in handy. Also it would be interesting to make a C program able to communicate with user pins, I've not done that yet. There are examples with the Vivado HLx version that tackle all kinds of C programs that compile proper, and the latest version, which I logged in some other thread about that I tried, does good C to Verilog simulation comparisons that appear to work in the free version.

Did anyone else get into this subject, or is it considered to spooky to mess with C-to-FPGA compilation ?

I know the Verilog it creates is sketchy, contains lots of unoverseeable assignments (at least the previous vivado_hl did), isn't efficient in every way, and even using one trigonometric function can create more HDL than fits in my 7010 board, but still, it's interesting to after the times of Silicon Valley and the Mead & Conway book finally be able to program a sea of gates using an actual silicon compiler. It should even be easy to put the whole Parallella project FPGA in C, at least as a lot shorter program than the HDL defs...