How about computing a cosine in FPGA

Using Zynq Programmable Logic and Xilinx tools to create custom board configurations

How about computing a cosine in FPGA

Postby theover » Tue Jun 14, 2016 2:30 pm

This is not a question thread, just a result I thought others might find encouraging. Using the Latest (2016.1) Vivado_HLS (wenpack version), I successfully was able to take a cosine function (cosf()) with single precision floating point angle input and single precision output, get it through the automated C-to-Verilog tools chain. Including a AXI-lite interface, that compiled to this amount of FPGA space on my 7010 board

LUT 2163 17600 12.289773
LUTRAM 151 6000 2.5166667
FF 1838 35200 5.221591
DSP 9 80 11.25
IO 53 100 52.999996
BUFG 1 32 3.125

and I made a test program working with the project from Kirrill (see viewtopic.php?f=51&t=3297 ) which runs on the Parallella (at least the Zynq part of it) and communicates with the FPGA accelerator function.

The accelerator function put into Vivado_hls is this:
Code: Select all
#include <inttypes.h>
#include "Testmul1/cpp_math.h"

#define PI 3.1415927

data_t cpp_math(data_t angle) {
    data_t c = cosf(angle);
    return c;
}


void my_mult_hlx(uint32_t const * in, uint32_t * out)
{
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE s_axilite register port=out
#pragma HLS INTERFACE s_axilite register port=in
   union {
      uint32_t ui;
      data_t   d;
   } un1, un2;
   int a,b;

   un1.ui=*in;

   un2.d = cpp_math(un1.d);
    *out = un2.ui;
}


After creating a kernel support and loading in the final .bit(.bin) file, this is the test program:

Code: Select all
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <inttypes.h>
#include <math.h>

const size_t IN  = 0x10/4;
const size_t OUT = 0x18/4;

/* compile with   gcc -o uio_mult_Csin_ar -std=c99 -Wall -g uio_mult_Csin_ar.c -lm */
/* compares cosine computations from ARMs and FPGA project */

int main(int argc, char *argv[])
{
    float af, bf, control, maxdiff=0.0;
    int i;
    void *t;
    size_t len = 1<<(10+2); //4K

    int fd = open("/dev/uio0", O_SYNC|O_RDWR);

    if ( fd < 0 )
    {
   perror("Failed to open /dev/uio0\n"
          "  Does it exists?\n"
          "  Check permissions\n"
          "  Check devicetree\n");
   return -1;
    }

    void *mem = mmap(NULL, len
           , PROT_READ | PROT_WRITE, MAP_SHARED, fd
           , 0);

    if (mem == MAP_FAILED) {
   perror("Can't map memory");
   close(fd);
   return -2;
    }

    volatile uint32_t * p = (uint32_t*)mem;

    printf("Initial memory state:\n");
    printf("%08x %08x ", p[IN], p[OUT]);
    printf("\n");


    for(i=-1000000;i<=1000000;i++)
    {

        af = 2.0*3.1415926535*((float) i)/1000000.0;

   t = (void *) &af;
   p[IN] = *((uint32_t *) t);

   t = (void *) &p[OUT];     /* add reading time */
   bf = *((float *) t);
   t = (void *) &p[OUT];
   bf = *((float *) t);
   t = (void *) &p[OUT];     /* add reading time */
   bf = *((float *) t);
   t = (void *) &p[OUT];
   bf = *((float *) t);
   t = (void *) &p[OUT];     /* add reading time */
   bf = *((float *) t);
   t = (void *) &p[OUT];
   bf = *((float *) t);
   t = (void *) &p[OUT];
   bf = *((float *) t);
   t = (void *) &p[OUT];
   bf = *((float *) t);
   t = (void *) &p[OUT];
   bf = *((float *) t);

   t = (void *) &p[OUT];
   bf = *((float *) t);
   control =  cosf( af );
   if (maxdiff < fabsf(bf-control)) maxdiff = fabsf(bf-control);

   if ((i%40000) == 0) {
      printf("#% 08d in=% 05f  (%08x=>%08x) % 5f expect: % 5f (diff=% e max % e)\n",
              i, af,
              p[IN], *((unsigned int *) t), bf,  control, bf-control, maxdiff
      );
        }
    }

    printf("Final memory state:\n");
    for(size_t i = 0; i < 4; ++i) {printf("%08x ", p[IN+i]);}
    printf("\n");

    //cleanup
    munmap(mem, len);
    close(fd);

    return 0;
}


And it works, this is the output from the 7010 board:
Code: Select all
parallella@parallella:~/Kir/test_app$ time ./uio_mult_Csin_ar
Initial memory state:
40c90fdb 3f7fffff
#-1000000 in=-6.283185  (c0c90fdb=>3f7fffff)  1.000000 expect:  1.000000 (diff=-5.960464e-08 max  5.960464e-08)
#-0960000 in=-6.031858  (c0c104fb=>3f77f511)  0.968583 expect:  0.968583 (diff= 0.000000e+00 max  5.960464e-08)
....
#-0120000 in=-0.753982  (bf4104fb=>3f3a9db0)  0.728969 expect:  0.728969 (diff= 0.000000e+00 max  1.192093e-07)
#-0080000 in=-0.502655  (bf00adfd=>3f6055a2)  0.876307 expect:  0.876307 (diff=-5.960464e-08 max  1.192093e-07)
#-0040000 in=-0.251327  (be80adfd=>3f77f511)  0.968583 expect:  0.968583 (diff= 0.000000e+00 max  1.192093e-07)
# 0000000 in= 0.000000  (00000000=>3f800000)  1.000000 expect:  1.000000 (diff= 0.000000e+00 max  1.192093e-07)
# 0040000 in= 0.251327  (3e80adfd=>3f77f511)  0.968583 expect:  0.968583 (diff= 0.000000e+00 max  1.192093e-07)
# 0080000 in= 0.502655  (3f00adfd=>3f6055a2)  0.876307 expect:  0.876307 (diff=-5.960464e-08 max  1.192093e-07)
# 0120000 in= 0.753982  (3f4104fb=>3f3a9db0)  0.728969 expect:  0.728969 (diff= 0.000000e+00 max  1.192093e-07)
# 0160000 in= 1.005310  (3f80adfd=>3f092bf1)  0.535827 expect:  0.535827 (diff= 0.000000e+00 max  1.192093e-07)
# 0200000 in= 1.256637  (3fa0d97c=>3e9e3778)  0.309017 expect:  0.309017 (diff=-2.980232e-08 max  1.192093e-07)
...
# 0840000 in= 5.277875  (40a8e45b=>3f092bee)  0.535827 expect:  0.535827 (diff=-5.960464e-08 max  1.192093e-07)
# 0880000 in= 5.529203  (40b0ef3b=>3f3a9daf)  0.728969 expect:  0.728969 (diff= 0.000000e+00 max  1.192093e-07)
# 0920000 in= 5.780530  (40b8fa1b=>3f6055a2)  0.876307 expect:  0.876307 (diff=-5.960464e-08 max  1.192093e-07)
# 0960000 in= 6.031858  (40c104fb=>3f77f511)  0.968583 expect:  0.968583 (diff= 0.000000e+00 max  1.192093e-07)
# 1000000 in= 6.283185  (40c90fdb=>3f7fffff)  1.000000 expect:  1.000000 (diff=-5.960464e-08 max  1.192093e-07)
Final memory state:
40c90fdb 00000000 3f7fffff 00000001

real   0m4.403s
user   0m4.390s
sys   0m0.000s


So this acceleration might wellbe useful for my audio projects!
theover
 
Posts: 174
Joined: Mon Dec 17, 2012 4:50 pm

Re: How about computing a cosine in FPGA

Postby theover » Wed Jun 15, 2016 10:27 pm

Today I tried making a (double) cos((double) angle) to fit in the Zynq FPGA, and at the expense of about half the LUTs, it fits, and (with similar program) appears to work to the accuracy of e-16 compared with Arm double precision math. It does about CD audio rate of computing double cosine values, but I read there are pragmas to do pipelining.

T.V.
theover
 
Posts: 174
Joined: Mon Dec 17, 2012 4:50 pm

Re: How about computing a cosine in FPGA

Postby theover » Tue Jun 21, 2016 2:38 pm

I've spent a few days studying the vivado_hls tutorial in :

ug871-vivado-high-level-synthesis-tutorial.pdf

and part of the manual in:

ug902-vivado-high-level-synthesis.pdf

And after trying some of the examples and having loaded a few examples in the existing Parallella project referenced above, it is safe to say there is serious potential in Vivado2016.1 + Vivado_HLS to get a C function to act as a piece of FPGA code that can also be connected up to the AXI (in my case "lite") bus to communicate with the Zynq's ARM cores.

The Tcl interpreter in Vivado can be extended with external code, the IP library is considerable and contains powerful blocks (like FFT, Cordic, memory constructs, etc. etc that in this version mostly are free to use for Webpack users ! That's a big thumbs up for Xilinx! The RTL simulator also works both in _hlx and the main vivado (though I don't know if it is possible to connect with the "logical analyze" block that is available from the IP lib), and the C code cross check compilation and C versus hardware compiled C compilation and simulation works smooth and efficient.

So after doing some of the tutorial examples in practice I want to try if, with the right directives, a C function can also be used to do user IO and connections with self-made IP blocks.

Anyhow, it's recommended for those into the Zynq as fast(-er) turnaround prototype machine.

T.
theover
 
Posts: 174
Joined: Mon Dec 17, 2012 4:50 pm

Re: How about computing a cosine in FPGA

Postby NeilKeiding » Tue Aug 09, 2016 3:52 pm

Hi...i am new here. I want yo ask you something about this "And after trying some of the examples and having loaded a few examples in the existing Parallella project referenced above, it is safe to say there is serious potential in Vivado2016.1 + Vivado_HLS to get a C function to act as a piece of FPGA code that can also be connected up to the AXI (in my case "lite") bus to communicate with the Zynq's ARM cores." Can you please tell me what examples did you used?
NeilKeiding
 
Posts: 1
Joined: Sun Aug 07, 2016 6:18 pm


Return to FPGA Design

Who is online

Users browsing this forum: No registered users and 3 guests

cron