I would like to implement a simple gemm (single and double precision) as a benchmark on parallella. In order to be portable, the implementation must be on OpenCL. It is my first experience with parallella and the initial results are very frustrating. Some ideas for achieve better results?
Here are the source:
https://www.dropbox.com/s/1sv8a4lu4uio1 ... ar.gz?dl=0
https://www.dropbox.com/s/vfc04w1gz6t3l ... ar.gz?dl=0
Both were builded with:
cc -I/usr/local/browndeer_new/include -o gemm_OpenCL gemm_OpenCL.c -L/usr/local/browndeer_new/lib -lcoprthr_opencl -O3 -lm -fopenmp -std=c99