Hi there,
I am trying to build a custom barrier for the Epiphany PEs. In order to test the performance of my barrier I run some tests and compared it with the default e_barrier. I visited the site with the implementation for the e_barrier:
and copied the code of 'e_barrier' so I would have a starting point for my implementation. I named this implementation 'my_barrier'.
I noticed that the e_barrier function which exists in libe-lib.a (e_mutex_barrier.o) has 5 times better performance (number of cycles) compared to my implementation taken from the link above. Notice that I use the same barrier vars and the default e_barrier_init function.
Where does this difference in performance come from? Are special flags used when compiling the e_mutex_barrier.c file in order to create libe-lib.a?
In my Makefile I use the follwing flags: -O3 -funroll-loops -ffast-math.