You do not measure the time to run your program on the Epiphany.
Instead, you measure the time it takes to copy your program into the Epiphany. Of course, copying the code into 16 cores takes about 16x as long as copying it to one core only, since there is 16x as much data to copy.
You cannot measure the Epiphany execution time on the host directly; usually, you keep a flag in shared memory which you poll in a loop. Then you measure the time from after the e_load_group() until this flag is set by your core.