Speed comparison between Epiphany and ARM NEON

Posted:
Mon Oct 14, 2013 4:12 pm
by ati
Hi!
Can you tell me the speed relation between a handcoded ARM NEON code and the appropriate Epiphany code?
Assuming there are calculations on 16/32 bit integers and/or floating point numbers. For example fft, jpeg decoding etc.
In other words, how many ARM cores could be equal to the 16 Epiphany cores in these situations?
Another question: (where) can I find an assembly instruction set reference manual for the Epiphany processor?
Thanks in advance,
Ati
Re: Speed comparison between Epiphany and ARM NEON

Posted:
Tue Oct 15, 2013 4:26 pm
by theover
If there's a known computation/application that's being compared, the programs have been properly optimized, and there's attention for the power-draw, there's a definitive answer for that application.
I can encode high quality 28Mbit/s h264/mp4 with a camera drawing just a few watts. The MIPS/FLOPS power of my I7 extreme running 12 threads at 4.5 GHz with GTX770 Cuda parallel processing can be seriously huge: but it will draw hundreds of Wats. So there are three main comparison criteria: can a certain problem be programmed on either type of processor, can it be done quick (and also: can the programming be done quick, or is optimizing a hell of a job done by once by the manufacturer for instance), and is it resource efficient (how many chips, how many mm2 on the chips, how much power for starting up and activating the computations, how much memory is used, etc).
If I recall correctly NEON is a parallel machine with use in for instance video encoding which is essentially a form of a Single Instruction Multiple Data processing, probably with some local memory, and it can make use of the memory connections of the ARM processor, probably including it's cache.
The Epiphany chip unless I'm very mistaking has a number of independent Reduced Instruction Set Cores on a shared network, and some local memory, and only an indirect connection to the main memory (off-chip via the network), and no special caching.
Those all properties are different. It may be hard to get the area and power efficiency of the NEON made by Xilinx, but OTOH there are some pretty good chip designers on board at Parallela I've understood, so I don't know what the score should be, apart form generalities: so the scaling should be better with the Epiphany if the network latency and bandwidth are good. it may well be vastly different efficiency predictions are possible!
T.V.