There's something to bear in mind - these are all "guaranteed never to exceed" performance figures (actually for the Epiphany E16G03 it's 1.4GFlops/core and $3/GFlop).
Application overheads - often related to the memory hierarchy and/or communication structure - can only reduce performance from these levels.
Now the realities surrounding these parts:
RK3188
- very high volume product for mobile devices - development costs are spread over millions of units sold
- BUT there is no high-speed inter-chip communication capability. If you want more performance than you can get from one RK3188, you're out of luck
TI 66AH2H12
- medium volume (but getting better now that HP are selling them in servers)
- very expensive if you want a board with one, somewhere north of $1000 IIRC
- lots of on-chip memory and/or cache - 18MB vs 512KB for RK3188 & E16G03
- lots of parallelism in each DSP core, but _very_ hard to exploit due to VLIW architecture (here's a paper by some experts:
http://www.cs.utexas.edu/users/flame/pubs/FLAWN61.pdf). Can't just recompile and enjoy the performance!
- It has a proper communication fabric (SRIO) - so can go multi-chip but SRIO has quite power-hungry and has relatively high latency
- BTW this part also has 4 ARM cores each with a NEON unit, so the overall performance is even higher than you quoted
E16G03
- small volume - hence high overhead costs/chip
- nice cheap board available - Parallella is an absolute bargain.
- good on-chip and between-chip network, low latency is very nice for parallel programming
- that's a bit academic because AFAIK no-one has made a nice 2D mesh-based Epiphany system!
- applicability is limited by small per-core RAM & need to be partnered with an FPGA.
Having compared the TI C6678 against Epiphany for matrix multiplication - without using the tricks in the paper referenced above - the price/performance comparison is not in Adapteva's favour. How I wish it were different...
However if you have an application that has "perfect" data locality (I think someone mentioned neural networks as an example), simple logic, and the need for > 20GFlops with low power consumption - there's not much around that can compete with the Epiphany.