Parallella Community

by **aolofsson** » Mon Aug 03, 2015 9:13 pm

So your routine on a simple scalar Epiphany core at 600MHz runs 39% faster than FFTW running on a 4-way A9 core at 667MHz?
I'd say that's pretty darn impressive!

Andreas

by **tnt** » Tue Aug 04, 2015 7:43 am

Yeah, I'm pretty happy about it.

The big advantage of the epiphany in this case are:
- Large register file : except for fft data load / store, there is no memory access for temporary results. Despite having loop pipelining and processing 4 data per loop iteration (2 radix-2 ops in //), I only ever use registers, and even only the "caller saver" registers so I don't even need to save/restore them.
- BITR opcode : infinitely useful for this :p
- Easy to predict low level behavior: Because I can understand exactly how the CPU will execute stuff, I can tailor the operations manually much better. Optimizing for ARM (or even worse Intel) has so many rules to follow that I can't keep them all in my head ...

Next step will probably be to extend this for higher point FFTs using multiple cores. (The current one is local mem only, so you can do at most 2048 points, but more realistically 1024 when using double-buffering)

by **aolofsson** » Tue Aug 04, 2015 12:26 pm

That's great to hear! Look forward to your inputs in the following topic.

viewtopic.php?f=23&t=3127

Parallella Community

Very Fast Fourrier Transform

Very Fast Fourrier Transform

Re: Very Fast Fourrier Transform

Re: Very Fast Fourrier Transform

Re: Very Fast Fourrier Transform

Who is online