Small update ...
I had a pile of code to sew together in order to make a (more) complete implementation and i finally sat down to it tonight.
I'm just using the cpu to do the scaling and sat generation synchronously and then using the epiphany to do the detection. I'm using 5 resample scales from 1/2 to 1/4 inclusive and a 512x512 source and probing every location.
single-core arm only is about 600ms, using one epiphany core for the detection stage is a little bit faster, 4 cores is about 2x faster but by then it's spending 60% or so just on the scaling/sat tables.
So I'll have to move more onto the epu before there's anything worth talking about. It's an opportunity to play with pipelined processing anyway but I think I will come up against sdk limitations.