I'm not sure why you have errors. I believe the threaded MPI COPRTHR beta download should be used rather than the COPRTHR source on github.
The kernel already has e_dma_copy in it for off-chip DMAs (bringing data in and writing out results). The on-chip MPI_Sendrecv_replace routine in the n-body code uses the DMA engine on the back end, although that's an implementation detail the application developer doesn't need to know. Regardless of the copy method, the n-body algorithm is not communication bound for large problems so that even if a more efficient copy method were written, it would not increase performance significantly (less than 5%).
I see you have some OpenMPI stuff installed. OpenMPI is different than COPRTHR threaded MPI. OpenMPI is strictly for the ARM host cores, not the Epiphany cores.