by theover » Mon Jan 02, 2017 8:25 pm
How about efficiency ? I mean bandwidth and nomenclature here aren't necessarily optimal: it's DMA that transfers data from a memory or device to memory or device, but it has to flow over the Epiphany network, the FPGA serializer blocks as well as the Epiphany memory-to-network unit and the (most appropriately so called) ARM Zynq processor DMA unit.
I get it that the way to access data from a large in memory data set as columns or rows in a fixed sized array holds some interest, but all these operations involved aren't the same tradeoffs as a "regular" DMA engine, where it's about memory banks and pages, startup time, possibly the data elements size, and the provisions of the cache(s) that can accelerate certain parts of the data access.
Purely as a tool to instruct the Parallella's infrastructure to feed you a stride vector, fine, of course.
Theo V.