by mhonman » Wed Oct 09, 2013 6:25 pm
I'm hoping I've got the wrong end of the stick here, but my understanding from earlier discussions is that when many cores attempt DMA to a single target, their writes make be interleaved. It that is true then the DMA needs to be atomic - max 64 bits I guess?
So the receiver would have one DMA channel in permanent slave mode, receiving "requests to communicate" from other cores. These can then be dispatched in an interrupt-driven mode.
Using the most general case where is making an RPC to B - i.e. it's a send-and-receive operation.
A[DMA 1]->B[DMA 0]: RPC request (coreid A, target, slot, result buffer) // max 64 bits
B[DMA 1]->A[DMA 0]: RPC request acknowledge (coreid B, target, slot, request buffer) // max 64 bits
A[DMA 1]->B[RAM]: request parameters placed in buffer in B's internal RAM
.... time passes ....
---- (hopefully nothing goes bump in the night) ....
B[DMA 1]->A[RAM]: results placed in buffer in A's internal RAM
B[DMA 1]->A[DMA 0]: RPC request complete (coreid B, target, slot, result buffer) // max 64 bits
There are other handshakes that would take fewer steps, however the above protocol avoids read operations.
Some precautions to prevent buffer overrun would probably be appreciated by users of such an API.
There is also the tradeoff between DMA setup time and read and write costs to consider. i.e. if writing only a couple of words it is probably cheaper to do that programmatically than take the route of setting up DMA registers. For reads I really don't know - that'll be something interesting to test.