Parallella Community

by **mhonman** » Mon Oct 07, 2013 8:45 pm

That is quite a workable abstraction. Do you envisage connections being one-to-one, or many-to-one (i.e. "channel" or "client-server")?

by **Gravis** » Mon Oct 07, 2013 10:35 pm

by **mhonman** » Wed Oct 09, 2013 6:25 pm

I'm hoping I've got the wrong end of the stick here, but my understanding from earlier discussions is that when many cores attempt DMA to a single target, their writes make be interleaved. It that is true then the DMA needs to be atomic - max 64 bits I guess?

So the receiver would have one DMA channel in permanent slave mode, receiving "requests to communicate" from other cores. These can then be dispatched in an interrupt-driven mode.

Using the most general case where is making an RPC to B - i.e. it's a send-and-receive operation.

A[DMA 1]->B[DMA 0]: RPC request (coreid A, target, slot, result buffer) // max 64 bits
B[DMA 1]->A[DMA 0]: RPC request acknowledge (coreid B, target, slot, request buffer) // max 64 bits
A[DMA 1]->B[RAM]: request parameters placed in buffer in B's internal RAM
.... time passes ....
---- (hopefully nothing goes bump in the night) ....
B[DMA 1]->A[RAM]: results placed in buffer in A's internal RAM
B[DMA 1]->A[DMA 0]: RPC request complete (coreid B, target, slot, result buffer) // max 64 bits

There are other handshakes that would take fewer steps, however the above protocol avoids read operations.
Some precautions to prevent buffer overrun would probably be appreciated by users of such an API.

There is also the tradeoff between DMA setup time and read and write costs to consider. i.e. if writing only a couple of words it is probably cheaper to do that programmatically than take the route of setting up DMA registers. For reads I really don't know - that'll be something interesting to test.

by **notzed** » Fri Oct 18, 2013 12:12 am

An on-mesh write is the same whether it came from the cpu or the dma engine, so for small cpu-synchronous writes there's no need to use DMA.

I'm a little puzzled as to what one would use a synchronous rpc call for though - by definition it leaves one end completely idle and thus wastes a whole core for the whole of the processing time + the latency.

by **Gravis** » Fri Oct 18, 2013 3:40 am

by **mhonman** » Fri Oct 18, 2013 1:18 pm

by **notzed** » Sat Oct 19, 2013 3:27 am

by **mhonman** » Sun Oct 20, 2013 8:06 pm

by **notzed** » Mon Oct 21, 2013 8:25 am

Parallella Community

manycore programming style and intercore communications

manycore programming style and intercore communications

Re: manycore programming style and intercore communications

Re: manycore programming style and intercore communications

Re: manycore programming style and intercore communications

Re: manycore programming style and intercore communications

Re: manycore programming style and intercore communications

Re: manycore programming style and intercore communications

Re: manycore programming style and intercore communications

Re: manycore programming style and intercore communications

Re: manycore programming style and intercore communications

Who is online