Parallella Community

by **SKyd3R** » Wed Nov 15, 2017 3:09 pm

Hello there!

I'm doing some experiments with the behavior of the eMesh and its predictability:

When one core stores data at another's memory I got a predictable behavior. The eMesh can send data at a rate of 1,5 cycles per messages so if I store 1000 pieces of data, the delay until those pieces reach the target core is about 1500+ cycles. This is something expected regarding the documentation.

The magic happens when I have 2 cores sending messages to another core sharing at least one link of the eMesh. In this example I use the cores 0x0 and 0x1 as senders and the 0x3 as the receiver. The delay that takes the receiver to get the 1000 messages from both senders is different at the senders being 2226 cycles at the core 0x0 and 1738 at the core 0x1.

I don't understand why the cores don't get the same delay as the arbitration is round robin and why the delay is not something closer to the total amount of messages times the delay of the eMesh: (1000 + 1000) x 1,5

All the sending code is done with assembly code to avoid any expected behavior.

Any clue?

Thanks!

by **jar** » Wed Nov 15, 2017 5:29 pm

Hi!

Assumed core layout:
0x0 0x1 0x2 0x3
0x4 0x5 0x6 0x7
0x8 0x9 0xa 0xb
0xc 0xd 0xe 0xf

Cores 0x0, 0x1, and 0x3 are on the same row. Did you observe the behavior sending data to a core from a neighboring row? Say, core 0x1 and 0x4 both sending to 0x0?

Without delay, you expect cores 0x0 and 0x1 to each take ~1500 cycles. So they were delayed 726 and 238 cycles, respectively. The sum of the delays is suspiciously close to 1000 cycles, which is the additional number of messages sent.

I believe the receiving core should be able to receive messages at 1 cycle per message.

by **SKyd3R** » Thu Nov 16, 2017 12:05 pm

Hi!

Thanks a lot for your answer!

I reproduced that example and it gives the same timing.
0x1: 2230
0x4: 1742

I think that your point is that the messages got a delay of 1,5 cycle each but the delay received by the network traffic will add a delay of 1 cycle per message on shared resources. Could be logical but I still don't understand why the added delay is different between the message and the traffic around.

Considering you stance to be true it reminds the question of why there isn't the same result (or at least a closer one) on the delay for each core. Even when the distances is the same, like in the case you proposed, there is a core with a heavier delay that the other.

Parallella Community

eMesh behavior

eMesh behavior

Re: eMesh behavior

Re: eMesh behavior

Who is online