Everyone,
Great discussion and I will try to feedback some of the questions into the manual!
More eMesh design details:(much of this was already correctly guessed in the thread, but I figure it can't hurt to repeat it?)
Arbitration:
On every outgoing link (south,east,west,north) there is basically a 4 to one mux (one entry for the traffic from the core at that node) that compete for the outgoing link. The arbitration scheme is round robin. Fixed priority arbitration refers to everything that happens inside the eCore or at the interface between the eCore and the eMesh.
Routing:
Always forward traveling along the row first before traveling in the column direction.
Buffering:
There is a register at each output and each input to the mesh node. Each node is clocked on the opposite clock edge of the nodes around it.
Flow Control:
There are "wait" signals going in each direction that push back at the same pace that the buffers fill up, so there are never lost packets. The flow control propagates throughout the network and of course extends all the way back to any master generating traffic on the eMesh (including the off-chip links, DMAs, LDR/STR)
Mesh Interaction:
Note that there are three completely independent meshes (read requests, on-chip write, and off-chip write). These meshes interact only in one place, at the "network interface" between the eMesh and the eCore. The arbitration priority should be described in the priority table 6.3?
Deterministic Latencies:
The intent of the sentence on page 27 was to explain that if the application can guarantee that there is never round robin arbitration losers, then you would have deterministic delays and bandwidths. For some examples of latency and bandwidth tests that we have run:
https://github.com/adapteva/epiphany-examplesBlocking:
The mesh network is truly bidirectional and only "blocks" in the direction of the "losers" in the round robin arbitration. ie. If core, west, north, and south all want to exit to the east and core wins the round robin arbitration, then west and north will be told to wait.
Off-chip links:
As notzed pointed out, the arbitration is such that the bandwidth allocation becomes unfair in some scenarios. Something that may not be clear is that the current implementation of the eLink only reaches peak bandwidth when there is a stream of write transactions with addresses incrementing by 8 (ie. 0x0,0x8,0x10, etc). If there is random interleaving of transactions, the effective off-chip bandwidth drops to less than 1/3 of peak. For off chip access, it's definitely recommended to use some kind of token mechanism (for now) to get best performance. This would take care of the unfair arbitration mechanism as well.
Andreas