Parallella Community

by **piotr5** » Tue Dec 23, 2014 9:35 pm

just a summary of my ideas for the hardware side:

some dedicated chip should handle epiphany and its connectors.
on epiphany, addressing should become more complicated:
4GB addressable, 1GB local and 3 more get sent directly to connectors on parallella.
which is which should be mapable by os at upgrades, to allow various connection-topologies.
i.e. the dedicated controller chip should send dma-stuff to the other 3 parallellas depending on which GB is addressed.
some flash memory or internal ram should tell it which is which.
what would be nice though is if the choice of directions could be made in the epiphany chip, the other chip just remaps the choice.
then north east and west and south would go to memory of respective parallellas.
for example south could be redirected to network or usb or sound or some data-region for hdmi-display.
i.e. instead of adding a whole fgpa just let the chip control where which signals go...

so, what I wish is 3 connectors, a chip controlling path of signals, and epiphany should send data in 3 directions if outside of 1GB:
south goes what's above 3GB, between 2GB and 3GB goes west. east is expected to be right after 1GB.
maybe you need to increase ram size of epiphany cores, to make >2GB actually addressable though...

by **aolofsson** » Wed Dec 31, 2014 1:43 pm

Most of what you are asking for is possible with the current Parallella board, the new emmu block and a memory interface inside the Zynq FPGA. The investment needed would be to spin a daughter card and to time needed to write the FPGA logic.

Good luck!

Andreas

by **piotr5** » Thu Jan 01, 2015 8:42 am

thanks for this answer. any pointers on what you mean with emmu block? is that another word for eLink east?
as for memory-interface, I'm assuming you refer only to my idea for addressing memory through another epiphany.
or is it possible to bypass axi bus through fpga and thereby solve the problem of slow memory access from epiphany?
what about east connection of epiphany? haven't seen mentioned it's connected to anything, although reference manual says it is.
any possibility to plug it into another parallella, with current board design? (if not, local mem access would block communication to 3rd parallella. for example when epiphany is communicating through network directly, assuming you program fpga in a way to emulate another e-core by sending signals to the cable, then accessing memory simultaniously wont work!)

did I miss some new technical documents? so far I haven't seen any description how to tell epiphany there are virtual ecores, east or west. nor have I seen a way how to communicate how rMesh/xMesh should route data. there are 6+6 bit for addressing 4096 eCores, so 128MB are accessible through eMesh protocol, correct? that's clearly below 4GB. so how can I tell eMesh where those 128MB are located (from point of view of eCores) and how to route addresses beyond that region?

by **9600** » Thu Jan 01, 2015 5:19 pm

by **piotr5** » Sat Jun 13, 2015 9:59 pm

so, as I understood, the only work of this epiphany-mmu in fpga is to translate a hand-ful of consecutive e-core positions into actual physical memory. well, my request here wasn't about the memory-stuff, I rather tried to ignite some inspiration for a new parallella board. let's call it the scaleable parallella.

the idea is: it would have only network connector for debugging, no usb nor hdmi, maybe not even a sd disc. (I'm thinking of net-boot and booting from another epiphany.) there would still be zync and ram, but the memory-structure would differ slightly between parallella and epiphany. scaleable parallella must have a 3rd eLink connector and this one should feature a counter-clockwise rotation of address-space. i.e. the message goes to the column with lower number, as it exits the epiphany the destination address must be rotated around the original address. i.e. it will seem the message which previously went to the west will go to the south, and conversely when the message travels in the other direction. the coordinates of the epiphany it just left stay roughly the same, although they too are rotated. this way, addressing something way down to the west will move on towards the south. and messages for something northwest will again leave through the west-eLink of the next parallella. the scaleable connector in the other thread could then keep the current parallella in a fixed position, or maybe this functionality should be built-in too if such a connector turns out to be too difficult to implement cheaply.

the usecase is again my tree-like cyclic network, but in the center there is an ordinary parallella desktop or embedded. on the software side, fpga should map the current epiphany into local ram, the neighbouring ones into the same space but with an offset -- hopefully next generations of epiphany will use only 256K or less. the ram shadowed by such an assignment can then be used as cache memory for the 1G RAM on the neighbouring parallellas. since each epiphany can only transfer as much as it has memory (32K currently) this will then be the size of each cache-page being pre-buffered. one such page probably must be sacrificed for storing each page's address. the goal is to have all 3 neighbours and the own memory as a flat 4G virtual memory where the current parallella and the neighbour agree on the 2G they have stored. even the epiphany of the neighbours is visible in that 4G, so that you can initialize a DMA-transfer from the neighbour's neighbour's neighbour. writes to the cache are dispatched immediately, also reads happen when the memory is requested. but then, maybe that part of the memory-logic should be configurable at runtime...

an interesting addition to such a design would be connectors between 1 fpga and the fpga of the 3 neighbours. i.e. somewhere you could plug-in 3 connectors. without hdmi or usb this should be quite a good connection, maybe even better fit for feeding the cache.

I still need to think a bit about this design, but I guess the west-connector would be used for the root-direction of the tree. i.e. to send a message down the tree towards the leafs, you go north or south, there's 8 parallellas in each of those directions. when the message goes back to the root, you'd send it northwest at each hop. afterall, the usb and main sd card are in the root. but maybe there is some design which offers a communication with fewer hops. afterall you can reach a total of 8x8=64 parallella-boards just by going north-west, plus the 3x8=24 boards accessible in north, south and west, 89 parallellas live on the fast lane...

I should emphasize, in my design the memory as seen from the point of view of epiphany is: 1.875G to the east, 900M to the south-west (located on the western neighbour), and epiphany cores to the south and north and west and northwest. seen from arm there is nearly the whole 4G of main memory available, from this parallella and its neighbours, but only 10x16 epiphany cores are visible, those on the neighbouring epiphany chips and the current one. therefore my design makes sense when you have more than 10 scaleable parallellas. maybe replacing fpga by something fixed could reduce production-costs so that people actually can afford those 89 parallellas for which it actually scales up nicely?

next step after the design is actually implementing my idea, just with fpga instead of the west-connection. before going into production this needs thorough testing. as was said in this thread, such testing requires no new boards. so my request is someone implements it and tests it with a few parallellas. for example there exists no raid driver combining multiple sd-cards located on parallellas connected through eLink. just imagine, 2 other sd-cards at 2G/s each, in addition to the local one. and then maybe another such sd-card accessible through fpga connected to north-eLink...

as for casing, that's complicated. there's no case fit for storing a tree-topology, even less when the tree loops through its leafs. but I suspect the case should be a triangle, connectors on edge or corner, flexibly tilted up or down...

Parallella Community

new 16 core parallella for scalability

new 16 core parallella for scalability

Re: new 16 core parallella for scalability

Re: new 16 core parallella for scalability

Re: new 16 core parallella for scalability

Re: new 16 core parallella for scalability

Who is online