Page 2 of 3

Re: Occam Programming Language for Parallella?

PostPosted: Sat Apr 27, 2013 1:32 pm
by mhonman
Thanks Andreas, that's encouraging info. IMO message-passing makes for a very clean software design.

There is something (probably not for this thread) that has left me puzzled - how to tackle CFD-type problems in the 32KB of memory that is available to each core. From my distant recollections of the Transputer work we did, there was a per-processor overhead of about 128K - mostly "payload" code compiled from Fortran, but also message-passing buffers.

But that's another question for the day that a Parallella board lands in my hands, with a suitably interesting problem to solve!

Mark

Re: Occam Programming Language for Parallella?

PostPosted: Sun May 12, 2013 1:02 am
by voidptr
aolofsson wrote:Some folks at Halmstad University in Sweden has already used a mix of Occam and C for programming the Epiphany.
Andreas


:-) noob here ...

many many years ago Inmos T800 OCCAM and CSP were so awesome !

But now Parallella will be so more awesome !!!

OCCAM is a must
Is there some plan to have CSP implemented also ?
It will be so nice !

now I need money to buy boards :o)

Re: Occam Programming Language for Parallella?

PostPosted: Thu Jul 25, 2013 6:56 pm
by Dr.BeauWebber
I fully agree, having Occam to supply the communication harness is exactly what I would like to see on the Epiphany.
The main problem is that Occam has a model of communicating channels to join the processing nodes together, and that is something that the Epiphany does not yet have.

I have recently ported the Transterpreter version of the University of Kent Kroc Occam to the Rasperry Pi, as a first stage model of porting it to the Parallella - works fine. (A really large chunk of source and tool-chain needed, though).

However the real hold-up for Occam on the Epiphany is the need for the communicating channels - doable, just needs someone who knows what they are doing !
cheers,
Beau Webber

Re: Occam Programming Language for Parallella?

PostPosted: Thu Jul 25, 2013 8:46 pm
by mhonman
I wouldn't say I know what I'm doing, but do have some ideas for very crude channel and PAR functionality. If a single process is mapped onto a Parallella core, there is no need to implement a scheduler, and the channel primitive can be based on busy-wait synchronisation in the same mould as the e-barrier routine provided by the SDK.

Is there an internet-accessible Parallella board on which I can give this a try?

For a more general solution it looks like the communication would need to be based on master-slave DMA - the documentation is not at all clear how that will behave if more than one master is simultaneously attempting to communication with a particular slave.

BTW a feature request for future versions of the Epiphany chip - for grid-partitioned parallel solvers, there is usually a 5-point stencil, i.e. data must be exchanged with 4 neighbours. Thus if the mesh network can handle 4 parallel transfers to adjacent cores, it would be useful to have 4 DMA engines available to perform those transfers concurrently.

Mark

Re: Occam Programming Language for Parallella?

PostPosted: Fri Jul 26, 2013 7:23 am
by Sundance_Parallella
Hi Mark,

I have a board that I can give you access to if that would help?
I shall send you the details via PM shortly.

Regards,
Ben

Re: Occam Programming Language for Parallella?

PostPosted: Fri Jul 26, 2013 7:28 am
by timpart
mhonman wrote:I wouldn't say I know what I'm doing, but do have some ideas for very crude channel and PAR functionality. If a single process is mapped onto a Parallella core, there is no need to implement a scheduler, and the channel primitive can be based on busy-wait synchronisation in the same mould as the e-barrier routine provided by the SDK.

Is there an internet-accessible Parallella board on which I can give this a try?

Yes two groups have kindly made theirs available I think

mhonman wrote:For a more general solution it looks like the communication would need to be based on master-slave DMA - the documentation is not at all clear how that will behave if more than one master is simultaneously attempting to communication with a particular slave.


My understanding is that the slave just looks for writes to one of its registers and stores the data in the next memory location when it gets one. If two masters attempted this at the same time the two streams would be interleaved in an arbitrary way. I think a mutex on the receeiving core is needed.

mhonman wrote:BTW a feature request for future versions of the Epiphany chip - for grid-partitioned parallel solvers, there is usually a 5-point stencil, i.e. data must be exchanged with 4 neighbours. Thus if the mesh network can handle 4 parallel transfers to adjacent cores, it would be useful to have 4 DMA engines available to perform those transfers concurrently.


If I remember the Transputer correctly each core has a link with its neighbours. The Epiphany works differently. There is a grid network which exchanges data and each core has a link onto it. The link can only exchange a double word (both onto and from network) in each clock cycle. So with the current design extra DMAs don't help, the bottleneck would be the route to the network.

Several people expressed a need for messaging support at the "Preparing for Parallella" meet. One person told me he had some ideas for doing it. Perhaps we should start a separate topic somewhere. SDK forum perhaps?

Tim

Re: Occam Programming Language for Parallella?

PostPosted: Fri Jul 26, 2013 8:48 pm
by mhonman
First, thanks Ben I'll definitely take you up on that. Will probably aim to work in the early mornings.

timpart wrote:
My understanding is that the slave just looks for writes to one of its registers and stores the data in the next memory location when it gets one. If two masters attempted this at the same time the two streams would be interleaved in an arbitrary way. I think a mutex on the receiving core is needed.


Unfortunately e-mutex in its present form is a busy-wait loop round a test-and-set instruction. About the only thing I could spot that would deliver data and an accompanying interrupt to a remote core is master-slave DMA. However the behaviour you have described might yet come in useful - can't see why it should not be atomic at the word or doubleword level. So perhaps it could be used to advise the peer that a channel has become ready.

timpart wrote:
mhonman wrote:BTW a feature request for future versions of the Epiphany chip - for grid-partitioned parallel solvers, there is usually a 5-point stencil, i.e. data must be exchanged with 4 neighbours. Thus if the mesh network can handle 4 parallel transfers to adjacent cores, it would be useful to have 4 DMA engines available to perform those transfers concurrently.


If I remember the Transputer correctly each core has a link with its neighbours. The Epiphany works differently. There is a grid network which exchanges data and each core has a link onto it. The link can only exchange a double word (both onto and from network) in each clock cycle. So with the current design extra DMAs don't help, the bottleneck would be the route to the network.


You're right - I jumped to conclusions when I saw the grid network. The good news is that since it is a doubleword per cycle the ratio of compute to communication speeds is about the same as the T800 (3 Mflops @ 20MHz, 4 x 2.4Mbps links (full duplex) => approx 40 bits in and out per flop).

timpart wrote:Several people expressed a need for messaging support at the "Preparing for Parallella" meet. One person told me he had some ideas for doing it. Perhaps we should start a separate topic somewhere. SDK forum perhaps?


I was probably one of them! That's a good suggestion about a specific topic because as someone (Roger?) said at the meet, it is better to get started with the most valuable parts of CSP, message-passing being the first of them.

Mark

Re: Occam Programming Language for Parallella?

PostPosted: Fri Jul 26, 2013 9:31 pm
by ysapir
FWIW:

Unfortunately e-mutex in its present form is a busy-wait loop round a test-and-set instruction


The e_mutex_lock() works this way. If you need a non-blocking operation, use e_mutex_trylock().

About the only thing I could spot that would deliver data and an accompanying interrupt to a remote core is master-slave DMA


Remember that:

1. When using DMA, you could use the E_DMA_MSGMODE mode to trigger the E_MESSAGE_INT event on the destination core.

2. Similarly, you could use e_write() followed by e_irq_set() to get the same effect. The E_USER_INT irq type may be useful here.

Re: Occam Programming Language for Parallella?

PostPosted: Sun Jul 28, 2013 8:37 pm
by timpart
ysapir wrote:FWIW:

1. When using DMA, you could use the E_DMA_MSGMODE mode to trigger the E_MESSAGE_INT event on the destination core.

2. Similarly, you could use e_write() followed by e_irq_set() to get the same effect. The E_USER_INT irq type may be useful here.


I'm having difficulty finding documentation about these. The Arch Ref manual 3 and 4 don't mention DMA triggering interrupt on destination core. (It mentions interrupt but doesn't say where.Only local calls mentioned in interrupt table.) From the position of the Message interrupt is it the reserved priority 5 one? Is the user interrupt the reserved priority 8 one or an alias for software interrupt? The latter isn't mentioned on p50 of SDK ref manual 5.13.07.10

Tim

Re: Occam Programming Language for Parallella?

PostPosted: Sun Jul 28, 2013 9:18 pm
by aolofsson
I am afraid that Yaniv let the cat out of the bag a little early. :D We are just now getting around to test some of these experimental features (two years old by now!) and I didn't want to release the spec until they were completely tested.

The message interrupt is indeed #5. Interrupt #8 is another secret one that we will hopefully disclosed by the middle of August. Interrupt #9 is the user interrupt.

Andreas