The Apl to C compiler aplc is now ported to the Parallella

Discussion about Parallella (and Epiphany) Software Development

Moderators: amylaar, jeremybennett, simoncook

The Apl to C compiler aplc is now ported to the Parallella

Postby Dr.BeauWebber » Tue May 28, 2013 2:14 am

The Apl to C compiler aplc is now ported to the Parallella.
Strictly this belongs on the Software => Parallel Programming section, but as yet there is not a section for the array processing language Apl.

What is still needed is the harness to link different processes on nodes.

The aplc library now running on parallella 1 node simulator :

Parallella epiphany c :
no: sleep microsleep nanosleep pipe dup ttyname
So I hacked dummy donothing replacement functions
- need to be replaced by specific e- functions or possibly occam structures.

$ aplcc helloworld.apl
$ ./e-aplcc aplc_helloworld.c
$ ../../INSTALL/bin/e-run a.out
Hello World from Aplc

In principle the whole compiler is ported, but everything has to be run on the simulator by calling e-run.

Here are some Apl matrix expressions that I have had running on Altera NiosII soft processors on field programmable gate arrays, as well as on the XMOS 4 processors on a chip devices, as well as now on a functional simulator for the 16 processor Parallella chip :

AplX Source code :
Function matrix :

'A:'
A ← ⍳ 5
A
' '
'B:'
B ← ⍳ 9
B
' '
'C: outer product:'
C ← A ∘.× B
C
' '
'T: transpose:'
T ← ⍉ C
T
' '
'O: inner product (matrix multiply):'
O ← C +.× T
O
' '
'S: solve matrix:'
S ← C ⌹ A
S

And here it is running on the NiosII soft-processor :

A:
1 2 3 4 5

B:
1 2 3 4 5 6 7 8 9

C: outer product:
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 12 16 20 24 28 32 36
5 10 15 20 25 30 35 40 45

T: transpose:
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
6 12 18 24 30
7 14 21 28 35
8 16 24 32 40
9 18 27 36 45

O: inner product (matrix multiply):
285 570 855 1140 1425
570 1140 1710 2280 2850
855 1710 2565 3420 4275
1140 2280 3420 4560 5700
1425 2850 4275 5700 7125

S: solve matrix:
1 2 3 4 5 6 7 8 9

The XMOS chip is integer only, and the extra code for doing the matrix multiply and matrix solve mean that it can only fit the A,B,C,T calculations into one core.

So far I have used the one node Parallella simulator for testing :
aplcc -c matrix.apl
./e-aplcc matrix.c
$ ../../INSTALL/bin/e-run a.out
A:
1 2 3 4 5

B:
1 2 3 4 5 6 7 8 9

C: outer product:
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 12 16 20 24 28 32 36
5 10 15 20 25 30 35 40 45

T: transpose:
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
6 12 18 24 30
7 14 21 28 35
8 16 24 32 40
9 18 27 36 45

O: inner product (matrix multiply):
285 570 855 1140 1425
570 1140 1710 2280 2850
855 1710 2565 3420 4275
1140 2280 3420 4560 5700
1425 2850 4275 5700 7125

S: solve matrix:
1 2 3 4 5 6 7 8 9

cheers,
Dr. Beau Webber
Director, Lab-Tools Ltd.
http://www.Lab-Tools.com
User avatar
Dr.BeauWebber
 
Posts: 114
Joined: Mon Dec 17, 2012 4:01 am
Location: England

Re: The Apl to C compiler aplc is now ported to the Parallel

Postby ysapir » Tue May 28, 2013 8:07 am

That's very nice. How does the C program (matrix.c) looks like? What is the size of the executable?

Do you always need standard output for using this language? Does it support low-level operations like memory referencing, etc, so one can use an APL program to read data, calculate and write a result to a shared buffer in memory, or is it bound to data hard-coded into the program? Then, can you share data among cores to support parallel processing?
User avatar
ysapir
 
Posts: 393
Joined: Tue Dec 11, 2012 7:05 pm

Re: The Apl to C compiler aplc is now ported to the Parallel

Postby Dr.BeauWebber » Tue May 28, 2013 10:15 am

Thanks Ysapir,
How does the C program (matrix.c) looks like? What is the size of the executable?

There is a lot of verbage in the .c code produced, as Apl is very particular about keeping track of types, data size and structure, etc. However the core sections of the code that do the array operations are fairly compact - as I always say, if you can do it better, it is simple enough to teach this to the compiler. Another thing that improves the code is declaring the variable and function types and structures.

matrix.apl : 242 bytes
matrix.c : 17 kByte Much of this is comments as to what it is doing annd why.
a.out : 1.31 Mbyte

A lot of the size of the run-time is direct copying in of very general run-time routines - this overhead does not go up much for much larger and more complex source code programs, and if wanted can be stripped back. I have had Aplc derived expressions running on 8-bit processors.

Do you always need standard output for using this language? Does it support low-level operations like memory referencing, etc, so one can use an APL program to read data, calculate and write a result to a shared buffer in memory, or is it bound to data hard-coded into the program?

No, standard in and standard out are not needed, the data can be read/written to files or pipes, though I have a large number of programs that act as filters, with data and commands piped in and data and status piped out.
The data is always held as Apl arrays of arbitrary size and shape, and indexing into these and manipulating arrays is where Apl is strongest :
Here is Apl indexing and manipulating numerical, text and mixed arrays :
Code: Select all
     
'A:'
A ← ⍳ 5
A
' '
'B:'
B ← ⍳ 9
B
' '
C ← A ∘.× B
      C[;3 5] ← 5 2 ⍴ 0
      C
1  2 0  4 0  6  7  8  9
2  4 0  8 0 12 14 16 18
3  6 0 12 0 18 21 24 27
4  8 0 16 0 24 28 32 36
5 10 0 20 0 30 35 40 45
     Txt ← 'hello'
      Txt
hello
      C[3;3 4 5 6 7] ← Txt
      C
1  2 0  4 0  6  7  8  9
2  4 0  8 0 12 14 16 18
3  6 h  e l  l  o 24 27
4  8 0 16 0 24 28 32 36
5 10 0 20 0 30 35 40 45

And here it is running on the Parallella simulator :
Code: Select all
A:
 1 2 3 4 5

B:
 1 2 3 4 5 6 7 8 9

C: outer product:
 1  2  3  4  5  6  7  8  9
 2  4  6  8 10 12 14 16 18
 3  6  9 12 15 18 21 24 27
 4  8 12 16 20 24 28 32 36
 5 10 15 20 25 30 35 40 45
 1  2 0  4 0  6  7  8  9
 2  4 0  8 0 12 14 16 18
 3  6 0 12 0 18 21 24 27
 4  8 0 16 0 24 28 32 36
 5 10 0 20 0 30 35 40 45
hello
+--+---+--+---+--+---+---+---+---+
| 1| 2 | 0| 4 | 0| 6 | 7 | 8 | 9 |
+--+---+--+---+--+---+---+---+---+
| 2| 4 | 0| 8 | 0| 12| 14| 16| 18|
+--+---+--+---+--+---+---+---+---+
| 3| 6 |h |e  |l |l  |o  | 24| 27|
+--+---+--+---+--+---+---+---+---+
| 4| 8 | 0| 16| 0| 24| 28| 32| 36|
+--+---+--+---+--+---+---+---+---+
| 5| 10| 0| 20| 0| 30| 35| 40| 45|
+--+---+--+---+--+---+---+---+---+

Now it is an interesting question if two Apl programs can manipulate data in a shared memory. I think we may need a simple pseudo file handling interface.
I think that this and the following point :
Then, can you share data among cores to support parallel processing?

and also the question as to how Apls running on separate cores pass data between them need careful consideration.
The current state of the Aplc is to assume an underlying unix structure, and there are built in facilities using pipes for such multiprocessing, However the Parallella version of the Aplc compiler says (not surprisingly) it can not find pipe or dup - is that correct ? These might be replaceable by fifos between the cores.
So we will have to use the native e-functions in epiphany to provide a replacement - I can see how to pass data between Apls on XMOS cores, though some work still to do, so I have good hopes it will be straightforward for the Parallella.
I will think on the shared memory question, and have a chat with Sam Sirlin who is the main maintainer of aplc.
Certainly AplX has shared variables that exist outside the workspaces, that the multi-tasking workspaces can request read/write access to - this is almost certainly the way to go. The variable are still held as Apl arrays, but there is a piece of code policing data common to all the other more localised workspaces.
Another question I have : the compiler finds timeofday, but does not find sleep, microsleep or nanosleep - is that correct ?
cheers, Beau
User avatar
Dr.BeauWebber
 
Posts: 114
Joined: Mon Dec 17, 2012 4:01 am
Location: England

Re: The Apl to C compiler aplc is now ported to the Parallel

Postby ysapir » Tue May 28, 2013 11:35 am

The *sleep functions should be implemented by e_ctimer*() function calls. I am surprised as to timeofday()... does it return any meaningful data?
User avatar
ysapir
 
Posts: 393
Joined: Tue Dec 11, 2012 7:05 pm

Re: The Apl to C compiler aplc is now ported to the Parallel

Postby 9600 » Tue May 28, 2013 12:32 pm

Dr.BeauWebber wrote:The Apl to C compiler aplc is now ported to the Parallella.
Strictly this belongs on the Software => Parallel Programming section, but as yet there is not a section for the array processing language Apl.


Great work and thanks for sharing details of the compiler port!

I've just created a new forum for APL and perhaps we could move further discussion there.

Cheers,

Andrew
Andrew Back (a.k.a. 9600 / carrierdetect)
User avatar
9600
 
Posts: 997
Joined: Mon Dec 17, 2012 3:25 am


Return to Programming Q & A

Who is online

Users browsing this forum: No registered users and 5 guests

cron