Page 1 of 1

Why port array languages such as Apl to Parallella ?

PostPosted: Tue May 28, 2013 8:20 pm
by Dr.BeauWebber
For the first time we are faced with cheap multiprocessor systems.
The dominant cost and difficulty of using them is no longer the hardware, but the time to deploy an initial concept into a working solution. For a number of problem types, the ability to quickly - and accurately - develop a solution is crucial.

The essential feature of the high level array languages such as Apl, Matlab, Maple, LabVIEW, is that it is possible to express array based operation using array notation such as :
Code: Select all
C <= A + B
where A, B, C may all be multidimensional arrays.
i.e. there is no need to manually code the individual word-sized operations as in languages such as C.

This results is both an increased speed of development, but often more important, a much greater certainty that the code that has been written will really do what one wants - "proof by inspection" is possible with 10 lines of array notation code in a way it is not with the equivalent 10 or possibly 100 pages of C code. It is possible to hold and develop complete programs in ones head. The generic :
Code: Select all
for (int i = 1; i < 10; i++)
   {
       Console.WriteLine(i);
   }
1 2 3 4 5 6 7 8 9

in Apl becomes
Code: Select all
⍳ 9
1 2 3 4 5 6 7 8 9
with no need to carefully check that all the semicolons are in the correct place.

In Apl one can flip or transpose an array A (of any dimension) with ⌽ A ⊖ A ⍉ A - and I do not need to tell you which does which.
However the real power of Apl comes from the ability to have both a wide range of functions that act on data and its structure, but also a number of operators that act on these functions :
A + B adds two arrays
but +/ A does a grand sum along the rows of A
while +\A does a running or scanned sum along the rows of A.
The operators act on a whole range of functions, thus the compressive operator / can be applied to find the greatest or least value along a row (or column) : /⌈ /[1]⌊

If you don't like all these symbols, that is OK, aplc natively uses plain text abbreviated names for the symbols :
A ← ⌽ ⍳ 9 becomes A .is .rv .iota 9 .
I myself usually develop algorithms using an Apl interpreter for my laptop, and when is is well tested use a program to translate the program to the dot notation for aplc. However there is no difficulty is working from scratch using the dot notation.

So using Apl to develop the code on a particular processor can greatly reduce the effort needed, and lead to a greater certainty that the expensively acquired data will actually mean what one intended.
But the Parallella is a multi-processing system, how do we connect these Apl nodes together ?
See the next posting, which will include a plea for help ....
Dr. Beau Webber
http://www.Lab-Tools.com

Re: Why port array languages such as Apl to Parallella ?

PostPosted: Wed May 29, 2013 1:31 am
by aolofsson
Dr. Webber,
Thank you so much for porting APL to Parallella! This totally makes my day!
If I understand correctly, you are suggesting something like pipes to communicate between APL programs running on different cores. Any thoughts on the possibility of actually extracting data parallelism and computation from the APL program. Does APL have properties that makes it a better starting point for "auto parallelizing compilers" compared to standard procedural languages like C? I know there have been some publications to this effect, but not sure how practical it is.
Andreas

Re: Why port array languages such as Apl to Parallella ?

PostPosted: Wed May 29, 2013 9:47 am
by Dr.BeauWebber
Thanks indeed aolofsson, your comment is a good start to my day !
If I understand correctly, you are suggesting something like pipes to communicate between APL programs running on different cores.


Well 'pipe' is the mechanism that I added to aplc back in the Transputer days, as well as 'spawn', which is a method for a master processor to start up and connect pipes to an arbitrary set of other processes on other processors.

Here is an example pipeline of aplc expressions, joined as a c-shell pipeline, running on the multiple cores in my laptop, as a demonstration of what I mean :
Code: Select all
$ ( echo "1000" | ./filt_gauss 2 > stde ) | & ( ./filt_binr 2 >> stde )

The first ‘node’ just emits the string “1000” from its std-out.
The second node uses aplc to generate a Gaussian (normal) distribution of 1000 data points, with a mean of 0 and standard deviation of 10 (uses von Neumann’s method).
The third node uses aplc to run a binning algorithm, and bins the random data into 11 bins between -11 and +11, giving a normal distribution.
Well I currently we have one node on the Parallella simulator, so here it is running as a single program, with similar output :

Code: Select all
$ aplcc binr_run.apl
$ ./e-aplcc binr_run.c
$ ../../INSTALL/bin/e-run ./a.out
Started binr_run
Enter a number say 1000 to 100000 :
.bx:
10000
-10   16
  -8  130
  -6  394
  -4 1170
  -2 2050
   0 2464
   2 2034
   4 1161
   6  456
   8  110
  10   14


If we had the standard library functions 'pipe' and 'dup' in newlib, this should work 'out of the box' on the Parallella. However these two functions seem to be missing, though related functions that do similar jobs appear to exist for the Parallella. It is just a question as to how to do it : either create 'e-pipe' and 'e-spawn' in aplc using the Parallella e-functions, or build a standard 'pipe' and 'spawn' on top of the e-functions, if such a thing can be done with reasonable efficiency. I certainly need advice from other people, and I am hoping that someone with a better knowledge of the Parallella might have a look at the 'pipe' question.
Any thoughts on the possibility of actually extracting data parallelism and computation from the APL program. Does APL have properties that makes it a better starting point for "auto parallelizing compilers" compared to standard procedural languages like C?

Well certainly APL should should be preferable to C, since APL already knows the structure and types of the data. Much of the verbiage in aplc code is doing the necessary processing to determine that the operands are of the correct type and shape to fit into the specified function and or operator, and also calculating the resultant type and shape.
So one already has this to build on. Aplc then proceeds to build a fairly optimised chunk of lazy evaluation nested c-code to calculate the result. This is then what one would have to modify to distribute the computation over many cores - but as I say, the analyses of data types and structures has already been done, one can just draw on this as needed. Not a trivial task, and not one I have looked at in detail, but some APL vendors have made a reasonable start on parts of this. The way they seem to have started is just identify the functions that could benefit the most from such parallelisation/distribution, and treat them one at a time. The nice thing about doing this inside an APL context is that for any particular data transformation, one only has to write it down once, (and test it), and then it is done for all similar transformations, one is not writing the same code (and testing it)again in that or another program.
cheers, Beau

Re: Why port array languages such as Apl to Parallella ?

PostPosted: Wed May 29, 2013 11:58 am
by Dr.BeauWebber
Thinking on the option of multiple Apls making use of the available essentially overlapping address space - see sections 4.1 & 5.4 in the architecture reference manual.

This discusses a particularly interesting feature as follows, where a process in one core (as given by address 0x8200) is writing directly into the local memory of another core (0x9200). Somewhat scary, but remarkably efficient.
Code: Select all
//VecA array at 0x82002000
//VecB array at 0x82004000
//remote_res at 0x92004000
for (i=0; i<100; i++){
loc_sum+=vecA[i]*vecB[i];
}
remote_res =loc_sum;

I wonder is one could create a process something like that available in AplX, where one can have shared variables that reside outside the local processes' workspace (i.e. outside local core memory), but are globally read/write accessible from any process - Essentially create a workspace with an address space covering the whole machine, that only does variable read/write to the local processes, at their request.

Cheers,
Beau

Re: Why port array languages such as Apl to Parallella ?

PostPosted: Wed May 29, 2013 5:08 pm
by Dr.BeauWebber
I have started a new topic : Inter-process communication on the Parallella to continue this communication discussion.

Re: Why port array languages such as Apl to Parallella ?

PostPosted: Sun Aug 04, 2013 10:41 pm
by shr
Array programming lanugages should be a good fit for the Parallella, though efficiently mapping APL and friends to its architecture will take some work.