Perplexed by PAL

Discussion about Parallella (and Epiphany) Software Development

Moderators: amylaar, jeremybennett, simoncook

Perplexed by PAL

Postby cmcconnell » Wed Jun 03, 2015 6:14 pm

I've never been entirely happy that I understand the design underlying the PAL library and the plans for its evolution. I don't know if others share some of my confusion. In part, it's this lack of understanding of the big picture that has been putting me off contributing. [Plus, to be honest, a worry that the big picture may be as confused as it is confusing!]

For what it's worth, here are a slightly random selection of queries and observations -


The functions currently being worked on seem to be serial in nature. They each have a vector-style signature, but then each contain a for loop to invoke the underlying code n times.

How is this eventually intended to fit into a parallel-invocation mechanism? Is the idea that at a higher level, a request to add N floats (for example) might be partitioned into n requests to add N/n floats, each of which will equate to a call of the p_add(..) function on n different cores? Or are all these functions themselves going to be modified at some point, replacing the for loops with something else?

I think 'vanilla C, compatible with gcc' is a bit too vague. I presume gcc compatibility is necessary but not sufficient, and that the maximum portability possible ought to be sought. Can C99 constructs be used? (but not C11, I guess.) One option would be to mandate a set of compiler flags to be used with gcc, disabling non-portable features.

Then again, a better option might be to settle on a standard set of macros to be used for conditional compilation purposes, detecting the compiler id and/or various features of the target architecture. That way, every available facility (short of assembly language) could be used for optimisation purposes.

I'm thinking of something like what is done in this example - http://www.saphir2.com/sphlib/ (See the 'Tuning' section.)

And the features of the target archiecture are surely a very important detail. The benchmarking section has placeholders for Epiphany, ARM and x86, but the latter two have 32 and 64-bit variants, plus a range of different SIMD extensions.

If code is written with elements of gcc-specific conditional compilation, then compiler intrinsics might be useful, as the next step up from pure 'vanilla' C, without getting into the specifics of NEON, SSE, AVX, etc.

[Is runtime detection perhaps even an option for the SIMD stuff, without getting into asm??]


Going back to my original query about the parallel mechanism, I think part of what I am struggling to get my head around is how it will be ensured that a call of a serial (i.e. runs on one core) function to, say, add N floats will be given an optimal amount of work to do, given its SIMD capabilties. [If that makes sense.]

And, in terms of the leaf functions currently being implemented, maybe it would make more sense to define a set of standard vector lengths, such as 1 (i.e. scalar), 2, 4, and 8; then use those as the building blocks for the multicore, parrallel world. We start by implementing the scalar version. For E32, that is all we ever need, and it is the PAL plumbing that is reponsible for invoking it mutliple times across multiple cores. For other architectures, we can start with just the scalar version, but fill in the missing pieces later, so that cores with 128, 256 or 512-bit SIMD registers can be fed work in larger chunks.


On a separate, but related note - ought there not be double (and perhaps also long double) versions of some of the functons, since it is not just E32 being targetted? C99 has type-generic macros for its math functions, which call the functions that correspond to the parameter type. Maybe the same idea could be applied to the PAL API?


I hope the above isn't too rambling. Ive wanted to post something along these lines for a while, but always found it too difficult to collect my thoughts; I'm not sure I've entirely succeeded this time, either. :)
Colin.
cmcconnell
 
Posts: 99
Joined: Thu May 22, 2014 6:58 pm

Return to Programming Q & A

Who is online

Users browsing this forum: No registered users and 7 guests

cron