[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 379: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 379: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 379: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Parallella Community • View topic - brainstorming - distinguish fine/coarse grain?

brainstorming - distinguish fine/coarse grain?

Forum for PAL Users and Developers

brainstorming - distinguish fine/coarse grain?

Postby dobkeratops » Sun Jul 05, 2015 10:07 am

brainstorming:

[1] on many systems there is a distinction between a coarse & fine grains of parallelism, specifically cores X SIMD (but also & ILP visible to the programer in unrolling, and hyper threading & nested caches)

Would it make sense to further divide the parallel functions with a naming convention into _coarse , _fine versions that hint you definitely mean one or the other; unhinted functions would just do whatever is most sensible.

If architecting for a conventional CPU, you would parallelise large,outer tasks into worker threads, within which fine grain tasks are parallelised by SIMD;
conversely on a manycore chip like parallella you might prefer to spawn yet more subtasks;

This could also relate to data-locality - a '_fine()' hint could also mean that the data-set in question is actually within local-memory and doesn't require DMA streaming

On some platforms the _fine() version *might* spawn more tasks; however you know it *never* does on a SIMD machine; Similarly, '_coarse()' would be a hint that you definitely mean 'use more cores/threads..). You would save code-size & branches over making a dynamic decision every time.

The distinction might help cross platform implementations, e.g. a 'p_sort_u32(..)' called from the main task could fan out and spawn smaller tasks that use p_sort_u32_fine() on subsets, then the outer task merges the results; fine grain sort is also available as a useful component in its' own right, for implementors of other PAL functions.

I would invisage that a set of postfixes can be tweaked by a user to supply more information but in a way that does not change the actual behaviour or result.

I realise that manycore may suit a more general approach , however the intent of this library appears to be to make a good compromise for moving a single codebase between manycore, GPGPU, SMP x SIMD, and even FPGA. You could start out with a good enough assumption across the board, e.g. 'coarse means 4 threads', 'fine means 4way SIMD', which might still be better than no distinction when given 8,16way SIMD.

eg


dobkeratops
 
Posts: 189
Joined: Fri Jun 05, 2015 6:42 pm
Location: uk

Return to PAL

Who is online

Users browsing this forum: No registered users and 2 guests

cron