[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Parallella Community • View topic - Linpack

Linpack

Linpack

Postby MiguelTasende » Thu Jul 16, 2015 9:03 pm

Hi!
I wanted to know... has anyone run the linpack with a parallella cluster?
I saw there was a challenge about this, some time ago. Anyone succeeded?

By now, I've run the linpack on the ARM cluster using OpenBLAS.
Now working on adding the Epiphany.

If anyone is interested in sharing thoughts, experiences, or linpack results, I would like to.

Right now, I have instantiated the BLAS via BLIS on a virtual cluster, modified a "gemm" kernel, built the linpack on top of it, and run it successfully (and checked that my kernel was being run by the linpack... with some "printf"s).
That is... I got to manage a container for a super-Epiphany powered matmul kernel. Now trying to fill it with an actual Epiphany powered matmul kernel :)
MiguelTasende
 
Posts: 51
Joined: Tue Jun 30, 2015 12:44 pm

Re: Linpack

Postby aolofsson » Sun Aug 02, 2015 11:19 pm

Very cool! How did things go with Epiphany BLIS?
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Linpack

Postby MiguelTasende » Thu Aug 06, 2015 4:47 pm

Still not there....
I managed to offload "dotproduct" to the Epiphany (in a terribly unoptimized way), and built the BLAS that uses that.
I completely rethought a Matmul algorithm that I think may solve all performance problems (uuuhhh... well, you have to be optimistic... we'll see), and by now have implemented it (partly: i'm offloading half and calculating half in the ARM by now).

Tasks for today:
- studying the linker scripts to allocate memory efficiently
- offload the second part of the algorithm to the Epiphany

After that I would have an initial prototype that I hope will be far more efficient than previous implementations (but may be wrong).

More tasks, for after the first prototype test:
- generalize the kernel for non-square matrices (it's an optimization since BLIS lets you use only "square kernels" and divides the original matrix accordingly, but if you let it use arbitrary matrices it optimizes more)
- create a double buffer
- optimize algorithm
- optimize code
- try to find a way not to hardware reset the Epiphany every time a call to the BLIS kernel is made

We'll be sending news...
MiguelTasende
 
Posts: 51
Joined: Tue Jun 30, 2015 12:44 pm

Re: Linpack

Postby aolofsson » Thu Aug 06, 2015 5:46 pm

That's awesome!

Doubt you will get good performance sending dot-product to Epiphany. (too BW limited).

You should never have to mess around with linker descriptor files.:-(
Recommend combining work with this (or somerhing similar) if you will be doing offloading from ARM/

https://github.com/USArmyResearchLab/mp ... ter/cannon

Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Linpack

Postby MiguelTasende » Fri Aug 07, 2015 2:28 pm

Yes, the "dotproduct" part was only to show that I could combine all the things, get a BLAS library that does some real calculations on the Epiphany, and get correct results when running linpack. No good performance expected.
I my actual algorithm I am not using any dot product at all.

After looking at the linker descriptor files I decided that I will use internal.ldf for now and not to make any changes... but may need to understand better the way it works anyway (not to put some variables in the wrong place). At the moment I can multiply 64x64 matrices using 2 memory banks to store the local results (I use 1 for the code and 1 for "small variables"). Being conservative on that (also adding 1 more bank doesn't improve too much: could get 78x78).
I have an idea to try to enhance it 16 times, because I could theoretically use 2 banks for each core, reusing space for partial-partial-partial-...-final results (short explaination... :) ), but that will be a bit more complicated...

Will look at the code you say, but they seem to be using "coprthr", and I am not (by now). I'm using only ESDK for this first prototype. Later, it can be improved if necessary.
MiguelTasende
 
Posts: 51
Joined: Tue Jun 30, 2015 12:44 pm

Re: Linpack

Postby MiguelTasende » Thu Mar 03, 2016 5:56 pm

These would be the first Linpack benchmark results... still a lot to optimize (it's a poor result, by now):

NOTE1: It was necessary to run the "Epiphany caller" algorithm from a different process for two reasons:
1) To avoid mmap() problems when the kernel is called many times
2) To avoid performance losses, due to initialization times

NOTE2: To accomodate to the HPL Linpack benchmark, which uses Double Precision, I created an hybrid version of the kernel. BLIS calls to dgemm, go to a custom dgemm kernel, that in fact does some casting and calls to the sgemm inner kernel.

A(192x4096), B(4096x256), C(192x256)
Operation: C_out = alpha * A * B + beta * C_in

---My own measurements--------------------------------------------------------------------------------------------------------------
My Matmul algorithm (run once, don't count initialization times): 3.4 GFLOPS
Same Matmul algorithm modified to run as a separate process (run from the same process): 3.04 GFLOPS
Same Matmul (run from another process: includes transfers to shared memory and synchronization times): 2.17 GFLOPS
------------------------------------------------------------------------------------------------------------------------------------------------
Possible improvements (on that "phase"): Algorithm itself (great potential on improving "e_read" times from ARM to shared RAM), Adopting the new e-link, Improving Inter-process communications.

---BLIS measurements--------------------------------------------------------------------------------------------------------------
sgemm kernel (M=192, N=256, K=4096): 2.63 GFLOPS (run from separate process)
sgemm complete operation (M=N=K=4096): Between 2.035 GFLOPS and 2.456 GFLOPS (depending on the transpose,conjugate, etc, operations requested)
"false dgemm" kernel (M=192, N=256, K=4096): 2.073 GFLOPS
"false dgemm" complete operation (M=N=K=4096): Between 1.575 GFLOPS and 1.829 GFLOPS
-------------------------------------------------------------------------------------------------------------------------------------------
Possible improvements (on that "phase"): I think the BLIS process is very efficient as it is, and is not wasting many FLOPS, by now.

---HPL Linpack benchmark--------------------------------------------------------------------------------------------------------------
Linpack (N = 4608, NB=768, many other options tweaked...): 495.7 MFLOPS
------------------------------------------------------------------------------------------------------------------------------------------------
Possible improvements (on that "phase"): The many parameters of the HPL configuration file change the way the algorithm is run, and do have a great impact on the performance. Also, HPL is calling the "false dgemm" of the BLIS-BLAS library (if I had a native Single Precision algorithm to compile with the BLIS-BLAS would be better).

That's it by now, hope to improve what I can. I'll also run it on a cluster (maybe first, and then improve other things)
MiguelTasende
 
Posts: 51
Joined: Tue Jun 30, 2015 12:44 pm

Re: Linpack

Postby snim2 » Thu Aug 04, 2016 9:24 pm

Hi @MiguelTasende this looks really interesting! Is your version of Linpack available publicly at all?

Thanks,

Sarah
snim2
 
Posts: 53
Joined: Mon Feb 03, 2014 5:02 pm

Re: Linpack

Postby MiguelTasende » Mon Aug 15, 2016 7:17 pm

I am really sorry. The problem is that I am very inexperienced in these matters (and maybe I am in shaky grounds, also...).
The work was published in a conference (last week): IEEE DataCom 2016 (Auckland).
The original paper was 8 pages long, but I had to reduce it to 4 pages for publication. I would like to publish the 8 pages version (more explained) in ArXiv or similar, but I am asking to IEEE copyright section to see if it is possible.

It is the first paper I publish, and also the first time I will try to release software code from within the company I work for, so I am lost in a legal mess :)
The code is still not available.
By the end of this month I will have news (or die trying... :P ). In principle, there will be support for the release of the code, from the company.

Any general advice (from experienced publishers) could help, also.

By now I can tell the title of the paper was: "Generation of the Single Precision BLAS library for the Parallella platform, with Epiphany co-processor acceleration, using the BLIS
framework"
And I still can't find it in IEEE Xplore, or anywhere on the web (maybe it is too soon).
MiguelTasende
 
Posts: 51
Joined: Tue Jun 30, 2015 12:44 pm

Re: Linpack

Postby jar » Tue Aug 16, 2016 3:45 am

User avatar
jar
 
Posts: 295
Joined: Mon Dec 17, 2012 3:27 am

Re: Linpack

Postby MiguelTasende » Fri Aug 19, 2016 1:22 am

OK, thanks.
After dealing with some arXiv issues (initial problems with "endorsement"...), now it is done.
Can be downloaded here:



(it is the 8 pages manuscript. Later had to be reduced to 4, for the final submission to the conference. The final submission is still not published, as far as I know.)

About the code, I hope to be able to release it soon.
MiguelTasende
 
Posts: 51
Joined: Tue Jun 30, 2015 12:44 pm

Next

Return to Clustering

Who is online

Users browsing this forum: No registered users and 2 guests