paper on extended "double-single" precision

Generic algorithms that could be implemented in almost any language, e.g. matrix operations and FFT.

paper on extended "double-single" precision

Postby notzed » Sat Jun 14, 2014 11:05 am

The paper ``Extended-Precision Floating-Point Numbers for GPU Computation'' by Andrew Thall may be of use for the epiphany.

It outlines some basic flops for a double-single format that I presume should be faster (and smaller) than a software-ieee-double library. All operations are decomposed into 32-bit flops so can run on hardware. It approximately doubles the mantissa accuracy but doesn't extend the exponent.

I downloaded it from here: http://andrewthall.net/papers/df64_qf128.pdf
notzed
 
Posts: 331
Joined: Mon Dec 17, 2012 12:28 am
Location: Australia

Re: paper on extended "double-single" precision

Postby upcFrost » Sat May 27, 2017 11:28 pm

Actually it might worth trying to implement. I'll try probably, at least basic ops
Current LLVM backend for Epiphany: https://github.com/upcFrost/Epiphany. Commits and code reviews are welcome
upcFrost
 
Posts: 37
Joined: Wed May 28, 2014 6:37 am
Location: Moscow, Russia

Re: paper on extended "double-single" precision

Postby jar » Sun May 28, 2017 12:42 am

I had experimented with this some time ago and didn't find it very worthwhile. Consider the df64_mult routine, which requires 8 multiplications, 10 subtractions and 6 additions by my count. It should be used very selectively.

Code: Select all
float2 df64_mult(float2 a, float2 b) { // 8 mul + 10 sub + 6 add
   float2 p;
   p = twoProd(a.x, b.x); // 6 mul + 8 sub + 3 add
   p.y += a.x * b.y;
   p.y += a.y * b.x;
   p = quickTwoSum(p.x, p.y); // 2 sub + 1 add
   return p;
}

float2 quickTwoSum(float a, float b) { // 2 sub + 1 add
   float s = a + b;
   float e = b - (s - a);
   return float2(s, e);
}

float2 twoProd(float a, float b) { // 6 mul + 8 sub + 3 add
   float p = a * b;
   float2 aS = split(a); // 1 mul + 3 sub
   float2 bS = split(b); // 1 mul + 3 sub
   float err = ((aS.x * bS.x - p) + aS.x * bS.y + aS.y - bS.x) + aS.y * bS.y;
   return float2(p, err);
}

float2 split(float a) { // 1 mul + 3 sub
   const float split = 4097; //(1<<12)+1;
   float t= a * split;
   float ahi = t - (t - a);
   float alo = a - ahi;
   return float2(ahi, alo);
}
User avatar
jar
 
Posts: 295
Joined: Mon Dec 17, 2012 3:27 am


Return to Algorithms

Who is online

Users browsing this forum: No registered users and 1 guest