pipelined floating point compare

pipelined floating point compare

Postby notzed » Tue Jul 29, 2014 1:36 am

Because the cpu uses a conditions code register for floating point operations like subtract any conditional logic based on those flags cannot be pipelined. e.g. you can't unroll multiple tests with branchless movCC logic.

You can't even issue another unrelated floating point operation to fill the fpu slot because they all affect flags so even if there are no pipeline stalls there are always going to be forced fpu pipeline holes. I think this is an oversight of the instruction set, but so be it.

But for some limited cases the fpu B flags can just be ignored and the IALU flags used instead. i.e. for negative, which is the most useful anyway.

e.g. an unrolled 'min' function:

Code: Select all
 fsub r16,r0,r48
 fsub r17,r1,r49
 fsub r18,r2,r50
 fsub r19,r2,r51
.. etc

 add r16,r16,#0
 movgt r0,r48
 add r17,r17,#0
 movgt r1,r49
etc


It requires an extra instruction to copy the N bit back to the integer CC register but it can be done any time later.

It can help with C too, even for integer arithmetic because of the way it forms booleans.
notzed
 
Posts: 331
Joined: Mon Dec 17, 2012 12:28 am
Location: Australia

Re: pipelined floating point compare

Postby aolofsson » Tue Jul 29, 2014 1:45 pm

notzed,
The way it is currently implemented, are the "b-flags" useful at all?
Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: pipelined floating point compare

Postby notzed » Tue Jul 29, 2014 4:25 pm

aolofsson wrote:notzed,
The way it is currently implemented, are the "b-flags" useful at all?
Andreas


Good question. The compiler guys might have something to say about it. But no doubt having them is better than nothing.

Looking at the ARM ISA there are two ways to compare floating point values. Either via vcmp which writes to shadow fpu flags register which can be copied over the integer ones so the same instructions work, or through a set of instructions which write results as boolean values to simd register channels.

The former is an improvement because all other fpu ops don't interfere with the result so they can be freely scheduled in the hole left waiting to use it. And the latter is required for simd algorithms and has some additional support like select-bits which can implement c = (t?a:b) more flexibly than movCC (and more besides).

A pair to compare would be nice:

fcmpCC rd,rn,rm
cmpCC rd,rn,rm

rd = testCC(rn, rm) ? ~0 : 0

Ideally, but even a 2-register version would be useful, destroying the content of rd in the process.

These can be then used together with the current boolean operators to perform complex logic without branches much easier than with movCC.

Also, rather than 'movCC':

selb rd,rn,rm

rd = (rd & ~rm) | (rn & rm)

This is more useful than movCC because the mask can come from other sources - compound logic equations or tables. It can also be used to implement bitfield operations amongst others.

imho condition codes don't really work well with deep pipelines (latency of results, expensive branches). Even arm has the 'S' flag on most integer instructions so they can be partially detached from scheduling.
notzed
 
Posts: 331
Joined: Mon Dec 17, 2012 12:28 am
Location: Australia


Return to Assembly

Who is online

Users browsing this forum: No registered users and 3 guests