Parallella Community

by **alexrp** » Mon Dec 30, 2013 8:01 am

It's not really clear whether they treat the input as signed or unsigned. The immediate versions of them appear to, while it's not specified whether the register versions do. Either way, it's a bit odd, seeing as IADD/ISUB exist, but maybe I'm overlooking something.

by **mhonman** » Mon Dec 30, 2013 10:56 am

IADD/ISUB are actually the same as FADD/FSUB (see instruction encodings). The secondary ALU can function as either a floating-point or an integer unit - set in the ARITHMODE bits of the CONFIG register (doc for ARITHMODE states that IADD/ISUB are signed).

But as far as I'm aware there is no need for separate signed and unsigned instructions when 2's complement is used - the carry and AN/BN flags can be used to implement signed or unsigned semantics around the basic ADD/SUB instructions.

by **alexrp** » Mon Dec 30, 2013 2:20 pm

by **mhonman** » Mon Dec 30, 2013 2:50 pm

by **timpart** » Tue Dec 31, 2013 1:26 am

Yes the timings are the same as FPU instructions, but since "rounding" isn't applicable to integer operations, you can save a cycle on the secondary ones by setting config to truncate. More on this .

Tim

by **cmcconnell** » Thu Aug 28, 2014 1:49 pm

by **notzed** » Fri Aug 29, 2014 3:19 am

Look up two's compliment arithmetic. There is no physical or mathematical difference between a signed or an unsigned add/sub for a fixed number of bits. The only difference is the interpretation of the results and what constitutes an overflow.

If the operands differ in length then signed/unsigned affects how the inputs are formed (either duplicates the msb across all higher bits, or it doesn't) as in the immediate variants which are all signed. But since the register variants are all 32-bit operands then there is no difference between signed or unsigned add/sub.

e.g. (using 16-bits here to save typing)

unsigned: 1 + 65534 = 65535, signed; 1 + (-2) = -1

In hex they are both: 0x0001 + 0xfffe = 0xffff

unsigned: 65534+65535 = 65533, signed: (-2)+(-1)= -3

In hex they are both: 0xfffe + 0xffff = 0xfffd (the carry to bit 16 is lost).

But obviously the unsigned version has overflowed.

If there was an "Add byte" then it would need a signed/unsigned specifier since 0xff unsigned is 255 and 0xff signed is -1.

IADD/ISUB is a just a "freebie" from a side-effect of the way integer multiply is implemented so as to require no extra silicon. They could be used in integer heavy code for up to 2x the throughput but the cost of changing mode is so high it's not really useful if the code does any flops.

The pipeline diagram details where the instructions are executed and retired (figure 14), and yes I believe it is one cycle less when using truncate mode. i.e. they will retire in E3 and not E4.

You might find the following posts and static analysis tool of interest. The tool has some bugs (it was really just an extended-weekend hack) and is hard-coded to the rounding timing but it's enough to help understand what's going on. My labelling of the pipleine stages might be out by one because i wasn't sure on which edge specific events occur but it doesn't really make any difference to the relationships.

http://a-hackers-craic.blogspot.com.au/ ... -tool.html
http://a-hackers-craic.blogspot.com.au/ ... round.html
http://www.users.on.net/~notzed/software/ezetool.html

I did a lot of testing with snippets of code on the hardware to confirm the numbers because whilst the tables are accurate and mostly complete there are some fiddly details missing and I wasn't sure how to interpret them either. Andreas also kindly provided further internal details in some posts here (in the assembly section, i think) which clarified some otherwise odd results I saw with the dual-issue rates (the dependencies on all load/stores are for dword register pairs for example).

by **cmcconnell** » Fri Aug 29, 2014 10:38 am

by **notzed** » Fri Aug 29, 2014 1:44 pm

ok no worries - its always hard to tell from one post on a forum. And until sometime in the last week or so there'd only been one download of the tool.

iadd probably just doesn't implement integer flags properly. I don't think the Bxx flags are that useful to start with because of the delay before they're ready and since every flop updates them you can't put any floating point ops between the test and their use. (PS the timing tool doesn't handle them at all, but that's just because I forgot about them at the time).

On to the pipeline stuff. The dword thing only affects loads/stores and I guess exists because they could be ldrd/strd but the size hasn't been decoded when the dual-issue decision takes place.

In the first case the add is dual-issuing with the fmadd because that is the first dual-issue pair candidate (due to sequencing) and there's nothing stopping that pair (in isolation) dual-issuing. But it then stalls at RA because the fmadd needs a 1 cycle separation from the preceding lsr and although there is another instruction there there isn't that cycle due to the dual-issue. In the second case you have the requisite separation even with the dual-issue.

I know it seems a bit weird but that's what I saw from the cycle counters and a whole lot of tiny tests I used to isolate the precise behaviour.

The way i modelled dual issue is that the next two instructions in the decoder are checked to see if they can dual-issue in isolation with no reference to other instructions. If they can they are issued together but if either stalls in the RA stage they both stall together as a pair. Guessing, this seems to be to preserve instruction order.

The eor's are just waiting for the fmadd results. The fmadd 'stage numbers' displayed should probably just go to 5 for rounding mode because you need 4 whole cycles between "1" of the first and "1" of the dependent instruction in this case. With that set of instructions you're probably not going to get any better timing: you need more work to fill all those holes.

The "dual issue" thing adds a lot more slots to fill than it appears at first glance. For example if you only have two dependent flops you need up to 6 ialu ops to fill the 4 delay holes if the first and last dual issue with the given flops.

As for ezetime: there's a big bug with str rx,... followed by flop rx,rx,rx, and the aforementioned missing B flags but if you spot anything else that looks wrong (or want explaining) let me know. I haven't played with it for a while mind you. Probably use the assembly forum if you do as i'm more likely to see it.

by **cmcconnell** » Fri Aug 29, 2014 11:13 pm

Parallella Community

Are ADD/SUB signed or unsigned operations?

Are ADD/SUB signed or unsigned operations?

Re: Are ADD/SUB signed or unsigned operations?

Re: Are ADD/SUB signed or unsigned operations?

Re: Are ADD/SUB signed or unsigned operations?

Re: Are ADD/SUB signed or unsigned operations?

Re: Are ADD/SUB signed or unsigned operations?

Re: Are ADD/SUB signed or unsigned operations?

Re: Are ADD/SUB signed or unsigned operations?

Re: Are ADD/SUB signed or unsigned operations?

Re: Are ADD/SUB signed or unsigned operations?

Who is online