Quick reality checks, if you will.
(A) Constructs like :
MOV R0, %low(90000000);
MOVT R0, %high(90000000);
are because you don't have 32 bit immediates in the opcodes ( so you rely on assembly time breakup of same and do two 16-bit loads ) ?
(B) Apropos of that I've just discovered the hardware loop facility. Neat ! You could phrase just about any variation of for-loop into that .....
However : is it subject to any 'branch taken penalty' at loop endpoint for non-ultimate iterations ? ( ie. at PC = LE just a straight load of PC = LS on LC > 0 with no fuss )
FWIW ( curiosity ): any especial design reason(s) for the double word boundaries, the 32-bit instruction sizes, minimum of 8 instructions in the loop block, and interrupt disabling ??
Cheers, Mike