Thank you all for comments and additional ideas! Here are some more from me:
Instead of a 32-bit rotate instruction, we could introduce one that is more generic and even more suitable for crypto: ARX, which is the acronym for add-rotate-XOR (as ). The instruction would compute:
RD ^= rotate(RN + RM, imm5)
It would need 3*6 = 18 bits for register indices and 5 bits for the rotate count, leaving up to 9 bits for the opcode.
Maybe the regular ADD and XOR instructions should be converted into forms of this instruction (sharing some circuitry with its implementation). ADD is the same as the ARX instruction proposed above, but with the immediate value at 0 and with the XOR (register file update mode?) disabled (one bit in the instruction encoding? or maybe 0 in the 5-bit immediate value would be treated specially, enabling this mode?) Then we don't even have to spend a new opcode on this instruction - it will be an extension/replacement of ADD.
XOR is also implementable as a special mode of ARX, but for the full 3-register form it's a bit trickier (need to have a bit in the instruction encoding that will change RM into input to XOR rather than to ADD).
If this is too tricky, then the fallback option is to implement just an ADD_ROT, which will also be usable as ADD in the straightforward way (at 0 rotate count), but having a full ARX instruction would provide speedup for ciphers that would use all 3 components of it or that would use the ADD and XOR components.
If opcode space permitted, we could even do:
RD ^= ((RN + RM) << n) ^ ((RN + RM) >> m)
where "n" and "m" are separate immediate values, 5-bit each, but this leaves only 3 bits for the opcode, which is clearly unacceptable (we'd waste 1/8th of our opcode space on this one instruction). That's a pity, since this instruction could also be usable for some bit permutations, with rotation being only a special case, yet not consume much/any more logic.
Speaking of bit permutations, we need to check out . 243 pages on optimal choice of bit permutation instruction(s) to implement.