Permute instruction

Permute (and Shuffle) instructions, part of bit manipulation as well as vector processing, copy unaltered contents from a source array to a destination array, where the indices are specified by a second source array.[1] The size (bitwidth) of the source elements is not restricted but remains the same as the destination size.

There exists two important permute variants, known as gather and scatter, respectively. The gather variant is as follows:

for i = 0 to length-1
    dest[i] = src[indices[i]]

where the scatter variant is:

for i = 0 to length-1
    dest[indices[i]] = src[i]

Note that unlike in memory-based gather-scatter all three of dest, src, and indices are registers (or parts of registers in the case of bit-level permute), not memory locations.

The scatter variant can be seen to "scatter" the source elements across (into) to the destination, where the "gather" variant is gathering data from the indexed source elements.

Given that the indices may be repeated in both variants, the resultant output is not a strict mathematical permutation because duplicates can occur in the output.

A special case of permute is also used in GPU "swizzling" (again, not strictly a permutation) which performs on-the-fly reordering of subvector data so as to align or duplicate elements with the appropriate SIMD lane.

  1. ^ Intel® 64 and IA-32 architectures software developer's manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4 (PDF). Intel. June 2021. p. 5-356 Vol. 2C.

From Wikipedia, the free encyclopedia · View on Wikipedia

Developed by razib.in