Single instruction, multiple data
A single instruction, a lot of data, a way to process data in parallel at the hardware level.
The most commonly used method is vector operations. Usually, the operations look like this 2 + 3 = 5, two operands 2 and 3 are fed to the processor`s adder, after addition, the result is stored in a register. In vector operations, operands are represented as vectors. The vector is written into an increased register in comparison with ordinary registers 32/64 bits normal, and up to 512 increased. It is easiest to think of vectors as arrays of operands.
For example, adding a 128-bit vector to int32 will look like this: [1,2,3,4] + [3,1,3,5] = [4,3,6,9]. It takes time to prepare a vector, therefore it is not profitable to use them on small data arrays, but the more data and the more complicated the calculation, the greater the increase in the speed of work compared to ordinary operations. It also takes more power to enable extended registers. When using SIMD, power consumption can increase up to 50%.
Consider the implementation of summing array elements using Advanced Vector Extensions (AVX). A set of commands that allows you to work with vectors.
SIMD - Single Command Stream and Multiple Data Stream. All of these systems usually have a large number of processors that are capable of executing the same instruction on different data in a rigid configuration. The SIMD machines are CPP DAP, Gamma II and Quadrics Apemille systems. Vector computers are the next subclass of SIMD systems. They manipulate arrays of similar data in a manner similar to scalar machines that process individual elements of such arrays. This is made possible through the use of specially designed vector central processing units.
Once the data has been processed by vector modules, the results can be presented at one, two or three clock cycles of the generator frequency. Vector processors process data in parallel when operating in vector mode, which makes them faster than when operating in scalar mode.
Not all algorithms can be vectorized so easily. Controlling the flow of heavy tasks cannot be easily accelerated with SIMD. In theory, this is possible if the comparisons are vectorized to achieve maximum cache optimization, this method will require more intermediate states.
SIMD owns large register files and this increases power consumption.
SIMD may also have limitations on data alignment, however, those familiar with one particular architecture may not expect this.
Collecting data directly into SIMD registers and then distributing them to the correct places is very complex and can be inefficient.
The adoption of SIMD systems in personal computer software was slow at first due to a number of problems. One was that many of the early sets of SIMD instructions could degrade overall system performance by reusing existing floating point registers.