How can I find a list of all SSE instructions? What happens if a CPU doesn't support SSE?

I've saw the names of some instructions that we're added on SSE, however there's no explain about all of them (Maybe SSE4? They're not even listed on Wikipedia). Where i can read about what they do?

The best source would be straight from the people who designed the extensions: Intel. The definitive references are the Intel® 64 and IA-32 Architectures Software Developer Manuals; I would recommend that you download the combined Volumes 1 through 3C (first download link on that page). You may want to look at Vol. 1, Ch. 12 - Programming with SSE3, SSSE3, SSE4 and AESNI. To refer to specific instructions, see Vol. 2, Ch. 3-4. (Appendix B is also helpful)


How do i know which of these instructions are being used?

The instructions are only used if a program you're running actually uses them (i.e. the bytecode corresponding to the various SSE4 instructions are being called). To find out what instructions a program uses, you need to use a disassembler.


If we do know which are being used, let's say i'm doing a comparation, (This may be the stupidest question i've ever done, i don't know about assembly, though) It's possible to directly use the instruction on an assembly code? (I've been looking at this: http://asm.inightmare.org/opcodelst/index.php?op=CMP)

How does the processor interpret the instructions?

You may want to have a look at my answer to the question, "How does a CPU 'know' what commands and instructions actually mean?". When you write out assembly code by hand, to make an executable, you pass the "human readable" assembly code to an assembler, which turns the instructions into the actual 0's and 1's the processor executes.


What would happen if i have a processor without any of the SSE instructions? (I suppose if in the case we want to do a comparation, we wouldn't be able, right?)

Since your computer is Turing complete, it can execute any arbitrary mathematical function using a software algorithm if it does not have the dedicated hardware to do so. Obviously, doing intense parallel or matrix mathematics in hardware is much faster than in software (requiring many loops of instructions), so this would cause a slow-down for the end user. Depending on how the program was created, it's possible that it may require a particular instruction (i.e. one from the SSE4 set), although given it's possible to do the same thing in software (and thus useable on more processors), this practice is rare.


As an example of the above, you may recall when processors first came out with the MMX instruction set extension. Let's say we want to add two 8-element, signed 8-bit vectors together (so each vector is 64-bits, equal to a single MMX register), or in other words, A + B = C. This could be done with a single MMX instruction called paddsb. For brevity, let's say our vectors are held at memory locations A, B, and C as well. Our equivalent assembly code would be:

movq   MM0, [A]
paddsb MM0, [B]
movq   [C], MM0

However, this operation could also easily be done in software. For example, the following C code performs the equivalent operation (since a char is 8-bits wide):

#define LEN 8
char A[LEN], B[LEN], C[LEN];

/* Code to initialize vectors A and B... */

for (i = 0; i < LEN; i++)
{
    C[i] = A[i] + B[i];
}

You can probably guess how the assembly code of the above loop would look, but it's clear that it would contain significantly more instructions (as we now need a loop to handle adding the vectors), and thus, we would need to perform that many more fetches. This is similar to how the word length of a processor affects a computer's performance (the purpose of MMX/SSEx is to provide both larger registers, as well as the ability to perform the same instruction on multiple pieces of data).


Answering you in the same Order as Questions:

  1. The easiest way would be to go to Intel's Site and download the whitepapers. Event eh Processor's SDK Manual will have all the required details. Here is one such link. Here is another link to the SSE Instruction Set's Mnemonics and Explanations.
  2. What exactly do you mean which of these instructions are being used? Are you looking for information about your processor or a particular application?
    For Processors, I don't know about Windows, but on Linux, you simply read it's processor flags. Easier done through the # lshw command.
    On the other hand, application specific, I'm not really sure, you could always disassemble an executable, and check out the instructions being used. Because most applications are complied for the mass audience, they will use only the Generic x86 Instruction Set. To use the more processor specific instructions, you should the compile the application manually on your system.
  3. You could always run a simulator. If you want to use the Assembly code within your programming projects, you can do it in C and C++. I have only used ASM Code inside C, so don't know if any other language supports it. For help on using in-line ASM, Refer to this SO Question.
  4. That question lies heavily in the field of Computer Architecture. While I could explain it here, it will not be easy. There was another SU question, that dealt with this subject.
  5. To answer your specific question, the SSE Instruction Set came out only in 1999, while the CMP instruction has been around since way before that. It was part of the Instruction Set in 8080 too. In any case, with our Machines being Turing-Complete, event he older Microprocessors could perform comparisons. Only, it was tougher to do them without an explicit instruction. Every Instruction Set is only a faster, easier and more optimized way to carry out certain instructions, it barely adds new functionality, since a Turing-Complete Machine can always compute everything that is computable