How does a polymorphic virus identify whether it's already infected a file?

All viruses are different, and operate in various ways. There is no one blanket solution in which a virus detects whether it has infected a file or device yet. If this were the case, it would make the job of the anti-virus quite easy wouldn't it? So as such, there are several different solutions to your question.

A quick note on terminology: Malware is defined as any type of malicious software. A virus is defined as a type of malware which replicates itself by modifying other computer programs and inserting its own code (as per Wikipedia). So a virus is a subclass of malware. There are also several different types of viruses: file resident viruses, memory resident viruses, boot sector viruses, and macro viruses.

Since you asked specifically about viruses that infect files, we will only look at file resident viruses.

File Resident Viruses

These will be looking to install itself into executable files, or programs. This is because other files, like text documents or photos, aren't interpreted as code by the machine, and the virus payload needs to be interpreted as instructions. When a virus infects an executable and writes to the binary, it will either overwrite the preexisting code, tack its code on the end, place its code at the beginning, or install its code into 'cavities' (empty spaces in the binary that are filled with NULL characters, or NOOPs).

Once a file is infected, the virus needs to somehow later recognize it. Some viruses are coded to be dumb, and will just reinfect the file. This could mean either the virus code is overwritten by the same virus code (or a slightly altered version) leading to no real change, or the file just keeps getting bigger because the virus keeps tacking onto the existing executable. The smarter versions of viruses will implement methods to recognize their spawn.

One way a virus can tell if an executable has been infected is via tagging. When writing it's code into a binary, the virus will place a value in a specific spot. Every time a virus attempts to infect an executable, it checks that spot for the value. If it's there, it doesn't infect.
A simple example: a virus might modify the first byte of a program to read 0xFF. If it sees 0xFF as the first byte of a file (even if that file isn't infected and just happens to have it's first byte as 0xFF) it won't infect it.
A more complex example: The virus uses an algorithm similar to how a hash works. It takes in the file name and generates a byte value and placement for that value, then writes that byte to that placement. Since it uses a hashing style algorithm, the same byte and placement will be generated given the same file name, so it wouldn't change. This makes it harder to identify an infection.

Another interesting idea would be that the virus creates a special data file hidden on the machine, which lists the executables it has infected. The file location could be static or generated. If the file isn't found, the virus would assume the system hasn't been infected at all yet.

Really any system you can think of could be a possible way that a piece of malware identifies it's infected spawn. There's no rules revolving around this sort of thing, all that matters is if it works. Some methods will end up being better than others though, depending on what the attacker is trying to achieve.

Polymorphic vs Metamorphic viruses

I won't go into lengthy details on the difference between the two, but it's important to know that a Metamorphic virus rewrites it's own code to avoid detection, where as a Polymorphic virus encrypts or encodes itself to make it appear different. In other words, with Polymorphic viruses, not all of the code changes. Some of the code (namely the decryption algorithm) stays the same with each iteration. In this sense, the fingerprint only partially changes, and the virus could search for the static portions of it's code that don't change, to recognize an infected executable.

Also, a Polymorphic virus only has a finite number of permutations. If this number is reasonably small, it could store the keys to all other possible permutations inside of the encrypted portion of its code. That way, it can decrypt all infections and recognize itself. This probably isn't the best option, since if one virus is decrypted by researchers, all permutations will be uncovered, but it's definitely a solution.

As for Metamorphic Viruses, it's even more complex, but they would be stuck using some sort of tagging system like previously defined. In these cases, it might even turn out that two different metamorphic strains of the same virus don't recognize each other, and install themselves over the other, leading to a war between the two strains (lol). But this could be said for any virus which holds a bug in the identification portion of it's code.

Tags:

Virus