How does a hard drive know where the data starts?

Data is not written as an arbitrary stream of ones and zeros. It is written in sectors. Each sector has the payload of user data, and a header. The header contains error correcting codes, as well as a special sync field that identifies the start of the sector, and the sector number so the drive can know when it has found the start of a sector, and which sector it is.


Psusi is correct (the data on the disk is structured, and different parts of the computer use different parts of that structure) but doesn't really get at your question.

The drive doesn't really "know" anything. It has low level electronics that can read markers on the disk (generally written at the factory, or by the drive head itself), read data blocks from the disk, or write data blocks to the disk, or tell if a particular spot on the disk is bad or damaged, or that it should move to a particular location on the disk. That's about all it "knows". The reading head doesn't decide to move someplace else by itself, something higher up in the machine tells it to...


It reads it from the disk.

Data on the disk is not only structured (as @psusi says), but also encoded. The encoding ensures that the recorded data cannot be confused for the position markers in the sector headings, so the circular stream can be read until the target position marker is found.

As I understand it, modern hard drives don't quite do that; they read the entire circle into a buffer, keeping track of where each sector is, and use the buffers to send back requested data.

UPDATE:

The magnetic media is a material which has a magnetic field with two key properties: 1) it never changes on its own, and 2) the recording device can change the orientation of the field at any point on the surface. When reading the media, the sensor detects where the field is oriented toward the sensor and where the field is oriented away from the sensor. As the sensor moves across the surface it detects the timings of these polarity transitions; the first layer of decoding is translating these timings into bit values. Due to physically necessary uncertainties in this process, the encoding must not require long stretches of the same polarity; that is, it must be a Run-length limited coding (RLL).

The particulars of hard drive designs are generally trade secrets, but there are essentially two ways to ensure that sector markers never appear in sector content:

  1. Design an RLL that allows special values which will never result from encoding content data. These special values could be used not only for marking sector boundaries but also for error correction or any other secondary purpose.

  2. Use a second layer of encoding that ensures the marker values only appear at the markers. This is a bit like URL encoding to allow special characters to be "hidden" in URLs, but with an additional constraint equivalent to limiting how many characters can be added, so it ends up more like base64 encoding.

So, the read head moves across the surface detecting magnetic polarity changes, the timings of those changes are used to determine the corresponding sequence of bit values (possibly including some exceptional values that don't represent stored data), and that sequence is used to determine which sectors are being read and the content of those sectors. As the content of sectors is determined, the data may be stored in a solid-state buffer and/or stored in a RAM buffer and/or sent back to fulfill a request.