Mathematical difference between white and black notes in a piano

The first thing you have to understand is that notes are not uniquely defined. Everything depends on what tuning you use. I'll assume we're talking about equal temperament here. In equal temperament, a half-step is the same as a frequency ratio of $\sqrt[12]{2}$; that way, twelve half-steps makes up an octave. Why twelve?

At the end of the day, what we want out of our musical frequencies are nice ratios of small lintegers. For example, a perfect fifth is supposed to correspond to a frequency ratio of $3 : 2$, or $1.5 : 1$, but in equal temperament it doesn't; instead, it corresponds to a ratio of $2^{ \frac{7}{12} } : 1 \approx 1.498 : 1$. As you can see, this is not a fifth; however, it is quite close.

Similarly, a perfect fourth is supposed to correspond to a frequency ratio of $4 : 3$, or $1.333... : 1$, but in equal temperament it corresponds to a ratio of $2^{ \frac{5}{12} } : 1 \approx 1.335 : 1$. Again, this is not a perfect fourth, but is quite close.

And so on. What's going on here is a massively convenient mathematical coincidence: several of the powers of $\sqrt[12]{2}$ happen to be good approximations to ratios of small integers, and there are enough of these to play Western music.

Here's how this coincidence works. You get the white keys from $C$ using (part of) the circle of fifths. Start with $C$ and go up a fifth to get $G$, then $D$, then $A$, then $E$, then $B$. Then go down a fifth to get $F$. These are the "neighbors" of $C$ in the circle of fifths. You get the black keys from here using the rest of the circle of fifths. After you've gone up a "perfect" perfect fifth twelve times, you get a frequency ratio of $3^{12} : 2^{12} \approx 129.7 : 1$. This happens to be rather close to $2^7 : 1$, or seven octaves! And if we replace $3 : 2$ by $2^{ \frac{7}{12} } : 1$, then we get exactly seven octaves. In other words, the reason you can afford to identify these intervals is because $3^{12}$ happens to be rather close to $2^{19}$. Said another way,

$$\log_2 3 \approx \frac{19}{12}$$

happens to be a good rational approximation, and this is the main basis of equal temperament. (The other main coincidence here is that $\log_2 \frac{5}{4} \approx \frac{4}{12}$; this is what allows us to squeeze major thirds into equal temperament as well.)

It is a fundamental fact of mathematics that $\log_2 3$ is irrational, so it is impossible for any kind of equal temperament to have "perfect" perfect fifths regardless of how many notes you use. However, you can write down good rational approximations by looking at the continued fraction of $\log_2 3$ and writing down convergents, and these will correspond to equal-tempered scales with more notes.

Of course, you can use other types of temperament, such as well temperament; if you stick to $12$ notes (which not everybody does!), you will be forced to make some intervals sound better and some intervals sound worse. In particular, if you don't use equal temperament then different keys sound different. This is a major reason many Western composers composed in different keys; during their time, this actually made a difference. As a result when you're playing certain sufficiently old pieces you aren't actually playing them as they were intended to be heard - you're using the wrong tuning.


Edit: I suppose it is also good to say something about why we care about frequency ratios which are ratios of small integers. This has to do with the physics of sound, and I'm not particularly knowledgeable here, but this is my understanding of the situation.

You probably know that sound is a wave. More precisely, sound is a longitudinal wave carried by air molecules. You might think that there is a simple equation for the sound created by a single note, perhaps $\sin 2\pi f t$ if the corresponding tone has frequency $f$. Actually this only occurs for tones which are produced electronically; any tone you produce in nature carries with it overtones and has a Fourier series

$$\sum \left( a_n \sin 2 \pi n f t + b_n \cos 2 \pi n f t \right)$$

where the coefficients $a_n, b_n$ determine the timbre of the sound; this is why different instruments sound different even when they play the same notes, and has to do with the physics of vibration, which I don't understand too well. So any tone which you hear at frequency $f$ almost certainly also has components at frequency $2f, 3f, 4f, ...$.

If you play two notes of frequencies $f, f'$ together, then the resulting sound corresponds to what you get when you add their Fourier series. Now it's not hard to see that if $\frac{f}{f'}$ is a ratio of small integers, then many (but not all) of the overtones will match in frequency with each other; the result sounds a more complex note with certain overtones. Otherwise, you get dissonance as you hear both types of overtones simultaneously and their frequencies will be similar, but not similar enough.


Edit: You should probably check out David Benson's "Music: A Mathematical Offering", the book Rahul Narain recommended in the comments for the full story. There was a lot I didn't know, and I'm only in the introduction!


The first answer is great, so I'll try to approach the question from another angle.

First, there are several different scales, and different cultures use different ones. It depends on the mathematics of the instruments as much as on cultural factors. Our scale has a very long history that can be traced to the ancient Greeks and Pythagoras in particular. They noticed (by hearing) that stringed instruments could produce different notes by adjusting the length of the string, and that some combinations sounded better.

The Greeks had a lot of interest in mathemathics, and it seemed "right" for them to search for "perfect" combinations—perfect meaning that they should be expressed in terms of fractions of small integer numbers. They noticed that if you double or halve the string length, you get the same note (the concept of an octave); other fractions, such as $2/3$, $3/4$, also produced "harmonic" combinations. That's also the reason why some combinations sound better, as it can be explained by physics. When you combine several sine waves, you hear several different notes that are the result of the interference between the original waves. Some combinations sound better while others produce what we call "dissonance".

So, in theory, you can start from an arbitrary frequency (or note) and build a scale of "harmonic" notes using these ratios (I'm using quotes because the term harmonic has a very specific meaning in music, and I'm talking in broad and imprecise terms). The major and minor scales of Western music can be approximately derived from this scheme. Both scales (major and minor) have $7$ notes. The white keys in the piano correspond to the major scale, starting from the C note.

Now, if you get the C note and use the "perfect" fractions, you'll get the "true" C major scale. And that's where the fun begins.

If you take any note in the C major scale, you can treat that note as the start of another scale. Take for instance the fifth of C (it's the G), and build a new major scale, now starting from G instead of C. You'll get another seven notes. Some of them are also on the scale of C; others are very close, but not exactly equal; and some fall in the middle of the notes in the scale of C.

If you repeat this exercise with all notes, you'll end up building $12$ different scales. The problem is that the interval is not regular, and there are some imprecisions. You need to retune the instrument if you want to have the perfect scale.

The concept of "chromatic" scale (with $12$ notes, equally spaced) was invented to solve this "problem". The chromatic scale is a mathematical approximation, that is close enough for MOST people (but not all). People with "perfect" ear can listen the imperfections. In the chromatic scale, notes are evenly spaced using the twelfth root of two. It's a geometric progression, that matches with good precision all possible major and minor scales. The invention of the chromatic scale allows players to play music in arbitrary scales without retuning the instrument—you only need to adjust the scale by "offsetting" a fixed number of positions, or semitones, from the base one of the original scale.

All in all, that's just convention, and a bit of luck. The white keys are an "historical accident", being the keys of the major scale of C. The other ones are needed to allow for transposition. Also bear in mind that (1) the keys need to have a minimum width to allow for a single finger, and (2) if you didn't have the black keys, the octave would be too wide for "normal" hands to play. So the scheme with a few intermediate keys is needed anyway, and the chromatic scale that we use is at least as good (or better) as any other possible scale.


The answers given are pretty good from a musical, mathematical, and socialogical / historical reason. But they miss the fundamental reason why there are $12$ notes in a western scale (or $5$ notes in an eastern pentatonic, etc.), and why it's those particular $12$ notes (or $5$).

Qiaochu almost nailed it by pointing out that we like notes which are simple integer ratios. But why? The fundamental reason stems from the physics of common early instruments -- flutes (including the human voice) and plucked strings -- and from the physics of the tympanum in the ear.

As Qiaochu noted, sound is not composed of a single sine wave frequency but rather a sum of many sine waves. The "note" we hear is the frequency of the primary (loudest) wave coming from these instruments. But frequencies exist in that wave as well, albeit largely masked by the primary. These are known informally as harmonics or overtones.

The first several harmonics of flutes and plucked strings are similar and very straightforward: If the primary is normalized to frequency $1$, then the second loudest harmonic is typically $1/2$ (an octave above), the third is usually $1/3$ (an octave and a fifth above), the fourth is usually $1/4$ (two octaves), the fifth is usually $1/5$ (two octaves and a major third), and the sixth is usually $1/6$ (two octaves and a fifth). If the primary note is C1, these translate roughly into C2, G3, C4, E4, and G4. If the harmonics continued in this way -- and they don't always -- various other notes appear.

This matters because if you want to play TWO instruments together, you'd like their harmonics to coincide even if they're playing different notes. Otherwise the excess of harmonics sounds bad to the ear. In the worst case, very close but not entirely overlapping harmonics create "beats" -- seeming alternating loud and soft periods of time -- which are irritating to listen to and tough on the ear.

To get harmonics to coincide in multiple instruments or even successive notes, you have to pick notes for them to play where their harmonics have a strong overlap. For example, this is also why the major fourth is useful even though it doesn't often appear early. It's because if one instrument is playing C, if the other instrument is playing major fourth but lower by an octave, they'll overlap nicely.

I believe these note selections (guaranteeing harmonics in harmony, so to speak) influenced the evolution of scale choices -- especially the pentatonic, that is, the black notes), and the division of the octave into $12$ pieces.

One early instrument which is totally out of whack from this is the bell. Bells and gongs can be tuned to have a variety of harmonics, but the most common ones -- foundry bells -- have a very loud, unusual third harmonic: minor third or E flat. It is so loud and incongruous that they sound terrible, even disturbing, when played along with strings, flutes, voices, etc. In fact, entire musical pieces have to be written specially for carillons (large multibell instruments) in order to guarantee proper overlap of harmonics. Generally this means that the entire piece has to be written in fully diminished chords. Major chords sound among the worst because of the clash between the major third in the chord and the minor third coming from the root's loud third harmonic.

Tags:

Music Theory