Unwanted frequencies in sawtooth tone

I can't be certain that this is generating your peaks, but any tone that starts and stops won't be 100% pure; a pure tone has no beginning and no end. Consider a tone that starts at $0$ at $t=0$, vibrates for time $\tau$, and then turns off. As an equation, that looks like this: $$y(t) = \sin(2\pi f t)\, \Theta\left(\tau-t\right)\, \Theta(t),$$ where $\Theta(x)$ is the Heaviside (unit step) function. If we integrate that against $\mathrm{e}^{i\omega t} (2\pi)^{-1/2}$, we get: \begin{align} \tilde{y}(\omega) & \equiv \int_{-\infty}^\infty \sin(2\pi f t)\, \Theta\left(\tau-t\right)\, \Theta(t) \frac{\mathrm{e}^{i\omega t}}{\sqrt{2\pi}} \operatorname{d}t \\ &= \int_0^\tau \left(\frac{\mathrm{e}^{i2\pi f t}-\mathrm{e}^{-i2\pi f t}}{2i}\right)\frac{\mathrm{e}^{i\omega t}}{\sqrt{2\pi}} \operatorname{d}t \\ & = \frac{1}{2i\sqrt{2\pi}}\left[\frac{\mathrm{e}^{i2\pi f t + i\omega t}}{i(2\pi f+\omega)} - \frac{\mathrm{e}^{-i2\pi f t + i\omega t}}{i(-2\pi f+\omega)}\right]_{t=0}^\tau \\ & = -\frac{1}{2\sqrt{2\pi}} \left[\frac{\mathrm{e}^{i[2\pi f + \omega ]\tau}}{(2\pi f+\omega)} - \frac{\mathrm{e}^{i[-2\pi f + \omega ]\tau}}{(-2\pi f+\omega)} - \frac{1}{(2\pi f+\omega)} + \frac{1}{(-2\pi f+\omega)}\right] \end{align} If $ f\tau=N$ is an integer, then we can simplify the above to: $$\tilde{y}(\omega) = \frac{\sqrt{2\pi}f}{\omega^2-(2\pi f)^2} \left(\mathrm{e}^{i\omega\tau} - 1\right).$$

Taking the absolute square to get something proportional to the power yields: $$P \propto \tilde{y}^* \tilde{y}= \frac{(2\pi f)^2}{\pi (\omega^2 - (2\pi f)^2)^2} (1 - \cos(\tau\omega)).$$

To see if this is the dominant contributor to your unwanted harmonics, try producing a tone that lasts twice as long, and one that lasts half as long, and see how that affects their locations. You should also expect unwanted harmonics from sampling and digitization, though I don't know how to describe where they'll pop up.

Edit: I didn't notice you were talking about a sawtooth wave. As @EmilioPisanty noted, saw tooths are not sine waves. The saw tooth is responsible for the dominant harmonics on the right side of the graph. Drop those (to get a pure sine wave), and you get something that is plausibly of a squared Lorentzian form. Also, the unwanted harmonics don't start around $5\operatorname{Hz}$, notice the edge of a purple lobe off the end of the graph. I'd bet the first lobe is near $0 \operatorname{Hz}$, exactly as you'd expect if you didn't have an integer number of wavelengths in your wave-form. $0\operatorname{Hz}$ represents a net constant offset to the signal, and that sort of unbalancing is what happens when you don't spend an equal time above and below equilibrium.

Edit: Sparked by @WetSavannaAnimalakaRodVance's comment, I decided to code up my own tone generator. The sawtooth wave I generated using Golang, writing out to text, and importing to Audancity produces the same spectrum. Just like the Audacity generated spectrum, the $5\operatorname{Hz}$ peak vanishes when the "Size" parameter is reduced to 16384, or lower. What's going on here, I think, is there is additional windowing imposed by how the spectrum is generated:

Plot Spectrum take the audio in blocks of 'Size' samples, does the FFT, and averages all the blocks together.

The windowing from generating the spectrum seems primarily to affect a sort of noise floor, though, so this just disguises the peak. I don't know the details of what's going on, but there's a hint in comparing the square wave with "square wave, no aliasing". Judging by the zoomed in version, where ringing is apparent, the "no aliasing" wave is generated using a sum of sine waves, as opposed to the simple mathematical algorithm with sharp cutoffs.

Point being, this is probably a case of aliasing: square and sawtooth waves contain frequency information that is higher than the sampling rate can faithfully represent, producing the audio equivalent of a Moire pattern (i.e. a low frequency tone/beat frequency).

Your sampling rate is 48k at 55Hz, so each period is 872.73 samples. The size of your FFT is 65536. It fits 75.093 period of the signals. The algorithm takes 75 periods to plot the chart. This leaves 0.093 periods between consequitive FFT transforms. 0.093 periods at 55Hz correaponds to a frequency of 5.1Hz that matches the ghost frequency that you see within the margin of error.

This frequency or its harmonics are not present in the sound, but are a mathematical error created by the size of the FFT transform (2^16) not relating by a whole number to the number of samples in the period of the signal.

Furthermore, by the same logic, the 5.1Hz ghost creates a secondary artifact at 0.04Hz that you also see. Specifically, 5.1hz is 9,350.6 samples per period at 48k. The 65536 FFT size fits 7.0087 periods. 7 periods are displayed at about 5Hz and the 0.0087 period error at 5Hz creates a 0.04Hz ghost that you see on the left. At about -80dB, the values are small and are affected by the rounding and other errors, by the precision of the computer clock, and other factors, so the actual values that you see may slightly vary.

The calculation shows that the artifacts should diminish, if you reduce the generated frequency from 55Hz to approximately 54.93Hz. Alternatively you can increase the sampling frequency to 48.06k. However, you cannot change the FFT size just slightly, because it must be a power of 2.