How to export Speak output?

This is probably not the best idea, but there is a post by Szabolcs that describes some undocumented features of record sound. You can string these together to play the sound, record it back, then save to a file. First, here is a shortened version of your rep function, followed by the "start record", then the call to rep.

rep[extra___String] := Speak["Go! " <> extra <> " interval"]; 
FrontEndExecute[FrontEnd`RecordSound[1, 0, 0]]; (* start recording *)
rep["twenty"]

After the sound has stopped, you can end recording and save to the variable snd:

FrontEndExecute[FrontEnd`RecordSound[2]];       (* end recording *)
snd = FrontEndExecute[FrontEnd`RecordSound[3]]  (* save to sound format *)

Then export to an mp3 file

Export["test.mp3", snd]

If you wish to remove the room ambiance, don't use the computer's speaker and microphone: instead, connect to an I/O that let's you connect a wire from line out to line in. Then you'll get a much better recording.

Note the warning at the end of Szabolcs message: "use at your own risk."


Update:

These are some interesting infos since my previous post:

  1. Contrary to what I thought and wrote in the first version of this answer, Speak can handle pauses inside a text and many other features to fine tune the speech (see this recent post for more infos).

  2. The commands to fine tune the speech (like pause, speed of the speech, ...) depend on your OS because Mathematica actually sends the text to your OS specific speech engine. It is probably why it is not documented in Mathematica (but it could be because there are some simple options which are in common).

  3. On OSX, I found the command to run directly the speech engine : this is the say command. It is particularly interesting as you can also choose the voice, ... and it can directly save the speech in a file and into different formats (mp4, wav, ... but however no mp3). Mathematica can run this external program to get exactly what the OP wanted (see below).

  4. On Windows OS, most proably you can do the same thing as there seems to be a similar text to speech command line system program or at least there are alternative programs (see this SO post), but I couldn't test.

  5. The text to speech internet API (http://tts-api.com/) that I used in my first answer and which i found in an example inside Mathematica documentation, seems clearly to handle only OSX type control commands to fine tune the speech ... Also, I noticed that it produces good quality and small size mp3 files.

Application to the OP`s problem

As previously :

rep[extra___String] := {speak["Go! " <> extra <> " interval"],
  pause[15], speak["Five"], pause[5], speak["Stop"], pause[5], 
  speak["Five"], pause[5]}

then

gym = {rep["first"], rep["second"]};

gymString = 
 gym /. {speak -> Sequence, 
     pause[t_] :> " [[slnc " <> ToString[t*1000] <> "]] "} // 
   Flatten // StringJoin @@ # &

To Play :

URLExecute["http://tts-api.com/tts.mp3", {"q" -> gymString}]

To Save directly into a mp3 :

Directory[]
URLSave["http://tts-api.com/tts.mp3", "gym.mp3", "Parameters" -> {"q" -> gymString}]

For OS X only ************** :

To Play :

Speak@gymString

Run["say -v Zarvox "<>gymString]

To Save :

Run["say -o gym.mp4 -v Zarvox "<>gymString]


Previous

Actually, the documentation almost tells you how to do it ;)

I just came across an interesting example in V10 documentation which shows how to play directly in the notebook a mp3 synthesized speech retrieved from an internet free API service.

Let's try that :

mysound = 
 URLExecute["http://tts-api.com/tts.mp3", {"q" -> "I like gym"}]

enter image description here

Then, you can check that it can be exported as an mp3 file like this Export["filename.mp3",mysound], but to prevent the import then export you could also directly write :

URLSave["http://tts-api.com/tts.mp3", "gymDirect.mp3", "Parameters" -> {"q" -> "I like gym"}]

By the way, you'll notice that Mathematica has strangely the same "voice" (at least on my Mac OSX)

Speak["I like gym"]

(it looks that on Mac, Mma uses a OSX local library (speech synthesis framework) to perform that.

Unfortunately, the internet synthesizer does not understand "pauses" (like Speak) otherwise you could have just sent a request with all your gym text.

This is not really a problem, to include the pauses we'll have to tweak a little bit.

Gym example

(Here I take your original code and just change Speak->speak, Pause->pause, so you can use it as before to create your sequences)

rep[extra___String] := {speak["Go! " <> extra <> " interval"],
  pause[15], speak["Five"], pause[5], speak["Stop"], pause[5], 
  speak["Five"], pause[5]};

Let's create a gym sequence example (shorter than yours for the purpose of the test) :

gym = {rep["first"], rep["second"]};

then some processing :

gymText = 
  Flatten[gym] /. speak[x_String] :> speak /@ StringSplit[x] // Flatten;

gymWord2Sound = 
  Rule[#, URLExecute["http://tts-api.com/tts.mp3", {"q" -> #}] // 
      First] & /@ Union[Cases[gymText, speak[str_] -> str]];

gymSound = 
 Sound[gymText /. {speak[str_] -> str, pause[t_] -> SoundNote[None, t]} /. gymWord2Sound]

enter image description here

Then to export the whole sequence (including the pauses) as a mp3 :

Directory[]
Export["gym.mp3", gymSound]

(You can also add mp3 specific export options in order to set the compression of the file )

Good luck !