Where to find body of email depending of mimeType

I think it will make sense if you think of the payload as a part in of itself. Let's say I send a message with just a subject and a plain message text:

From: [email protected]
To: [email protected]
Subject: Example Subject

This is the plain text message

This will result in the following parsed message:

{
 "id": "154ecb53c10b74d8",
 "threadId": "154ecb53c10b74d8",
 "labelIds": [
  "INBOX",
  "SENT"
 ],
 "snippet": "This is the plain text message",
 "historyId": "38877",
 "internalDate": "1464260181000",
 "payload": {
  "partId": "",
  "mimeType": "text/plain",
  "filename": "",
  "headers": [
   ...
  ],
  "body": {
   "size": 31,
   "data": "VGhpcyBpcyB0aGUgcGxhaW4gdGV4dCBtZXNzYWdlCg=="
  }
 },
 "sizeEstimate": 355
}

If I send a message with a plain text part, a html part and an image, it will look like this when parsed:

{
 "id": "154ed5ccaa12f3df",
 "threadId": "154ed5ccaa12f3df",
 "labelIds": [
  "SENT",
  "INBOX",
  "IMPORTANT"
 ],
 "snippet": "This is a plain/html message with an image.",
 "historyId": "841379",
 "internalDate": "1464271162000",
 "payload": {
  "mimeType": "multipart/mixed",
  "filename": "",
  "headers": [
     ...
  ],
  "body": {
   "size": 0
  },
  "parts": [
   {
    "mimeType": "multipart/alternative",
    "filename": "",
    "headers": [
     {
      "name": "Content-Type",
      "value": "multipart/alternative; boundary=089e0122896c7c80d80533bf3205"
     }
    ],
    "body": {
     "size": 0
    },
    "parts": [
     {
      "partId": "0.0",
      "mimeType": "text/plain",
      "filename": "",
      "headers": [
       {
        "name": "Content-Type",
        "value": "text/plain; charset=UTF-8"
       }
      ],
      "body": {
       "size": 47,
       "data": "VGhpcyBpcyBhIHBsYWluL2h0bWwgKm1lc3NhZ2UqIHdpdGggYW4gaW1hZ2UuDQo="
      }
     },
     {
      "partId": "0.1",
      "mimeType": "text/html",
      "filename": "",
      "headers": [
       {
        "name": "Content-Type",
        "value": "text/html; charset=UTF-8"
       }
      ],
      "body": {
       "size": 73,
       "data": "PGRpdiBkaXI9Imx0ciI-VGhpcyBpcyBhIHBsYWluL2h0bWwgPGI-bWVzc2FnZTwvYj4gd2l0aCBhbiBpbWFnZS48L2Rpdj4NCg=="
      }
     }
    ]
   },
   {
    "partId": "1",
    "mimeType": "image/png",
    "filename": "smile.png",
    "headers": [
       ...
    ],
    "body": {
     "attachmentId": "ANGjdJ-OrSy7VAYL-UbRyNtmySbZLlV-fV43zJF0_neNGZ8yKugsZAxb32eSb-CrbYIhF9NvjGwBVEjSkRrUWoCS7aDpgoQnt9WR7f2sa17qVEyOg_JVSbrGrunirvQw2dY-SxxB3Y0JP3aYDHSBXpNO6fFCByVFWQDw1et5Mh9di7bGO4AWOLKFVe_Yb2RmdDwuazGXGb8zA88TTMaiEPIacPTNiVtBrIWG0EKGxHBhep9j8ujyWeCS5P9X80dBHvBNj4T9XjUwcrN6FvwegRewRMM9cBupY7jQESR7915OcbhCNyi5l64x6vVh1ZU",
     "size": 2002
    }
   }
  ]
 },
 "sizeEstimate": 3077
}

You will see it's just the RFC822-message parsed to JSON. If you just traverse the parts, and treat the payload as a part itself, you will find what you are looking for.

var parts = [response.payload];

while (parts.length) {
  var part = parts.shift();
  if (part.parts) {
    parts = parts.concat(part.parts);
  }

  if(part.mimeType === 'text/html') {
    var decodedPart = decodeURIComponent(escape(atob(part.body.data.replace(/\-/g, '+').replace(/\_/g, '/'))));
    console.log(decodedPart);
  }
}

There are many MIME types that can be returned, here are a few:

  • text/plain: the message body only in plain text
  • text/html: the message body only in HTML
  • multipart/alternative: will contain two parts that are alternatives for each othe, for example:
    • a text/plain part for the message body in plain text
    • a text/html part for the message body in html
  • multipart/mixed: will contain many unrelated parts which can be:
    • multipart/alternative as above, or text/plain or text/html as above
    • application/octet-stream, or other application/* for application specific mime types for attachments
    • image/png ot other image/* for images, which could be embedded in the message.

The definitive reference for all this is RFC 2046 https://www.ietf.org/rfc/rfc2046.txt (you might want to also see 2044 and 2045)

To answer your question, build a tree of the message, and look either for:

  • the first text/plain or text/html part (either in the message body or in a multipart/mixed)
  • the first text/plain or text/html inside of a multipart/alternative, which may be part of a multipart mixed.

An example of a complex message:

  • multipart/mixed

    • multipart/alternative
      • text/plain <- message body in plain text
      • text/html <- message body in HTML
    • application/zip <- a zip file attachment
  • -

I know this question is not new but I've wrote a PHP script which correctly parses messages pulled from Gmail API, including any type of attachment.

The script includes a recursive "iterateParts" function which iterates all message parts so we can be sure we extracted all available data from each message.

Script steps are:

  1. Pull all message ids from API
  2. Get some important headers (subject & from address)
  3. Either body is directly on payload or send payload to iterateParts
  4. iterateParts is parsing each message to $msgArr with it's data, base64 encoded
  5. Push $msgArr to master array $allmsgArr
  6. Traverse master array and save each part as file according to it's MIME type and filename

    $maxToPull = 1;
    $gmailQuery = "ALL";

    // Initializing Google API
    $service = new Google_Service_Gmail($client);

    // Pulling all gmail messages into $messages array
    $user = 'me';
    $msglist = $service->users_messages->listUsersMessages($user, ["maxResults"=>$maxToPull, "q"=>$gmailQuery]);
    $messages = $msglist->getMessages();

    // Master array that will hold all parsed messages data, including attachments
    $allmsgArr = array();

    // Traverse each message
    foreach($messages as $message)
    {
        $msgArr = array();
        $single_message = $service->users_messages->get('me', $message->getId());
        $payload = $single_message->getPayload();

        // Nice to have the gmail msg id, can be used to direct access the message in Gmail's web gui
        $msgArr['gmailmsgid'] = $message->getId();

        // Retrieving the subject and "from" email address
        foreach($payload->getheaders() as $oneheader)
        {
            if($oneheader['name'] == 'Subject')
                $msgArr['subject'] = $oneheader['value'];
            if($oneheader['name'] == 'From')
                $msgArr['fromaddress'] = substr($oneheader['value'], strpos($oneheader['value'], '<')+1, -1);
        }

        // If body is directly in the message payload (only for plain text messages where there's no HTML part and no attachments, normally this is not the case)
        if($payload['body']['size'] > 0)
            $msgArr['textplain'] = $payload['body']['data'];     
        // Else, iterate over each message part and continue to dig if necessary
        else
            iterateParts($payload, $message->getId());

        // Push the parsed $msgArr (parsed by iterateParts) to master array
        array_push($allmsgArr, $msgArr);
    }


    // Traverse each parsed message and saving it's content and attachments to files
    foreach($allmsgArr as $onemsgArr)
    {

        $folder = "messages/".$onemsgArr['gmailmsgid'];
        mkdir($folder);

        if($onemsgArr['textplain'])
            file_put_contents($folder."/textplain.txt", decodeData($onemsgArr['textplain']));
        if($onemsgArr['texthtml'])
            file_put_contents($folder."/texthtml.html", decodeData($onemsgArr['texthtml']));
        if($onemsgArr['attachments'])
        {
            foreach($onemsgArr['attachments'] as $oneattachment)
            {
                if(!empty($oneattachment['filename']))
                    $filename = $oneattachment['filename'];
                else if($oneattachment['mimetype'] == "message/rfc822" && empty($oneattachment['filename'])) // email attachments
                    $filename = "noname.eml";
                else
                    $filename = "unknown";
                file_put_contents($folder."/".$filename, decodeData($oneattachment['data']));
            }
        }
    }


    function iterateParts($obj, $msgid) {

        global $msgArr;
        global $service;
        foreach($obj as $parts)
        {
            // if found body data
            if($parts['body']['size'] > 0)
            {
                // plain text representation of message body
                if($parts['mimeType'] == 'text/plain')
                {
                    $msgArr['textplain'] = $parts['body']['data'];
                }
                // html representation of message body
                else if($parts['mimeType'] == 'text/html')
                {
                    $msgArr['texthtml'] = $parts['body']['data'];
                }
                // if it's an attachment
                else if(!empty($parts['body']['attachmentId']))
                {
                    $attachArr['mimetype'] = $parts['mimeType'];
                    $attachArr['filename'] = $parts['filename'];
                    $attachArr['attachmentId'] = $parts['body']['attachmentId'];

                    // the message holds the attachment id, retrieve it's data from users_messages_attachments
                    $attachmentId_base64 = $parts['body']['attachmentId'];
                    $single_attachment = $service->users_messages_attachments->get('me', $msgid, $attachmentId_base64);

                    $attachArr['data'] = $single_attachment->getData();

                    $msgArr['attachments'][] = $attachArr;
                }       
            }

            // if there are other parts inside, go get them
            if(!empty($parts['parts']) && !empty($parts['mimeType']) && empty($parts['body']['attachmentId']))
            {
                iterateParts($parts->getParts(), $msgid);
            }

        }
    }

    // All data returned from API is base64 encoded
    function decodeData($data)
    {
        $sanitizedData = strtr($data,'-_', '+/');
        return base64_decode($sanitizedData);
    }

This is how $allmsgArr will look like (where only one message was pulled):


Array
(
    [0] => Array
        (
            [gmailmsgid] => 25k1asfa556x2da
            [fromaddress] => [email protected]
            [subject] => Fwd: Sea gulls picture
            [textplain] => UE5SIDQxQzAwMg0KDQpBUkJFTFRFU1QxDQoNCg0K
            [texthtml] => PGRpdiBkaXI9Imx0ciI-PHNwYW4gc3R5bGU9ImZi
            [attachments] => Array
                (
                    [0] => Array
                        (
                            [mimetype] => image/png
                            [filename] => sea_gulls.png
                            [attachmentId] => ANGjdJ9tmy4d8vPXhU_BjNEFEaDODOpu29W2u5OTM7a0
                            [data] => iVBORw0KGgoAAAANSUhEUgAABSYAAAKWCAYAAABUP
                        )

                    [1] => Array
                        (
                            [mimetype] => image/jpeg
                            [filename] => Outlook_Signature.jpg
                            [attachmentId] => ANGjdJ-CgZTK0oK44Q8j7TlN_JlaexxGKZ_wHFfoEB
                            [data] => 6jRXhpZgAATU0AKgAAAAgABwESAAMAAAABAAEAAAEa
                        )

                )
        )
)

Tags:

Gmail Api