Need help optimizing a google apps script that labels emails

From the documentation :

method getInboxThreads()

Retrieve all Inbox threads irrespective of labels This call will fail when the size of all threads is too large for the system to handle. Where the thread size is unknown, and potentially very large, please use the 'paged' call, and specify ranges of the threads to retrieve in each call.*

So you should handle a certain number of threads, label the messages and set up a time trigger to run each "page" every 10 minutes or so until all the messages are labelled.


EDIT : I have given this a try , please consider as a draft to start with :

The script will process 100 threads at a time and send you an email to inform you on its progress and show the log.

When it's finished it will warn you with an email as well. It uses scriptProperties to store its state. (don't forget to update the mail adress at the end of the script). I tried it with a time trigger set to 5 minutes and it seems to run smoothly for now...

function inboxLabeller() {

  if(ScriptProperties.getKeys().length==0){ // this is to create keys on the first run
    ScriptProperties.setProperties({'threadStart':0, 'itemsprocessed':0, 'notF':true})
    }
    var items = Number(ScriptProperties.getProperty('itemsprocessed'));// total counter
    var tStart = Number(ScriptProperties.getProperty('threadStart'));// the value to start with
    var notFinished = ScriptProperties.getProperty('notF');// the "main switch" ;-)
    Logger.clear()

  while (notFinished){ // the main loop
    var threads = GmailApp.getInboxThreads(tStart,100);
    Logger.log('Number of threads='+Number(tStart+threads.length));
      if(threads.length==0){
      notFinished=false ;
      break
      }
      for(t=0;t<threads.length;++t){
       var mCount = threads[t].getMessageCount();
       var mSubject = threads[t].getFirstMessageSubject();
       var labels = threads[t].getLabels();
       var labelsNames = '';
         for(var l in labels){labelsNames+=labels[l].getName()}
       Logger.log('subject '+mSubject+' has '+mCount+' msgs with labels '+labelsNames)
         for(var l in labels){
             labels[l].addToThread(threads[t])
      }
      }
        tStart = tStart+100;
        items = items+100
        ScriptProperties.setProperties({'threadStart':tStart, 'itemsprocessed':items})
        break
      }
   if(notFinished){
      GmailApp.sendEmail('mymail', 'inboxLabeller progress report', 'Still working, '+items+' processed \n - see logger below \n \n'+Logger.getLog());
      }else{
      GmailApp.sendEmail('mymail', 'inboxLabeller End report', 'Job completed : '+items+' processed');
      ScriptProperties.setProperties({'threadStart':0, 'itemsprocessed':0, 'notF':true})
      }
}

This will find individual messages that do not have a label and apply the label of the associated thread. It takes much less time because it's not relabeling every single message.

function label_unlabeled_messages() {
  var unlabeled = GmailApp.search("has:nouserlabels -label:inbox -label:sent -label:chats -label:draft -label:spam -label:trash");

  for (var i = 0; i < unlabeled.length; i++) {
    Logger.log("thread: " + i + " " + unlabeled[i].getFirstMessageSubject());
    labels = unlabeled[i].getLabels();
    for (var j = 0; j < labels.length; j++) {
      Logger.log("labels: " + i + " " + labels[j].getName());
      labels[j].addToThread(unlabeled[i]);
    }
  }
}

OK, I've been working on this for a few days because I was really frustrated with the strange way that Gmail labels/doesn't label messages in conversations.

I'm flabbergasted actually that labels aren't automatically applied to new messages in a conversation. This is not reflected at all in the Gmail UI. There's no way to look at a thread and determine that the labels only apply to some messages in the thread, and you cannot add labels to a single message in the UI. As I was working through my script below, I noticed that you can't even programmatically add labels to a single message. So there really is no reason for the current behavior.

With my rant out of the way, I have a few notes about the script.

  1. I sort of combined Saqib's code with Serge's code.
  2. The script has two parts: an initial run that relabels all threads that have a user label attached, and a maintenance run that labels recent emails (currently looks back 4 days). Only one part executes during a single run. Once the initial run is completed, only the maintenance part will run. You can set a trigger to it run once per day, or more or less often, depending on your needs.
  3. The initial run halts after 4 minutes to avoid being terminated by the 5 minute script time limit. It sets a trigger to run again after 4 minutes (both of these times can be changed using constants in the script). The trigger gets deleted at the next run.
    • There is no run-time check in the maintenance section. If you have lots of emails in the last 4 days, the maintenance section might hit the script time limit. I could probably change the script to be more efficient here, but so far it's worked for me so I am not really motivated to improve on it.
  4. There's a try/catch statement in the initial run to try to catch the Gmail "write quota error" and exit gracefully (i.e. writing the current progress so it can be picked up again later), but I don't know if it works because I couldn't get the error to happen.
  5. You'll get an email when the time limit is reached, and when the initial run is finished.
  6. For some reason, the log doesn't always clear fully between runs, even when using the Logger.clear() command. So the status logs that it emails to the user have more than just the most recent run info. I don't know why this occurs.

I have used this to process 20,000 emails in around half an hour (including wait times). I actually ran it twice, so it processed 40,000 emails in one day. I guess the Gmail read/write limit of 10,000 isn't what is being applied here (maybe applying a label to 100 threads at a time counts as a single write event instead of 100?). It gets through about 5,000 threads in a 4 minute run, according to the status email it sends.

Sorry for the long lines. I blame the widescreen monitors. Let me know what you think!

function relabelGmail() {

  var startTime= (new Date()).getTime(); // Time at start of script
  var BATCH=100; // total number of threads to apply label to at once.
  var LOOKBACKDAYS=4; // Days to look back for maintenance section of script. Should be at least 2
  var MAX_RUN_TIME=4*60*1000; // Time in ms for max execution. 4 minutes is a good start.
  var WAIT_TIME=4*60*1000; // Time in ms to wait before starting the script again.
  Logger.clear();



//  ScriptProperties.deleteAllProperties(); return; // Uncomment this line and run once to start over completely

  if(ScriptProperties.getKeys().length==0){ // this is to create keys on the first run
    ScriptProperties.setProperties({'itemsProcessed':0, 'initFinished':false, 'lastrun':'20000101', 'itemsProcessedToday':0, 
                                    'currentLabel':'null-label-NOTREAL', 'currentLabelStart':0, 'autoTrig':0, 'autoTrigID':'0'});
  }

  var itemsP = Number(ScriptProperties.getProperty('itemsProcessed')); // total counter
  var initTemp = ScriptProperties.getProperty('initFinished'); // keeps track of when initial run is finished. 
  var initF = (initTemp.toLowerCase() == 'true'); // Make it boolean

  var lastR = ScriptProperties.getProperty('lastrun'); // String of date corresponding to itemsProcessedToday in format yyyymmdd
  var itemsPT = Number(ScriptProperties.getProperty('itemsProcessedToday')); // daily counter
  var currentL = ScriptProperties.getProperty('currentLabel'); // Label currently being processed
  var currentLS = Number(ScriptProperties.getProperty('currentLabelStart')); // Thread number to start on

  var autoT = Number(ScriptProperties.getProperty('autoTrig')); // Number to say whether the last run made an automatic trigger
  var autoTID = ScriptProperties.getProperty('autoTrigID'); // Unique ID of last written auto trigger

  // First thing: google terminates scripts after 5 minutes. 
  // If 4 minutes have passed, this script will terminate, write some data, 
  // and create a trigger to re-schedule itself to start again in a few minutes. 
  // If an auto trigger was created last run, it is deleted here.
  if (autoT) {
    var allTriggers = ScriptApp.getProjectTriggers();

    // Loop over all triggers. If trigger isn't found, then it must have ben deleted.
    for(var i=0; i < allTriggers.length; i++) {
      if (allTriggers[i].getUniqueId() == autoTID) {
        // Found the trigger and now delete it
        ScriptApp.deleteTrigger(allTriggers[i]);
        break;
      }
    }
    autoT = 0;
    autoTID = '0';
  }

  var today = dateToStr_();
  if (today == lastR) { // If new day, reset daily counter
    // Don't do anything
  } else {
    itemsPT = 0;
  }

  if (!initF) { // Don't do any of this if the initial run has been completed
    var labels = GmailApp.getUserLabels();

    // Find position of last label attempted
    var curLnum=0;
    for ( ; curLnum < labels.length; curLnum++) { 
      if (labels[curLnum].getName() == currentL) {break};
    }
    if (curLnum == labels.length) { // If label isn't found, start over at the beginning
      curLnum = 0;
      currentLS = 0;
      itemsP=0;
      currentL=labels[0].getName();
    }

    // Now start working through the labels until the quota is hit.
    // Use a try/catch to stop execution if your quota has been hit. 
    // Google can actually automatically email you, but we need to clean up a bit before terminating the script so it can properly pick up again tomorrow.
    try {
      for (var i = curLnum; i < labels.length; i++) {
        currentL = labels[i].getName(); // Next label
        Logger.log('label: ' + i + ' ' + currentL);

        var threads = labels[i].getThreads(currentLS,BATCH);

        for (var j = Math.floor(currentLS/BATCH); threads.length > 0; j++) {
          var currTime = (new Date()).getTime();
          if (currTime-startTime > MAX_RUN_TIME) {

            // Make the auto-trigger
            autoT = 1; // So the auto trigger gets deleted next time.

            var autoTrigger = ScriptApp.newTrigger('relabelGmail')
            .timeBased()
            .at(new Date(currTime+WAIT_TIME))
            .create();

            autoTID = autoTrigger.getUniqueId();

            // Now write all the values.
            ScriptProperties.setProperties({'itemsProcessed':itemsP, 'initFinished':initF, 'lastrun':today, 'itemsProcessedToday':itemsPT, 
                                            'currentLabel':currentL, 'currentLabelStart':currentLS, 'autoTrig':autoT, 'autoTrigID':autoTID});

            // Send an email
            var emailAddress = Session.getActiveUser().getEmail();
            GmailApp.sendEmail(emailAddress, 'Relabel job in progress', 'Your Gmail Relabeller has halted to avoid termination due to excess ' +
                               'run time. It will run again in ' + WAIT_TIME/1000/60 + ' minutes.\n\n' + itemsP + ' threads have been processed. ' + itemsPT + 
                               ' have been processed today.\n\nSee the log below for more information:\n\n' + Logger.getLog());
            return;
          } else {
            // keep on going
            var len = threads.length;
            Logger.log( j * BATCH + len);

            labels[i].addToThreads(threads);

            currentLS = currentLS + len;
            itemsP = itemsP + len;
            itemsPT = itemsPT + len;
            threads = labels[i].getThreads( (j+1) * BATCH, BATCH);
          }
        }

        currentLS = 0; // Reset LS counter
      }

      initF = true; // Initial run is done

    } catch (e) { // Clean up and send off a notice. 
      // Write current values back to ScriptProperties
      ScriptProperties.setProperties({'itemsProcessed':itemsP, 'initFinished':initF, 'lastrun':today, 'itemsProcessedToday':itemsPT, 
                                      'currentLabel':currentL, 'currentLabelStart':currentLS, 'autoTrig':autoT, 'autoTrigID':autoTID});

      var emailAddress = Session.getActiveUser().getEmail();
      var errorDate = new Date();
      GmailApp.sendEmail(emailAddress, 'Error "' + e.name + '" in Google Apps Script', 'Your Gmail Relabeller has failed in the following stack:\n\n' + 
                         e.stack + '\nThis may be due to reaching your daily Gmail read/write quota. \nThe error message is: ' + 
                         e.message + '\nThe error occurred at the following date and time: ' + errorDate + '\n\nThus far, ' + 
                         itemsP + ' threads have been processed. ' + itemsPT + ' have been processed today. \nSee the log below for more information:' + 
                         '\n\n' + Logger.getLog());
      return;
    }

    // Write current values back to ScriptProperties. Send completion email.
    ScriptProperties.setProperties({'itemsProcessed':itemsP, 'initFinished':initF, 'lastrun':today, 'itemsProcessedToday':itemsPT, 
                                    'currentLabel':currentL, 'currentLabelStart':currentLS, 'autoTrig':autoT, 'autoTrigNumber':autoTID});

    var emailAddress = Session.getActiveUser().getEmail();
    GmailApp.sendEmail(emailAddress, 'Relabel job completed', 'Your Gmail Relabeller has finished its initial run.\n' + 
                       'If you continue to run the script, it will skip the initial run and instead relabel ' + 
                       'all emails from the previous ' + LOOKBACKDAYS + ' days.\n\n' + itemsP + ' threads were processed. ' + itemsPT + 
                       ' were processed today. \nSee the log below for more information:' + '\n\n' + Logger.getLog());

    return; // Don't run the maintenance section after initial run finish

  } // End initial run section statement


  // Below is the 'maintenance' section that will be run when the initial run is finished. It finds all new threads
  // (as defined by LOOKBACKDAYS) and applies any existing labels to all messages in each thread. Note that this 
  // won't miss older threads that are labeled by the user because all messages in a thread get the label
  // when the label action is first performed. If another message is then sent or received in that thread, 
  // then this maintenance section will find it because it will be deemed a "new" thread at that point. 
  // You may need to search further back the first time you run this if it took more than 3 days to finish
  // the initial run. For general maintenance, though, 4 days should be plenty.

  // Note that I have not implemented a script-run-time check for this section. 

  var threads = GmailApp.search('newer_than:' + LOOKBACKDAYS + 'd', 0, BATCH); // 
  var len = threads.length;

  for (var i=0; len > 0; i++) {

    for (var t = 0; t < len; t++) {
      var labels = threads[t].getLabels();

      for (var l = 0; l < labels.length; l++) { // Add each label to the thread
        labels[l].addToThread(threads[t]);
      }
    }

    itemsP = itemsP + len;
    itemsPT = itemsPT + len;

    threads = GmailApp.search('newer_than:' + LOOKBACKDAYS + 'd', (i+1) * BATCH, BATCH); 
    len = threads.length;
  }
  // Write the property data
  ScriptProperties.setProperties({'itemsProcessed':itemsP, 'initFinished':initF, 'lastrun':today, 'itemsProcessedToday':itemsPT, 
                                  'currentLabel':currentL, 'currentLabelStart':currentLS, 'autoTrig':autoT, 'autoTrigID':autoTID});
}


// Takes a date object and turns it into a string of form yyyymmdd
function dateToStr_(dateObj) { //takes in a date object, but uses current date if not a date

  if (!(dateObj instanceof Date)) {
    dateObj = new Date();
  }

  var dd = dateObj.getDate();
  var mm = dateObj.getMonth()+1; //January is 0!
  var yyyy = dateObj.getFullYear();

  if(dd<10){dd='0'+dd}; 
  if(mm<10){mm='0'+mm};
  dateStr = ''+yyyy+mm+dd;

  return dateStr;

}

Edit: 3/24/2017 I guess I should turn on notifications or something, because I never saw the question from user29020. In case anyone ever has the same question, here's what I do: I run it as a maintenance function by setting a daily trigger to run each night between 1 and 2 AM.

An additional note: It seems that at some point in the last year or so, labeling calls to Gmail have slowed down significantly. It now takes around 0.2 seconds per thread, so I would expect an initial run of 20k emails to take at least 20 runs or so before it makes it all the way through. This also means that if you typically receive more than 100-200 emails a day, the maintenance section might also start to take too long and start to fail. Now that's a lot of emails, but I bet there are some people that receive that many, and it seems much more likely that you would hit that than the 1000 or so daily emails that would have been needed for failure back when I first wrote the script.

Anyway, one mitigation would be to reduce the LOOKBACKDAYS to less than 4, but I wouldn't recommend putting it less than 2.