Fast way to transform datetime strings with timezones into UNIX timestamps in C++

There are some things you can do to optimize your use of Howard Hinnant's date library:

auto tbase = make_zoned("UTC", local_days{January/1/1970});

The lookup of a timezone (even "UTC") involves doing a binary search of the database for a timezone with that name. It is quicker to do a lookup once, and reuse the result:

// outside of loop:
auto utc_tz = locate_zone("UTC");

// inside of loop:
auto tbase = make_zoned(utc_tz, local_days{January/1/1970});

Moreover, I note that tbase is loop-independent, so the whole thing could be moved outside of the loop:

// outside of loop:
auto tbase = make_zoned("UTC", local_days{January/1/1970});

Here's a further minor optimization to be made. Change:

auto dp = tcurr.get_sys_time() - tbase.get_sys_time() + 0s;

To:

auto dp = tcurr.get_sys_time().time_since_epoch();

This gets rid of the need for tbase altogether. tcurr.get_sys_time().time_since_epoch() is the duration of time since 1970-01-01 00:00:00 UTC, in seconds. The precision of seconds is just for this example, since the input has seconds precision.

Style nit: Try to avoid putting conversion factors in your code. This means changing:

auto tcurr = make_zoned(tz, local_days{ymd} + 
        seconds{time_str.tm_hour*3600 + time_str.tm_min*60 + time_str.tm_sec}, choose::earliest);

to:

auto tcurr = make_zoned(tz, local_days{ymd} + hours{time_str.tm_hour} + 
                        minutes{time_str.tm_min} + seconds{time_str.tm_sec},
                        choose::earliest);

Is there a way to avoid this binary search if this time zone is also fixed. I mean can we get the time zone offset and DST offset and manually adjust the time point.

If you are not on Windows, try compiling with -DUSE_OS_TZDB=1. This uses a compiled-form of the database which can have higher performance.

There is a way to get the offset and apply it manually (https://howardhinnant.github.io/date/tz.html#local_info), however unless you know that your offset doesn't change with the value of the time_point, you're going to end up reinventing the logic under the hood of make_zoned.

But if you are confident that your UTC offset is constant, here's how you can do it:

auto tz = current_zone();
// Use a sample time_point to get the utc_offset:
auto info = tz->get_info(
    local_days{year{time_str.tm_year+1900}/(time_str.tm_mon+1)/time_str.tm_mday}
      + hours{time_str.tm_hour} + minutes{time_str.tm_min}
      + seconds{time_str.tm_sec});
seconds utc_offset = info.first.offset;
for( int i=0; i<RUNS; i++){

    genrandomdate(&time_str);
    // Apply the offset manually:
    auto ymd = year{time_str.tm_year+1900}/(time_str.tm_mon+1)/time_str.tm_mday;
    auto tp = sys_days{ymd} + hours{time_str.tm_hour} +
              minutes{time_str.tm_min} + seconds{time_str.tm_sec} - utc_offset;
    auto dp = tp.time_since_epoch();
}

Update -- My own timing tests

I'm running macOS 10.14.4 with Xcode 10.2.1. I've created a relatively quiet machine: Time machine backup is not running. Mail is not running. iTunes is not running.

I have the following application which implements the desire conversion using several different techniques, depending upon preprocessor settings:

#include "date/tz.h"
#include <cassert>
#include <iostream>
#include <vector>

constexpr int RUNS = 1'000'000;
using namespace date;
using namespace std;
using namespace std::chrono;

vector<tm>
gendata()
{
    vector<tm> v;
    v.reserve(RUNS);
    auto tz = current_zone();
    auto tp = floor<seconds>(system_clock::now());
    for (auto i = 0; i < RUNS; ++i, tp += 1s)
    {
        zoned_seconds zt{tz, tp};
        auto lt = zt.get_local_time();
        auto d = floor<days>(lt);
        year_month_day ymd{d};
        auto s = lt - d;
        auto h = floor<hours>(s);
        s -= h;
        auto m = floor<minutes>(s);
        s -= m;
        tm x{};
        x.tm_year = int{ymd.year()} - 1900;
        x.tm_mon = unsigned{ymd.month()} - 1;
        x.tm_mday = unsigned{ymd.day()};
        x.tm_hour = h.count();
        x.tm_min = m.count();
        x.tm_sec = s.count();
        x.tm_isdst = -1;
        v.push_back(x);
    }
    return v;
}


int
main()
{

    auto v = gendata();
    vector<time_t> vr;
    vr.reserve(v.size());
    auto tz = current_zone();  // Using date
    sys_seconds begin;         // Using date, optimized
    sys_seconds end;           // Using date, optimized
    seconds offset{};          // Using date, optimized

    auto t0 = steady_clock::now();
    for(auto const& time_str : v)
    {
#if 0  // Using mktime
        auto t = mktime(const_cast<tm*>(&time_str));
        vr.push_back(t);
#elif 1  // Using date, easy
        auto ymd = year{time_str.tm_year+1900}/(time_str.tm_mon+1)/time_str.tm_mday;
        auto tp = local_days{ymd} + hours{time_str.tm_hour} +
                  minutes{time_str.tm_min} + seconds{time_str.tm_sec};
        zoned_seconds zt{tz, tp};
        vr.push_back(zt.get_sys_time().time_since_epoch().count());
#elif 0  // Using date, optimized
        auto ymd = year{time_str.tm_year+1900}/(time_str.tm_mon+1)/time_str.tm_mday;
        auto tp = local_days{ymd} + hours{time_str.tm_hour} +
                  minutes{time_str.tm_min} + seconds{time_str.tm_sec};
        sys_seconds zt{(tp - offset).time_since_epoch()};
        if (!(begin <= zt && zt < end))
        {
            auto info = tz->get_info(tp);
            offset = info.first.offset;
            begin = info.first.begin;
            end = info.first.end;
            zt = sys_seconds{(tp - offset).time_since_epoch()};
        }
        vr.push_back(zt.time_since_epoch().count());
#endif
    }
    auto t1 = steady_clock::now();

    cout << (t1-t0)/v.size() << " per conversion\n";
    auto i = vr.begin();
    for(auto const& time_str : v)
    {
        auto t = mktime(const_cast<tm*>(&time_str));
        assert(t == *i);
        ++i;
    }
}

Each solution is timed, and then checked for correctness against a baseline solution. Each solution converts 1,000,000 timestamps, all relatively close together temporally, and outputs the average time per conversion.

I present four solutions, and their timings in my environment:

1. Use mktime.

Output:

3849ns per conversion

2. Use tz.h in the easiest way with USE_OS_TZDB=0

Output:

3976ns per conversion

This is slightly slower than the mktime solution.

3. Use tz.h in the easiest way with USE_OS_TZDB=1

Output:

55ns per conversion

This is much faster than the above two solutions. However this solution is not available on Windows (at this time), and on macOS does not support the leap seconds part of the library (not used in this test). Both of these limitations are caused by how the OS ships their time zone databases.

4. Use tz.h in an optimized way, taking advantage of the a-priori knowledge of temporally grouped time stamps. If the assumption is false, performance suffers, but correctness is not compromised.

Output:

15ns per conversion

This result is roughly independent of the USE_OS_TZDB setting. But the performance relies on the fact that the input data does not change UTC offsets very often. This solution is also careless with local time points that are ambiguous or non-existent. Such local time points don't have a unique mapping to UTC. Solutions 2 and 3 throw exceptions if such local time points are encountered.

Run time error with USE_OS_TZDB

The OP got this stack dump when running on Ubuntu. This crash happens on first access to the time zone database. The crash is caused by empty stub functions provided by the OS for the pthread library. The fix is to explicitly link to the pthreads library (include -lpthread on the command line).

==20645== Process terminating with default action of signal 6 (SIGABRT)
==20645==    at 0x5413428: raise (raise.c:54)
==20645==    by 0x5415029: abort (abort.c:89)
==20645==    by 0x4EC68F6: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20645==    by 0x4ECCA45: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20645==    by 0x4ECCA80: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20645==    by 0x4ECCCB3: __cxa_throw (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20645==    by 0x4EC89B8: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20645==    by 0x406AF9: void std::call_once<date::time_zone::init() const::{lambda()#1}>(std::once_flag&, date::time_zone::init() const::{lambda()#1}&&) (mutex:698)
==20645==    by 0x40486C: date::time_zone::init() const (tz.cpp:2114)
==20645==    by 0x404C70: date::time_zone::get_info_impl(std::chrono::time_point<date::local_t, std::chrono::duration<long, std::ratio<1l, 1l> > >) const (tz.cpp:2149)
==20645==    by 0x418E5C: date::local_info date::time_zone::get_info<std::chrono::duration<long, std::ratio<1l, 1l> > >(std::chrono::time_point<date::local_t, std::chrono::duration<long, std::ratio<1l, 1l> > >) const (tz.h:904)
==20645==    by 0x418CB2: std::chrono::time_point<std::chrono::_V2::system_clock, std::common_type<std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1l> > >::type> date::time_zone::to_sys_impl<std::chrono::duration<long, std::ratio<1l, 1l> > >(std::chrono::time_point<date::local_t, std::chrono::duration<long, std::ratio<1l, 1l> > >, date::choose, std::integral_constant<bool, false>) const (tz.h:947)
==20645==