Merge by date multiple log files that also include un-dated lines (e.g. stack traces)

Tricky. While it is possible using date and bash arrays, this really is the kind of thing that would benefit from a real programming language. In Perl for example:

$ perl -ne '$d=$1 if /(.+?),/; $k{$d}.=$_; END{print $k{$_} for sort keys(%k);}' log*
01:02:03.6497,2224,0022 foo
foo1
2foo
foo3
01:03:03.6497,2224,0022 FOO
FOO1
2FOO
FOO3
01:04:03.6497,2224,0022 bar
1bar
bar2
3bar

Here's the same thing uncondensed into a commented script:

#!/usr/bin/env perl

## Read each input line, saving it 
## as $_. This while loop is equivalent
## to perl -ne 
while (<>) {
    ## If this line has a comma
    if (/(.+?),/) {
        ## Save everything up to the 1st 
        ## comma as $date
        $date=$1;
    }
    ## Add the current line to the %k hash.
    ## The hash's keys are the dates and the 
    ## contents are the lines.
    $k{$date}.=$_;
}

## Get the sorted list of hash keys
@dates=sort(keys(%k));
## Now that we have them sorted, 
## print each set of lines.
foreach $date (@dates) {
    print "$k{$date}";
}

Note that this assumes that all date lines and only the date lines contain a comma. If that's not the case, you can use this instead:

perl -ne '$d=$1 if /^(\d+:\d+:\d+\.\d+),/; $k{$d}.=$_; END{print $k{$_} for sort keys(%k);}' log*

The approach above needs to keep the entire contents of the files in memory. If that is a problem, here's one that doesn't:

$ perl -pe 's/\n/\0/; s/^/\n/ if /^\d+:\d+:\d+\.\d+/' log* | 
    sort -n | perl -lne 's/\0/\n/g; printf'
01:02:03.6497,2224,0022 foo
foo1
2foo
foo3    
01:03:03.6497,2224,0022 FOO
FOO1
2FOO
FOO3    
01:04:03.6497,2224,0022 bar
1bar
bar2
3bar

This one simply puts all lines between successive timestamps on to a single line by replacing newlines with \0 (if this can be in your log files, use any sequence of characters you know will never be there). This passed to sort and then tr to get the lines back.


As very correctly pointed out by the OP, all of the above solutions need to be resorted and don't take into account that the files can be merged. Here's one that does but which unlike the others will only work on two files:

$ sort -m <(perl -pe 's/\n/\0/; s/^/\n/ if /^\d+:\d+:\d+\.\d+/' log1) \
            <(perl -pe 's/\n/\0/; s/^/\n/ if /^\d+:\d+:\d+\.\d+/' log2) | 
    perl -lne 's/[\0\r]/\n/g; printf'

And if you save the perl command as an alias, you can get:

$ alias a="perl -pe 's/\n/\0/; s/^/\n/ if /^\d+:\d+:\d+\.\d+/'"
$ sort -m <(a log1) <(a log2) | perl -lne 's/[\0\r]/\n/g; printf'