Sort a list of domain names (FQDNs) starting from tld and working left

Solution 1:

This simple python script will do what you want. In this example I name the file domain-sort.py:

#!/usr/bin/env python
from fileinput import input
for y in sorted([x.strip().split('.')[::-1] for x in input()]): print '.'.join(y[::-1])

To run it use:

cat file.txt | ./domain-sort.py

Note that this looks a little uglier since I wrote this as more or a less a simple one-liner I had to use slice notation of [::-1] where negative values work to make a copy of the same list in reverse order instead of using the more declarative reverse() which does it in-place in a way that breaks the composability.

And here's a slightly longer, but maybe more readable version that uses reversed() which returns an iterator, hence the need to also wrap it in list() to consume the iterator and produce a list:

#!/usr/bin/env python
from fileinput import input
for y in sorted([list(reversed(x.strip().split('.'))) for x in input()]): print '.'.join(list(reversed(y)))

On a file with 1,500 randomly sorted lines it takes ~0.02 seconds:

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
Maximum resident set size (kbytes): 21632

On a file with 150,000 randomly sorted lines it takes a little over 3 seconds:

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.20
Maximum resident set size (kbytes): 180128

Here is an arguably more readable version that does the reverse() and sort() in-place, but it runs in the same amount of time, and actually takes slightly more memory.

#!/usr/bin/env python
from fileinput import input

data = []
for x in input():
   d = x.strip().split('.')
   d.reverse()
   data.append(d)
data.sort()
for y in data:
   y.reverse()
   print '.'.join(y)

On a file with 1,500 randomly sorted lines it takes ~0.02 seconds:

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
Maximum resident set size (kbytes): 22096

On a file with 150,000 randomly sorted lines it takes a little over 3 seconds:

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.08
Maximum resident set size (kbytes): 219152

Solution 2:

Here's a PowerShell script that should do what you want. Basically it throws all the TLD's into an array reverses each TLD, sorts it, reverses it back to its original order, and then saves it to another file.

$TLDs = Get-Content .\TLDsToSort-In.txt
$TLDStrings = @();

foreach ($TLD in $TLDs){
    $split = $TLD.split(".")
    [array]::Reverse($split)
    $TLDStrings += ,$split
}

$TLDStrings = $TLDStrings|Sort-Object

foreach ($TLD in $TLDStrings){[array]::Reverse($TLD)}

$TLDStrings | %{[string]::join('.', $_)} | Out-File .\TLDsToSort-Out.txt

Ran it on 1,500 records - took 5 seconds on a reasonably powerful desktop.


Solution 3:

cat domain.txt | rev | sort | rev


Solution 4:

Slightly less cryptic, or at least prettier, Perl:

use warnings;
use strict;

my @lines = <>;
chomp @lines;

@lines =
    map { join ".", reverse split /\./ }
    sort
    map { join ".", reverse split /\./ }
    @lines;

print "$_\n" for @lines;

This is a simple example of a Guttman–Rosler transform: we convert the lines into the appropriate sortable form (here, split the domain name on periods and reverse the order of the parts), sort them using the native lexicographic sort and then convert the lines back to their original form.


Solution 5:

In Unix scripting: reverse, sort and reverse:

awk -F "." '{for(i=NF; i > 1; i--) printf "%s.", $i; print $1}' file |
  sort |
  awk -F "." '{for(i=NF; i > 1; i--) printf "%s.", $i; print $1}'