How do I concatenate all the files in a given directory in order of date, where I want the newest file on top?

To concatenate files you use

cat file1 file2 file3 ...

To get a list of quoted filenames sorted by time, newest first, you use

ls -t

Putting it all together,

cat $(ls -t) > outputfile

You might want to give some arguments to ls (eg, *.html).

But if you have filenames with spaces in them, this will not work. My file.html will be assumed to be two filenames: My and file.html. You can make ls quote the filenames, and then use xargs, who understands the quoting, to pass the arguments to cat.

ls -tQ | xargs cat

As for your second question, filtering out parts of files isn't difficult, but it depends on what exactly you want to strip out. What are the “redundant headers”?


The easiest way of listing files in an order other than lexicographic is with zsh glob qualifiers. Without zsh, you can use ls, but parsing the output of ls is fraught with dangers.

cat *(om)

If you want to strip some lines, use sed or awk or perl. For example, to take the <head> from the first file and combine the <body> parts from the other files, assuming that the <body> and </body> tags are alone on a line in every file:

{
  sed -e '/<\/body>/ q' *.html(om[2])
  sed -e '1,/<body>/ d' -e '/<\/body>/,$ d' *.html(om[3,-1])
  echo '</body>'
  echo '</html>'
} >concatenated.html

Explanation:

  • First, concatenated.html is created. It is therefore the youngest *.html file (assuming no file has a date in the future.
  • Then copy from the second-youngest *.html file, but quit at the </body> line.
  • Then copy from the other files, but skip everything down to the <body> line and starting with the </body> line.
  • Finally produce the last closing tags.