extract from line to line and then save to separate file

sed -n '2762818,2853648p' /var/log/logfile > /var/log/output.txt

p is for print

Probably the best way to do this is with shell redirection, as others have mentioned. sed though, while a personal favorite, is probably not going to do this more efficiently than will head - which is designed to grab only so many lines from a file.

There are other answers on this site which demonstrably show that for large files head -n[num] | tail -n[num] will outperform sed every time, but probably even faster than that is to eschew the pipe altogether.

I created a file like:

echo | dd cbs=5000000 conv=block | tr \  \\n >/tmp/5mil_lines

And I ran it through:

{ head -n "$((ignore=2762817))" >&2
  head -n "$((2853648-ignore))" 
} </tmp/5mil_lines 2>/dev/null  |
sed -n '1p;$p'

I only used sed at all there to grab only the first and last line to show you...

2762818
2853648

This works because when you group commands with { ... ; } and redirect the input for the group like ... ; } <input all of them will share the same input. Most commands will exhaust the whole infile while reading it so in a { cmd1 ; cmd2; } <infile case usually cmd1 reads from the head of the infile to its tail and cmd2 is left with none.

head, however, will always seek only so far through its infile as it is instructed to do, and so in a...

{ head -n [num] >/dev/null
  head -n [num]
} <infile

...case the first seeks through to [num] and dumps its output to /dev/null and the second is left to begin its read where the first left it.

You can do...

{ head -n "$((ignore=2762817))" >/dev/null
  head -n "$((2853648-ignore))" >/path/to/outfile
} <infile

This construct also works with other kinds of compound commands. For example:

set "$((n=2762817))" "$((2853648-n))"
for n do head "-n$n" >&"$#"; shift
done <5mil_lines 2>/dev/null | 
sed -n '1p;$p'

...which prints...

2762818
2853648

But it might also work like:

d=$(((  n=$(wc -l </tmp/5mil_lines))/43 ))      &&
until   [ "$(((n-=d)>=(!(s=143-n/d))))" -eq 0 ] &&
        head "-n$d" >>"/tmp/${s#1}.split"
do      head "-n$d" > "/tmp/${s#1}.split"       || ! break
done    </tmp/5mil_lines

Above the shell initially sets the $n and $d variables to ...

$n
- The line count as reported by wc for my test file /tmp/5mil_lines
$d
- The quotient of $n/43 where 43 is just some arbitrarily selected divisor.

It then loops until it has decremented $n by $d to a value less $d. While doing so it saves its split count in $s and uses that value in the loop to increment the named > output file called /tmp/[num].split. The result is that it reads out an equal number of \newline delimited fields in its infile to a new outfile for each iteration - splitting it out equally 43 times over the course of the loop. It manages it without having to read its infile any more than 2 times - the first time is when wc does it to count its lines, and for the rest of the operation it only reads as many lines as it writes to the outfile each time.

After running it I checked my results like...

tail -n1 /tmp/*split | grep .

OUTPUT:

==> /tmp/01.split <==
116279  
==> /tmp/02.split <==
232558  
==> /tmp/03.split <==
348837  
==> /tmp/04.split <==
465116  
==> /tmp/05.split <==
581395  
==> /tmp/06.split <==
697674  
==> /tmp/07.split <==
813953  
==> /tmp/08.split <==
930232  
==> /tmp/09.split <==
1046511 
==> /tmp/10.split <==
1162790 
==> /tmp/11.split <==
1279069 
==> /tmp/12.split <==
1395348 
==> /tmp/13.split <==
1511627 
==> /tmp/14.split <==
1627906 
==> /tmp/15.split <==
1744185 
==> /tmp/16.split <==
1860464 
==> /tmp/17.split <==
1976743 
==> /tmp/18.split <==
2093022 
==> /tmp/19.split <==
2209301 
==> /tmp/20.split <==
2325580 
==> /tmp/21.split <==
2441859 
==> /tmp/22.split <==
2558138 
==> /tmp/23.split <==
2674417 
==> /tmp/24.split <==
2790696 
==> /tmp/25.split <==
2906975 
==> /tmp/26.split <==
3023254 
==> /tmp/27.split <==
3139533 
==> /tmp/28.split <==
3255812 
==> /tmp/29.split <==
3372091 
==> /tmp/30.split <==
3488370 
==> /tmp/31.split <==
3604649 
==> /tmp/32.split <==
3720928 
==> /tmp/33.split <==
3837207 
==> /tmp/34.split <==
3953486 
==> /tmp/35.split <==
4069765 
==> /tmp/36.split <==
4186044 
==> /tmp/37.split <==
4302323 
==> /tmp/38.split <==
4418602 
==> /tmp/39.split <==
4534881 
==> /tmp/40.split <==
4651160 
==> /tmp/41.split <==
4767439 
==> /tmp/42.split <==
4883718 
==> /tmp/43.split <==
5000000

extract from line to line and then save to separate file

OUTPUT:

Tags:

Command Line

Grep

Sed

Related

Recent Posts