How to understand pipes

About your performance question, pipes are more efficient than files because no disk IO is needed. So cmd1 | cmd2 is more efficient than cmd1 > tmpfile; cmd2 < tmpfile (this might not be true if tmpfile is backed on a RAM disk or other memory device as named pipe; but if it is a named pipe, cmd1 should be run in the background as its output can block if the pipe becomes full). If you need the result of cmd1 and still need to send its output to cmd2, you should cmd1 | tee tmpfile | cmd2 which will allow cmd1 and cmd2 to run in parallel avoiding disk read operations from cmd2.

Named pipes are useful if many processes read/write to the same pipe. They can also be useful when a program is not designed to use stdin/stdout for its IO needing to use files. I put files in italic because named pipes are not exactly files in a storage point of view as they reside in memory and have a fixed buffer size, even if they have a filesystem entry (for reference purpose). Other things in UNIX have filesystem entries without being files: just think of /dev/null or others entries in /dev or /proc.

As pipes (named and unnamed) have a fixed buffer size, read/write operations to them can block, causing the reading/writing process to go in IOWait state. Also, when do you receive an EOF when reading from a memory buffer ? Rules on this behavior are well defined and can be found in the man.

One thing you cannot do with pipes (named and unnamed) is seek back in the data. As they are implemented using a memory buffer, this is understandable.

About "everything in Linux/Unix is a file", I do not agree. Named pipes have filesystem entries, but are not exactly file. Unnamed pipes do not have filesystem entries (except maybe in /proc). However, most IO operations on UNIX are done using read/write function that need a file descriptor, including unnamed pipe (and socket). I do not think that we can say that "everything in Linux/Unix is a file", but we can surely say that "most IO in Linux/Unix is done using a file descriptor".


Two of the basic fundamentals of UNIX philosophy are

  1. To make small programs that do one thing well.
  2. and expect the output of every program to become the input to another,as
    yet unknown,program.

    The use of pipes let you leverage the effects of these two design
    fundamentals to create extremely powerful chains of commands to achieve your desired result.

    Most command-line programs that operate on files can also accept input on standard in(input through keyboard) and output to standard out(print on
    screen).

    Some commands are designed to only operate within a pipe can't operate on files directly.

    for example tr command

  ls -C | tr 'a-z' 'A-Z'
    cmd1 | cmd2
  • Sends STDOUT of cmd1 to STDIN of cmd2 instead of the screen.

  • STDERR is not forwarded across pipes.

    In short Pipes is character (|) can connect commands.

    Any command that writes to STDOUT can be be used on the left hand side of pipe.

       ls - /etc | less 
    

    Any command that reads from STDIN can be used on the right-hand side of a pipe.

       echo "test print" | lpr 
    

    A traditional pipe is "unnamed" because it exists anonymously and persists only for as long as the process is running. A named pipe is system-persistent and exists beyond the life of the process and must be deleted once it is no longer being used. Processes generally attach to the named pipe (usually appearing as a file) to perform inter-process communication (IPC).

source : http://en.wikipedia.org/wiki/Named_pipe


To supplement the other answers...

stdin and stdout are file descriptors and are read and written as if they are files. therefore you can do echo hi | grep hi, and it will replace echo's stdout with a pipe and replace stdin of grep to other end of this pipe.