Use sed to encapsulate the first word of each paragraph with <i> </i>?

Using sed,

  • if there's a letter at the beginning of the line, then
  • capture any amount of non-whitespace characters and
  • replace those captured characters with surrounding <i> ... </i>.

like this:

sed '/^[a-zA-Z]/ s!\([^ ]*\)!<i>\1</i>!' < file > file.new

On this sample input:

Snapdragon  Plant with a two-lipped flower.

Snap-fastener  = *press-stud.

Snapper  Any of several edible marine fish.

Snappish  1 curt; ill-tempered; sharp. 2 inclined to snap.

The output is:

<i>Snapdragon</i>  Plant with a two-lipped flower.

<i>Snap-fastener</i>  = *press-stud.

<i>Snapper</i>  Any of several edible marine fish.

<i>Snappish</i>  1 curt; ill-tempered; sharp. 2 inclined to snap.

To break down the pieces of the sed command:

  • /^[a-zA-Z]/ -- this is an address filter; it means to apply the subsequent command only to lines that match this regular expression. The regular expression requires that a letter (either lower-case a-z or upper-case A-Z) must follow the beginning of the line ^.

  • s!\([^ ]*\)!<i>\1</i>! -- this is the search and replace command. It uses a delimiter between the search and the replacement; the common delimiter is a forward-slash, but since the replacement text has a forward-slash, I changed the delimiter to an exclamation point !. The search term has two pieces to it: the capturing parenthesis, which have to be escaped, and the regular expression [^ ]*, which says: "match anything-except-a-space, zero or more times *. The replacement text refers back to that captured group with \1 and surrounds it with the HTML tag.

To additionally wrap each non-empty line with paragraph tags, add another sed expression:

sed -e '/^[a-zA-Z]/ s!\([^ ]*\)!<i>\1</i>!' -e '/./ { s/^/<p>/; s!$!</p>! }' < file

The additional expression says:

  • match lines that have one (any) character -- this skips blank lines
  • { group the next two commands together
  • search and replace the beginning of line ^ with an opening paragraph tag
  • search and replace the end of line $ with a closing paragraph tag
  • } end the grouping

You can do this with sed:

$ sed '/^$/n;s#^\([^ ]*\)#<i>\1</i>#' input.txt
<i>Snapdragon</i>  Plant with a two-lipped flower.

<i>Snap-fastener</i>  = *press-stud.

<i>Snapper</i>  Any of several edible marine fish.

<i>Snappish</i>  1 curt; ill-tempered; sharp. 2 inclined to snap.

Explanation

The sed above includes 2 blocks. The first block detects any blank lines, /^$/ and skips them, n.

  • skip any blank lines /^$/n

The second block does all the heavy lifting s#..#..#, and detects sub-strings that do not include a space \([^ ]*\). This pattern is 'saved' via the \(..\) that wraps it, so we can reuse it later on via the \1.

  • match sub-string up to first space \([^ ]*\)
  • save match, \1, and wrap it with <i>...</i>