I chose awk
. Alternation via \|
only works in a few variants of grep
and sed
.
awk -v "regex=bar|baz" '
$0 ~ regex { # if there is a match
a = $0 # store current line in a
printf "Match! " # print the marker
# as long there matches something
while (match(a, regex)) {
# print that something
printf "(%s)", substr(a, RSTART, RLENGTH)
# remove the processed part from a
a = substr(a, RSTART + RLENGTH)
}
printf " " # separator
}
# finally print the line, regardless of whether there
# was a match or not
{
print
}
' your_file
An example run, including the script as a oneliner:
$ cat input
foo foo foo foo foo foo bar foo foo foo foo
foo foo foo foo foo foo foo foo foo foo foo
foo foo foo foo foo foo foo bar foo foo foo
foo foo foo baz foo foo bar foo foo foo foo
$ awk -v "regex=bar|baz" '$0 ~ regex { a = $0; printf "Match! "; while (match(a, regex)) { printf "(%s)", substr(a, RSTART, RLENGTH); a = substr(a, RSTART + RLENGTH) } printf " " } { print }' input
Match! (bar) foo foo foo foo foo foo bar foo foo foo foo
foo foo foo foo foo foo foo foo foo foo foo
Match! (bar) foo foo foo foo foo foo foo bar foo foo foo
Match! (baz)(bar) foo foo foo baz foo foo bar foo foo foo foo
The order of the matches in the last line is the same order they appear in the input; let me know how that should really be handled, because your example output orders them differently.
And a solution in sed
, if only for showing that the awk
variant is easier:
$ cat input
foo foo foo foo foo foo bar foo foo foo foo
foo foo foo foo foo foo foo foo foo foo foo
foo foo foo foo foo foo foo bar foo foo foo
foo foo foo baz foo foo bar foo foo foo foo
$ sed -e '/bar/ba' -e '/baz/!b' -e :a -e 'h;s/^/|/;:b' -e 's/|\(.*\)\(bar\)/(\2)|\1/;s/|\(.*\)\(baz\)/(\2)|\1/;tb' -e 's/^/Match! /;G;s/|.*\n/ /' input
Match! (bar) foo foo foo foo foo foo bar foo foo foo foo
foo foo foo foo foo foo foo foo foo foo foo
Match! (bar) foo foo foo foo foo foo foo bar foo foo foo
Match! (bar) (baz) foo foo foo baz foo foo bar foo foo foo foo
It can be shortened a bit for sed variants that support extended regular expressions (GNU sed -r, OS X, *BSD sed -E):
sed -E -e '/bar|baz/!b' -e 'h;s/^/#/;:a' -e 's/#(.*)(bar|baz)/(\2)#\1/;ta' -e 's/^/Match! /;G;s/#.*\n/ /'
With commentary:
/bar/ba # "bar" found: jump to label a
/baz/!b # no "baz" either, end cycle, print line
:a # label a (only here if "bar" or "baz" found)
h # save the line in the hold buffer
# The | character is used as a marker. occurrences of "|" in
# the input are fine, they don't interfere.
s/^/|/ # prepend a |
:b # label b for a loop
# find "bar" and move it before |
s/|\(.*\)\(bar\)/(\2)|\1/
# same with baz
s/|\(.*\)\(baz\)/(\2)|\1/
tb # jump back to b if there was a match
s/^/Match! / # prepend "Match! "
G # append the remembered line
s/|.*\n/ / # remove the data we worked with between | and
# the linefeed inserted by G, so there's only
# the prefix and the real line left, separated
# by one space character