How to use regex with AWK for string replacement?

Try this (gawk is needed).

awk '{a=gensub(/.*#([0-9]+)(\").*/,"\\1","g",$0);if(a~/[0-9]+/) {gsub(/[0-9]+\"/,a+11"\"",$0);}print $0}' YourFile

Test with your example:

kent$  echo '(bookmarks
("Chapter 1 Introduction 1" "#1"
("1.1 Problem Statement and Basic Definitions 2" "#2")
("Exercises 30" "#30")
("Notes and References 34" "#34"))
)
'|awk '{a=gensub(/.*#([0-9]+)(\").*/,"\\1","g",$0);if(a~/[0-9]+/) {gsub(/[0-9]+\"/,a+11"\"",$0);}print $0}'   
(bookmarks
("Chapter 1 Introduction 12" "#12"
("1.1 Problem Statement and Basic Definitions 13" "#13")
("Exercises 41" "#41")
("Notes and References 45" "#45"))
)

Note that this command won't work if the two numbers (e.g. 1" and "#1") are different. or there are more numbers in same line with this pattern (e.g. 23" ...32"..."#123") in one line.


UPDATE

Since @Tim (OP) said the number followed by " in same line could be different, I did some changes on my previous solution, and made it work for your new example.

BTW, from the example I feel that it could be a table of content structure, so I don't see how the two numbers could be different. First would be the printed page number, and 2nd with # would be the page index. Am I right?

Anyway, you know your requirement best. Now the new solution, still with gawk (I break the command into lines to make it easier to read):

awk 'BEGIN{FS=OFS="\" \"#"}{if(NF<2){print;next;}
        a=gensub(/.* ([0-9]+)$/,"\\1","g",$1);
        b=gensub(/([0-9]+)\"/,"\\1","g",$2); 
        gsub(/[0-9]+$/,a+11,$1);
        gsub(/^[0-9]+/,b+11,$2);
        print $1,$2
}' yourFile

test with your new example:

kent$  echo '(bookmarks
("Chapter 1 Introduction 1" "#1"
("1.1 Problem Statement and Basic Definitions 23" "#2")
("Exercises 31" "#30")
("Notes and References 42" "#34"))
)
'|awk 'BEGIN{FS=OFS="\" \"#"}{if(NF<2){print;next;}
        a=gensub(/.* ([0-9]+)$/,"\\1","g",$1);
        b=gensub(/([0-9]+)\"/,"\\1","g",$2); 
        gsub(/[0-9]+$/,a+11,$1);
        gsub(/^[0-9]+/,b+11,$2);
        print $1,$2
}'                        
(bookmarks
("Chapter 1 Introduction 12" "#12"
("1.1 Problem Statement and Basic Definitions 34" "#13")
("Exercises 42" "#41")
("Notes and References 53" "#45"))
)


EDIT2 based on @Tim 's comment

(1) Does FS=OFS="\" \"#" mean the separator of field in both input and output is double quote, space, double quote and #? Why specify double quote twice?

You are right for the separator in both input and output part. It defined separator as:

" "#

There are two double quotes, because it is easier to catch the two numbers you want (based on your example input).

(2) In /.* ([0-9]+)$/, does $ mean the end of the string?

Exactly!

(3) In the third argument of gensub(), what is the difference between "g" and "G"? there is no difference between G and g. Check this out:

gensub(regexp, replacement, how [, target]) #
    Search the target string target for matches of the regular expression regexp. 
    If "how" is a string beginning with ‘g’ or ‘G’ (short for “global”), then 
        replace all matches of regexp with replacement.

This is from http://www.gnu.org/s/gawk/manual/html_node/String-Functions.html. you can read to get detailed usage of gensub.


Unlike just about every tool that provides regexp substitutions, awk does not allow backreferences such as \1 in replacement text. GNU Awk gives access to matched groups if you use the match function, but not with ~ or sub or gsub.

Note also that even if \1 was supported, your snippet would append the string +11, not perform a numerical computation. Also, your regexp isn't quite right, you're matching things like "42"" and not "#42".

Here's an awk solution (warning, untested). It only performs a single replacement per line.

awk '
  match($0, /"#[0-9]+"/) {
    n = substr($0, RSTART+2, RLENGTH-3) + 11;
    $0 = substr($0, 1, RSTART+1) n substr($0, RSTART+RLENGTH-1)
  }
  1 {print}'

It would be simpler in Perl.

perl -pe 's/(?<="#)[0-9]+(?=")/$1+11/e'

awk can do it, but it isn't direct, even using backreferencing.
GNU awk has (partial) backreferecing, in the form of gensub.

Instances of 123" are temporarily wrapped in \x01 and \x02 to mark them as unmodified (for sub(). co

Or you could just step through the loop changing candidates as you go, in which case, the backreferencing and "brackets" aren't needed; but keeping track of the character index is needed.

awk '{$0=gensub(/([0-9]+)\"/, "\x01\\1\"\x02", "g", $0 )
      while ( match($0, /\x01[0-9]+\"\x02/) ) {
        temp=substr( $0, RSTART, RLENGTH )
        numb=substr( temp, 2, RLENGTH-3 ) + 11
        sub( /\x01[0-9]+\"\x02/, numb "\"" ) 
      } print }'

Here is another way, using gensub and array split and \x01 as a field delimiter (for split).. \x02 marks an array element as a candidate for the arithmetic addition.

awk 'BEGIN{ ORS="" } {
     $0=gensub(/([0-9]+)\"/, "\x01\x02\\1\x01\"", "g", $0 )
     split( $0, a, "\x01" )
     for (i=0; i<length(a); i++) { 
       if( substr(a[i],1,1)=="\x02" ) { a[i]=substr(a[i],2) + 11 }
       print a[i]
     } print "\n" }'