Using pcre2 in a c++ project

I don't know if this is still something you're looking at or not... but just-in-case does this help?

From pcre2api man page:

In a Windows environment, if you want to statically link an application program against a non-dll PCRE2 library, you must define PCRE2_STATIC before including pcre2.h.


In case someone wants to build the library using visual studio

  1. Download pcre2 from the website, (http://www.pcre.org/)
  2. in Visual Studio 2015, (and maybe others), create an empty project "Win32 project" and call it pcre2.
  3. Copy all the files in \pcre2\src\ to your newly created empty project.
  4. Add all the files listed in "NON-AUTOTOOLS-BUILD", (located in the base folder)
    • pcre2_auto_possess.c
    • pcre2_chartables.c
    • pcre2_compile.c
    • pcre2_config.c
    • etc...
  5. Rename the file config.h.generic to config.h
  6. Add the config.h file to the project.
  7. In your project, select all the *.c file Go Properties > C/C++ > Precompiled Header > "Not Using Precompiled header"
  8. Select the project, Go to Properties > Preprocessor > Preprocessor Definition and select the drop down list, and add...
    • PCRE2_CODE_UNIT_WIDTH=8
    • HAVE_CONFIG_H

Compile and the lib file should be created fine.


If you don't mind using a wrapper, here's mine: JPCRE2

You need to select the basic character type (char, wchar_t, char16_t, char32_t) according to the string classes you will use (respectively std::string, std::wstring, std::u16string, std::u32string):

typedef jpcre2::select<char> jp;
//Selecting char as the basic character type will require
//8 bit PCRE2 library where char is 8 bit,
//or 16 bit PCRE2 library where char is 16 bit,
//or 32 bit PCRE2 library where char is 32 bit.
//If char is not 8, 16 or 32 bit, it's a compile error.

Match Examples:

Check if a string matches a pattern:

if(jp::Regex("(\\d)|(\\w)").match("I am the subject")) 
    std::cout<<"\nmatched";
else
    std::cout<<"\nno match";

Match all and get the match count:

size_t count = 
jp::Regex("(\\d)|(\\w)","mi").match("I am the subject", "g");
// 'm' modifier enables multi-line mode for the regex
// 'i' modifier makes the regex case insensitive
// 'g' modifier enables global matching

Get numbered substrings/captured groups:

jp::VecNum vec_num;
count = 
jp::Regex("(\\w+)\\s*(\\d+)","im").initMatch()
                                  .setSubject("I am 23, I am digits 10")
                                  .setModifier("g")
                                  .setNumberedSubstringVector(&vec_num)
                                  .match();
std::cout<<"\nTotal match of first match: "<<vec_num[0][0];      
std::cout<<"\nCaptrued group 1 of first match: "<<vec_num[0][1]; 
std::cout<<"\nCaptrued group 2 of first match: "<<vec_num[0][2]; 

std::cout<<"\nTotal match of second match: "<<vec_num[1][0];
std::cout<<"\nCaptrued group 1 of second match: "<<vec_num[1][1];
std::cout<<"\nCaptrued group 2 of second match: "<<vec_num[1][2]; 

Get named substrings/captured groups:

jp::VecNas vec_nas;
count = 
jp::Regex("(?<word>\\w+)\\s*(?<digit>\\d+)","m")
                         .initMatch()
                         .setSubject("I am 23, I am digits 10")
                         .setModifier("g")
                         .setNamedSubstringVector(&vec_nas)
                         .match();
std::cout<<"\nCaptured group (word) of first match: "<<vec_nas[0]["word"];
std::cout<<"\nCaptured group (digit) of first match: "<<vec_nas[0]["digit"];

std::cout<<"\nCaptured group (word) of second match: "<<vec_nas[1]["word"];
std::cout<<"\nCaptured group (digit) of second match: "<<vec_nas[1]["digit"];

Iterate through all matches and substrings:

//Iterating through numbered substring
for(size_t i=0;i<vec_num.size();++i){
    //i=0 is the first match found, i=1 is the second and so forth
    for(size_t j=0;j<vec_num[i].size();++j){
        //j=0 is the capture group 0 i.e the total match
        //j=1 is the capture group 1 and so forth.
        std::cout<<"\n\t("<<j<<"): "<<vec_num[i][j]<<"\n";
    }
}

Replace/Substitute Examples:

std::cout<<"\n"<<
///replace all occurrences of a digit with @
jp::Regex("\\d").replace("I am the subject string 44", "@", "g");

///swap two parts of a string
std::cout<<"\n"<<
jp::Regex("^([^\t]+)\t([^\t]+)$")
             .initReplace()
             .setSubject("I am the subject\tTo be swapped according to tab")
             .setReplaceWith("$2 $1")
             .replace();

Replace with Match Evaluator:

jp::String callback1(const jp::NumSub& m, void*, void*){
    return "("+m[0]+")"; //m[0] is capture group 0, i.e total match (in each match)
}
int main(){
    jp::Regex re("(?<total>\\w+)", "n");
    jp::RegexReplace rr(&re);
    String s3 = "I am ঋ আা a string 879879 fdsjkll ১ ২ ৩ ৪ অ আ ক খ গ ঘ আমার সোনার বাংলা";
    rr.setSubject(s3)
      .setPcre2Option(PCRE2_SUBSTITUTE_GLOBAL);
    std::cout<<"\n\n### 1\n"<<
            rr.nreplace(jp::MatchEvaluator(callback1));
            //nreplace() treats the returned string from the callback as literal,
            //while replace() will process the returned string
            //with pcre2_substitute()

    #if __cplusplus >= 201103L
    //example with lambda
    std::cout<<"\n\n### Lambda\n"<<
            rr.nreplace(
                jp::MatchEvaluator(
                    [](const jp::NumSub& m1, const jp::MapNas& m2, void*){
                        return "("+m1[0]+"/"+m2.at("total")+")";
                    }
                ));
    #endif
    return 0;
}

You can read the complete documentation here.


PCRE2_SPTR pattern = (PCRE2_SPTR)std::string("([a-z]+)|\\s").c_str();

Using this pointer with any of the PCRE functions will result in undefined behavior. The std::string temporary is destroyed at the end of the definition of pattern, causing pattern to dangle.

My recommendation is to change pattern's type to std::string and call c_str() when passing arguments to a PCRE function. It is a very fast operation in C++11 (and you are not using the old GCC 4 ABI).

There are also several C++ wrappers for PCRE that might help you avoid such issues and make PCRE easier to use, but I do not the status of Windows support.