Strict Standards: date_default_timezone_get(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected 'America/New_York' for 'EDT/-4.0/DST' instead in /homepages/14/d176026529/htdocs/htdocs/wiki/wiki/includes/Setup.php on line 368
Regular Expressions - WinMerge Development Wiki

Regular Expressions

From WinMerge Development Wiki
Jump to: navigation, search

Regular expressions are one key feature in WinMerge filtering. But they can be also used in editor for searches.

Contents

PCRE

PCRE Regular Expression implementation is in WinMerge source tree in /Externals/pcre/. All regular expression code must use PCRE.

One of advantages of PCRE is it supports UTF-8 natively. So for example in line filters we avoid converting data between different unicode formats.

Read PCRE documentation from source repository.

Usage Areas

Filtering

Filtering is based on regular expressions:

  • file filtering uses regular expression rules to include/exclude items in compare
  • line filtering uses regular expressions to exclude differences in files

Note that even if *.ext -style filter is given from Open-dialog, it is still converted to regular expression before the compare.

Filtering will be almost completely rewritten.

Search / Replace

Editor allows using regular expressions in Find/Replace dialogs.

Using PCRE

Unicode

As WinMerge handles Unicode data, PCRE needs to handle it too. PCRE handles UTF-8 Unicode. So all unicode data (expressions and strings to match) must be given as UTF-8.

There are utility functions in Ucs2Utf8.h for conversions needed.

Circumventing EOL Chars Issue

In WinMerge files can have different EOL characters, sometimes EOL characters can vary even inside one file. PCRE can handle different EOL chars but EOL char type must be told to it. This is not always practical.

Experimental release 2.7.1.5 and later have new post-compare line filtering implemented. This new line filtering code first normally compares files and emits results to the difference list. Then lines in differences are matched against regular expressions given. If all lines in both sides match the expression, difference is marked as ignored. As linedata is got from WinMerge's own buffers there is no EOL bytes so the compare is EOL style independent.

Now we finally have fixed linefilters to work with different EOL styles and $ in regular expression matches the line end.

PCRE Structures

Be aware that PCRE structures like pcre and pcre_extra are complex structures with pointers. Don't try to copy them as simple structs.

In the Code

Start by including pcre.h and Ucs2Utf8.h

#include "pcre.h"
#include "Ucs2Utf8.h"

Add PCRE structures and related variables:

const char * errormsg = NULL;
int erroroffset = 0;
char regexString[200] = {0};
int regexLen = 0;
int pcre_opts = 0;

Set regular expression string to regexString and length to regexLen. For unicode build, set PCRE options to use UTF8:

#ifdef UNICODE
      pcre_opts |= PCRE_UTF8;
#endif

Set other PCRE options if needed: PCRE_BSR_ANYCRLF (Do we need this with pcre 7.7??) and PCRE_CASELESS.

Compile and study regular expression:

pcre *regexp = pcre_compile(regexString, pcre_opts, &errormsg,
    &erroroffset, NULL);
pcre_extra *pe = NULL;
if (regexp)
{
      errormsg = NULL;
      pe = pcre_study(regexp, 0, &errormsg);
}

Add variables for result table:

int ovector[30];
char compString[200] = {0};

Execute regexp:

int result = pcre_exec(regexp, pe, compString, stringLen,
          0, 0, ovector, 30);

If the result value is >= 0 then we got a match. ovector contains matches as pairs of start- and end-positions.

Personal tools