Ignore a word in a regular expression

Today a friend of mine asked me a tricky regexp related question: he wanted to match against a set of strings like:

WORD_FOO
WORD_BAR
WORD_FOOBAR
WORD_QUUX
WORD_FAR

He did want to match and store any string starting with WORD_ and followed by a valid word, unless it was FOO. So it should match all the aforementioned lines but WORD_FOO.

Tricky.

He was using something like /^(WORD_[^\s]+)$/, so I suggested using [^\s|VERSION] (I did wake up only a few moments before), but of course that doesn’t work, since it would exclude all strings containing the characters V, E, R, S, I, O, N.

It took me some digging, but finally I found this answer on StackOverflow that documents the use of negative look-arounds.

Using these constructs I managed to get this regex: WORD_((?!FOO\W)\S+) that satisfies the requirements (you can check it on Rubular).

How does it work?

(?!FOO\W) checks the next characters of the string. If they DON’T (!) contain the word FOO followed by a non-word (whitespace, etc) character (\W), then the matching will be made against \S+ (one or more non whitespace characters). So you’ll get the second part of the word in your \1, $1, etc.

If you want to ignore all sub-strings starting with FOO, you can get rid of that \W.

Advertisements

One thought on “Ignore a word in a regular expression

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s