Position Matching RegEx

faltutech

6 months ago

Using Boundaries :

Position matching is used to specify where within a string of text a match should occur.

To understand the need for position matching,consider the following example:
TEXT
The cat scattered his food all over the room.

REGEX
cat

RESULT

The cat scattered his food all over the room.

But what if we want to match exact cat not xxcatxx where xx could be any length string.

To solve this we would use metacharacter \b which denotes the boundary.

So for above example we will use \bcat\b and it will match cat only.

It is importantto realize that to match a whole word, \b must be used both before and after the text to be matched. If we use it only before the to be matched expression then it will only match words which starts with the specified pattern. E.g. \bcat will match cat, cation, caption while \bcat\b will match cat,cation,caption.

To specifically not match at a word boundary, use \B.

E.g. Test Case – cat, ecation, caption

RegEx – \Bcat\B

Result – cat, ecation, caption

Defining String Boundaries :

Word boundaries are used to locate matches based on word position (start
of word, end of word,entire word, and so on). String boundaries perform
a similar function but are used to match patterns at the start or end of an
entire string. The string boundary metacharacters are ^ forstart of string
and $ for end of string.

E.g. – Test Case – cathelloby

RegEx – ^cat.*by$

Result – cathelloby

Above RegEx will not match 'cathelloby anything else' . To match this string the RegEx will be '^cat.*else$'.

Using Multiline Mode

^ matches the start of a string and $ matches the end of a string—usually. There is an exception,or rather, a way to change this behavior. Many regular expression implementations support the use of special metacharacters that modify the behavior of other metacharacters, and one of these is (?m),whichenables multiline mode. Multiline mode forces the regular expression engine to treat line breaks as a string separator,so that ^ matches the start of a string or the start after a line break (a new line), and $ matches the end of a string or the end after a line break. If used, (?m) must be placed at the very front of the pattern.

Note : (?m) is not supported by many regular expression implementations. Some regular expression implementations also support the use of \A to mark the start of a string and \Z to mark the end of a string. If supported, these metacharacters function much like ^ and $, respectively, but unlike ^ and $, they are not modified by (?m) and willtherefore not operate in multiline mode.