Xxaxx's Xperimints #2
Regular Expressions
"Regular expressions" look like Greek or some other totally incomprehensible language. I'm told they are actually reasonably simple, once you get the hang of them. Well, I can tell you that in the beginning they look like a cross between gibberish and magical incantations. Hopefully this tutorial will spread some light on the subject.
Regular Expressions are used for "matching". This is all part of the process of finding stuff that contains something or is equal to something.
Any character that isn't a special character mentioned below matches itself. This includes all letters and numbers, and some punctuation.
For example:
- Newbie -- matches the string "Newbie". It's important to realize that this is case-sensitive; this expression won't match "newbie".
Example
One of the first special characters is the dot '.' A dot '.' matches any character except a newline. So, the expression
- .ewbie -- will match the string "Newbie", and "Rewbie" and "Sewbie", etc. The program doesn't care if it makes sense. It just looks for a match. Example
The next special characters is the star '*' Any character followed by a star '*' matches that character repeated 0 or more times. Thus,
- N*ewbie -- matches "Newbie", "NNewbie", or "NNNNNewbie", or "ewbie". Example
One special form called dot-star ".*" matches any number of unspecified characters. This is a totally useful expression and you will find it as part of many expressions. Example
Related to "*" is the "+" and "?" special characters.
The plus "+" will match one or more of the preceding character.
- N+ewbie -- matches "Newbie", "NNewbie", or "NNNNNewbie". "ewbie" is not matched. Example
The question mark "?" will match one or none of the preceding character -- not multiple.
- N?ewbie -- matches "Newbie" or "ewbie". "NNewbie" is not matched. Example
- o* Zero or more o:s (i.e., "" or "o" or "oo" or "ooo" or "oooo" or ...)
- o+ One or more o:s (i.e., "o" or "oo" or "ooo" or "oooo" or...)
- o? Zero or one o:s (i.e., "" or "o")
You can use parentheses to group an expression for use with a modifier. So, the expression
- N(ew)+bie -- matches "Newbie", "Newewbie", "Newewewbie", etc. Example
If one character in a pattern could be one of several, you can use a character class. This is defined using the [ and ]. For example:
- N[aeiou]wbie -- matches "Nawbie", "Newbie", "Niwbie", "Nowbie", "Nuwbie". Example
A special case of the [ and ] character class definition is created by using '^' as the first character of a class.
For example:
- N[^aeiou]wbie -- matches "Nbwbie", "Ncwbie", "Ndwbie", etc. as long as it is NOT a e i o or u. Example
You can combine the class definition with the multiplier thing to get something like:
- [aeiou]+ -- matches any series of one or more vowel characters. Such as aeeiaouaaeuui. Example
- [^aeiou]+ -- matches any series of one or more non-vowel characters. Such as jjdklmnw. Example
There is one more extremely important special character -- the '|' (vertical-bar) character. It is used to match either of two expressions. For example:
- Newbie|Oldbie -- will match "Newbie" or "Oldbie" Example.
When not used inside a class definition [], the "^" indicates the beginning of a line. (I guess they didn't figure this was confusing enough.).
The '$' indicates the end of a line.
For example:
- ^Newbie -- matches Newbie at the beginning of a line. Example
- Newbie$ -- matches Newbie at the end of a line Example
- ^Newbie.*Newbie$ -- matches a line with Newbie at the beginning and at the end. Example
|