Regular Expressions

aka "regex"

May 31, 2015


I've been interested in Regular Expressions for a while now, especially since I've always seen them used frequently in solutions on Stack Overflow. Very briefly defined, a regular expression is a sequence of characters that defines a search pattern. You'll see them used mostly for string matching.


If you've ever seen anything like this: \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b, it can look pretty intimidating! But I'll try to break it down with some simple examples sourced from a couple great websites.


Firstly, it's VERY important to know what the symbols of regex stand for. Jeffrey Way of Tuts Plus suggests memorizing them. No other way around it:


  • . - Matches any character, except newline.
  • * - Matches 0 or more of the preceding character.
  • + - Matches 1 or more of the preceding character.
  • ? - Preceding character is optional. Matches 0 or 1 occurrence.
  • \d - Matches any single digit
  • \w - Matches any word character (alphanumeric & underscore).
  • [XYZ] - Matches any single character from the character class.
  • [XYZ]+ - Matches one or more of any of the characters in the set.
  • $ - Matches the end of the string.
  • ^ - Matches the beginning of a string.
  • [^a-z] - When inside of a character class, the ^ means NOT; in this case, match anything that is NOT a lowercase letter.
Source: TutsPlus

Let's start with a really simple example. Here, we are looking for the word "ruby" in the phrase on the right side of the =~ operator. If found, it returns the index where the first instance is found. You can also use .match() in place of =~.


/ruby/ =~ "I'm using Ruby!" #=> 8
/costa rica/.match("We're going to New York.") #=> nil
/st/.match('haystack') #=> "st"

Now things get a little trickier. Let's look at an example using regex match and replace. I found an excellent example from the lovely Liz Abanante's blog! We want to separate an integer with commas every 3rd integer, so that 100000 would output to 100,000. [This is one of the challenges we were given!]


Let's say Liz has a method called separate_comma, and it could look like this:


          def separate_comma(number)
            comma_num = number.to_s
            comma_num.reverse.gsub(/...(?=.)/, '\&,').reverse
          end
           

How does this work? Well, firstly, gsub means global substitution. It takes two arguments, the first of which is our match, and the second is what we want to replace that with. The . (period) in regex can mean anything! So:


          num.reverse.gsub(/...
        

calls for 3 consecutive "anything" characters. It then uses the (?=.) expression to *find* these anything characters. We then substitute /...(?=.)/ with a comma: '\&,' (\& is the matched text!)

Hope that made sense! For more examples, see Ruby's official docs: https://ruby-doc.org/core-2.2.0/Regexp.html


That's it for now! Thanks for reading.