Tuesday, November 4, 2014

Advanced Regular Expression Features

If you want to employ all regular expressions constructs and features you can't make it with a simple shell command or a tool, but you need to use a programming language. I use Perl, because it has the best and longest established support for regular expression.

Parenthesized Subexpressions

Earlier, we saw that regex engines support metacharacters \1, \2, \3, etc. to refer to the text matched by parenthesized subexpressions. Perl and most other modern regex-endowed languages also provide a way to refer to the text matched by parenthesized subexpressions from code outside of the regular expression, after a match has been successfully completed.
  • Use the meta-characters \1, \2, \3 within the regular expression to refer to some text matched earlier during the same match attempt
  • Use the variable $1, $2, $3 in subsequent code to refer to text matched by the first, second, third, parenthesized subexpression
This paragraph takes as sample problem the validation of user input. We'll prompt the user to enter a length value in either Centimeters or Inches, read that value, and then verify it with a regular expression to make sure it's a number. If it matches the regex we calculate and display both centimeters and inches. Otherwise, we issue a warning message:
 print "Enter a length (fox example 20CM, 5IN):\n";
 $input = <STDIN>; 

 # ([0-9]+) and (CM|IN) are capturing parentheses
 if ($input =~ m/^([0-9]+)(CM|IN)$/) {

   # If it matches $1 is the number, $2 is "CM" or "IN".
   # We save $1 and $2 to named variables.
   $InputNum = $1; 
   $type = $2;
   # The input was centimeters, so calculate inches 
   if ($type eq "CM") {
     $centimeters = $InputNum;
     $inches = $centimeters / 2.54;
   } else {
     # If not "CM", it must be an "IN", so calculate centimeters
     $inches = $InputNum;
     $centimeters = $inches * 2.54;

   # We have both length, so display the results:
   printf "%.2f CM is %.2f IN\n", $centimeters, $inches;
} else {
   # The initial regex did not match, so issue a warning.
   print "Expecting a number followed by \"CM\" or \"IN\",\n";
   print "so I don't understand \"$input\".\n";
You can use the program like this:
 max$ perl conversion
 Enter a length (fox example 20CM, 5IN):
 30.00 cm is 11.81 in

No comments :

Post a Comment