Tuesday, November 4, 2014

JavaScript Methods for Pattern Matching

returned valueString methods
position|-1string.search(regexp)
new_stringstring.replace(regexp, replacement_str)
array|nullstring.match(regexp)
arraystring.split(regexp)
returned valueRegExp methods
array|nullregexp.exec(string)
booleanregexp.test(string)

1. String Methods for Pattern Matching

Strings support four methods using regular expressions to perform pattern matching and search-and-replace operations.
1.1 The search method
The search(regexp) method takes a regular-expression argument and returns either the character position of the start of the first matching substring or −1 if there is no match. If the argument to search() is not a regular expression, it is converted to one by the RegExp constructor. The search() method does not support global searches.
"JavaScript".search(/script/i); // 4
1.2 The replace method
The replace method performs search and replace operations. The replace(regexp, replacement_str) method searches the string on which it is called for matches with the specified pattern. If the regular expression has the (g) global flag, the replace method replaces all matches in the string with the replacement string, otherwise, it replaces only the first match.
 var string = "I am going to the dentist today.";
 var new_string = string.replace(/today/ig, "tomorrow");
Recall that parenthesized subexpressions remember the text matched by the subexpression. If a $ followed by a digit appears in the replacement string, replace() replaces those two characters with the text that matches the specified subexpression.
 var string = "I need to fill up my car tank.";
 var new_string = string.replace(/([\w\s]+)car([\s\w]+)/ig, "$1motorcycle$2");
1.3 The match method
The match() method takes a regular expression as argument (or converts its argument to a regular expression) and returns an array that contains the results of the match or null if there is no match.

1. If the regular expression has the g flag set, the method returns an array of all matches that appear in the string.
 "1 plus 2 equals 3".match(/\d+/g) // returns ["1", "2", "3"]
2. If the regular expression does not have the g flag set, match() searches for the first match. In this case, the first element of the array is the matching string, and any remaining elements are the parenthesized subexpressions of the regular expression.
var regexp = /(\w+):\/\/([\w.]+)\/(\S*)/;
var string = "Check out the web page at http://www.example.com/interesting_page.htm";
var array = string.match(regexp);
if (array !== null) {
   var fullurl = array[0]; // "http://www.example.com/interesting_page.htm"
   var protocol = array[1]; // "http"
   var host = array[2]; // "www.example.com"
   var path = array[3]; // "interesting_page.htm"
}
Passing a nonglobal regular expression to the match() method of a string is the same as passing the string to the exec() method of the regular expression.
1.4 The split method
The split() method breaks the string on which it is called into an array of substrings, using the regular expression argument as a separator.
 "17:45:59".split(":"); // ["17","45","59"]

2. The RegExp Object

In javascript, regular expressions are represented as RegExp objects. The RegExp() constructor takes one or two string arguments and creates a new RegExp object:
regexp = new RegExp('pattern','flags');
regexp_literal = /pattern/flags;
  • The first argument to RegExp constructor is a string that contains the regular expression.
  • The second optional argument to RegExp() indicates the regular-expression flags: they can be a combination of the three g,i,m letters.
Let's build a regular expression to search for American phone numbers in a text. An American phone number is alike with 707-827-7019. We are using both RegExp constructor and the literal syntax:
// using literal syntax
var regexp = /(\d{3}[.-]?){2}\d{4}/g;
// using RegExp constructor
var regexp = new RegExp("(\\d{3}[.-]?){2}\\d{4}","g");
Note that both string literals and regular expressions use the \ character for escape sequences, so when you pass a regular expression to RegExp() as a string literal, you must replace each \ character with \\.
Question: when using regexp literals and when using regexp constructors syntax ?
Answer: use the RegExp() constructor to create a regular expression dynamically, for example to search for a string entered by the user.
2.1 RegExp Object Properties
Each RegExp object has five properties.
  • The source property is a read-only string that contains the text of the regular expression.
  • The global property is a read-only boolean value that specifies whether the regular expression has the g flag.
  • The ignoreCase property is a read-only boolean value that specifies whether the regular expression has the i flag.
  • The multiline property is a read-only boolean value that specifies whether the regular expression has the m flag.
  • The lastIndex property is a read/write integer. When the g flag is set, this property stores the position in the string at which the next search is to begin.
2.2 RegExp Object's Methods for Pattern Matching

2.2.1 The exec() method


exec() is the main RegExp objects method that perform pattern-matching operations and is similar to the string match() method.

The method regexp.exec(string) executes a regular expression on the argument string. If it does not find a match, it returns null. If it finds a match, it returns an array like the array returned by string.match(regexp) for nonglobal searches: the first element is the string that matched the regular expression, and any subsequent elements are the substrings that matched any parenthesized subexpressions. The array.index property contains the position at which the match occurred and the array.input property is the searched string.

The string.match(regexp) method returns an array of matches for global regular expression and an array with complete information of the single match for non-global regular expression. On the contrary, regex.exec(string) always returns an array containing a single match whether or not the regular expression has the global g flag.

When you call regex.exec() with a regular expression with the g flag, it sets the regex.lastIndex property to the character position immediately following the matched substring.
If you want to know all the regular expression matches in a string, you have to call exec() repeatedly.
var regexp = /Java/g;
var string = "JavaScript is more fun than Java!";
var array;
while((array = regexp.exec(string)) != null) {
    console.log("Matched '" + array[0] + "'" + " at position " + array.index);
}

2.2.2 The test() method

The regex.test(string) method takes a string and returns true if the string contains a match for the regular expression:
 var regexp = /java/i;
 regexp.test("JavaScript"); // Returns true

No comments :

Post a Comment