Thursday, March 28, 2013

Basic Regular Expression Patterns II

Here I'll be discussing some common problems which can be solved with Regular expressions in Javascript.


1. Problem : Check whether a given phone number is in (xxx)xxx-xxx format.
   Answer :
   var patt = /^\([0-9]{3}\)[0-9]{3}-[0-9]{3}$/;
 patt.test("(123)456-678");  // true

  
   Note : The character "(" has been denoted as "\("; similarly the character ")". Both were escaped. This can be changed to "[(]" and "[)]" as shown in the example below.
  
   var patt = /^[(][0-9]{3}[)][0-9]{3}-[0-9]{3}$/;
  patt.test("(123)456-678");  // true


   Equivalent PHP statement would be as shown below :
  
   $patt = "/^\([0-9]{3}\)[0-9]{3}-[0-9]{3}$/";
  echo preg_match($patt,"(234)234-456");  // 1
  $patt = "/^[(][0-9]{3}[)][0-9]{3}-[0-9]{3}$/";
  echo preg_match($patt,"(234)234-456");  // 1

  
2. Problem : Check whether a given location is in city_name, state format. Example : New York, NY
   

   Answer :
   var patt = /^(.*),\s[A-Z]{2}$/;
 patt.test("Saint George Island, AK"); // true
 patt.test("Saint George Island,AK");  // false, a space was expected after comma


3. Problem : Check whether a given email is in name@domain.com format.
   

    Email formats can be of various types. It is really difficult to build a small REGEX to catch right formatted email addresses. The domain names can be like gmail.com or yahoo.ac.in, or 1234.com. Email addess can be as complex as  !#$%&'*+-/=?^_`{}|~@example.org or "()<>[]:,;@\\\"!#$%&'*+-/=?^_`{}| ~.a"@example.org or even " "@example.org. We would keep it to simple addresses like john.smith@dert.com or john.smith@yahoo.ac.in which falls within most used email formats.
   

   Answer :
   var patt = /^[A-Za-z0-9]{1}([A-Za-z0-9_]*)(\.[A-Za-z0-9_]+)*[a-zA-Z0-9_]*@([A-Za-z0-9]+)([A-Za-z0-9_]*)(\.[A-Za-z0-9]+)+$/;
 patt.test("john.smith@edu.co.au");

   

   Explanation ::
   a. ^[A-Za-z0-9]{1} means the string should start with alphanumeric character only. No dot or underscore is allowed at the beginning of email address.
   b. ([A-Za-z0-9_]*) means, next any alphanumeric or underscore may occur
   c. (\.[A-Za-z0-9_]+)* means, next a sequence of dot(.) and a series of characters may occur. For example, in j.g.h@gmail.com, the [.g] and [.h] portions are matched by this pattern. a._a._@gmail.com is a valid address but a._a._.@gmail.com is invalid and the local part [part before @] can not end with a dot (.).
   d. @ means the string should have a '@' character.
   e. ([A-Za-z0-9]+) means, after '@', alphanumeric character should only appear immediately. No dot(.) or underscore should appear immediately after '@'. For example, abc@_23.com is not valid address as underscore has appeared immediately after @.
   f. ([A-Za-z0-9_]*) means, after that, any length of alphanumeric characters including underscore may appear.
   g. (\.[A-Za-z0-9]+)+ means, next, any series of dot (.) and characters i.e ".ac", ".in" etc may occur minimum 1 time.
  
   However, we did not put a check on the following points.
   i. Domain names must end with ".com",".biz",".me", ".org",".net" etc.
   ii. Complex email addresses like "()<>[]:,;@\\\"!#$%&'*+-/=?^_`{}| ~.a"@example.org can not be tested with the pattern above.


Check the 1st part of this article here 
Check the 3rd part of this article here

No comments: