Saturday, August 2, 2008

isValidEmail

This week we're going to build on the regular expression we wrote last week to validate e-mail addresses. What do IP addresses have to do with e-mail addresses? Just like domain names map to IP addresses, so also the domain part of an e-mail address can be substituted with an IP address, so instead of person@example.com you could have person@192.168.1.1


ASP

  1. function isValidEmail(email)
  2.     dim regEx
  3.     dim result
  4.     set regEx = new RegExp
  5.     with regEx
  6.         .IgnoreCase = True
  7.         .Global = True
  8.         .Pattern = "^[^@]{1,64}@[^@]{1,255}$"
  9.     end with
  10.     result = false
  11.     ' Test length.
  12.     if regEx.Test(email) then
  13.         regEx.Pattern = "^((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\"))@(([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?)$"
  14.         ' Test syntax.
  15.         if regEx.Test(email) then
  16.             result = true
  17.         end if
  18.     end if
  19.     isValidEmail = result
  20.     set regEx = nothing
  21. end function

PHP

  1. function isValidEmail($email)
  2. {
  3.     $lengthPattern = "/^[^@]{1,64}@[^@]{1,255}$/";
  4.     $syntaxPattern = "/^((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\"))@(([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?)$/";
  5.     return ((preg_match($lengthPattern, $email) > 0) && (preg_match($syntaxPattern, $email) > 0)) ? true : false;
  6. }

The validation is broken down into two steps: checking the length of each part, and checking the syntax of each part.


^[^@]{1,64}@[^@]{1,255}$


The part before the @ symbol is called the local part, and cannot exceed 64 characters. The part after the @ symbol is called the domain part, and cannot exceed 255 characters.


^((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\"))@(([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?)$


In the check for syntax, the local part is validated by ((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\")). This is actually two patterns separated by the pipe character. The first, (([\w\+\-]+)(\.[\w\+\-]+)*), allows letters, numbers, the plus sign, and the hyphen (or minus sign if you prefer). It also allows periods, but not as the first or last character. The second, (\"[^(\\|\")]{0,62}\"), allows just about anything, provided the local part is enclosed in quotation marks (which is valid, but you'll probably never encounter it).


The domain part is validated by (([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?). Once again, this is two different patterns separated by the pipe character. The second pattern is our IP address checker from last week with optional enclosure in square brackets. The first pattern, ([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,}), allows a slightly smaller range of characters (no plus signs or underscores), any number of subdomains, and a top-level domain of at least 2 characters (the minimum). Some regular expressions will impose a maximum of six characters on the top-level domain (the longest at the moment is .museum), but that wouldn't allow for longer top-level domains that could be created in the future.


No comments: