Saturday, August 30, 2008

Roman Numerals, Part 3

Back in March, we wrote a function to turn an arabic number into a Roman numeral. That function had a limitation of numbers smaller than 5000. There are ways of writing Roman numerals larger than 5000, but they are not as well accepted by purists because they evolved during later time periods. Out of respect for the purists, this new function will hinge on the old one, rather than replace it.

In the system we will be using, a bar is placed over the numeral to indicate that it is multipled by 1000:

  • V = 5000
  • X = 10,000
  • L = 50,000
  • C = 100,000
  • D = 500,000
  • M = 1 million

Since there are no Unicode characters for this purpose, we will have to cheat a little bit by using some HTML and CSS.


  1. function bigroman(ByVal arabic)
  2.     dim thousands
  3.     thousands = Array("", "M", "MM", "MMM", "M(V)", "(V)", "(V)M", "(V)MM", "(V)MMM", "M(X)")
  4.     if arabic >= 10000 then
  5.         bigroman = "(" & roman((arabic - (arabic mod 10000)) / 1000) & ")"
  6.         arabic = arabic mod 10000
  7.     end if
  8.     bigroman = bigroman & thousands((arabic - (arabic mod 1000)) / 1000)
  9.     arabic = arabic mod 1000
  10.     bigroman = bigroman & roman(arabic)
  11.     ' Convert parentheses to <span> tags.
  12.     bigroman = replace(bigroman, "(", "<span style=""text-decoration: overline"">")
  13.     bigroman = replace(bigroman, ")", "</span>")
  14. end function


  1. function bigroman($arabic)
  2. {
  3.     $thousands = Array("", "M", "MM", "MMM", "M(V)", "(V)", "(V)M", "(V)MM", "(V)MMM", "M(X)");
  4.     if ($arabic >= 10000)
  5.     {
  6.         $bigroman = "(" . roman(($arabic - fmod($arabic, 10000)) / 1000) . ")";
  7.         $arabic = fmod($arabic, 10000);
  8.     }
  9.     $bigroman .= $thousands[($arabic - fmod($arabic, 1000)) / 1000];
  10.     $arabic = fmod($arabic, 1000);
  11.     $bigroman .= roman($arabic);
  12.     // Convert parentheses to <span> tags.
  13.     $bigroman = str_replace("(", "<span style=""text-decoration: overline"">", $bigroman);
  14.     $bigroman = str_replace(")", "</span>", $bigroman);
  15.     return $bigroman;
  16. }

Saturday, August 23, 2008


In this final installment of the postal code trilogy, we turn our attention to the United Kingdom. Postal codes in the UK are called postcodes. They are similar to postal codes in Canada in that they contain both letters and numbers, but unlike Canadian postal codes, they are variable in length.

A postcode can have any of the following formats:

  • A9 9AA
  • A99 9AA
  • A9A 9AA
  • AA9 9AA
  • AA99 9AA
  • AA9A 9AA

To match all of these formats, we'll use the following regular expression: [A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKS-UW])\ [0-9][ABD-HJLNP-UW-Z]{2}. You may notice that, like Canadian postal codes, certain letters are only allowed in certain positions, or not at all.

There are also a few special cases that are valid postcodes but deviate from the regular format:

  • Girobank - (GIR\ 0AA)
  • Father Christmas - (SAN\ TA1)
  • British Forces Post Office - (BFPO\ (C\/O\ )?[0-9]{1,4})
  • Overseas territories - ((ASCN|BBND|[BFS]IQQ|PCRN|STHL|TDCU|TKCA)\ 1ZZ)


  1. function isValidPostCode(postCode)
  2.     dim regEx
  3.     set regEx = new RegExp
  4.     with regEx
  5.         .IgnoreCase = true
  6.         .Global = true
  7.         .Pattern = "^([A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKS-UW])\ [0-9][ABD-HJLNP-UW-Z]{2}|(GIR\ 0AA)|(SAN\ TA1)|(BFPO\ (C\/O\ )?[0-9]{1,4})|((ASCN|BBND|[BFS]IQQ|PCRN|STHL|TDCU|TKCA)\ 1ZZ))$"
  8.     end with
  9.     if regEx.Test(trim(CStr(postCode))) then
  10.         isValidPostCode = true
  11.     else
  12.         isValidPostCode = false
  13.     end if
  14.     set regEx = nothing
  15. end function


  1. function isValidPostCode($postCode)
  2. {
  3.     $pattern = "/^([A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKS-UW])\ [0-9][ABD-HJLNP-UW-Z]{2}|(GIR\ 0AA)|(SAN\ TA1)|(BFPO\ (C\/O\ )?[0-9]{1,4})|((ASCN|BBND|[BFS]IQQ|PCRN|STHL|TDCU|TKCA)\ 1ZZ))$/i";
  4.     return (preg_match($pattern, trim($postCode)) > 0) ? true : false;
  5. }

Saturday, August 16, 2008


This week we're turning our attention to my own country, Canada, and writing a function to validate postal codes. Unlike a ZIP code, a postal code contains letters too; the format is A1A 1A1. The simplest regular expression to validate this would be:

^[A-Z]{1}[\d]{1}[A-Z]{1}[ ]?[\d]{1}[A-Z]{1}[\d]{1}$

But not all letters are used, and some letters can only appear in certain positions. Taking this into account gives us a slightly more complex pattern of:

^[ABCEGHJ-NPRSTVXY]{1}[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[ ]?[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[0-9]{1}$

But I want to take this a step further and check the validity of the combination of the first three characters. The postal code C2B 4S3 has a valid format, but the code itself is not valid because there is no C2B postal area. I also want to check the postal code against the province to ensure that they match.

The full source code is too long to post here, but is available in its entirety from Snipplr in the language of your choice:

Alternatively, if you wanted the extended validation, but didn't care about the province matching, you could combine everything into one gigantic regular expression:


Saturday, August 9, 2008


Another week, another validation function. Last week was a little long, so this week we'll do a bit shorter one: validating a United States ZIP code. The pattern is simple enough that I don't think an explanation is warranted.


  1. function isValidZIPCode(zipCode)
  2.     dim regEx
  3.     set regEx = new RegExp
  4.     with regEx
  5.         .IgnoreCase = True
  6.         .Global = True
  7.         .Pattern = "^[0-9]{5}(-[0-9]{4})?$"
  8.     end with
  9.     if regEx.Test(trim(CStr(zipCode))) then
  10.         isValidZipCode = True
  11.     else
  12.         isValidZipCode = False
  13.     end if
  14.     set regEx = nothing
  15. end function


  1. function isValidZIPCode($zipCode)
  2. {
  3.     return (preg_match("/^[0-9]{5}(-[0-9]{4})?$/i", trim($zipCode)) > 0) ? true : false;
  4. }

Saturday, August 2, 2008


This week we're going to build on the regular expression we wrote last week to validate e-mail addresses. What do IP addresses have to do with e-mail addresses? Just like domain names map to IP addresses, so also the domain part of an e-mail address can be substituted with an IP address, so instead of you could have person@


  1. function isValidEmail(email)
  2.     dim regEx
  3.     dim result
  4.     set regEx = new RegExp
  5.     with regEx
  6.         .IgnoreCase = True
  7.         .Global = True
  8.         .Pattern = "^[^@]{1,64}@[^@]{1,255}$"
  9.     end with
  10.     result = false
  11.     ' Test length.
  12.     if regEx.Test(email) then
  13.         regEx.Pattern = "^((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\"))@(([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?)$"
  14.         ' Test syntax.
  15.         if regEx.Test(email) then
  16.             result = true
  17.         end if
  18.     end if
  19.     isValidEmail = result
  20.     set regEx = nothing
  21. end function


  1. function isValidEmail($email)
  2. {
  3.     $lengthPattern = "/^[^@]{1,64}@[^@]{1,255}$/";
  4.     $syntaxPattern = "/^((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\"))@(([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?)$/";
  5.     return ((preg_match($lengthPattern, $email) > 0) && (preg_match($syntaxPattern, $email) > 0)) ? true : false;
  6. }

The validation is broken down into two steps: checking the length of each part, and checking the syntax of each part.


The part before the @ symbol is called the local part, and cannot exceed 64 characters. The part after the @ symbol is called the domain part, and cannot exceed 255 characters.


In the check for syntax, the local part is validated by ((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\")). This is actually two patterns separated by the pipe character. The first, (([\w\+\-]+)(\.[\w\+\-]+)*), allows letters, numbers, the plus sign, and the hyphen (or minus sign if you prefer). It also allows periods, but not as the first or last character. The second, (\"[^(\\|\")]{0,62}\"), allows just about anything, provided the local part is enclosed in quotation marks (which is valid, but you'll probably never encounter it).

The domain part is validated by (([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?). Once again, this is two different patterns separated by the pipe character. The second pattern is our IP address checker from last week with optional enclosure in square brackets. The first pattern, ([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,}), allows a slightly smaller range of characters (no plus signs or underscores), any number of subdomains, and a top-level domain of at least 2 characters (the minimum). Some regular expressions will impose a maximum of six characters on the top-level domain (the longest at the moment is .museum), but that wouldn't allow for longer top-level domains that could be created in the future.