Reusable Code: August 2008

Saturday, August 30, 2008

Roman Numerals, Part 3

Back in March, we wrote a function to turn an arabic number into a Roman numeral. That function had a limitation of numbers smaller than 5000. There are ways of writing Roman numerals larger than 5000, but they are not as well accepted by purists because they evolved during later time periods. Out of respect for the purists, this new function will hinge on the old one, rather than replace it.

In the system we will be using, a bar is placed over the numeral to indicate that it is multipled by 1000:

V = 5000
X = 10,000
L = 50,000
C = 100,000
D = 500,000
M = 1 million

Since there are no Unicode characters for this purpose, we will have to cheat a little bit by using some HTML and CSS.

ASP

function bigroman(ByVal arabic)
dim thousands
thousands = Array("", "M", "MM", "MMM", "M(V)", "(V)", "(V)M", "(V)MM", "(V)MMM", "M(X)")
if arabic >= 10000 then
bigroman = "(" & roman((arabic - (arabic mod 10000)) / 1000) & ")"
arabic = arabic mod 10000
end if
bigroman = bigroman & thousands((arabic - (arabic mod 1000)) / 1000)
arabic = arabic mod 1000
bigroman = bigroman & roman(arabic)
' Convert parentheses to tags.
bigroman = replace(bigroman, "(", "")
bigroman = replace(bigroman, ")", "")
end function

PHP

function bigroman($arabic)
{
$thousands = Array("", "M", "MM", "MMM", "M(V)", "(V)", "(V)M", "(V)MM", "(V)MMM", "M(X)");
if ($arabic >= 10000)
{
$bigroman = "(" . roman(($arabic - fmod($arabic, 10000)) / 1000) . ")";
$arabic = fmod($arabic, 10000);
}
$bigroman .= $thousands[($arabic - fmod($arabic, 1000)) / 1000];
$arabic = fmod($arabic, 1000);
$bigroman .= roman($arabic);
// Convert parentheses to tags.
$bigroman = str_replace("(", "", $bigroman);
$bigroman = str_replace(")", "", $bigroman);
return $bigroman;
}

Saturday, August 23, 2008

isValidPostCode

In this final installment of the postal code trilogy, we turn our attention to the United Kingdom. Postal codes in the UK are called postcodes. They are similar to postal codes in Canada in that they contain both letters and numbers, but unlike Canadian postal codes, they are variable in length.

A postcode can have any of the following formats:

A9 9AA
A99 9AA
A9A 9AA
AA9 9AA
AA99 9AA
AA9A 9AA

To match all of these formats, we'll use the following regular expression: [A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKS-UW])\ [0-9][ABD-HJLNP-UW-Z]{2}. You may notice that, like Canadian postal codes, certain letters are only allowed in certain positions, or not at all.

There are also a few special cases that are valid postcodes but deviate from the regular format:

Girobank - (GIR\ 0AA)
Father Christmas - (SAN\ TA1)
British Forces Post Office - (BFPO\ (C\/O\ )?[0-9]{1,4})
Overseas territories - ((ASCN|BBND|[BFS]IQQ|PCRN|STHL|TDCU|TKCA)\ 1ZZ)

ASP

function isValidPostCode(postCode)
dim regEx
set regEx = new RegExp
with regEx
.IgnoreCase = true
.Global = true
.Pattern = "^([A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKS-UW])\ [0-9][ABD-HJLNP-UW-Z]{2}|(GIR\ 0AA)|(SAN\ TA1)|(BFPO\ (C\/O\ )?[0-9]{1,4})|((ASCN|BBND|[BFS]IQQ|PCRN|STHL|TDCU|TKCA)\ 1ZZ))$"
end with
if regEx.Test(trim(CStr(postCode))) then
isValidPostCode = true
else
isValidPostCode = false
end if
set regEx = nothing
end function

PHP

function isValidPostCode($postCode)
{
$pattern = "/^([A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKS-UW])\ [0-9][ABD-HJLNP-UW-Z]{2}|(GIR\ 0AA)|(SAN\ TA1)|(BFPO\ (C\/O\ )?[0-9]{1,4})|((ASCN|BBND|[BFS]IQQ|PCRN|STHL|TDCU|TKCA)\ 1ZZ))$/i";
return (preg_match($pattern, trim($postCode)) > 0) ? true : false;
}

Saturday, August 16, 2008

isValidPostalCode

This week we're turning our attention to my own country, Canada, and writing a function to validate postal codes. Unlike a ZIP code, a postal code contains letters too; the format is A1A 1A1. The simplest regular expression to validate this would be:

^[A-Z]{1}[\d]{1}[A-Z]{1}[ ]?[\d]{1}[A-Z]{1}[\d]{1}$

But not all letters are used, and some letters can only appear in certain positions. Taking this into account gives us a slightly more complex pattern of:

^[ABCEGHJ-NPRSTVXY]{1}[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[ ]?[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[0-9]{1}$

But I want to take this a step further and check the validity of the combination of the first three characters. The postal code C2B 4S3 has a valid format, but the code itself is not valid because there is no C2B postal area. I also want to check the postal code against the province to ensure that they match.

The full source code is too long to post here, but is available in its entirety from Snipplr in the language of your choice:

Alternatively, if you wanted the extended validation, but didn't care about the province matching, you could combine everything into one gigantic regular expression:

^(A(0[ABCEGHJ-NPR]|1[ABCEGHK-NSV-Y]|2[ABHNV]|5[A]|8[A])|B(0[CEHJ-NPRSTVW]|1[ABCEGHJ-NPRSTV-Y]|2[ABCEGHJNRSTV-Z]|3[ABEGHJ-NPRSTVZ]|4[ABCEGHNPRV]|5[A]|6[L]|9[A])|C(0[AB]|1[ABCEN])|E(1[ABCEGHJNVWX]|2[AEGHJ-NPRSV]|3[ABCELNVYZ]|4[ABCEGHJ-NPRSTV-Z]|5[ABCEGHJ-NPRSTV]|6[ABCEGHJKL]|7[ABCEGHJ-NP]|8[ABCEGJ-NPRST]|9[ABCEGH])|G(0[ACEGHJ-NPRSTV-Z]|1[ABCEGHJ-NPRSTV-Y]|2[ABCEGJ-N]|3[ABCEGHJ-NZ]|4[ARSTVWXZ]|5[ABCHJLMNRTVXYZ]|6[ABCEGHJKLPRSTVWXZ]|7[ABGHJKNPSTXYZ]|8[ABCEGHJ-NPTVWYZ]|9[ABCHNPRTX])|H(0[HM]|1[ABCEGHJ-NPRSTV-Z]|2[ABCEGHJ-NPRSTV-Z]|3[ABCEGHJ-NPRSTV-Z]|4[ABCEGHJ-NPRSTV-Z]|5[AB]|7[ABCEGHJ-NPRSTV-Y]|8[NPRSTYZ]|9[ABCEGHJKPRSWX])|J(0[ABCEGHJ-NPRSTV-Z]|1[ACEGHJ-NRSTXZ]|2[ABCEGHJ-NRSTWXY]|3[ABEGHLMNPRTVXYZ]|4[BGHJ-NPRSTV-Z]|5[ABCJ-MRTV-Z]|6[AEJKNRSTVWYXZ]|7[ABCEGHJ-NPRTV-Z]|8[ABCEGHLMNPRTVXYZ]|9[ABEHJLNTVXYZ])|K(0[ABCEGHJ-M]|1[ABCEGHJ-NPRSTV-Z]|2[ABCEGHJ-MPRSTVW]|4[ABCKMPR]|6[AHJKTV]|7[ACGHK-NPRSV]|8[ABHNPRV]|9[AHJKLV])|L(0[[ABCEGHJ-NPRS]]|1[ABCEGHJ-NPRSTV-Z]|2[AEGHJMNPRSTVW]|3[BCKMPRSTVXYZ]|4[ABCEGHJ-NPRSTV-Z]|5[ABCEGHJ-NPRSTVW]|6[ABCEGHJ-MPRSTV-Z]|7[ABCEGJ-NPRST]|8[EGHJ-NPRSTVW]|9[ABCGHK-NPRSTVWYZ])|M(1[BCEGHJ-NPRSTVWX]|2[HJ-NPR]|3[ABCHJ-N]|4[ABCEGHJ-NPRSTV-Y]|5[ABCEGHJ-NPRSTVWX]|6[ABCEGHJ-NPRS]|7[AY]|8[V-Z]|9[ABCLMNPRVW])|N(0[ABCEGHJ-NPR]|1[ACEGHKLMPRST]|2[ABCEGHJ-NPRTVZ]|3[ABCEHLPRSTVWY]|4[BGKLNSTVWXZ]|5[ACHLPRV-Z]|6[ABCEGHJ-NP]|7[AGLMSTVWX]|8[AHMNPRSTV-Y]|9[ABCEGHJKVY])|P(0[ABCEGHJ-NPRSTV-Y]|1[ABCHLP]|2[ABN]|3[ABCEGLNPY]|4[NPR]|5[AEN]|6[ABC]|7[ABCEGJKL]|8[NT]|9[AN])|R(0[ABCEGHJ-M]|1[ABN]|2[CEGHJ-NPRV-Y]|3[ABCEGHJ-NPRSTV-Y]|4[AHJKL]|5[AGH]|6[MW]|7[ABCN]|8[AN]|9[A])|S(0[ACEGHJ-NP]|2[V]|3[N]|4[AHLNPRSTV-Z]|6[HJKVWX]|7[HJ-NPRSTVW]|9[AHVX])|T(0[ABCEGHJ-MPV]|1[ABCGHJ-MPRSV-Y]|2[ABCEGHJ-NPRSTV-Z]|3[ABCEGHJ-NPRZ]|4[ABCEGHJLNPRSTVX]|5[ABCEGHJ-NPRSTV-Z]|6[ABCEGHJ-NPRSTVWX]|7[AENPSVXYZ]|8[ABCEGHLNRSVWX]|9[ACEGHJKMNSVWX])|V(0[ABCEGHJ-NPRSTVWX]|1[ABCEGHJ-NPRSTV-Z]|2[ABCEGHJ-NPRSTV-Z]|3[ABCEGHJ-NRSTV-Y]|4[ABCEGK-NPRSTVWXZ]|5[ABCEGHJ-NPRSTV-Z]|6[ABCEGHJ-NPRSTV-Z]|7[ABCEGHJ-NPRSTV-Y]|8[ABCGJ-NPRSTV-Z]|9[ABCEGHJ-NPRSTV-Z])|X(0[ABCGX]|1[A])|Y(0[AB]|1[A]))[ ]?[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[0-9]{1}$

Saturday, August 9, 2008

isValidZipCode

Another week, another validation function. Last week was a little long, so this week we'll do a bit shorter one: validating a United States ZIP code. The pattern is simple enough that I don't think an explanation is warranted.

ASP

function isValidZIPCode(zipCode)
dim regEx
set regEx = new RegExp
with regEx
.IgnoreCase = True
.Global = True
.Pattern = "^[0-9]{5}(-[0-9]{4})?$"
end with
if regEx.Test(trim(CStr(zipCode))) then
isValidZipCode = True
else
isValidZipCode = False
end if
set regEx = nothing
end function

PHP

function isValidZIPCode($zipCode)
{
return (preg_match("/^[0-9]{5}(-[0-9]{4})?$/i", trim($zipCode)) > 0) ? true : false;
}

Saturday, August 2, 2008

isValidEmail

This week we're going to build on the regular expression we wrote last week to validate e-mail addresses. What do IP addresses have to do with e-mail addresses? Just like domain names map to IP addresses, so also the domain part of an e-mail address can be substituted with an IP address, so instead of person@example.com you could have person@192.168.1.1

ASP

function isValidEmail(email)
dim regEx
dim result
set regEx = new RegExp
with regEx
.IgnoreCase = True
.Global = True
.Pattern = "^[^@]{1,64}@[^@]{1,255}$"
end with
result = false
' Test length.
if regEx.Test(email) then
regEx.Pattern = "^((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\"))@(([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?)$"
' Test syntax.
if regEx.Test(email) then
result = true
end if
end if
isValidEmail = result
set regEx = nothing
end function

PHP

function isValidEmail($email)
{
$lengthPattern = "/^[^@]{1,64}@[^@]{1,255}$/";
$syntaxPattern = "/^((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\"))@(([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?)$/";
return ((preg_match($lengthPattern, $email) > 0) && (preg_match($syntaxPattern, $email) > 0)) ? true : false;
}

The validation is broken down into two steps: checking the length of each part, and checking the syntax of each part.

^[^@]{1,64}@[^@]{1,255}$

The part before the @ symbol is called the local part, and cannot exceed 64 characters. The part after the @ symbol is called the domain part, and cannot exceed 255 characters.

^((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\"))@(([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?)$

In the check for syntax, the local part is validated by ((([\w\+\-]+)(\.[\w\+\-]+)*)|(\"[^(\\|\")]{0,62}\")). This is actually two patterns separated by the pipe character. The first, (([\w\+\-]+)(\.[\w\+\-]+)*), allows letters, numbers, the plus sign, and the hyphen (or minus sign if you prefer). It also allows periods, but not as the first or last character. The second, (\"[^(\\|\")]{0,62}\"), allows just about anything, provided the local part is enclosed in quotation marks (which is valid, but you'll probably never encounter it).

The domain part is validated by (([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,})|\[?([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3}\]?). Once again, this is two different patterns separated by the pipe character. The second pattern is our IP address checker from last week with optional enclosure in square brackets. The first pattern, ([a-zA-Z0-9\-]+\.)+([a-zA-Z0-9]{2,}), allows a slightly smaller range of characters (no plus signs or underscores), any number of subdomains, and a top-level domain of at least 2 characters (the minimum). Some regular expressions will impose a maximum of six characters on the top-level domain (the longest at the moment is .museum), but that wouldn't allow for longer top-level domains that could be created in the future.

Reusable Code

Saturday, August 30, 2008

Roman Numerals, Part 3

ASP

PHP

Saturday, August 23, 2008

isValidPostCode

ASP

PHP

Saturday, August 16, 2008

isValidPostalCode

Saturday, August 9, 2008

isValidZipCode

ASP

PHP

Saturday, August 2, 2008

isValidEmail

ASP

PHP

About Me

Flair

License

Snipplr

Labels

Further Reading

Blog Archive

Reusable Code

Saturday, August 30, 2008

Roman Numerals, Part 3

ASP

PHP

Saturday, August 23, 2008

isValidPostCode

ASP

PHP

Saturday, August 16, 2008

isValidPostalCode

Saturday, August 9, 2008

isValidZipCode

ASP

PHP

Saturday, August 2, 2008

isValidEmail

ASP

PHP

About Me

Flair

License

Snipplr

Subscribe To

Labels

Further Reading

Blog Archive