Rejecting unwanted characters from input

It seems that some common elements of programming stump us from time to time. Take the task of filtering a input search string in PHP to remove unwanted characters. Using a RegEx many developers find it easy to search for a substring, but find it difficult to use the same to reject some particular characters from a string. A simple solution is shown below, which rejects all the characters from the input except alphanumeric and a space.

$search = "the great /%&&world ,fair of 1964";
$cleaned = preg_replace("/[^A-Za-z0-9 ]/", "", $search);

Returns:

the great world fair of 1964

The important part of the regular expression is the caret ^ along with the character class […]. The normal character class will match elements specified in the class. For example the following will match the alphanumeric characters and a space in the class and replace them with a empty character, effectively removing them, because we have specified a empty string as the second parameter in preg_replace.

$search = "the great /%&&world ,fair of 1964";
$cleaned = preg_replace("/[A-Za-z0-9 ]/", "", $search);

Returns:

/%&&,

However if we use a negated character class, which is a character class starting with a caret ^ sign, we invert the meaning of the class. Now it means match all the characters not in the character class, which basically removes all the unwanted characters.

If you are using ereg_replace then you need to use the following,which does not have any delimiters. Note that this function has been DEPRECATED as of PHP 5.3.0, and will throw a ‘Deprecated’ error.

$cleaned = ereg_replace("[^A-Za-z0-9 ]", "", $search);


One thought on “Rejecting unwanted characters from input

Comments are closed.