Easy manipulation of URLs


Whether you are dynamically creating urls or changing existing ones, manipulation of urls is a frequent coding requirement during development; doing the same on short urls is easy, but quickly becomes complex for urls which have larger query parameters.
In this post we will see how we can use Net_URL2 Pear package to manipulate URLS.

General url sytnax

Before we start, a general URL syntax review will be useful. The most general form of a URL contains only two elements:

<scheme>:<scheme-specific-part>

The term scheme refers to a type of access method such as ftp, http, telnet, file etc; which describes the way the following resource is to be used. The rest of the url, after scheme, is dependent on the scheme type.

A complete generlized syntax for http, ftp is shown below.

<scheme>://<user>:<password>@<host>:<port>/<url-path>;<params>?
<query>#<fragment>

Installation

Net_URL2 being a Pear package we will use the Pear installer as below. I recommend to always use the Pear installer to download packages rather than downloading it manually as the Pear installer automatically downloads any dependent packages.

pear install Net_URL2-0.3.0

Reading url data

Now that we have seen how a general url looks like, its time to move on to real examples. In this example we will use the following sample url.

http://www.some-domain.com:80/search.php?q=beatles&id=56&cat=music

Below is an example using the Net_URL2 library and its output for the above url:

<?php
 
include('Net/URL2.php');
 
$url = new Net_URL2('http://www.some-domain.com:80/search.php?
                     q=beatles&id=56&cat=music');
 
echo "Host      :    " . $url->host . "\n";
echo "Protocol  :    " . $url->scheme. "\n";
echo "Port      :    " . $url->port . "\n";
echo "Path      :    " . $url->path . "\n";
 
echo "Query Variables: \n";
print_r($url->QueryVariables);
 
?>

Which will output the following:

Host      :    www.some-domain.com
Protocol  :    http
Port      :    80
Path      :    /search.php
Query String : 
Array
(
    [q] => beatles
    [id] => 56
    [cat] => music
)

Changing url data

We can as easily change various url parameters as we can read them.

.
.
$url->protocol = "https";
$url->path = "/my_search";
 
$queryVars = array();
 
/* Get the query variables array */
$queryVars = $url->QueryVariables;
 
/* Change some url parameters */
$queryVars ['q'] = "Scarlett Johansson";
$queryVars ['cat'] = "movies";
$queryVars ['pics'] = 1;
 
/* Save back the query variables array */
$url->QueryVariables = $queryVars;
 
/* Display the changed url */
echo $url->geturl();

Which will change the example url to the following:

https://www.some-domain.com:80/New_search.php?
q=Scarlett%20Johansson&id=56&cat=movies&pics=1

Note the changed parameter values, also note that we have added a new ‘pics’ parameter in the url.

We can also change the parameter values using a name,value pair.

/* Change the 'cat' parameter value to 'books' */
$url->setQueryVariable('cat', "books");

Or unset a parameter

/* This will remove the 'pics' parameter from the url */
$url->unsetQueryVariable('pics');

You can also easily get fragment url identifiers from a url. Fragment identifier locates a sub-location in a resource. If you have a url like the following:

http://www.some-domain.com/index.php#book_id

The fragment id can be reached by:

$url = new Net_URL2('http://www.some-domain.com/index.php#book_id');
 
/* Will return 'book_id' from the url */
echo $url->fragment;

If you are accessing a url using some credentials as below:

ftp://username:password@some-domain.com

You can get the username-password by:

.
.
echo "Username  :    " . $url->user . "\n";
echo "Password  :    " . $url->password . "\n";

Normalizing URLS

URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. Normalization helps you determine if two syntactically different URLs are equivalent.

We normalize a url as below.

$url = new Net_URL2('http://www.example.com/../a/b/../c/./d.html');
 
/* Returns 'http://www.example.com/a/c/d.html' */
echo $url->getNormalizedURL();

In conclusion

Net_URL2 package helps you quickly process urls, without resorting to complex regular expressions or string manipulation.

Additional information

RFC 3986
Uniform Resource Locator
URL normalization

This site is a digital habitat of Sameer Borate, a freelance web developer working in PHP, MySQL and WordPress. I also provide web scraping services, website design and development and integration of various Open Source API's. Contact me at metapix[at]gmail.com for any new project requirements and price quotes.

2 Responses

1

Guy Patterson

November 10th, 2009 at 6:28 am

Is this something one would only use during the development phase? I’m having a hard time coming up with reasons or scenarios to use this library … ? Where or when might someone use this on a production site?

Thanks,

Guy
http://www.nullamatix.com

sameer

November 12th, 2009 at 2:26 am

There are many – generating seo friendly urls, manipulating urls during redirection, logging urls etc.

Your thoughts

Sign up for fresh content in your email