Today's Question:  What are you most afraid of as a programmer?        GIVE A SHOUT

Technical Article => Web =>  JavaScript

Get hostname from a URL using JavaScript

  Peter      2012-06-15 09:16:45      10,080    0

Sometimes we may have strings which contain some UR;s and we may want to retrieve the hostname from the URLs for some statistic use. For example, we may have a URL : We may want to retrieve the from the URL. How? Use regular expression. Here I give an example using JavaScript. If you want to check whether a string is a URL or not. Refer to Detect URLs in a Block of Text.

In JavaScript, we can have a regular expression like

var pattern=/(.+:\/\/)?([^\/]+)(\/.*)*/i;

the regular expression pattern can be used to get the hostname. There are three parenthesis in the pattern, they are used to group the strings together and when testing on the target string, the matched string blocks can be remembered and returned as an array. Then we can retrieve the hostname from the returned array. The first parenthesis is to check the protocol of the URL, protocol can be http://, https://,ftp:// or file://. It can have zero or one occurrence of the protocol in one valid URL. The second parenthesis is to match the hostname, anything before the first occurrence of '/' after the protocol string will belong to the hostname. If  no '/' present, then the whole string after the protocol string is the hostname. The third parenthesis is to match all the rest after the hostname.

For example, if we have a URL string

var url="";

After we run

var arr=pattern.exec(url);

The returned array arr will contain 4 elements. The arr[0] is the matched URL string which is arr[1] contains the http:// which is the string block matched in the first parenthesis; arr[2] is the hostname which is the matched string block in the second parenthesis; arr[3] is /aboutus.html which is the matched string block in the third parenthesis.

What if we don't have http:// at the beginning of a URL? We can still use this pattern, it will still return an array of 4 items. The only difference is that arr[1] is empty since no matched string block. It is the same if the URL doesn't have /index.html or any other similar blocks appended, in this case arr[3] will be empty.

So for any valid URL, we can get the hostname with arr[2].  Hope this can help you when you want to know which host of a URL belongs to. This pattern can also be used in other programming languages.



Share on Facebook  Share on Twitter  Share on Google+  Share on Weibo  Share on Reddit  Share on Digg  Share on Tumblr    Delicious



No comment for this article.


Copy paste issue

By sonic0002