Today's Question:  What's your opinion about Alibaba mooncake incident?        GIVE A SHOUT

Technical Article => Web =>  JavaScript

Regular expression to get html meta description

  Peter      2012-07-03 10:09:20      6,540    0    0

When we need to process a HTML page source code, we often need to retrieve the meta description of the page besides the links in the page. This description is usually located in <meta> tag of a HTML page. The meta description is very useful for search engine index. How can we retrieve the meta description? If we use a regular expression, we can easily get the meta description.

In JavaScript, the regular expression looks like :

var pattern = /<meta.*?name="description".*?content="(.*?)".*?>|<meta.*?content="(.*?)".*?name="description".*?>/i;

since the description is the content in the <meta> tag with a property name which has a value of description. So we need to find this tag and then use parenthesis to group the description for later retrieval.  Also here we use a | character to separate the two sub patterns , the meta tag can have either sub pattern above.

Suppose now we have a sample code snippet which contains

var data='<meta name="description" content="This is a sample code snippet">';

when we run

var arr=pattern.exec(data);

The returned arr is an array with 3 elements if it's matched. The first one arr[0] is the matched content in the data variable, arr[1] is the content matched in the first parenthesis in the pattern, arr[2] is the content in the second parenthesis in the pattern. If the first sub pattern is matched, then arr[1] will contain the description and arr[2] will be empty. Otherwise, arr[1] will be empty and arr[2] will contain the description.

In the above case, arr[0] will be <meta name="description" content="This is a sample code snippet"> and arr[1] will be This is a sample code snippet and arr[2] will be empty.

In conclusion, to get the meta description you only need to check whether arr[1] is empty or not, if it's empty, then the description is arr[2], otherwise it's arr[1].



Share on Facebook  Share on Twitter  Share on Google+  Share on Weibo  Share on Reddit  Share on Digg  Share on Tumblr    Delicious



No comment for this article.


How Android will ruin a day

By sonic0002