Regular expression to get html meta description
When we need to process a HTML page source code, we often need to retrieve the meta description of the page besides the links in the page. This description is usually located in <meta> tag of a HTML page. The meta description is very useful for search engine index. How can we retrieve the meta description? If we use a regular expression, we can easily get the meta description.
var pattern = /<meta.*?name="description".*?content="(.*?)".*?>|<meta.*?content="(.*?)".*?name="description".*?>/i;
since the description is the content in the <meta> tag with a property name which has a value of description. So we need to find this tag and then use parenthesis to group the description for later retrieval. Also here we use a | character to separate the two sub patterns , the meta tag can have either sub pattern above.
Suppose now we have a sample code snippet which contains
var data='<meta name="description" content="This is a sample code snippet">';
when we run
The returned arr is an array with 3 elements if it's matched. The first one arr is the matched content in the data variable, arr is the content matched in the first parenthesis in the pattern, arr is the content in the second parenthesis in the pattern. If the first sub pattern is matched, then arr will contain the description and arr will be empty. Otherwise, arr will be empty and arr will contain the description.
In the above case, arr will be <meta name="description" content="This is a sample code snippet"> and arr will be This is a sample code snippet and arr will be empty.
In conclusion, to get the meta description you only need to check whether arr is empty or not, if it's empty, then the description is arr, otherwise it's arr.