294
The best answer on StackOverflow: Using RegEx to parse HTML
(stackoverflow.com)
Post funny things about programming here! (Or just rant about your favourite programming language.)
It can't be done, as an opening tag in html can contain anything in its attributes, even JavaScript (e.g. onclick handler).
??? Non sequitur
You can't parse every html opening tag with regex, because a html opening tag doesn't have a set structure. How would you match, with regex, this opening tag?
<mytag myattribute="<value of \"myattribute\">" >
Is this valid HTML? My understanding is that that attribute value needs to be escaped, i.e
<value of \"myattribute\">
.The quote must not be escaped when you start with a single quote. The rest doesn't. This is valid and tested:
<img alt='my "<img>"'>