294
The best answer on StackOverflow: Using RegEx to parse HTML
(stackoverflow.com)
Post funny things about programming here! (Or just rant about your favourite programming language.)
You can't parse every html opening tag with regex, because a html opening tag doesn't have a set structure. How would you match, with regex, this opening tag?
<mytag myattribute="<value of \"myattribute\">" >
Is this valid HTML? My understanding is that that attribute value needs to be escaped, i.e
<value of \"myattribute\">
.The quote must not be escaped when you start with a single quote. The rest doesn't. This is valid and tested:
<img alt='my "<img>"'>