2009-07-26

Regular Expressions - Negative Lookahead

/a(?!b)/ will match an a that's not followed by a b (with the not-b character not being included in the match)

this can be built upon to match ranges that dont contain a string.

/^((?!DONT_MATCH).)*$/ will match full lines that dont contain the string "DONT_MATCH"


An example using negative look ahead in order to close <p> tags in poorly formed html:
$ irb
>> x="<p>an un-closed paragraph tag <p>and a properly formed paragraph</p>"
=> "<p>an un-closed paragraph tag <p>and a properly formed paragraph</p>"
>> x.sub(/<p>((?!</p><p>).)*/,"")
=> "<p>and a properly formed paragraph</p>"
>> x.sub(/<p>(((?!</p><p>).)*)/,"<p>\\1</p>")
=> "<p>an un-closed paragraph tag </p><p>and a properly formed paragraph</p>"


!@#$%^!! blogger seems to have "helpfully" correcting my bad html example when I tried to use the wysiwyg editor. attempt 2 as plain html.

Labels: