Fixing invalid HTML tags with RegEx. Can this be done?

Off-topic chat. Post your favorite mod_rewrite jokes here!

Fixing invalid HTML tags with RegEx. Can this be done?

Postby Ahhhk » Mon Jul 07, 2008 10:38 pm

Hi!

I have the content for over 1000 articles in a db. Unfortunately, the person(s) that originally entered it all left off the closing paragraph tags and used random case for the P's.

Now, is it possible to do a regex replace to correct the missing tags/case?

In other words, I have:

<p>This is a title
<P>this is a paragraph
<P>this is another paragraph
<p>this is even more text

And want to change/replace it to (via PHP):

<p>This is a title</p>
<p>this is a paragraph</p>
<p>this is another paragraph</p>
<p>this is even more text </p>

The paragraphs are much longer then the samples I gave and are not all on one line like that (or it'd be too easy...<sigh>)

I was trying to do it by adding a </p>and a line break before each <p> and then lowering the case of the <P>'s and removing the first </p>. But, I couldn't get it to work and its quite ugly anyway.

Any help would be greatly appreciated!

Thanks.

Ahhhk!
Ahhhk
 
Posts: 7
Joined: Wed Jan 02, 2008 11:17 am

Postby richardk » Wed Jul 09, 2008 2:44 pm

You can try
Code: Select all
$html_after = preg_replace('/(<p>.*?)(<p>.*)?$/im', '\\1</p>\\2', $html_before

but i haven't tested it.
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby Ahhhk » Wed Jul 09, 2008 8:06 pm

Freegin' beautiful!!

Thanks richardk. I was in the middle of trying to do it with Tidy but that just ended up creating a complete valid document.

Your regex skills never cease to amaze me.

Ahhk!
Ahhhk
 
Posts: 7
Joined: Wed Jan 02, 2008 11:17 am


Return to Almost Anything Goes

Who is online

Users browsing this forum: No registered users and 3 guests

cron