How to clean certain HTML tags with PHP

If you have a string which has HTML and you want to remove only certain tags from it, use the snippet below.

DOMDocument

PHP has this DOMDocument class available for us https://secure.php.net/manual/en/class.domdocument.php

function removeTag($content, $tagName) {
    $dom = new DOMDocument();
    $dom->loadHTML('<?xml encoding="utf-8" ?>' . $content);

    $nodes = $dom->getElementsByTagName($tagName);
    while ($node = $nodes->item(0)) {
        $replacement = $dom->createDocumentFragment();
        while ($inner = $node->childNodes->item(0)) {
            $replacement->appendChild($inner);
        }
        $node->parentNode->replaceChild($replacement, $node);
    }

    # remove <!DOCTYPE
    $dom->removeChild($dom->doctype);

    $nodes = $dom->getElementsByTagName('html');
    while ($node = $nodes->item(0)) {
        $replacement = $dom->createDocumentFragment();
        while ($inner = $node->childNodes->item(0)) {
            $replacement->appendChild($inner);
        }
        $node->parentNode->replaceChild($replacement, $node);
    }

    $nodes = $dom->getElementsByTagName('body');
    while ($node = $nodes->item(0)) {
        $replacement = $dom->createDocumentFragment();
        while ($inner = $node->childNodes->item(0)) {
            $replacement->appendChild($inner);
        }
        $node->parentNode->replaceChild($replacement, $node);
    }

    return str_replace('<?xml encoding="utf-8" ?>', '', $dom->saveHTML());
}

$content = '<span>This <b>is</b> an <span>example</span></span>';

echo removeTag($content, 'span'); // "This <b>is</b> an example"

About Rick

Senior Front-end Software Engineer from Barcelona, Haidong Gumdo Instructor (korean martial art of the sword), street photographer, travel lover, TV addict, Boston Red Sox fan, and privacy advocate.

Leave a Reply

Add <code> Some Code </code> by using this tags.

*
*