preg_match_all

Split html tags

Here is a sample html file contains only "p" and "div" tag:

 
<p>hello world</p>
<div class="codeclass">
  <p>ipsum</p>
  <p><img src=""/></p>
</div>
<p>Ipsom de la</p>
<p>end</p>
 

And we want to insert content into the html, we don't want undermine the html structure, we wish to delimit it in terms of html element, not just characters. We also want to know the positions we want to insert the content. We can use PREG_OFFSET_CAPTURE:

 
function test_pregmatch() {
    println("testing test_pregmatch");
    $content = file_get_contents("sample.html");
    echo "<pre>";
    echo htmlspecialchars($content);
    echo "</pre>";
    $delimiters = array("</p>","</div>");
    preg_match_all('~' . implode("|", $delimiters) . '~', $content, $matches, PREG_OFFSET_CAPTURE);
    myprint_r($matches);
}
 
test_pregmatch();
 
 

The output should looks like this

ARRAY
0
ARRAY
0
ARRAY
0</p>
114
1
ARRAY
0</p>
155
2
ARRAY
0</p>
179
3
ARRAY
0</div>
185
4
ARRAY
0</p>
1107
5
ARRAY
0</p>
1119

When you are using PREG_OFFSET_CAPTURE, the matches is an array, each element contains a pair, the first is the text that captured, the second is the position where the text starts.

For example, if we want to insert to the middle

 
    $afterSection = floor(count($matches[0]) / 2);
    echo substr_replace($content,
                        $matches[0][$afterSection][0] . $replace,
                        $matches[0][$afterSection][1],
                        strlen($matches[0][$afterSection][0]));