How To Create An HTML Parser Module

In Drupal 6 the need for a good HTML parser first became apparent to me when I wanted to insert some custom code into the content area of all pages of a certain content type. I soon understood how complex it would be to write a decent HTML parser, for example because of different ways that exist to close a tag. Therefore I looked around and found an HTML written in PHP called "PHP Simple HTML DOM Parser". Despite its name I've used it extensively and so far it's covered all of my needs. So, in this blog post I will cover how to write a module for bringing this excellent HTML parser into Drupal!

1. Create a folder in sites/all/modules called "html_parser" and create the files "html_parser.info" and "html_parser.module".

2. Content for the info file:

name = PHP HTML Parser
description = PHP HTML Parser
core = 6.x

3. Content for the module file:

<?php
module_load_include('php', 'html_parser', 'simple_html_dom');

4. Download PHP Simple HTML DOM Parser and extract to the module folder.

5. Install the module as usual!

There is a possible namespace conflict if you do it this way as the functions don't begin with "html_parser_" but so far I haven't had any problems and I haven't had the time to create a Libraries-based solution. Read the included manual for information about usage or visit this page for a practical usage example.