Splitting a Large HTML File


Some time ago I created a guide to the U3A’s Beacon membership management system. It was intended to provide an introduction to Beacon for the leaders of the groups in my local U3A and it was called the Beacon Cookbook. The original document was created using the Apple Pages word processor app, exported to PDF and loaded onto our U3A website.

The Cookbook attracted one or two flattering comments from other U3As and I gave blanket permission for any U3A to use it. One U3A member, however, wanted to tailor it for their local U3A and neither a Pages document nor a PDF is suitable for that. So, I had a go at creating an HTML version of it. The initial draft is on the Coding Café’s Slack system as a ZIP file containing one large HTML file, a couple of small CSS files and about 25 image files.

A Problem

This first version of the Cookbook in HTML contains all the necessary details but it could be improved with a few tweaks. In particular, the original idea of presenting the information as a series of ‘recipes’ has been lost and some of the screenshots have artefacts that could be removed with some image manipulation. It would be easier to apply these tweaks if the large HTML file could be split so that each recipe was in its own file. But the HTML standard does not specify a file inclusion mechanism.

A Potential Solution

The World Wide Web Consortium provides a website with some excellent tutorial material and, in the How To section, there is a page that explains how to include HTML snippets in an HTML document. The method described there relies on inserting HTML elements with a special tag into the main document and a JavaScript function that scans for those tags and replaces the contents of the tagged elements with HTML read from a specified file. The JavaScript function is provided, so this would appear to be a complete solution to my problem.

A Stumble

To me, the function provided on the W3C website seemed a little clunky and potentially inefficient. Having loaded the HTML from a file into the first tagged element, it then removed the tag and scanned the whole document again. With each pass it would process successive tags until there were none left. And, because it uses a recursive function call, the later parts of the main document would be scanned multiple times even when no include tags remained.

So I decided to write a better includeHTML() function, based on the one provided by the W3C. My first attempt didn’t work. After some head scratching and repeated reading of relevant parts of the JavaScript manual, I realised that the loop I had copied from the W3C function was using a global variable as the loop index. Not only is that generally bad practice, but it was making a mess of the recursive function call I was using.

Lesson 1:

JavaScript will use a global loop variable unless that variable is first declared at local scope.

So, don’t write “for (i = 0; …)”; write “for (let i = 0; …)” instead.

An Obstacle

Having corrected the loop bug, my improved includeHTML() function worked. But my browser rejected the requests to read the include files. It claimed it was being asked to perform a cross-origin request, which is a security hazard and not allowed.

Well, of course, there was no cross-origin request – that was obvious to me, but not to the Firefox browser. On investigation, it seems that browsers refuse to load HTML documents supplied by a web server (with an http:// address) into local files (loaded from a file:// address). More precisely, if the protocol, host and port of the current document do not match those of the include file, the browser refuses to load it.

At first, this didn’t seem to be a significant issue. It just meant that both the main document and the include files would have to be specified by URLs with the same protocol part, either file::// or http://. I tried file:// first. But the includeHTML() function sends a request to the web server at the host given in the URL and that has to use the http:// protocol. So, I tried that. And it failed because my computer wasn’t running a local web server.

Clearing the Hurdle

Running a local web server just to be able to read a local HTML file was never going to be a practical solution to my problem, but I thought I’d try it anyway, just to see if it works.

The secure version of the HyperText Transfer Protocol (https://) failed, at a guess because my local web server doesn’t have a valid TLS certificate. (I didn’t investigate the error message.) However, the unsecure variant (http://) does work. My JavaScript function does insert HTML from include files and the browser displays the resulting document as expected.

This works for files on a web server, but not for local files.

By stoneyfish

Humanist and retired software engineer with a love of music.

1 comment

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: