Skip to content

Mark Embling

Shrinking HTML

Recently I was looking into making HTML 5 a little smaller. I found that not only can you make all the usual tweaks to save space such as removing any instances of div overload and keeping class names and IDs small, you can actually remove entire chunks of HTML which isn't technically required.

To some of you, this won't be news at all and you will wonder what all the fuss is about. However from what I can tell it is actually not that widely known and might be useful on occasion, especially if you need to heavily optimise your HTML's download size.

Optional Tags

As we all know, the root HTML element is <html>, which contains a <head> and <body> element within it. However what you might not know is that none of those tags strictly need to be there - unless you have a comment immediately following the opening tags or need to add attributes to them. There is also often no need to actually bother including the closing tag for your <p> elements, assuming the element following it is one of a predefined set of elements (which it most likely will be).

The HTML 5 spec lists all the optional start and end tags which you are able to omit. It's well worth reading through this section of the spec to get a feel for how relaxed the HTML spec is. What's even better is that when you miss out these tags, they don't just suddenly disappear forever and make your document gibberish - the browser interpreting them infers their existence. That means the DOM is still as you would expect and all your CSS which relies on their presence will work just as it should.

Quick Demo

Here is a quick example which demonstrates the sort of size difference removing all these optional tags can make.

Original

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Example</title>
        <link href="styles/styles.css" rel="stylesheet">
        <script src="http://ajax.microsoft.com/ajax/jquery/jquery-1.4.2.min.js"></script>
        <script src="js/behavior.js"></script>
    </head>
    <body>
        <div id="head">
            <h1>Example</h1>
            <p id="slogan">Some kind of slogan or tagline</p>
        </div>

        <div id="main">
            <div class="post">
                <h2>Post 1</h2>
                <p>Summary of post number 1. Well worth clicking on...</p>
                <p class="read-more"><a href="/read/post1?phpMyAdmin=v3uJuW5ABKGSnVmg5V2QfExvjp3"></a></p>
            </div>
            <div class="post">
                <h2>Post 2</h2>
                <p>Summary of post number 2. Also worth clicking on...</p>
                <p class="read-more"><a href="/read/post2?phpMyAdmin=v3uJuW5ABKGSnVmg5V2QfExvjp3"></a></p>
            </div>
        </div>

        <div id="footer">
            <p><a href="http://bit.ly/bqvxlG">Mark Embling</a> 2010</p>
        </div>
    </body>
</html>

33 lines, 857 characters

Revised Version

In this version I have removed all the end tags for <p> elements and removed the start and end tags for the <html>, <head> and <body> elements. Since this example had no lists, tables or other elements, that was the most we could do. That said, it still makes a noticeable difference to a small page like this. And note that it still validates fine.

<!DOCTYPE html>
<meta charset="utf-8">
<title>Example</title>
<link href="styles/styles.css" rel="stylesheet">
<script src="http://ajax.microsoft.com/ajax/jquery/jquery-1.4.2.min.js"></script>
<script src="js/behavior.js"></script>

<div id="head">
    <h1>Example</h1>
    <p id="slogan">Some kind of slogan or tagline
</div>

<div id="main">
    <div class="post">
        <h2>Post 1</h2>
        <p>Summary of post number 1. Well worth clicking on...
        <p class="read-more"><a href="/read/post1?phpMyAdmin=v3uJuW5ABKGSnVmg5V2QfExvjp3"></a>
    </div>
    <div class="post">
        <h2>Post 2</h2>
        <p>Summary of post number 2. Also worth clicking on...
        <p class="read-more"><a href="/read/post2?phpMyAdmin=v3uJuW5ABKGSnVmg5V2QfExvjp3"></a>
    </div>
</div>

<div id="footer">
    <p><a href="http://bit.ly/bqvxlG">Mark Embling</a> 2010
</div>

28 lines, 733 characters (85.5% of the original)

Disadvantages

Technically, there aren't really any disadvantages to this. However, it is definitely not as readable and you always need to bear in mind where the start/end elements actually belong. Also it is important to stick to the rules as given in the spec, otherwise it isn't really possible to predict what will actually happen. I would suggest removing all the optional parts as the last step in producing the page, and ensuring the page still validates without any problems.

Why?

In most circumstances, this is probably not something which you would normally worry about. In this (small) example, we stripped out 124 characters which is hardly a huge number. However combined with stripping out whitespace (I didn't do that) and minifying your CSS and JavaScript, it will all help.

In my case, I specifically wanted to make things as small as they can reasonably be, as it was my entries into the 10K Apart contest which I was working on. And they were both a little bit too big. With the first entry, I could have shrunk it a little more by more heavily minifying the CSS and JS, but for my second entry I did need the extra bytes. Oh and if you like the two entries, please vote for them :).

To sum up, this is not something which will change your life. But it is something which is worth knowing if you are often involved in design or development for the web. Just don't try and use this on XHTML documents, as they do need all of the opening/closing tags for each element.