Tidying up HTML with jSoup

Robert shows how to impliment tidy HTML using jSoup commandbox cfml

April 24, 2024 / Robert Zehnder

One of the pet peeves I have with commandbox-ssg is that the output HTML has not been very tidy. If you have been developing with ColdFusion for a while you are probably familiar with the chunks of whitespace in your pages. One of the things that has been on the back burner for me is figuring out a way to remove this from generated output. I have considered looking in to ways to post process the HTML which I am sure node has some modules that will help with this, but I really wanted something ColdFusion specific to keep things simple.

Ben Nadel recently posted about his CF_SaveFile Custom Tag in ColdFusion which was actually intriguing since there are a few parallels to what he is using his tag to generate static content and how commandbox-ssg writes templates. His custom tag also includes a dedentContent function that will attempt to normalize the indention. I wanted to see how well this would work in commandbox-ssg but the results were not quite right. No slight to Ben, my use case has to take in to account text in pre and code blocks. Back to the drawing board.

Then last night while I was eating dinner I thought "I bet jSoup can handle that." After reading through the documentation this morning I found out I was right. If you call jSoup's parse() function it will return tidy HTML. There were trials and errors getting jSoup working properly in commandbox-ssg but I am fairly certain I have worked out the kinks. As of version 0.2.0, commandbox-ssg will pipe all output through jSoup to tidy the HTML file contents before writing to the file system.

Honestly jSoup is probably one of my favorite libraries to work with because of its versatility. Now I can add formatting HTML for static pages.