Reading page metadata using jSoup in ColdFusion
The ability to query a web page for metadata is one of the main
features I use for creating content here. ColdFusion (and the jSoup
ColdBox module) makes the task trivial.
https://gist.github.com/robertz/2a5885fe0ccbb48d8ffc8cf633b9c995
I am using the jSoup connect() method to read the document although
it is also possible to pull in the HTML using cfhttp. The code will
follow redirects and overrides the user agent to avoid potential
filters. I have noticed some web application firewalls will reject the
request, I assume that could be because they block spider requests
from my VPS provider (Digital Ocean). The same request works fine when
run locally.
The next step is iterating through the meta properties to pull out
the opengraph and twitter metadata which is then returned to the
caller so that data may be acted upon.
Although this example is specific to ColdBox, it should be easily
portable to a non-ColdBox app.