Transforming Text with Hugo (featuring plainify, htmlUnescape, and more)
Updated  2023-March-26

Page contents

 

News

2023-February-14  Published this evolving⁠[1] article.

Prerequisites

This article assumes you know about…

 

Text can be HTML, Markdown, etc.

In Hugo, a web page is built from text that resides in…

  • layout files (including shortcodes),

  • the page’s primary source file (front matter and main matter),

  • data files,

  • resource files,

  • and static files.

 

Text can be interpreted as…

  • HTML,

  • Go Template code (also known as Go HTML),⁠[3]

  • Markdown,

  • Org-mode markup,

  • raw literal (uninterpreted) plain text,

  • or other things.

 

To be safe, default Hugo automatically transforms some text

For text that can be defined by a user, for example the title of a page, Hugo does things to ensure the safety of the HTML that Hugo generates. For example, if this is in the front matter of a page’s primary source file:

title: >-
 Portals: Doorways to Infinite Ink's Website & Beyond

 

And if one of the page’s layout files includes {{ .Title }}, Hugo (with its default settings), will generate this:

Portals: Doorways to Infinite Ink's Website & Beyond

 

Note that…

  • ' was transformed into ' and

  • & was transformed into &.

This is called entity escaping.

 

To prevent Hugo from doing the above entity escaping, you can do either of the following two options.

  1. In the layout file, use {{ .Title | safeHTML }} (instead of {{ .Title }}) or

  2. in the Hugo project’s config file, specify the following.

    outputFormats:
      html:
        isPlainText: true

In the Infinite Ink Hugo project, I do #2, which means I do not need to use | safeHTML in Infinite Ink’s layout files, but I do sometimes need to use | plainify, | htmlUnescape, and other text-⁠transformation functions.

 

Example: Transforming a page’s title

Suppose the following is in the front matter of the primary source file of an Infinite Ink page.

title: >-
 Portals&hairsp;&#x1F6AA;&#xFE0E;:<br>Doorways to Infinite&nbsp;Ink's Website <nobr>& Beyond🚪</nobr>

 

Note that this title variable contains…

  • HTML named entities (&hairsp; and &nbsp;),

  • HTML numeric entities (&#x1F6AA; and &#xFE0E;),

  • HTML tags (<br>, <nobr>, and </nobr>),⁠[4]

  • a literal ASCII single quote⁠[5] ('),

  • a literal ampersand (&),

  • and a literal Unicode⁠[2] emoji character (🚪).

 

This title can be used in many places on the Infinite Ink website, for example…

  • in a <title> element,

  • in an <h1> element,

  • in lists of pages on relevant see-⁠also sections and portals,

  • in relevant “next” and “previous” link titles,

  • and in a “Share on Mastodon” link.

In the Infinite Ink layout files, I sometimes use {{ .Title }}, but other times I need to remove some or all of the HTML in the title. The following sections show how to use some Hugo functions to transform this title.

 

Assumption: The Hugo config file specifies isPlainText for HTML output

As I wrote above, the Infinite Ink website’s Hugo config file includes this:

outputFormats:
  html:
    isPlainText: true

 

The results that I get in the following transformations depend on this setting.

 

My goal: Use plain text in the document’s <title> element

The <title> element is displayed in a browser’s title bar and should contain only plain text (i.e., no HTML). For details about the <title> element, see developer.mozilla.org/en-US/docs/Web/HTML/Element/title.

For the following front-matter title variable…

title: >-
 Portals&hairsp;&#x1F6AA;&#xFE0E;:<br>Doorways to Infinite&nbsp;Ink's Website <nobr>& Beyond🚪</nobr>

…my goal is for Hugo to generate the web page’s <head> section and have it contain something like this:

<title>Portals 🚪︎: Doorways to Infinite Ink's Website & Beyond🚪</title>

Note that there are no HTML entities or HTML tags in this.

 

Some transformations

{{ .Title }} (no transformation)


Portals&hairsp;&#x1F6AA;&#xFE0E;:<br>Doorways to Infinite&nbsp;Ink's Website <nobr>& Beyond🚪</nobr>

 

{{ .Title | safeHTML }}


Portals&hairsp;&#x1F6AA;&#xFE0E;:<br>Doorways to Infinite&nbsp;Ink's Website <nobr>& Beyond🚪</nobr>

Note that is is the same as {{ .Title }} because Infinite Ink’s config has isPlainText set to true. To learn about safeHTML, see gohugo.io/functions/safehtml/.

 

{{ .Title | plainify }}


Portals&hairsp;&#x1F6AA;&#xFE0E;:
Doorways to Infinite&nbsp;Ink's Website & Beyond🚪

Note that plainify removed the <br>, <nobr>, and </nobr> tags. To learn about plainify, see gohugo.io/functions/plainify/.

 

{{ .Title | htmlUnescape }}


Portals 🚪︎:<br>Doorways to Infinite Ink's Website <nobr>& Beyond🚪</nobr>

Note that htmlUnescape transformed the HTML numeric and named entities to their corresponding literal Unicode⁠[2] characters. To learn about htmlUnescape, see gohugo.io/functions/htmlUnescape/.

 

{{ .Title | htmlEscape }}


Portals&amp;hairsp;&amp;#x1F6AA;&amp;#xFE0E;:&lt;br&gt;Doorways to Infinite&amp;nbsp;Ink&#39;s Website &lt;nobr&gt;&amp; Beyond🚪&lt;/nobr&gt;

Note that htmlEscape transformed <, >, ', and & to their corresponding HTML named or numeric entities. This is called entity escaping. To learn about htmlEscape, see gohugo.io/functions/htmlescape/.

 

{{ .Title | plainify | htmlUnescape }} (my goal)


Portals 🚪︎:
Doorways to Infinite Ink's Website & Beyond🚪

Since whitespace does not matter in the <title> element, this satisfies my goal!

 

{{ .Title | plainify | htmlUnescape | urlquery }} (sometimes used in share buttons)


Portals%E2%80%8A%F0%9F%9A%AA%EF%B8%8E%3A%0ADoorways+to+Infinite%C2%A0Ink%27s+Website+%26+Beyond%F0%9F%9A%AA

This tranformation is useful for some share buttons and links, for example in the “Share this page on Mastodon” link below. To learn about urlquery, see gohugo.io/functions/urlquery/.

 

References

See also

Endnotes


1. Many Infinite Ink articles, including this one, are evergreen and regularly updated.
3. Go Template code is also known as Go HTML. For details, see gohugo.io/categories/templates/, golang.org/pkg/html/template/, and golang.org/pkg/text/template/.
4. As discussed in developer.mozilla.org/en-US/docs/Web/HTML/Element/nobr, <nobr> is deprecated, but, for now, I still use it.🤷
5. The ASCII quotation marks ' and " are also known as a dumb, neutral, straight, typewriter, or vertical quotation marks.

Discuss or share 📝 🤔 🐘