The website you are reading this on is made up of a set of static assets rendered with Hugo. This post walks through adding a basic search feature to this site that runs locally in your browser. I wanted to build out search in a way that aligned with the current statically generated approach, avoiding any server-side logic and executing queries entirely within the browser. Additionally, I wanted to keep the build process for the site simple, so avoiding the addition of any special steps in the local development or production build processes. This post walks through how I accomplished it for this site and how you could do the same.
To make this work, we need to go through three basic steps:
- Augment the site’s static page structure to add pages and UI elements for the search feature. We can do this using Hugo native functionality.
- Implement the search result rendering layer to display results from the generated index. For this I chose to use Preact, which is a lighter-weight version of React that provides similar functionality.
Adding the search UI
The basic structure I used for the search implementation is as follows:
- Add a search query form to every page on the site. I chose to include it as part the navigation bar. You can see it on the top right of this page!
- Add a search results page that users are redirected to when the results
render. This is hosted at
- Support for search-as-you-type when on the search results page.
The search query form is quite simple. It’s written as a Hugo Partial that can be injected into anywhere we want on the site, consisting of two basic elements, an input field and a submit button. For my site, I’ll be placing this partial into the navigation bar.
Next, we need a page that this form will redirect the user to after a query is
submitted. The code snippet above references this new page in the
of the form, where we use the
.RelPermalink page variable to
specify the location of the redirect.
Creating a single root level page in Hugo can be done in a few different ways, but the basic idea is that you need two files: a markdown file that represents the existence of the page, and a layout template that corresponds with that markdown file via the template lookup order. The following two code snippets show those two files as they have been created for this site.
The markdown file causes the page to be rendered. It is empty because there is
no particular content that we are rendering there by default. All the content is
in the layout file. In order to make this approach work, you need the
type = "search" line in the file so that the lookup order finds the corresponding
This template provides the basic scaffolding for the results page. It is empty
<div id="results"></div> element provides the attachment point where the results
will be injected into the page. We’ll cover that in detail in just a moment.
For the basic page setup in Hugo, that’s it! We can now start work on creating the actual search index.
Building the search index
The next step in our process is to generate the index that we will be searching. For this use case, I want to be able to search across the pages within the website, which are written in Markdown, before being rendered to HTML by Hugo. Using Hugo templating, we can iterate over the pages directly and pull together the data needed to construct the index.
Gathering content from Hugo
First, we need to use Hugo to gather together all the information we need about the pages from within the website that we want to make searchable.
with the name
index.tpl.js. At build time, this file gets rendered into one
with the name format of
index.js with the data in it all filled out from Hugo.
By moving the templating logic from the search logic file into its own file, we
get some quality of life improvements, such as disabling error linting for that
The templating logic within this file is fairly straightforward if you go line
by line. First, we create a
$store template variable to capture the page
content that we want, keyed on their URL to enable quick lookups of the original
entries later on. We then iterate over the pages of interest within the site,
which in my case uses the
where function to walk through the
pages within the blog section. For each of those pages, we create an object
using the built in
dict function, and then use the
function to add it to
$store. So far, all of this has been
happening inside of the internal template variable state, with nothing being
rendered out to the final JS file. As the last step, we take the store that we
have constructed and run it through the
and store it globally as
window.store, which we will reference to build the
actual search index with Lunr in a moment.
To get Hugo to actually execute this template, we need to leverage the
resources.ExecuteAsTemplate feature within Hugo Pipes. The following code
snippet shows how we do that for the
index.tpl.js file and add it as a script
tag. The end result here is that we have the JSON object for
available throughout the page for use in building the actual search index
Building the Lunr index
Now that we have the content available to us on
window.store, we can reference
snippet constructs a
lunr.Index customized to the content that we
are making searchable for this website. Fields are boosted according to relative
importance. These numbers are mostly based on an intuitive sense of what seems
most important to a post, and have been tweaked a bit with experimentation,
which we will talk about a bit later on. We additionally enable an undocumented
feature in Lunr to enable position information for use in
highlighting as a bit of extra credit, which we will look at in more detail
shortly. For now, we have a working index up and running!
If you want to learn more about Lunr and the options it offers straight from the source, I recommend checking out their getting started guide which walks through the fundamentals of a very simple search implementation and includes links to more detailed Lunr resources.
Executing search queries
We now have an index, but we’re not actually doing anything with it yet. In
order to interact with the index, we create a
getResults function. To implement basic
search with Lunr, you can just call the
search function on your index and pass
in your query string. Lunr also supports a variety of query syntax that uses
special characters. We check for the use of any of those with the
lunrQueryChars regex, and if so pass the query directly to Lunr.
For all other cases however, we will customize the default behavior a bit to provide a “search as you type” functionality on the search result page itself. For this to work well, we need to support prefix matching on at least the last word in the query, which is what the parsing logic in the latter portion of this code snippet does.
Rendering results with Preact
The below snippet shows the
Result functional component
that renders a single search result, as well as the
ResultList component that
renders a list of results. These elements will be the foundational pieces of our
search result interface.
It’s worth noting that at first glance this isn’t the most DRY
approach imaginable, as it does replicate the Hugo content
view template we already have for rendering a
of blog posts here. However as these results need to be
built dynamically, and we are performing custom highlighting with the
Highlight component, there does not seem to be a clear way out of this, and we
will still leverage the same CSS to give consistency to the general styling here
and throughout the site.
Diving a bit deeper into the implementation, this snippet shows how we highlight
the terms in the result content that match the query. We create a new
Highlight functional component that transforms text input into highlighted
snippets. Lunr has an undocumented feature to provide
positional information as part of the returned matches, which we enabled at
indexing time above. What this provides to us is a pair of numbers, an offset
and length, for every term in the items that matches the input query. In this
component we re-construct the snippet text while inserting
mark elements which provide us with highlighting, indicating
to the user what terms matched their query. After iterating through the whole
string in this manner, the resulting values are returned to be rendered into the
Result component to be rendered into the DOM as a contiguous block of text.
With those components in place, the end result looks like this! The inserted
<mark> elements provide highlighting within the result.
One neat capability of Preact (and React itself) is that it can be integrated within the context of an existing site that is not fully based on the framework, which is exactly what we are doing here.
The below function shows the initialization procedure for the page which runs at load time. First we parse any query out of the URL, and then we register the logic to re-render the page to facilitate a “search as you type” experience. Because this site is built around Bootstrap 4, we already have jQuery available, so we use that to update the query rendering when the user types an update to the query itself.
update function referenced therein is fairly straightforward, taking in
the query from the result box if it is changed, retrieving the corresponding
results and rendering them anew.
Compiling JSX with Preact and esbuild in Hugo
- Define the
- Override the
JSXFragmentoptions that are passed to esbuild to customize the compilation of JSX to work with Preact rather than vanilla React.
- In development environments only, we enable inline source maps. These are disabled in production because they greatly increase bundle size when they are inserted inline. In the future if separate source maps files are supported, we can switch to that more and remove the conditional.
- We then apply an SRI hash to the file and put it in a script tag!
With that, we have a fully operational search page. Feel free to give it a try!
While we have made a lot of progress here, more code also means more things that can wrong! In an effort to keep this site low-maintenance and stable, I wanted to add tests to ensure that future iterations do not break things with the search capability. I took a dual approach to this, first validating the search functionality with unit testing, and then building on that with browser-based testing.
To set up the unit tests, we mock out two key pieces of the website itself, the DOM and the search index. I already had jest set up, which we will be using as our test runner. This will be integrated with testing based on jsdom that jest provides for us via the configuration seen in this configuration file. This file also contains a few customizations for Preact that mirror the configuration we did for esbuild that facilitate JSX parsing in the tests.
To set up the DOM environment, we first import the rendered HTML file from the
hugo (this means that the build step must be run for the tests to
work). We also want a bit of example search data that we can assert
deterministic results from. For this, I pulled in descriptions from a TV series
that I spent more time watching in my earlier days than I might care to admit.
This will mock out the index on the site itself, providing us with the ability
to assert various specific behaviors.
With that in place, we can start writing tests. Here we import the search
functionality and provide basic assertions on the
getResults function and then
update function, the latter of which is where we start referencing the
DOM that is available to us through the injected jsdom.
To round things off, we also add a test for the initialization and “search as
you type” functionality. One of the limitations of jsdom is that navigation is
not supported within it. To work around this, I found an approach in a blog
post covering how to mock the
window.location object in
jest, which is utilized here. With that in place, we can set the URL query
parameter to emulate the search form submission on other site pages without
complaints from jsdom. The latter tests are then also able to verify that the
jQuery-based update mechanism properly re-renders results when the query is
updated, with a bit of a nudge to have the events within jsdom be properly
The above testing provides us with a pretty high level of confidence that search is working as expected. As I already had browser-based testing through puppeteer up and running for this site, it was straightforward enough to add an additional test case which asserts that when we submit the search form on a page, we end up on a page that displays results, and that within those results, there is a snippet from my prior post on building a neural network from scratch.
Alternatives and future work
While the basic search functionality provided here is perfectly fine for the amount of content currently on this site, there are plenty of ways that the basic approach taken here could be implemented differently, or expanded and further refined, which I’ll briefly mention here.
A precursor to any meaningful iterations on a search system is a way to evaluate the quality of results. The “seriousness” of that evaluation can scale with the importance of the application. For example, to refine the query building logic above I used the test cases shown above as well as manual testing with various queries against specific pages in my site that I would expect to be returned. To provide a more formal evaluation, we could augment the above test suites to connect to the actual built index from the site and add various offline measures that assert quality characteristics above a certain threshold and in a variety of different cases. This would require building out a dataset of test cases for search queries and expected results that the site’s performance could be measured against. As this site has more content added, or if your site already does, setting something like that up would be a great next step!
I mentioned highlighting as a feature of Stork above, and at the moment the highlighting implementation we have here is fairly basic. The highlighting algorithm could be improved by more intelligently choosing snippets from within the content of various results rather than just using the beginning of the text. This could more reliably give the user information as to the reason for relevance of particular documents. This is a rich area with a fair amount of research available on query-based text summarization which could be interesting to experiment with for this application3.
On the performance front, one note about our implementation here is that the index itself is being built at load time from the raw documents built into the browser. We could alternatively compile the index at the time the site is built, and then serve the pre-built index. Lunr provides documentation on pre-building indexes that we could follow to do this. A fancy approach to integrate this with the site build could be to leverage experimental esbuild plugins but Hugo does not support them yet and apparently adding support is hard. This would cut out on compute time in the browser, however, it comes at the expense of a larger index file that needs to be served over the network (about 4x before compression for my site at the time of this writing, though compression and bigger sites likely yield a better ratio), which is what dissuaded me from doing this for my implementation, as index build times take less than 50ms. While I do pay for bandwidth, I do not pay for those few extra CPU cycles in your browser, thus I’ve decided to stick with the current approach for now.
If others seem to agree with me that this is a good way to go about implementing Lunr search into your Hugo site, this approach may be also worth listing in the directory of search approaches listed here: gohugo.io/tools/search/, perhaps in a simplified form.
Either way, I would love to hear thoughts on this approach! Feel free to leave a comment, question, or perhaps a complaint below.
Two features that seem like they might provide this sort of capability are hooks and context which I did experiment with. However these both have the perspective that the Preact application is the centerpiece of your UI, and when you try calling those methods to access or modify state from outside of a particular component, errors are thrown. Even the section on side-effects is talking about the other way around, triggering external side effects, rather than external operations having side effects on Preact components themselves. ↩︎
For further exploration of text summarization, this repository seems to provide a fantastic set of resources, if a bit overwhelming as a starting point: https://github.com/icoxfog417/awesome-text-summarization. I also came across this blog post that gives a helpful explanation of applying the TextRank algorithm to summarization: https://towardsdatascience.com/understanding-automatic-text-summarization-1-extractive-methods-8eb512b21ecc. ↩︎