Searching a Jamstack site with Pagefind

Fulltext search with 11ty and Pagefind

Pagefind
Setup
Configure the index
1. Attributes
2. Selectors
A sketch of how to search
Resources

Pagefind

What is a good way of searching content on a static website? Pagefind can be an answer to that question. It is an excellent solution if you want to stay static without executing any code on the server during runtime! Pagefind is optimized for static websites. The basic idea is to prepare the search index after you’ve built your static site, by analyzing the output, and then adding the static search index to the output.

Pagefind runs after Hugo, Eleventy, Jekyll, Next, Astro, SvelteKit, or any other website framework. The installation process is always the same: Pagefind only requires a folder containing the built static files of your website, so in most cases no configuration is needed to get started.
The Pagefind homepage

My previous search solution was relying on MiniSearch and Netlify Edge Functions. It was running quick with minimal data to download for the user, but it had one downside: Running JavaScript in Edge Functions on the server for a static site is not so static anymore, and it makes you depending on a provider. Pagefind requires nothing like that. You can use it with any static site and it does not require runtime functionality on the server.

I never had issues with Netlify Edge Functions, but I was curious to see what can be done with Pagefind. Over the weekend I´ve set up Pagefind to allow searching content on ulfschneider.io. Here is what I found:

Pagefind is easy to setup. It’s more simple than my previous setup of MiniSearch and Edge Functions.
The search is quick.
Pagefind provides dynamic excerpts for search results with marked matches out of the box. The excerpts are of high value when screening for the right document.
I think the search accuracy is fine. In my experiments Pagefind returned valid matches that helped to get the relevant documents quickly.

The search index generation after your site-build is fast. On my MacBook Pro M1 it takes 0.582 seconds to index 514 pages, as you can see from the log output below:

[Walking source directory]
Found 632 files matching **/*.{html}
[Parsing files]
Found a data-pagefind-body element on the site.
↳ Ignoring pages without this tag.
[Reading languages]
Discovered 1 language: en
[Building search indexes]
Total:
  Indexed 1 language
  Indexed 514 pages
  Indexed 8926 words
  Indexed 0 filters
  Indexed 0 sorts
Finished in 0.582 seconds

The data to index can be tailored (index the entire page or parts of it). The data to carry in addition as metadata (such as dates and tags) is configurable.
Pagefind comes with a ready-made interactive frontend, which is good, but I did not use it.
As said before: The Pagefind search index can be hosted along with your static data on any machine and any CDN. Pagefind doesn’t need serverlogic at runtime.

On my website I use Pagefind in the following way: While the user is typing, a list of max. 7 suggested document titles is shown to the user and any of those documents can directly be jumped to. When the user submits a search without directly opening a document, all matches are presented with their excerpts.

Before you can display any content from a search result, you have to tell Pagefind to download the full content and calculate the dynamic excerpt, based on the search term and the downloaded content. This works well for showing dynamic excerpts, but it is unnecessarily costly when you only want to show a document title or a static excerpt that is already prepared on the server. I´d prefer to have a configuration option for telling Pagefind what content to send immediately with the search results. The makers are open for discussing this topic.

Setup

You will install Pagefind as a dev dependency in your npm project.

npm install --save-dev pagefind

Then you can run it after your regular build. E.g., for 11ty, it could look like follows (but I do not recommend it this way):

//package.json
...
"scripts": {
  //...
  "build": "eleventy && pagefind --site _site",
  //...
}

pagefind --site _site means to run Pagefind and let it look for the build output inside of the _site directory, which is the default output directory for 11ty. After Pagefind has analyzed the site output, it will add the search index data and some JavaScript into a sub-directory of the output, named pagefind (e.g.: _site/pagefind). The JavaScript provides the search API (pagefind.js) and the user interface (pagefind-ui.js and pagefind-ui.css) which will be used inside the browser to run the client-side search. It´s on you to decide if you want to use the search API code only and provide all user interface code by yourself (which I did), or if you want to leverage the ready-to-use Pagefind search frontend.

Note
When using 11ty, I recommend not to trigger the Pagefind run from within your package.json, but to use the eleventy.after event inside of your .eleventy.js config file. This way your Pagefind will run after each 11ty build, even when working in your local dev environment with --watch or --serve. Then you have a current search index available during development and after a production build.

//.eleventy.js
const { execSync } = require("child_process"); //this comes with node

module.exports = function (eleventyConfig) {
  //...

  eleventyConfig.on(
    "eleventy.after",
    async ({ dir, results, runMode, outputMode }) => {
      console.log(
        "******** eleventy after build event, configured in .eleventy.js config file"
      );
      execSync(`npx pagefind --site ${dir.output}`, {
        encoding: "utf-8",
        stdio: "inherit", //see the output of the process in your log
      });
    }
  );

  //...
};

Configure the index

Likely you want to tell Pagefind what content to consider for the search index. Pagefind has several options to achieve this. Use Configuring what content is indexed as an entry into the topic.

Attributes

Your first option is to assign attributes to your HTML template files to control how Pagefind will proceed your build output.

data-pagefind-body

This attribute allows you to mark an element and all its children content inside of your HTML documents to be used for the index.

Note
Once a data-pagefind-body attribute exists on any page of your site, any pages without this attribute will not be indexed.

I´ve set up my blog so that I can exclude certain documents from the index by setting a frontmatter variable named nosearch in the Markdown content file. Setting nosearch: true will omit the document from the search index. You can achieve that by adjusting your template in the following way:

<!--default.html template file -->
{%- if no search == nil and draft == nil -%}
{%- assign search_attribute = 'data-pagefind-body' -%}
{%- else -%}

<body {{search_attribute}}>
<!--...-->

The above Liquid template code will check if one of the frontmatter variables nosearch or draft exists. If any of those variables exists, the document body will receive the attribute data-pagefind-ignore="all" which leads to not using the document for the index. If the document doesn´t have a draft and nosearch frontmatter variable, the attribute data-pagefind-body is assigned to the HTML body and will mark it for index use.

data-pagefind-ignore

Even within the indexed body you can ignore certain content parts and refrain from putting them into the search index by marking the content to ignore with the data-pagefind-ignore attribute. E.g., for a comments section that always has a heading Comments, you can ignore the heading with:

<details>
  <summary data-pagefind-ignore>
    <strong>Comments</strong>
  </summary>
//...

Assigning data-pagefind-ignore to an element will exclude the element and all its children.

data-pagefind-meta

This attribute is very useful to include certain data and have it directly accessible in a search result. You can use it even inside of content that has been marked with data-pagefind-ignore. E.g., to make the date of when a post has been updated accessible in the metadata, do:

<time
  datetime="{{updated | isoDate}}"
  data-pagefind-meta="updated:{{updated | isoDate}}">
{{ updated }}
</time>

In your search results you can then access the updated property as follows:

const pagefind = await import("/pagefind/pagefind.js");
const search = await pagefind.search("static");
const oneResult = await search.results[0].data();
console.log(oneResult.meta.updated); //access the updated metadata

There are more options available for what you can do with metadata. Please refer to Setting up metadata.

data-pagefind-index-attrs

Use this to add the contents of HTML attributes to the index. E.g.:

<img src="/hero.png" title="Image Title" alt="Image Alt" data-pagefind-index-attrs="title,alt" />

I did not use that.

Selectors

Additional configuration is possible in the pagefind.yml file.

Pagefind will look for a pagefind.toml, pagefind.yml, pagefind.yaml, or pagefind.json file in the directory that you have run the [pagefind] command in.
Pagefind CLI configuration sources

To exclude certain selectors from the index, do:

exclude_selectors:
  - "#my_navigation"
  - "blockquote > span"
  - "[id^='prefix-']"

The root selector to be used for building the index is html. Any data outside of this selector will not be detected for the index.

A sketch of how to search

Below is a sketch for how to leverage the search API for your own search frontend.

<script>
  async function search(query) {
    try {
      let start = Date.now();

      pagefind = await import("/pagefind/pagefind.js");
      const search = await pagefind.search(query);
      const promiseCollector = [];
      const results = Array(search.results.length)
      for (const i = 0; i < search.results.length; i++) {
        promiseCollector.push(
          search.results[i].data().then((data) => results[i] = data)
        );
      }
      await Promise.allSettled(promiseCollector);

      let duration = Date.now() - start;

      console.log(
        "The search for [" +
          query +
          "] returned " +
          results.length +
          " results within " +
          (duration / 1000).toFixed(2) +
          " seconds"
      );

      printSearchResults(results);
    } catch (error) {
      printError(error);
    }
  }

//...
</script>