Searching a Jamstack site with Pagefind
Pagefind
What is a good way of searching content on a static website? Pagefind can be an answer to that question. It is an excellent solution if you want to stay static without executing any code on the server during runtime! Pagefind is optimized for static websites. The basic idea is to prepare the search index after you’ve built your static site, by analyzing the output, and then adding the static search index to the output.
Pagefind runs after Hugo, Eleventy, Jekyll, Next, Astro, SvelteKit, or any other website framework. The installation process is always the same: Pagefind only requires a folder containing the built static files of your website, so in most cases no configuration is needed to get started.
My previous search solution was relying on MiniSearch and Netlify Edge Functions. It was running quick with minimal data to download for the user, but it has one downside: Running JavaScript in Edge Functions on the server for a static site is not so static anymore and it makes you depending on a provider. Pagefind requires nothing like that. You can use it with any static site and it does not require runtime functionality on the server.
I never had issues with Netlify Edge Functions, but I was curious to see what can be done with Pagefind. Over the weekend I´ve set up Pagefind to allow searching content on ulfschneider.io. Here is what I found:
- Pagefind is easy to setup. It’s more simple than my previous setup of MiniSearch and Edge Functions.
- The search is quick.
- Pagefind provides dynamic excerpts for search results with marked matches out of the box. The excerpts are of high value when screening for the right document.
- I think the search accuracy is fine. In my experiments Pagefind returned valid matches that helped to get the relevant documents quickly.
- The search index generation after your site-build is fast. On my MacBook Pro M1 it takes 0.582 seconds to index 514 pages, as you can see from the log output below:
[Walking source directory]
Found 632 files matching **/*.{html}
[Parsing files]
Found a data-pagefind-body element on the site.
↳ Ignoring pages without this tag.
[Reading languages]
Discovered 1 language: en
[Building search indexes]
Total:
Indexed 1 language
Indexed 514 pages
Indexed 8926 words
Indexed 0 filters
Indexed 0 sorts
Finished in 0.582 seconds - The data to index can be tailored (index the entire page or only parts of it). The data to carry in addition as metadata (such as dates and tags) is configurable.
- Pagefind comes with a ready-made interactive frontend, which is good, but I did not use it.
- As said before: The Pagefind search index can be hosted along with your static data on any machine and any CDN. Pagefind doesn’t need serverlogic at runtime.
On my website I use Pagefind in the following way: While the user is typing, a list of max. 7 suggested document titles is shown to the user and any of those documents can directly be jumped to. When the user submits a search without directly opening a document, all matches are presented with their excerpts.
Before you can display any content from a search result, you have to tell Pagefind to download the full content and calculate the dynamic excerpt, based on the search term and the downloaded content. This works well for showing dynamic excerpts, but it is unnecessarily costly when you only want to show a document title or a static excerpt that is already prepared on the server. I´d prefer to have a configuration option for telling Pagefind what content to send immediately with the search results. The makers are open for discussing this topic.
Setup
You will install Pagefind as a dev dependency in your npm project.
npm install --save-dev pagefind
Then you can run it after your regular build. E.g., for 11ty, it could look like follows (but I do not recommend it this way):
//package.json
...
"scripts": {
//...
"build": "eleventy && pagefind --site _site",
//...
}
pagefind --site _site
means to run Pagefind and let it look for the build output inside of the _site
directory, which is the default output directory for 11ty. After Pagefind has analyzed the site output, it will add the search index data and some JavaScript into a sub-directory of the output, named pagefind
(e.g.: _site/pagefind
). The JavaScript provides the search API (pagefind.js
) and the user interface (pagefind-ui.js
and pagefind-ui.css
) which will be used inside the browser to run the client-side search. It´s on you to decide if you want to use the search API code only and provide all user interface code by yourself (which I did), or if you want to leverage the ready-to-use Pagefind search frontend.
Note
When using 11ty, I recommend not to trigger the Pagefind run from within your
package.json
, but to use theeleventy.after
event inside of your.eleventy.js
config file. This way your Pagefind will run after each 11ty build, even when working in your local dev environment with--watch
or--serve
. Then you have a current search index available during development and after a production build.
//.eleventy.js
const { execSync } = require("child_process"); //this comes with node
module.exports = function (eleventyConfig) {
//...
eleventyConfig.on(
"eleventy.after",
async ({ dir, results, runMode, outputMode }) => {
console.log(
"******** eleventy after build event, configured in .eleventy.js config file"
);
execSync(`npx pagefind --site ${dir.output}`, {
encoding: "utf-8",
stdio: "inherit", //see the output of the process in your log
});
}
);
//...
};
Configure the index
Likely you want to tell Pagefind what content to consider for the search index. Pagefind has several options to achieve this. Use Configuring what content is indexed as an entry into the topic.
Attributes
Your first option is to assign attributes to your HTML template files to control how Pagefind will proceed your build output.
data-pagefind-body
- This attribute allows you to mark an element and all its children content inside of your HTML documents to be used for the index.
-
Note
Once a
data-pagefind-body
attribute exists on any page of your site, any pages without this attribute will not be indexed. - I´ve set up my blog so that I can exclude certain documents from the index by setting a frontmatter variable named
nosearch
in the Markdown content file. Settingnosearch: true
will omit the document from the search index. You can achieve that by adjusting your template in the following way: -
<!--default.html template file -->
{%- if no search == nil and draft == nil -%}
{%- assign search_attribute = 'data-pagefind-body' -%}
{%- else -%}
{%- assign search_attribute = 'data-pagefind-ignore="all"' -%}
{%- endif -%}
<body {{search_attribute}}>
<!--...--> - The above Liquid template code will check if one of the frontmatter variables
nosearch
ordraft
exists. If any of those variables exists, the document body will receive the attributedata-pagefind-ignore="all"
which leads to not using the document for the index. If the document doesn´t have adraft
andnosearch
frontmatter variable, the attributedata-pagefind-body
is assigned to the HTML body and will mark it for index use. data-pagefind-ignore
- Even within the indexed body you can ignore certain content parts and refrain from putting them into the search index by marking the content to ignore with the
data-pagefind-ignore
attribute. E.g., for a comments section that always has a heading Comments, you can ignore the heading with: -
<details>
<summary data-pagefind-ignore>
<strong>Comments</strong>
</summary>
//... - Assigning
data-pagefind-ignore
to an element will exclude the element and all its children. data-pagefind-meta
- This attribute is very useful to include certain data and have it directly accessible in a search result. You can use it even inside of content that has been marked with
data-pagefind-ignore
. E.g., to make the date of when a post has been updated accessible in the metadata, do: -
<time
datetime="{{updated | isoDate}}"
data-pagefind-meta="updated:{{updated | isoDate}}">
{{ updated }}
</time> - In your search results you can then access the
updated
property as follows: -
const pagefind = await import("/pagefind/pagefind.js");
const search = await pagefind.search("static");
const oneResult = await search.results[0].data();
console.log(oneResult.meta.updated); //access the updated metadata - There are more options available for what you can do with metadata. Please refer to Setting up metadata.
data-pagefind-index-attrs
- Use this to add the contents of HTML attributes to the index. E.g.:
<img src="/hero.png" title="Image Title" alt="Image Alt" data-pagefind-index-attrs="title,alt" />
- I did not use that.
Selectors
Additional configuration is possible in the pagefind.yml
file.
Pagefind will look for a
pagefind.toml
,pagefind.yml
,pagefind.yaml
, orpagefind.json
file in the directory that you have run the [pagefind] command in.
To exclude certain selectors from the index, do:
exclude_selectors:
- "#my_navigation"
- "blockquote > span"
- "[id^='prefix-']"
The root selector to be used for building the index is html
. Any data outside of this selector will not be detected for the index.
A sketch of how to search
Below is a sketch for how to leverage the search API for your own search frontend.
<script>
async function search(query) {
try {
const results = [];
let start = Date.now();
pagefind = await import("/pagefind/pagefind.js");
const search = await pagefind.search(query);
const promiseCollector = [];
for (const entry of search.results) {
promiseCollector.push(
entry.data().then((data) => results.push(data))
);
}
await Promise.allSettled(promiseCollector);
let duration = Date.now() - start;
console.log(
"The search for [" +
query +
"] returned " +
results.length +
" results within " +
(duration / 1000).toFixed(2) +
" seconds"
);
printSearchResults(results);
} catch (error) {
printError(error);
}
}
...
</script>
Comments