Site Search in a CMS: From LIKE Queries to Full-Text
The four-tier implementation ladder, relevance basics, search UX, and why most sites should stop at database full-text
Type a query into the search box on most CMS-powered sites and one of two things happens: a SQL LIKE query someone wrote in ten minutes runs against the posts table, or nothing happens because nobody built search at all. Both are more common than you'd expect, and both are fine — until they're not. This guide walks the full ladder of CMS site search, from LIKE '%term%' to Algolia, with honest advice on when each step is worth taking. Spoiler: most sites should stop at step two.
TL;DR: Most CMS sites start with a SQL LIKE query for search and that is fine — until it is not. This guide walks the full ladder from LIKE to Meilisearch to Algolia, with honest advice on when to climb each rung.
Why site search deserves more than ten minutes
Visitors who use your search box are your highest-intent users. They didn't land on a page and bounce — they told you, in their own words, exactly what they want. On e-commerce sites, searchers convert at a multiple of browsers. On content sites, they read more pages and come back more often.
That makes search a strange thing to neglect. Teams will spend a week tuning a homepage hero and leave search returning unranked results from a query that can't match "caching" when someone types "cache."
There's a second payoff most teams miss: search queries are free user research. Every query is a visitor telling you what they expected to find on your site. Every zero-result query is content you haven't written yet. More on that below.
The implementation ladder
Site search has four tiers. Each one buys you something specific and costs you something specific. The mistake isn't picking the wrong tier — it's jumping to tier 3 or 4 when tier 2 would have done the job with zero extra infrastructure.
Tier 1: SQL LIKE — the ten-minute version
The simplest possible search is a LIKE query with wildcards:
SELECT id, title, slug
FROM posts
WHERE status = 'published'
AND (title LIKE '%caching%' OR body LIKE '%caching%')
ORDER BY published_at DESC
LIMIT 20;
This works. It's also the version running on a surprising share of production sites, because it ships in ten minutes and nobody complains until the content library grows.
The problems:
- No relevance ranking.
ORDER BY published_atisn't relevance — a post that mentions "caching" once in a footnote ranks above the definitive caching guide if it's newer. There's no score to sort by. - The leading wildcard kills indexes.
LIKE '%caching%'can't use a B-tree index, so every query is a full table scan over every published post's body. - No stemming or tokenization. "cache" doesn't match "caching." "databases" doesn't match "database" unless it's a substring hit.
- Multi-word queries get ugly. "laravel caching" needs to be split and matched per-term, with AND/OR logic you now maintain by hand.
Honest assessment: LIKE is fine up to roughly 1,000 posts. Full table scans over a few megabytes of text finish in milliseconds, and a small content library means weak ranking hurts less — there are only so many results to mis-order. If your blog has 80 posts, LIKE is not your bottleneck. Ship it and move on.
Tier 2: Database full-text — the tier most sites should stop at
Both MySQL and SQLite ship real full-text search engines inside the database you already run. This is the most underused tier in CMS search, and it's where the price-to-power ratio peaks: you get relevance ranking, tokenization, and prefix search with zero new services, zero sync pipelines, zero new monthly bills.
This tier also assumes your content lives in a database in the first place. If you're weighing flat files against a database for your CMS, search is one of the strongest arguments for the database side — see flat-file CMS vs database CMS for the full trade-off.
MySQL FULLTEXT takes two statements:
ALTER TABLE posts ADD FULLTEXT INDEX ft_posts (title, body);
SELECT id, title,
MATCH(title, body) AGAINST('caching strategy') AS score
FROM posts
WHERE status = 'published'
AND MATCH(title, body) AGAINST('caching strategy')
ORDER BY score DESC
LIMIT 20;
Now you have a real relevance score — MySQL ranks by term frequency and how rare the term is across your corpus, so a post that's actually about caching outranks one that mentions it in passing. Boolean mode adds prefix search and operators:
-- 'cach*' matches cache, caching, cached
SELECT id, title
FROM posts
WHERE MATCH(title, body) AGAINST('+cach* -wordpress' IN BOOLEAN MODE);
SQLite FTS5 is even better than people expect. It implements BM25 ranking — the same baseline algorithm Elasticsearch uses — in a virtual table:
CREATE VIRTUAL TABLE posts_fts USING fts5(
title, body,
content='posts', content_rowid='id'
);
SELECT p.id, p.title, bm25(posts_fts) AS rank
FROM posts_fts
JOIN posts p ON p.id = posts_fts.rowid
WHERE posts_fts MATCH 'caching'
ORDER BY rank
LIMIT 20;
The content= option makes it an external-content table, so you're not duplicating your post bodies — just the index. Keep it in sync with three small triggers on insert, update, and delete, and you're done. Prefix queries ('cach*'), phrase queries ('"static site"'), and column filters ('title: caching') all work out of the box.
This is the tier UnfoldCMS ships at: built-in database-driven search that runs on both SQLite and MySQL, with no external service to install — which is exactly what you want when your CMS needs to run on shared hosting where you can't spin up a search daemon. The same search is exposed over the REST API at /api/v1/search, so a headless frontend gets it for free.
What tier 2 still can't do: typo tolerance. Type "cachng" and you get nothing. No faceted filtering, no synonyms, no "did you mean." For a blog or docs site, that's usually an acceptable trade. For a store with 40,000 SKUs and fat-fingered mobile shoppers, it isn't — and that's your signal to climb.
Tier 3: Dedicated search engines — Meilisearch, Typesense, Elasticsearch
Dedicated engines are where search becomes a product feature instead of a query. Meilisearch and Typesense are the modern picks: typo tolerance by default, sub-50ms responses, faceted filtering, synonyms, and sane APIs you can have running locally in minutes. Elasticsearch remains the heavyweight — more powerful, more tunable, and considerably more work to operate.
What you gain:
- Typo tolerance. "cachng" finds "caching." On mobile, this alone changes how usable search feels.
- Facets. Filter results by category, author, year — counts included.
- Speed at scale. Millions of documents with consistent latency.
What you pay:
- A new service to run. Install it, monitor it, upgrade it, give it RAM, back it up. Your search is now a second system that can be down while your site is up.
- A sync pipeline. Every post create, update, and delete must propagate to the index. Laravel Scout and similar libraries make the wiring short, but the failure modes — stale index, dropped jobs, full reimports after schema changes — are now yours.
- Hosting constraints. Shared hosting is out. You need a VPS or a managed instance, which often costs more than the site's actual hosting.
Be honest with yourself here: most blogs and marketing sites never need this tier. A documentation site with 300 pages doesn't have a typo-tolerance problem worth running a daemon for. The cases that justify tier 3 are large catalogs, search-as-core-UX products, and sites where facets genuinely drive navigation.
Tier 4: SaaS — Algolia and friends
Algolia is the polished end of the ladder. The developer experience is the best in the business: drop-in UI libraries, instant results as you type, an excellent dashboard, relevance tuning without touching ranking math. You skip the ops burden of tier 3 entirely — no service to run, no RAM to provision.
The catch is the meter. Algolia bills per search request (and per record stored), which means your costs scale with your traffic, not your content. A search box on a popular site fires requests on every keystroke unless you debounce and cache aggressively. Plenty of teams have watched a "free tier" project graduate into a real monthly line item the moment traffic arrived. You're also coupling a core feature of your site to a third party's availability and pricing decisions.
Tier 4 makes sense when search is central to your product, the team is small, and engineering time costs more than the subscription. It makes much less sense for a self-hosted content site that chose self-hosting to avoid exactly this kind of dependency.
The four tiers side by side
| Tier | Relevance ranking | Typo tolerance | Extra infrastructure | Realistic ceiling |
|---|---|---|---|---|
SQL LIKE |
None | No | None | ~1,000 posts |
| DB full-text (FULLTEXT / FTS5) | Yes (TF-IDF / BM25) | No | None | Tens of thousands of documents |
| Dedicated engine (Meilisearch, Typesense, Elasticsearch) | Yes, tunable | Yes | One more service to run | Millions of documents |
| SaaS (Algolia) | Yes, tunable | Yes | None — but per-search pricing | Whatever your budget allows |
The pattern worth noticing: tiers 1 and 2 cost the same to operate — nothing. The jump from 2 to 3 is the expensive one, and it's an ops jump, not a code jump. Make that jump because users need typo tolerance or facets, not because Elasticsearch looks better on the architecture diagram.
Relevance basics: small rules, big difference
Whatever tier you're on (above LIKE), two cheap adjustments fix most "why is this result first?" complaints.
Weight the title above the body. A query matching a post's title is a far stronger signal than the same words buried in paragraph twelve. With separate MySQL indexes on title and body, it's one expression:
SELECT id, title,
(MATCH(title) AGAINST('caching') * 3
+ MATCH(body) AGAINST('caching')) AS score
FROM posts
WHERE MATCH(title, body) AGAINST('caching')
ORDER BY score DESC;
In SQLite FTS5 it's a one-liner — bm25(posts_fts, 3.0, 1.0) weights the title column at 3x. The exact multiplier matters less than having one; anywhere from 2x to 5x reads as "right" to users.
Boost recent content, gently. For news-like content, a decay factor keeps stale posts from squatting on top spots: multiply the score by something like 1 / (1 + days_old / 365). Keep it gentle — a great post from 2023 should still beat a mediocre one from last week. Evergreen docs sites can skip this entirely.
That's genuinely most of it. Stemming, synonyms, and field-level analyzers are real tools, but title weighting plus a mild recency boost gets a content site 90% of the way to "search feels right."
Search UX: instant results vs a results page
You have two delivery patterns, and the good news is they share a backend.
Instant results — a dropdown that populates as the user types — feels modern and keeps users in flow. The implementation rules: debounce keystrokes by 150–300ms so you're not firing a query per character, show the top 5–7 hits with titles and section labels, and end the list with a "See all results" link. Prefix search (tier 2 and up) is what makes this work mid-word.
A results page is still worth having even with instant search: it's linkable, paginated, and the natural landing spot for that "see all" link. If your CMS exposes search over an API — UnfoldCMS's /api/v1/search endpoint is one example — the same endpoint backs both the dropdown and the page, whether your frontend is server-rendered Blade or a React app. That API-first shape is also a fair item to weigh when comparing headless CMS options.
Highlight the match. Wrap matched terms in <mark> in titles and snippets. Users scan for their own query words to judge relevance; highlighting is how they confirm a result is worth clicking in under a second. SQLite FTS5 ships highlight() and snippet() functions that do this in the query itself.
Measure it: zero-result queries are content ideas
Most teams ship search and never look at what people type into it. That's leaving the best part on the table. Log every query with a timestamp and a result count — one small table:
CREATE TABLE search_logs (
id INTEGER PRIMARY KEY,
query VARCHAR(255) NOT NULL,
results_count INT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Then review two reports monthly:
- Top queries tell you what visitors expect your site to cover and what your navigation is failing to surface. If "pricing" is a top search, your pricing link is too hard to find.
- Zero-result queries are a ranked list of content you should write. Real users asked your site a question and got nothing back. No keyword tool gives you intent data this clean, and it costs you a one-line insert.
Normalize queries (lowercase, trim) before aggregating, and cap stored query length — search boxes attract pasted garbage and probe strings.
Search and SEO: keep result pages out of the index
One thing search adds that can hurt you: internal search result URLs (/search?q=...) are infinite, thin, auto-generated pages. Google's guidelines explicitly call out internal search results as content to keep out of the index, and letting them in wastes crawl budget on pages with no standalone value.
The fix is a noindex directive on the results page:
<meta name="robots" content="noindex, follow">
Use noindex, not a robots.txt disallow — if crawlers are blocked from fetching the page, they never see the noindex, and the URL can still end up indexed from external links. follow keeps the links on the page crawlable so any link equity flows onward. This is one item on a longer list of search-engine hygiene a CMS should handle for you — the CMS SEO checklist covers the other nine.
Climb the ladder when users push you
The ladder, compressed: LIKE is fine for small sites and honest about being temporary. Database full-text — MySQL FULLTEXT or SQLite FTS5 — is the sweet spot for most content sites: real ranking, prefix search, zero new infrastructure, works on shared hosting. Dedicated engines earn their ops cost when you need typo tolerance and facets at scale. Algolia buys best-in-class DX with a meter attached.
Start at tier 2 unless you have a measured reason not to. Weight titles over bodies, debounce your instant results, log zero-result queries, and noindex the results page. Then let real user behavior — not architecture envy — tell you when to climb.
Frequently Asked Questions
What is the simplest way to add search to a CMS site?
A SQL LIKE '%term%' query on the title and body columns. It works for sites with under ~5,000 posts and requires no extra infrastructure. The main limit is it does not handle typos, stemming, or relevance ranking.
When should I switch from database search to a dedicated search engine? When zero-result rate climbs above 15–20%, when response time exceeds 200ms, or when users tell you search is not finding things they know are there. Meilisearch or Algolia are the two most common upgrades.
Does site search affect SEO?
Yes — if search result pages are indexed by Google. Block them with noindex or Disallow in robots.txt. Having thousands of thin search-result pages indexed can dilute crawl budget and hurt rankings.
How do I measure whether my search is working? Track zero-result queries (searches that returned nothing), top search terms, and click-through from search results. Most analytics tools and server logs give you the raw data; the insight is in the zero-result list.
Should I use client-side or server-side search? Server-side for most CMS sites — it keeps the index private, works without JavaScript, and scales better. Client-side (Pagefind, Fuse.js) is a good fit for static sites where you can ship a pre-built index at deploy time.
Free & Open Source
Own your CMS. No subscriptions.
Unfold CMS is free to download and self-host. Built on Laravel + React, full source code included.
Share this post: