XML Sitemap Guide: What Still Matters in 2026

Half the sitemap advice on the internet is from 2009. People still set priority="0.8" on every page, still tweak changefreq to "daily" hoping Google crawls faster, and still wonder why neither does anything. Google has been telling us for years which parts of the sitemap protocol it reads and which parts it throws away — most guides just never caught up.

This is an xml sitemap guide for 2026: what a sitemap actually does, which fields Google still reads, what belongs in the file, and the mistakes that quietly hurt crawling. Short version up front:

TL;DR: A sitemap is a discovery aid, not a ranking factor. Google reads loc and lastmod (when the dates are honest) and ignores priority and changefreq entirely. Put only canonical, indexable, 200-status URLs in it. Submit it once in Search Console, reference it in robots.txt, and let your CMS regenerate it automatically so it never goes stale.

What Does an XML Sitemap Actually Do?

A sitemap is a list of URLs you hand to crawlers so they can find your pages without crawling every link. That's the whole job. It helps discovery. It does not boost rankings, it does not guarantee indexing, and it does not override your site's actual link structure.

Google's own documentation is blunt about this: "Using a sitemap doesn't guarantee that all the items in your sitemap will be crawled and indexed." A sitemap is a hint, and Google treats every part of it as a hint.

So when does it matter? Google's docs list the real cases:

Large sites — hundreds or thousands of pages where some get buried deep in the click path.
New sites — few external links pointing in, so crawlers have no path to follow yet.
Sites with weak internal linking — pages that aren't reachable from navigation (they shouldn't exist, but they do).
Fast-changing sites — blogs, news, e-commerce, where lastmod tells Google what changed since the last crawl.

If you run a 12-page brochure site with clean navigation, Google will find everything through links alone. The sitemap still costs you nothing, so ship one anyway — but don't expect it to move rankings. It won't.

What it does give you that's underrated: Search Console reporting. Once you submit a sitemap, the Pages report can filter indexing status by sitemap. That's how you find out 40% of your blog posts are sitting in "Crawled — currently not indexed" instead of guessing.

Sitemap Protocol Basics: Limits and Index Files

The protocol from sitemaps.org hasn't changed in years, and the limits are hard:

50,000 URLs per sitemap file, maximum
50 MB uncompressed per file, maximum
UTF-8 encoding, special characters entity-escaped
Gzip compression allowed (sitemap.xml.gz) — the 50 MB limit applies to the uncompressed size

A minimal valid sitemap looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/blog/xml-sitemap-guide/</loc>
    <lastmod>2026-06-10</lastmod>
  </url>
</urlset>

Notice what's missing: no priority, no changefreq. That's deliberate — more on why below.

When Do You Need a Sitemap Index?

Past 50,000 URLs (or 50 MB), you split into multiple sitemap files and point a sitemap index at them:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2026-06-12</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2026-05-30</lastmod>
  </sitemap>
</sitemapindex>

Even under the limit, splitting by content type (posts, pages, categories) is worth it: when Search Console reports an indexing problem, you can see which bucket it's in. One giant file gives you one giant undifferentiated error count.

You submit the index file once; Google follows it to the child sitemaps. An index can list up to 50,000 sitemaps, so the practical ceiling is 2.5 billion URLs. You'll run out of crawl budget long before you run out of protocol.

Which Sitemap Fields Matter in 2026?

Two fields matter: loc and lastmod. Google ignores priority and changefreq completely — this is confirmed, not speculation. Gary Illyes said it plainly, and Google's own sitemap documentation now states: "Google ignores priority and changefreq values."

Field	Google in 2026	Verdict
`loc`	Required. The URL itself.	Always include
`lastmod`	Used as a crawl signal — if your dates are consistently accurate	Include, keep honest
`priority`	Ignored	Drop it
`changefreq`	Ignored	Drop it

Why lastmod Still Earns Its Place

lastmod is the one optional field with a confirmed job. Google uses it to decide whether a known URL is worth recrawling. A page that says it changed yesterday gets a recrawl sooner than one untouched since 2023.

The catch — and Google has said this on the record — is trust. If your sitemap claims every URL changed today, every day, Google learns your dates are noise and stops reading them for your whole site. One dishonest field poisons the rest. Set lastmod from your database's real updated_at timestamp, not from the time the sitemap file was generated. Those are very different things, and confusing them is the single most common sitemap bug I see.

Use W3C datetime format: 2026-06-10 is fine; 2026-06-10T14:30:00+00:00 is fine too if you actually track time-of-day edits.

Why priority and changefreq Died

priority was always relative to your own site — and everyone set everything to 0.8 or 1.0, which made it meaningless. changefreq was wishful thinking: marking a page "hourly" never made Google crawl hourly, and Google can compare crawl snapshots itself to learn real change rates. Both fields have been dead weight for over a decade. Including them doesn't hurt, but it bloats the file and signals that whoever built the generator hasn't read the docs since 2010.

What Belongs in a Sitemap (and What Doesn't)

Every URL in your sitemap should be canonical, indexable, and return HTTP 200. Anything else wastes crawl budget and muddies your Search Console reports. Google calls sitemap URLs a strong canonical hint — so listing the wrong variants actively works against you.

Include:

Canonical URLs only — one version per page, matching your rel="canonical" tags
Pages that return 200 when fetched
Pages you want indexed — no noindex anywhere in the file
The trailing-slash (or non-trailing) variant you've standardized on, consistently

Leave out:

Noindexed pages — a sitemap says "index this," noindex says "don't." Sending both signals for the same URL tells Google your site contradicts itself.
Redirects — list the destination, never the redirecting URL
404s and 410s — dead URLs in a sitemap are pure noise
Paginated junk — /blog/page/7/, filtered category URLs, ?sort=price variants. Crawlers find pagination through links; it doesn't need sitemap promotion.
Parameter duplicates, print versions, staging URLs — anything that isn't the one true version of a page

A clean test: pick 10 random URLs from your sitemap and curl -I each one. Every response should be 200, and none should carry a noindex header or meta tag. If even one fails, your generator has a filtering bug.

How Do You Submit a Sitemap?

Two mechanisms, use both:

Search Console — Indexing → Sitemaps → paste the URL → Submit. One time. Google remembers it and refetches on its own schedule. This is also what turns on per-sitemap indexing reports, which is the real prize.
robots.txt — add a Sitemap: line. This is how Bing, and any other crawler you've never configured, finds the file:

Sitemap: https://example.com/sitemap.xml

The line takes a full absolute URL and can appear anywhere in robots.txt. Multiple Sitemap: lines are valid if you have several files outside an index.

What you don't need anymore: the old ping endpoint. Google deprecated google.com/ping?sitemap= in 2023 and now relies on regular refetches plus the lastmod dates inside the file. Any plugin still "pinging Google" on every publish is doing nothing.

In UnfoldCMS the robots.txt is editable from the admin — the Sitemap: line ships in the default file, so both pieces are wired up before you think about them. The same admin area manages the llms.txt file for AI crawlers, which is becoming the sitemap's younger sibling: a discovery file, but for language models instead of search bots.

Dynamic CMS Sitemaps vs Static Files

Generate the sitemap dynamically from your database. Static sitemap files go stale the moment someone publishes, unpublishes, or renames anything.

A static sitemap.xml you regenerate "when you remember" is a liability. The failure mode is silent: you publish six posts, the sitemap doesn't know, discovery slows, and nothing errors anywhere. Dynamic generation — the sitemap route queries the database at request time (or rebuilds on a short cache) — makes staleness impossible by construction.

The dynamic approach also fixes lastmod for free: each URL's date comes from the row's real updated_at, so editing one post changes one date. Compare that to static generators that stamp generation-time on every URL — the exact "everything changed today" pattern that makes Google distrust your dates.

Scheduled content is the sneaky edge case. A post scheduled for next Tuesday must not appear in the sitemap until next Tuesday — it 404s publicly until then, and a 404 in your sitemap is an own goal. UnfoldCMS handles this correctly: the auto-generated sitemap only includes published, live posts, so scheduled publishing and the sitemap can't disagree. If you're evaluating platforms, this kind of plumbing is exactly what to check — it's item one in our CMS SEO checklist.

When is static fine? Genuinely static sites — a docs site rebuilt by CI on every deploy, where the build step regenerates the sitemap from the same content it renders. The principle holds either way: the sitemap must be a byproduct of publishing, never a separate chore.

Common Sitemap Mistakes (Ranked by Damage)

Stale or fake lastmod on every URL. Generation-time stamps on all URLs teach Google to ignore your dates site-wide. Worst because it breaks the one field that still works.
Forgetting the sitemap after a migration. You move domains or restructure slugs, set up redirects, and the sitemap keeps serving the old URLs — so Google keeps crawling 301s for months. After any migration, the sitemap must list final destination URLs only. This pairs with redirect hygiene: UnfoldCMS keeps slug history and auto-redirects old post URLs, and its SEO redirects support an optional expiry date for redirects that should die after the migration settles — but the sitemap should never lean on either. List the new URL, redirect the old one, done.
404s in the file. Deleted a page, sitemap didn't notice. Every crawl of that URL is wasted budget, and Search Console flags it forever.
Noindexed and redirected URLs included. Contradictory signals; Google resolves them by trusting your sitemap less.
Wrong URL variants. http:// instead of https://, missing trailing slashes, www mismatches. Each one is a soft duplicate that competes with your canonical.
Submitting and never looking again. The Sitemaps report in Search Console shows discovered vs indexed counts per file. A widening gap is your earliest warning of indexing problems — but only if you check it.

Do You Need an Image Sitemap?

Briefly: probably not as a separate file. The protocol supports an <image:image> extension inside your regular sitemap entries, which helps Google associate images with pages — useful for sites where image search drives real traffic (recipes, products, photography). For a typical blog or company site, properly used <img> tags with descriptive alt text get crawled fine without it. If you do add image entries, the same rule applies: only live, canonical image URLs. News and video sitemaps are their own specialized formats with their own rules — out of scope here, and only relevant if you're a publisher or video platform.

FAQ

Does an XML sitemap improve rankings?

No. Google has confirmed sitemaps are a discovery mechanism, not a ranking signal. A sitemap helps pages get found faster; what happens after crawling depends on content quality and links like everything else.

How often should a sitemap update?

Every time content changes — which is why it should be generated dynamically by your CMS rather than rebuilt by hand. There's no "refresh schedule" to optimize; correctness is the goal.

Should small sites bother with a sitemap?

Yes, because it's free and it unlocks per-sitemap indexing reports in Search Console. Just don't expect discovery gains — Google crawls a well-linked 20-page site fully without one.

Can a sitemap contain URLs from a different domain?

Only if you prove ownership of both in Search Console, or host the sitemap via a robots.txt reference on the target domain. In practice: keep each domain's sitemap on that domain and skip the headache.

Is a missing sitemap why my page isn't indexed?

Almost never. If a page is crawled but not indexed, the sitemap already did its job — the problem is page quality, internal linking, or duplication. Check the URL Inspection tool before touching the sitemap.

The Bottom Line

Treat the sitemap like infrastructure, not strategy. Two fields, honest dates, clean URLs, generated automatically by the platform — then forget it exists and spend the saved hours on content and links, which actually move rankings.

If your current platform makes you regenerate sitemaps by hand or ships fake lastmod dates, that's a platform problem. UnfoldCMS — a self-hosted Laravel CMS that runs on shared hosting — generates the sitemap, JSON-LD structured data, and redirect handling automatically, and exposes content over a REST API at /api/v1/* if you go headless. See how it compares in our breakdown of Contentful alternatives for 2026, or check the feature list yourself.

Sources: Google Search Central sitemap documentation (developers.google.com), sitemaps.org protocol specification, Google Search Central blog on the sitemap ping deprecation (2023), and Google's public statements on lastmod usage via Search Off the Record and Gary Illyes.

XML Sitemaps in 2026: What Still Matters (and What Doesn't)