I audit 15-20 websites per month. The pattern I see most often: businesses investing thousands in content and link building while their technical foundation is cracked. Half their pages aren’t being crawled properly. Their Core Web Vitals are failing. And they have zero visibility to AI crawlers like GPTBot and PerplexityBot.

Technical SEO isn’t the exciting part of search marketing. Nobody’s bragging about their XML sitemap on LinkedIn. But it’s the multiplier that makes everything else work. A site with great content marketing on a broken technical foundation is like a Formula 1 engine in a car with flat tires.

This guide covers every technical SEO factor that matters in 2026, with real audit examples, code fixes, and the complete 47-point checklist we use for every client.

68%
of websites have critical crawlability issues (Screaming Frog, 2026)
53%
of mobile visits abandoned if load time exceeds 3 seconds
40%
of large sites have pages Google can’t efficiently crawl

Crawlability and Indexation

Before Google can rank your content, Googlebot needs to find it, crawl it, and render it. Before ChatGPT can cite you, GPTBot needs the same access. I find crawlability problems on the majority of sites I audit.

Site Architecture

Your site architecture determines how crawlers discover and prioritize pages. The rule: no important page should be more than 3 clicks from your homepage.

BAD: Deep Architecture (6 clicks)
Home > About > Our Services > Marketing > Digital Marketing > SEO Services > Technical SEO
Result: Google barely crawls this page. Zero rankings.
GOOD: Flat Architecture (2 clicks)
Home > Services > Technical SEO
Result: Crawled daily. Rankings within 8 weeks.

What to check:

  • Flat hierarchy: Homepage links to hub pages, hubs link to individual pages. Three levels maximum.
  • Logical URLs: /services/technical-seo/ tells crawlers what to expect. /page-id-4827/ tells them nothing.
  • Orphan pages: Pages with zero internal links are invisible. Run Screaming Frog’s “Orphan Pages” report to find them. We found 340 orphan pages on one client’s eCommerce site that had never been indexed.
  • Internal link depth: Use Sitebulb or Screaming Frog’s crawl depth report. Any page at depth 4+ needs a shortcut link from a higher-level page.

XML Sitemaps

Your sitemap should be a curated list of important, indexable pages. Not a dump of every URL.

  • Only include canonical, indexable URLs (no redirects, no noindexed pages, no parameterized URLs)
  • Keep each sitemap under 50,000 URLs (use sitemap indexes for larger sites)
  • Update lastmod dates only when content actually changes (Google ignores sites that update lastmod artificially)
  • Split by content type: pages-sitemap.xml, posts-sitemap.xml, products-sitemap.xml
  • Submit through Google Search Console AND reference in robots.txt: Sitemap: https://yoursite.com/sitemap_index.xml

Robots.txt Configuration

Your robots.txt controls which crawlers access which parts of your site. In 2026, this means managing access for Google, Bing, AND AI crawlers.

RECOMMENDED ROBOTS.TXT FOR 2026
User-agent: *
Allow: /
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /?s=
Disallow: /search/
Disallow: /tag/
Disallow: /author/

# AI Crawlers - ALLOW (you want AI visibility)
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: anthropic-ai
Allow: /

Sitemap: https://yoursite.com/sitemap_index.xml
Pro Tip

Check your robots.txt right now. Many hosting providers and security plugins block AI crawlers by default. If you see User-agent: GPTBot / Disallow: /, you’re invisible to ChatGPT’s search features. For most businesses, that’s traffic you want. Read our AI search optimization guide for the full strategy.

Canonical Tags

Canonical tags tell Google which version of a page is the “official” one. They’re critical for handling duplicate content.

Common mistakes we find in audits:

  • Self-referencing canonicals missing: Every page should have a canonical tag pointing to itself
  • HTTP/HTTPS mismatch: Canonical says http:// but page loads on https://
  • Trailing slash inconsistency: /services/seo vs /services/seo/ are different URLs to Google
  • Paginated pages: Page 2, 3, 4 of a category should canonical to themselves, NOT to page 1
  • Parameter duplicates: ?utm_source=google creates a duplicate. Canonical should strip parameters.

Redirect Chains and Loops

A redirect chain is when URL A redirects to B, which redirects to C, which redirects to D. Each hop wastes crawl budget and leaks link equity (roughly 15% per redirect according to Moz testing). If your site was hit by redirect-related issues, our penalty recovery guide covers the full remediation process.

Real example: We audited a site with 1,200 redirects. 340 of them were chains (2-5 hops). After cleaning these to single direct redirects, crawl efficiency improved 28% and 12 pages that had been stuck on page 2 moved to page 1 within 6 weeks.

How to find them: Screaming Frog > Response Codes > Redirect Chains. Fix every chain to a single 301 redirect pointing directly to the final destination.

Index Management

  • Check index coverage: Google Search Console > Pages report. Look for “Crawled but not indexed” and “Discovered but not indexed” issues.
  • Noindex thin pages: Tag pages, author archives, internal search results, and filtered views should be noindexed to prevent index bloat.
  • Remove from index: Use the URL Removal tool for urgent removals, or meta robots noindex for permanent exclusions.

Core Web Vitals: Real Fixes With Real Results

Core Web Vitals measure user experience. They’re a confirmed ranking factor, though I’ll be honest: they’re a tiebreaker, not a catapult. But when competing for positions 1-5, these metrics determine who wins. Our dedicated CWV deep-dive covers every fix with code examples.

Client Case Study: eCommerce Speed Optimization

Client: WooCommerce store, 4,200 products, shared hosting

LCP
4.2s to 1.8s
INP
380ms to 145ms
CLS
0.24 to 0.04
What we changed: Migrated from shared hosting to Cloudways VPS ($28/mo), converted all images to WebP with ShortPixel, implemented WP Rocket for caching, deferred 14 non-critical scripts, added explicit image dimensions, switched to font-display: swap. Total timeline: 2 weeks.

LCP (Largest Contentful Paint) – Target: Under 2.5s

LCP measures when the largest visible content element finishes rendering. Usually your hero image or main heading.

Fix 1: Optimize the LCP element

LCP IMAGE OPTIMIZATION CODE
// Hero image: Use fetchpriority + explicit dimensions
<img src="hero.webp"
     width="1200" height="630"
     fetchpriority="high"
     alt="Descriptive alt text"
     style="aspect-ratio: 1200/630;">

// Preload the LCP image in <head>
<link rel="preload" as="image" href="hero.webp"
      fetchpriority="high">

// For below-fold images: lazy load
<img src="photo.webp" loading="lazy"
     decoding="async" width="600" height="400">

Fix 2: Eliminate render-blocking resources

  • Defer non-critical CSS: Load above-fold styles inline, defer the rest with media="print" onload="this.media='all'"
  • Defer JavaScript with defer or async attributes
  • Move third-party scripts (analytics, chat, ads) to load after main content

Fix 3: Server response time (TTFB)

  • Shared hosting: 800-2000ms TTFB (too slow for good LCP)
  • VPS (DigitalOcean, Cloudways): 150-400ms (acceptable)
  • CDN-backed (Cloudflare, Fastly): 50-200ms (ideal)
  • If your TTFB exceeds 600ms, no amount of frontend optimization will save your LCP

INP (Interaction to Next Paint) – Target: Under 200ms

INP replaced First Input Delay in March 2024 and is significantly harder to pass. It measures responsiveness across the entire session, not just the first click.

Common INP killers and fixes:

  • Long JavaScript tasks (>50ms): Break into smaller chunks using requestIdleCallback or scheduler.yield()
  • Heavy event handlers: Debounce scroll and input handlers. Never run expensive DOM queries on every keystroke.
  • Third-party scripts: A/B testing tools (Optimizely, VWO), live chat widgets, and analytics scripts are the #1 INP killer. Load them on interaction, not on page load.
  • WordPress plugin bloat: Elementor, Slider Revolution, and WooCommerce Cart Fragments are common offenders. Deactivate what you don’t need.
Pro Tip

Open Chrome DevTools > Performance panel > click “Interactions” track. Record a session, click around your site, and look for red interaction markers. Each red flag shows exactly which user action was slow and what JavaScript caused the delay. This is the fastest way to diagnose INP problems.

CLS (Cumulative Layout Shift) – Target: Under 0.1

CLS measures unexpected page jumps. The easiest metric to fix.

CLS FIX: FONT LOADING WITHOUT LAYOUT SHIFT
/* Prevent font-swap layout shift with size-adjust */
@font-face {
  font-family: 'Outfit';
  src: url('outfit.woff2') format('woff2');
  font-display: swap;
  size-adjust: 105%;  /* Match fallback font metrics */
  ascent-override: 95%;
  descent-override: 22%;
  line-gap-override: 0%;
}

/* Always set image dimensions */
img {
  max-width: 100%;
  height: auto;
  aspect-ratio: attr(width) / attr(height);
}

Structured Data and Schema Markup

Structured data has evolved from “nice to have” to essential infrastructure. AI search engines parse structured data to understand your business entity. A page with proper schema is far more likely to be cited by ChatGPT or Perplexity than one without it.

Required Schema by Business Type

Business Type Required Schema Rich Result
All websites Organization, WebSite, BreadcrumbList Sitelinks, Knowledge Panel
Service businesses Service, ProfessionalService, FAQPage FAQ dropdowns, Service details
Local businesses LocalBusiness, OpeningHoursSpecification Map pack, Hours, Reviews
eCommerce Product, AggregateRating, Offer, Review Star ratings, Price, Availability
Content/Blog Article, HowTo, VideoObject Date, Author, How-to steps
Healthcare MedicalCondition, Physician, MedicalProcedure Health cards, Provider details

Read our dedicated schema markup implementation guide for full JSON-LD code examples for each type.

AI Crawler Management

In 2026, your site isn’t just crawled by Googlebot. GPTBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), and Google-Extended are indexing the web for AI search responses.

What to Do

  • Allow AI crawlers in robots.txt (see code example above)
  • Create an llms.txt file: This emerging standard tells AI crawlers what your business does, your services, pricing, and key pages. Think of it as robots.txt for AI.
  • Implement comprehensive schema: AI engines parse structured data to understand entities
  • Ensure fast, accessible pages: AI crawlers have timeout limits. JavaScript-rendered content may not be parsed.
  • Structure content for citation: Clear H2/H3 headings, direct answer paragraphs, factual claims with numbers

Mobile-First Indexing

Google uses the mobile version of your content for indexing and ranking. What this means practically:

  • Your mobile site IS your site as far as Google is concerned
  • Content hidden behind tabs/accordions on mobile IS crawled (Google changed this in 2023)
  • Mobile page speed directly impacts rankings
  • Touch targets need 48x48px minimum (Lighthouse audits this)

HTTPS and Security Headers

  • HTTPS is non-negotiable in 2026
  • Implement HSTS (HTTP Strict Transport Security) headers
  • Ensure zero mixed content (all resources load over HTTPS)
  • Use 301 redirects from HTTP to HTTPS (not 302)
  • Add security headers: Content-Security-Policy, X-Frame-Options, X-Content-Type-Options

JavaScript SEO

If your site uses React, Angular, Vue, or Next.js, you have a rendering challenge that affects both Google and AI search visibility. Google uses a two-pass indexing system: first it crawls HTML, then renders JavaScript (sometimes days later). AI crawlers often skip JS rendering entirely. This is especially critical for on-page SEO since content that isn’t rendered can’t be ranked.

Solutions by Priority

  1. Server-Side Rendering (SSR): Gold standard. Pages rendered on server before sending to client.
  2. Static Site Generation (SSG): Pre-render at build time. Best for content that changes infrequently.
  3. Hybrid approach: SSR for critical pages (homepage, service pages), client-side for less important pages.
  4. Dynamic rendering: Serve pre-rendered HTML to crawlers, JavaScript to users. A workaround, not ideal long-term.

Advanced Technical Issues

Log File Analysis

Server logs show you exactly how Googlebot behaves on your site, which pages it crawls most, which it ignores, and where it encounters errors. Tools: Screaming Frog Log File Analyser, Oncrawl, Botify.

What to look for:

  • Pages Googlebot crawls frequently (high priority in Google’s eyes)
  • Pages Googlebot never visits (may need better internal linking)
  • 5xx errors during crawl (server issues Google sees that you don’t)
  • Crawl budget waste on low-value URLs (parameterized, paginated, filtered)

Crawl Budget Optimization

For sites with 10,000+ pages, crawl budget matters. Google won’t crawl everything every day.

  • Block low-value URLs in robots.txt (search results, filtered views, session URLs)
  • Fix redirect chains (each hop wastes crawl budget)
  • Return proper 404s for dead pages (not soft 404s that waste crawl resources)
  • Keep your sitemap clean (only include pages you want indexed)

International SEO: Hreflang

If you serve multiple countries or languages, hreflang tags tell Google which version to show each audience.

HREFLANG IMPLEMENTATION
<link rel="alternate" hreflang="en-us" href="https://example.com/services/" />
<link rel="alternate" hreflang="en-au" href="https://example.com.au/services/" />
<link rel="alternate" hreflang="en-gb" href="https://example.co.uk/services/" />
<link rel="alternate" hreflang="x-default" href="https://example.com/services/" />

Common hreflang mistakes: missing return tags (every page must reference every other version), wrong language codes, mixing hreflang with canonical incorrectly.

Pagination and Faceted Navigation

  • Pagination: Use rel=”next”/rel=”prev” where supported, or implement load-more/infinite scroll with proper URL handling
  • Faceted navigation: Color/size/price filters create thousands of parameterized URLs. Solution: canonical to base URL, or robots meta noindex on filtered views. Never block with robots.txt (this hides the links on those pages too).

The Complete 47-Point Technical SEO Checklist

Here’s every item we check during a client technical audit. This isn’t a preview. This is the full checklist.

Crawlability (12 items)

  1. XML sitemap exists, is valid, and submitted to Search Console
  2. Robots.txt properly configured (not blocking important content)
  3. AI crawler access enabled (GPTBot, PerplexityBot, ClaudeBot)
  4. No orphan pages (every page has at least one internal link)
  5. Site architecture depth under 4 clicks for all important pages
  6. No redirect chains or loops (all redirects are single-hop 301s)
  7. Canonical tags present and correct on all pages
  8. No duplicate content issues (parameterized URLs handled)
  9. Pagination handled correctly
  10. Faceted navigation not creating index bloat
  11. Internal links using descriptive anchor text
  12. Broken internal links: zero (check monthly)

Indexation (8 items)

  1. Google Search Console: zero “Crawled but not indexed” important pages
  2. Thin pages noindexed (tag archives, author pages, internal search)
  3. No accidental noindex on important pages
  4. Meta robots tags correctly configured per page type
  5. Google cache shows current version of key pages
  6. No soft 404 errors (Search Console > Pages report)
  7. Hreflang tags correct for multi-language/multi-region sites
  8. URL parameters handled in Search Console (if applicable)

Page Speed and Core Web Vitals (12 items)

  1. LCP under 2.5 seconds on mobile (field data in CrUX)
  2. INP under 200ms on mobile
  3. CLS under 0.1
  4. TTFB under 600ms (server response time)
  5. All images in WebP or AVIF format
  6. All images have explicit width and height attributes
  7. Hero/LCP image has fetchpriority="high"
  8. Below-fold images use loading="lazy"
  9. Critical CSS inlined, non-critical CSS deferred
  10. JavaScript deferred or async (no render-blocking scripts)
  11. Web fonts use font-display: swap with preload
  12. Third-party scripts loaded after main content (or on interaction)

Structured Data (7 items)

  1. Organization schema with logo, contact, social profiles
  2. WebSite schema with SearchAction
  3. BreadcrumbList schema on all inner pages
  4. Page-specific schema (Article, Product, Service, FAQPage, LocalBusiness)
  5. All schema validates in Google Rich Results Test
  6. Schema data matches visible page content
  7. No deprecated schema types or properties

Security and Accessibility (4 items)

  1. HTTPS enabled with valid SSL certificate
  2. HSTS header configured
  3. Zero mixed content (all resources over HTTPS)
  4. Security headers present (CSP, X-Frame-Options)

AI Search Readiness (5 items)

  1. AI crawlers allowed in robots.txt
  2. llms.txt file created with business information
  3. Content structured for AI parsing (clear headings, direct answer paragraphs)
  4. Entity information consistent across site and web (NAP, brand mentions)

Download This Checklist as a Spreadsheet

47-Point Technical SEO Checklist (Google Sheet + PDF)

Editable spreadsheet with status tracking, priority scoring, and notes column. Use it for your own audits.

#Check ItemStatusPriority
1XML sitemap valid and submittedPass / Fail / N/AHigh / Med / Low
2Robots.txt not blocking contentPass / Fail / N/AHigh / Med / Low
3AI crawler access enabledPass / Fail / N/AHigh / Med / Low
…44 more items with notes column

We’ll email the Google Sheet link + PDF. No spam. Join 3,200+ subscribers who get our weekly SEO insights.

Pro Tip

Don’t try to fix everything at once. Prioritize by impact: crawlability issues first (Google can’t rank what it can’t find), then indexation, then speed, then schema. A site with perfect CWV scores but broken crawlability won’t rank. A site with good crawlability and decent speed will outperform one with the reverse.

Tools We Use for Technical Audits

  • Screaming Frog: The workhorse. Crawl analysis, broken links, redirect chains, duplicate content, schema validation. $259/year.
  • Google Search Console: Free. Index coverage, CWV data, manual actions, crawl stats. Irreplaceable.
  • PageSpeed Insights: Free. Lab and field CWV data. Use field data for ranking signals (what Google actually uses).
  • Sitebulb: Visual crawl analysis. Better than Screaming Frog for site architecture visualization. $35-65/month.
  • Ahrefs Site Audit: Cloud-based, good for ongoing monitoring. Included with Ahrefs subscription.
  • Chrome DevTools: Free. Performance panel for INP debugging, Network panel for waterfall analysis, Lighthouse for automated audits.

Want Us to Run This Audit for You?

Our senior strategists use this exact 47-point checklist plus AI visibility analysis. We walk you through every finding on a live screen share. No commitment, no sales pitch.

Claim Your Free Technical Audit