I audit 15-20 websites per month. The pattern I see most often: businesses investing thousands in content and link building while their technical foundation is cracked. Half their pages aren’t being crawled properly. Their Core Web Vitals are failing. And they have zero visibility to AI crawlers like GPTBot and PerplexityBot.
Technical SEO isn’t the exciting part of search marketing. Nobody’s bragging about their XML sitemap on LinkedIn. But it’s the multiplier that makes everything else work. A site with great content marketing on a broken technical foundation is like a Formula 1 engine in a car with flat tires.
This guide covers every technical SEO factor that matters in 2026, with real audit examples, code fixes, and the complete 47-point checklist we use for every client.
Crawlability and Indexation
Before Google can rank your content, Googlebot needs to find it, crawl it, and render it. Before ChatGPT can cite you, GPTBot needs the same access. I find crawlability problems on the majority of sites I audit.
Site Architecture
Your site architecture determines how crawlers discover and prioritize pages. The rule: no important page should be more than 3 clicks from your homepage.
Result: Google barely crawls this page. Zero rankings.
Result: Crawled daily. Rankings within 8 weeks.
What to check:
- Flat hierarchy: Homepage links to hub pages, hubs link to individual pages. Three levels maximum.
- Logical URLs:
/services/technical-seo/tells crawlers what to expect./page-id-4827/tells them nothing. - Orphan pages: Pages with zero internal links are invisible. Run Screaming Frog’s “Orphan Pages” report to find them. We found 340 orphan pages on one client’s eCommerce site that had never been indexed.
- Internal link depth: Use Sitebulb or Screaming Frog’s crawl depth report. Any page at depth 4+ needs a shortcut link from a higher-level page.
XML Sitemaps
Your sitemap should be a curated list of important, indexable pages. Not a dump of every URL.
- Only include canonical, indexable URLs (no redirects, no noindexed pages, no parameterized URLs)
- Keep each sitemap under 50,000 URLs (use sitemap indexes for larger sites)
- Update
lastmoddates only when content actually changes (Google ignores sites that update lastmod artificially) - Split by content type:
pages-sitemap.xml,posts-sitemap.xml,products-sitemap.xml - Submit through Google Search Console AND reference in robots.txt:
Sitemap: https://yoursite.com/sitemap_index.xml
Robots.txt Configuration
Your robots.txt controls which crawlers access which parts of your site. In 2026, this means managing access for Google, Bing, AND AI crawlers.
User-agent: * Allow: / Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Disallow: /wp-includes/ Disallow: /?s= Disallow: /search/ Disallow: /tag/ Disallow: /author/ # AI Crawlers - ALLOW (you want AI visibility) User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: PerplexityBot Allow: / User-agent: anthropic-ai Allow: / Sitemap: https://yoursite.com/sitemap_index.xml
Check your robots.txt right now. Many hosting providers and security plugins block AI crawlers by default. If you see User-agent: GPTBot / Disallow: /, you’re invisible to ChatGPT’s search features. For most businesses, that’s traffic you want. Read our AI search optimization guide for the full strategy.
Canonical Tags
Canonical tags tell Google which version of a page is the “official” one. They’re critical for handling duplicate content.
Common mistakes we find in audits:
- Self-referencing canonicals missing: Every page should have a canonical tag pointing to itself
- HTTP/HTTPS mismatch: Canonical says
http://but page loads onhttps:// - Trailing slash inconsistency:
/services/seovs/services/seo/are different URLs to Google - Paginated pages: Page 2, 3, 4 of a category should canonical to themselves, NOT to page 1
- Parameter duplicates:
?utm_source=googlecreates a duplicate. Canonical should strip parameters.
Redirect Chains and Loops
A redirect chain is when URL A redirects to B, which redirects to C, which redirects to D. Each hop wastes crawl budget and leaks link equity (roughly 15% per redirect according to Moz testing). If your site was hit by redirect-related issues, our penalty recovery guide covers the full remediation process.
Real example: We audited a site with 1,200 redirects. 340 of them were chains (2-5 hops). After cleaning these to single direct redirects, crawl efficiency improved 28% and 12 pages that had been stuck on page 2 moved to page 1 within 6 weeks.
How to find them: Screaming Frog > Response Codes > Redirect Chains. Fix every chain to a single 301 redirect pointing directly to the final destination.
Index Management
- Check index coverage: Google Search Console > Pages report. Look for “Crawled but not indexed” and “Discovered but not indexed” issues.
- Noindex thin pages: Tag pages, author archives, internal search results, and filtered views should be noindexed to prevent index bloat.
- Remove from index: Use the URL Removal tool for urgent removals, or
meta robots noindexfor permanent exclusions.
Core Web Vitals: Real Fixes With Real Results
Core Web Vitals measure user experience. They’re a confirmed ranking factor, though I’ll be honest: they’re a tiebreaker, not a catapult. But when competing for positions 1-5, these metrics determine who wins. Our dedicated CWV deep-dive covers every fix with code examples.
Client Case Study: eCommerce Speed Optimization
Client: WooCommerce store, 4,200 products, shared hosting
font-display: swap. Total timeline: 2 weeks.
LCP (Largest Contentful Paint) – Target: Under 2.5s
LCP measures when the largest visible content element finishes rendering. Usually your hero image or main heading.
Fix 1: Optimize the LCP element
// Hero image: Use fetchpriority + explicit dimensions <img src="hero.webp" width="1200" height="630" fetchpriority="high" alt="Descriptive alt text" style="aspect-ratio: 1200/630;"> // Preload the LCP image in <head> <link rel="preload" as="image" href="hero.webp" fetchpriority="high"> // For below-fold images: lazy load <img src="photo.webp" loading="lazy" decoding="async" width="600" height="400">
Fix 2: Eliminate render-blocking resources
- Defer non-critical CSS: Load above-fold styles inline, defer the rest with
media="print" onload="this.media='all'" - Defer JavaScript with
deferorasyncattributes - Move third-party scripts (analytics, chat, ads) to load after main content
Fix 3: Server response time (TTFB)
- Shared hosting: 800-2000ms TTFB (too slow for good LCP)
- VPS (DigitalOcean, Cloudways): 150-400ms (acceptable)
- CDN-backed (Cloudflare, Fastly): 50-200ms (ideal)
- If your TTFB exceeds 600ms, no amount of frontend optimization will save your LCP
INP (Interaction to Next Paint) – Target: Under 200ms
INP replaced First Input Delay in March 2024 and is significantly harder to pass. It measures responsiveness across the entire session, not just the first click.
Common INP killers and fixes:
- Long JavaScript tasks (>50ms): Break into smaller chunks using
requestIdleCallbackorscheduler.yield() - Heavy event handlers: Debounce scroll and input handlers. Never run expensive DOM queries on every keystroke.
- Third-party scripts: A/B testing tools (Optimizely, VWO), live chat widgets, and analytics scripts are the #1 INP killer. Load them on interaction, not on page load.
- WordPress plugin bloat: Elementor, Slider Revolution, and WooCommerce Cart Fragments are common offenders. Deactivate what you don’t need.
Open Chrome DevTools > Performance panel > click “Interactions” track. Record a session, click around your site, and look for red interaction markers. Each red flag shows exactly which user action was slow and what JavaScript caused the delay. This is the fastest way to diagnose INP problems.
CLS (Cumulative Layout Shift) – Target: Under 0.1
CLS measures unexpected page jumps. The easiest metric to fix.
/* Prevent font-swap layout shift with size-adjust */ @font-face { font-family: 'Outfit'; src: url('outfit.woff2') format('woff2'); font-display: swap; size-adjust: 105%; /* Match fallback font metrics */ ascent-override: 95%; descent-override: 22%; line-gap-override: 0%; } /* Always set image dimensions */ img { max-width: 100%; height: auto; aspect-ratio: attr(width) / attr(height); }
Structured Data and Schema Markup
Structured data has evolved from “nice to have” to essential infrastructure. AI search engines parse structured data to understand your business entity. A page with proper schema is far more likely to be cited by ChatGPT or Perplexity than one without it.
Required Schema by Business Type
| Business Type | Required Schema | Rich Result |
|---|---|---|
| All websites | Organization, WebSite, BreadcrumbList | Sitelinks, Knowledge Panel |
| Service businesses | Service, ProfessionalService, FAQPage | FAQ dropdowns, Service details |
| Local businesses | LocalBusiness, OpeningHoursSpecification | Map pack, Hours, Reviews |
| eCommerce | Product, AggregateRating, Offer, Review | Star ratings, Price, Availability |
| Content/Blog | Article, HowTo, VideoObject | Date, Author, How-to steps |
| Healthcare | MedicalCondition, Physician, MedicalProcedure | Health cards, Provider details |
Read our dedicated schema markup implementation guide for full JSON-LD code examples for each type.
AI Crawler Management
In 2026, your site isn’t just crawled by Googlebot. GPTBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), and Google-Extended are indexing the web for AI search responses.
What to Do
- Allow AI crawlers in robots.txt (see code example above)
- Create an llms.txt file: This emerging standard tells AI crawlers what your business does, your services, pricing, and key pages. Think of it as robots.txt for AI.
- Implement comprehensive schema: AI engines parse structured data to understand entities
- Ensure fast, accessible pages: AI crawlers have timeout limits. JavaScript-rendered content may not be parsed.
- Structure content for citation: Clear H2/H3 headings, direct answer paragraphs, factual claims with numbers
Mobile-First Indexing
Google uses the mobile version of your content for indexing and ranking. What this means practically:
- Your mobile site IS your site as far as Google is concerned
- Content hidden behind tabs/accordions on mobile IS crawled (Google changed this in 2023)
- Mobile page speed directly impacts rankings
- Touch targets need 48x48px minimum (Lighthouse audits this)
HTTPS and Security Headers
- HTTPS is non-negotiable in 2026
- Implement HSTS (HTTP Strict Transport Security) headers
- Ensure zero mixed content (all resources load over HTTPS)
- Use 301 redirects from HTTP to HTTPS (not 302)
- Add security headers: Content-Security-Policy, X-Frame-Options, X-Content-Type-Options
JavaScript SEO
If your site uses React, Angular, Vue, or Next.js, you have a rendering challenge that affects both Google and AI search visibility. Google uses a two-pass indexing system: first it crawls HTML, then renders JavaScript (sometimes days later). AI crawlers often skip JS rendering entirely. This is especially critical for on-page SEO since content that isn’t rendered can’t be ranked.
Solutions by Priority
- Server-Side Rendering (SSR): Gold standard. Pages rendered on server before sending to client.
- Static Site Generation (SSG): Pre-render at build time. Best for content that changes infrequently.
- Hybrid approach: SSR for critical pages (homepage, service pages), client-side for less important pages.
- Dynamic rendering: Serve pre-rendered HTML to crawlers, JavaScript to users. A workaround, not ideal long-term.
Advanced Technical Issues
Log File Analysis
Server logs show you exactly how Googlebot behaves on your site, which pages it crawls most, which it ignores, and where it encounters errors. Tools: Screaming Frog Log File Analyser, Oncrawl, Botify.
What to look for:
- Pages Googlebot crawls frequently (high priority in Google’s eyes)
- Pages Googlebot never visits (may need better internal linking)
- 5xx errors during crawl (server issues Google sees that you don’t)
- Crawl budget waste on low-value URLs (parameterized, paginated, filtered)
Crawl Budget Optimization
For sites with 10,000+ pages, crawl budget matters. Google won’t crawl everything every day.
- Block low-value URLs in robots.txt (search results, filtered views, session URLs)
- Fix redirect chains (each hop wastes crawl budget)
- Return proper 404s for dead pages (not soft 404s that waste crawl resources)
- Keep your sitemap clean (only include pages you want indexed)
International SEO: Hreflang
If you serve multiple countries or languages, hreflang tags tell Google which version to show each audience.
<link rel="alternate" hreflang="en-us" href="https://example.com/services/" /> <link rel="alternate" hreflang="en-au" href="https://example.com.au/services/" /> <link rel="alternate" hreflang="en-gb" href="https://example.co.uk/services/" /> <link rel="alternate" hreflang="x-default" href="https://example.com/services/" />
Common hreflang mistakes: missing return tags (every page must reference every other version), wrong language codes, mixing hreflang with canonical incorrectly.
Pagination and Faceted Navigation
- Pagination: Use rel=”next”/rel=”prev” where supported, or implement load-more/infinite scroll with proper URL handling
- Faceted navigation: Color/size/price filters create thousands of parameterized URLs. Solution: canonical to base URL, or robots meta noindex on filtered views. Never block with robots.txt (this hides the links on those pages too).
The Complete 47-Point Technical SEO Checklist
Here’s every item we check during a client technical audit. This isn’t a preview. This is the full checklist.
Crawlability (12 items)
- XML sitemap exists, is valid, and submitted to Search Console
- Robots.txt properly configured (not blocking important content)
- AI crawler access enabled (GPTBot, PerplexityBot, ClaudeBot)
- No orphan pages (every page has at least one internal link)
- Site architecture depth under 4 clicks for all important pages
- No redirect chains or loops (all redirects are single-hop 301s)
- Canonical tags present and correct on all pages
- No duplicate content issues (parameterized URLs handled)
- Pagination handled correctly
- Faceted navigation not creating index bloat
- Internal links using descriptive anchor text
- Broken internal links: zero (check monthly)
Indexation (8 items)
- Google Search Console: zero “Crawled but not indexed” important pages
- Thin pages noindexed (tag archives, author pages, internal search)
- No accidental noindex on important pages
- Meta robots tags correctly configured per page type
- Google cache shows current version of key pages
- No soft 404 errors (Search Console > Pages report)
- Hreflang tags correct for multi-language/multi-region sites
- URL parameters handled in Search Console (if applicable)
Page Speed and Core Web Vitals (12 items)
- LCP under 2.5 seconds on mobile (field data in CrUX)
- INP under 200ms on mobile
- CLS under 0.1
- TTFB under 600ms (server response time)
- All images in WebP or AVIF format
- All images have explicit width and height attributes
- Hero/LCP image has
fetchpriority="high" - Below-fold images use
loading="lazy" - Critical CSS inlined, non-critical CSS deferred
- JavaScript deferred or async (no render-blocking scripts)
- Web fonts use
font-display: swapwith preload - Third-party scripts loaded after main content (or on interaction)
Structured Data (7 items)
- Organization schema with logo, contact, social profiles
- WebSite schema with SearchAction
- BreadcrumbList schema on all inner pages
- Page-specific schema (Article, Product, Service, FAQPage, LocalBusiness)
- All schema validates in Google Rich Results Test
- Schema data matches visible page content
- No deprecated schema types or properties
Security and Accessibility (4 items)
- HTTPS enabled with valid SSL certificate
- HSTS header configured
- Zero mixed content (all resources over HTTPS)
- Security headers present (CSP, X-Frame-Options)
AI Search Readiness (5 items)
- AI crawlers allowed in robots.txt
- llms.txt file created with business information
- Content structured for AI parsing (clear headings, direct answer paragraphs)
- Entity information consistent across site and web (NAP, brand mentions)
Download This Checklist as a Spreadsheet
47-Point Technical SEO Checklist (Google Sheet + PDF)
Editable spreadsheet with status tracking, priority scoring, and notes column. Use it for your own audits.
1XML sitemap valid and submittedPass / Fail / N/AHigh / Med / Low
2Robots.txt not blocking contentPass / Fail / N/AHigh / Med / Low
3AI crawler access enabledPass / Fail / N/AHigh / Med / Low
……44 more items with notes column
We’ll email the Google Sheet link + PDF. No spam. Join 3,200+ subscribers who get our weekly SEO insights.
Don’t try to fix everything at once. Prioritize by impact: crawlability issues first (Google can’t rank what it can’t find), then indexation, then speed, then schema. A site with perfect CWV scores but broken crawlability won’t rank. A site with good crawlability and decent speed will outperform one with the reverse.
Tools We Use for Technical Audits
- Screaming Frog: The workhorse. Crawl analysis, broken links, redirect chains, duplicate content, schema validation. $259/year.
- Google Search Console: Free. Index coverage, CWV data, manual actions, crawl stats. Irreplaceable.
- PageSpeed Insights: Free. Lab and field CWV data. Use field data for ranking signals (what Google actually uses).
- Sitebulb: Visual crawl analysis. Better than Screaming Frog for site architecture visualization. $35-65/month.
- Ahrefs Site Audit: Cloud-based, good for ongoing monitoring. Included with Ahrefs subscription.
- Chrome DevTools: Free. Performance panel for INP debugging, Network panel for waterfall analysis, Lighthouse for automated audits.
Want Us to Run This Audit for You?
Our senior strategists use this exact 47-point checklist plus AI visibility analysis. We walk you through every finding on a live screen share. No commitment, no sales pitch.


