---
name: internal-linking
description: Audit a site's internal link graph — find orphan pages, hub-and-authority structure, anchor-text quality issues, isolated clusters, and link-equity distribution. The agent crawls pages on the user's machine (their IP, no Worker subrequest cap) and the MCP does the graph computation. Returns prescriptive fixes with specific (source, target, anchor) recommendations. Built and maintained by Momentic.
version: 1.0.0
---

# internal-linking

Internal linking is one of the highest-leverage on-site SEO/GEO levers — and the one most teams have no map of. This skill builds the map: it crawls a sample of the site, computes the link graph, and returns a prescriptive list of pages to link, anchors to use, and clusters to consolidate.

**Architecture note:** the agent (you) does the actual crawling using its own HTTP/fetch capability. The Momentic MCP only does the graph math (`compute_link_graph`) and provides a per-page link extractor (`extract_links`). This is by design — your IP, your concurrency, no Worker subrequest cap.

## Prerequisites

- Server: `https://momenticmarketing.com/mcp`
- Tools used: `analyze_site` (for sitemap), `extract_links`, `compute_link_graph`
- Agent capability needed: HTTP fetch (Claude WebSearch + fetch, Claude Code, etc.)

## Process

### Step 1 — Identify scope and crawl plan

Get the user's root domain. Decide on a sample size:

| User intent | Max pages |
|---|---|
| Quick sanity check | 25 |
| Standard audit | 100 |
| Full site (≤500 URLs) | min(sitemap count, 500) |
| Large site (>500 URLs) | 250 sample + warn the user that mention-based suggestions need full-corpus coverage to be reliable |

### Step 2 — Get the URL list

Call `analyze_site(domain)`. From the returned `sitemapXml.urlCount` and `sitemapXml.sections`, decide how to sample:

- **If sitemap < 100 URLs:** crawl them all
- **If sitemap > 100 URLs:** prefer a stratified sample — top-level pages (home, about, services), one or two pages per major section, plus a random tail. The goal is a representative graph, not exhaustive coverage.

If you need the actual URLs (not just counts), fetch `<domain>/sitemap.xml` directly with your fetch tool and parse the `<loc>` entries.

### Step 3 — Crawl

For each URL in the sample, fetch the page using your own HTTP capability:

- **User-Agent:** identify yourself honestly (`MomenticInternalLinkingSkill/1.0` is fine, or use whatever your platform sends)
- **Concurrency:** 3–5 parallel fetches is polite. Don't hammer.
- **Timeout:** 8 seconds per page, then move on
- **Skip:** non-HTML responses (PDFs, images, sitemaps), 4xx/5xx pages, anything > 5MB
- **Capture:** raw HTML, response status, final URL after redirects, and (if you can) the `<title>` and a stripped body-text excerpt for mention detection

For each successful fetch, call `extract_links(html, baseUrl)` on the MCP. Collect the results into:

```js
[
  {
    url: 'https://example.com/about',
    internalLinks: [...],   // from extract_links result
    title: 'About Us',      // optional, enables mention-not-linked suggestions
    bodyText: '...',        // optional, enables mention-not-linked suggestions (first ~5000 chars is enough)
  },
  ...
]
```

### Step 4 — Compute the graph

Call `compute_link_graph(pages, options)`:

```js
compute_link_graph({
  pages: [...],
  options: {
    findMentions: true,    // set true ONLY if pages have title + bodyText
    hubCount: 10,
    authorityCount: 10,
    rankingLimit: 50,
  }
})
```

The result includes: orphans, hubs, authorities, PageRank-lite ranking, anchor issues, connected components, isolated clusters, and (if `findMentions`) mention-based suggestions.

### Step 5 — Report

Format the verdict in this structure:

```
## Internal Linking Audit: <domain>

**Pages crawled:** N (of M in sitemap)
**Edges:** E
**Average internal links per page:** X.X

### Health checks

| Check | Status | Detail |
|---|---|---|
| Orphan pages (in-degree 0) | <count> | <comma-sep first 5 URLs, "..." if more> |
| Connected components | <count> | <largest size>, <isolated cluster count> |
| Anchor diversity issues | <count> | <count of all-same, count of generic-only> |

### Top hubs (most-linked-to)

| URL | In-degree | Out-degree | PageRank |
|---|---|---|---|
| ... |

### Top authorities (link out to many hubs)

| URL | Out-degree | Hubs linked to |
|---|---|---|

### Pages losing the most equity (orphan or weakly linked)

For each orphan or page with in-degree ≤ 1: state which hubs SHOULD link to it and what anchor text to use.

### Specific fixes (do these in order)

1. **Add link from <hub URL> to <orphan URL>** with anchor `<suggested anchor>` — reason: <orphan + topical match>
2. ...
```

### Step 6 — Be specific

The output is only useful if every recommendation is concrete:

- ✗ Bad: "Improve internal linking on the blog"
- ✓ Good: "Add a link from `/blog/seo-basics` to `/services/seo-audit` with anchor *'a free SEO audit'* — currently the blog post mentions audits 4 times but never links to the services page"

Use the `suggestions` array from `compute_link_graph` (when `findMentions: true`) as your primary source for these. Each suggestion has `from`, `to`, and `reason: 'mentioned-not-linked'`. Augment with anchor recommendations based on:
- The target page's `<title>`
- The first H1
- Common phrases in the target page's body
- The brand/product name when relevant

### Step 7 — Note what's not covered

Be honest in the report's "What was NOT verified" section about:
- External backlink equity (we don't see those)
- 301/302 redirect chains internal to the site (compute_link_graph doesn't follow them)
- Pages that exist but aren't in the sitemap (orphans of a different kind)
- JavaScript-rendered links (depends on the agent's fetch capability)

## Output format

```
## Internal Linking Audit: <domain>

**Audit date:** <YYYY-MM-DD>
**Pages crawled:** <N> of <M> in sitemap
**Internal edges:** <E>
**Average internal links per page:** <X.X>

### Graph health

- ✓/⚠/✗ Connectivity — <count> components, largest is <size>, <count> isolated clusters
- ✓/⚠/✗ Hub structure — <comment on hub concentration>
- ✓/⚠/✗ Anchor diversity — <count of all-same, generic-only issues>

### Orphans (no inbound links from crawled pages)

<numbered list of up to 10, each with a 1-line reason it's an orphan>

### Top hubs

<table: URL | in-degree | out-degree | PageRank>

### Top authorities

<table: URL | out-degree | hubs-linked-to>

### Anchor issues

<list any all-anchors-same and no-descriptive-anchors findings>

### Specific link recommendations (top 10)

<numbered list. Each: "Add link from [from URL] → [to URL] with anchor "[suggested anchor]". Reason: [mention found in source body / topical match / orphan rescue]">

### What was NOT verified

- External backlinks
- Redirect chains
- JavaScript-rendered links
- Pages outside the sitemap
```

## Notes for the agent

- **You do the crawling, not the MCP.** The MCP is stateless graph math + per-page link extraction. Calling `extract_links` from the agent on N pages is N MCP requests; the actual fetches happen on your end. This respects target-site rate limits and removes any subrequest cap.
- **Don't use `findMentions: true` on > 250 pages** without warning the user — it's O(N²) and Workers will time out. For larger sites, run mention detection in batches or skip it.
- **Stratify your sample.** A random sample of 100 pages from a 5000-URL site won't surface the orphan blog cluster from 2019 unless you specifically include it. Make sure you sample at least one URL from every top-level section returned by `analyze_site → sitemapXml.sections`.
- **Suggested-anchor quality matters.** Don't return "click here" or "more info" as a suggestion — the whole point is anchor diversity.
- **Pair with `geo-aeo-readiness`:** internal-linking issues often surface as Stage 2 (Extractability) failures in the GEO audit when orphan pages can't be reached from the index.
