News snippet from Tyler

Tyler

July 24, 2025 12:15 PM

posted in #tools

Explanation of chunking techniques used in information retrieval, via Sean Forde on LinkedIn.

Obvious take: the way we write can impact what search engines, chatbots, and AI tools are able to find.

Based on my research in IR so far, which I started to dig into in April, most production RAG systems use recursive character splitting at 256-512 tokens with ~15-20% overlap. Complex semantic chunking rarely justifies its computational cost. Structure and metadata might matter more.

Here’s a test I’m setting up that would likely be technique agnostic, but also a user BOREFEST:

#1 - Write a complete overview sentence to include company, location, service, and key metric in under 25 words.

“Momentic is a boutique SEO and web‑development agency in Milwaukee that serves fast‑growing brands.”

#2 - Structure content with labeled sections Headers create natural chunk boundaries for document-based chunking.

Expertise / Momentic focuses on technical SEO, content strategy, and performance web builds. The team tunes Core Web Vitals and automates metadata to improve crawl rates.
Clients / Momentic supports fast-growing B2B and B2C brands, including Google Ventures portfolio companies. Typical engagements run 12+ months and target double-digit revenue gain.

#3 - Repeat entity names when topics shift. Avoid ambiguous pronouns. This makes the content so dry, boring, and kills style :disappointed:

“Momentic reports results in dashboards and weekly production meetings. Momentic also conducts quarterly strategy reviews.”

#4 - Include metrics and causal relationships Use specific numbers and verbs like “drives,” “reduces,” “results in.”

“Momentic helps to drive an average of 40% revenue for clients in 12 months.”

Implement expanded structured Schema.org with the info (not something new)

Add a summary statement to reinforce key value proposition for retrieval scoring. Something like:
“Momentic pairs technical depth with clear reporting so marketing leaders act fast and see measurable returns.”

*Here’s why I think it will work:*

Recursive splitters respect heading boundaries
256-512 token chunks balance context and precision
Schema.org is a language written for machines, and while most LLMs don’t execute JS, early testing makes me think Gemini responds well to even unsupported types.
Clear structure should improve retrieval accuracy by 15-25%

Tyler

Recent content from the team