KeyspiderKeyspider
← All features
Index What Matters. Exclude What Doesn't.

Content Type to Crawl

Not everything on a site or server needs to be in your search index. Keyspider lets you specify exactly which file types and content formats to crawl — web pages, PDFs, Word documents, XML feeds, JSON APIs, and more. Precise control means a cleaner index and more relevant results.

Content Type to Crawl
Supported types: HTML, PDF, DOCX, XLSX, XML, JSON, plain text, and more
Include or exclude specific MIME types per data source
Crawl depth control — define how many levels deep to follow links
URL pattern matching to include or exclude specific paths
Authentication support for gated PDFs and documents
Metadata extraction from document properties (author, created date, title)

How it works

1

Define which content types to include

In the crawler settings for each source, select which file types to index. Enable PDFs but exclude images and scripts, for example — keeping the index clean and focused.

2

Set URL and path rules

Use include/exclude patterns to crawl only specific sections of a site. Index /support/* but exclude /blog/* if your search is focused on support content.

3

Keyspider extracts content from every format

Text is extracted from PDFs, DOCX files, and other document formats automatically. No pre-processing or format conversion needed on your side.

Use cases

Document management systems

Index only PDFs and Word documents from a SharePoint library — excluding images, videos, and spreadsheets that don't contain searchable text content.

Technical documentation portals

Crawl HTML pages and PDF manuals simultaneously. Users search once and find answers from both web documentation and downloadable technical guides.

Government information portals

Index published policy PDFs, legislation documents, and web pages from the same site. Content type rules ensure internal drafts and working files are never exposed.

Ready to give your users better answers?

AI Search, AI Assistant, and Workplace Search. Deployed in days, not months. See it live on your own content.

No credit card required · Live in 2 weeks · Cancel anytime