How to Train Your AI Chatbot from a Website URL
Paste a URL and let ChatbotsHub crawl, clean, and index your website into a chatbot — no copy-paste, no uploads.
Uploading documents is great, but your most up-to-date knowledge usually already lives on your website. With website URL training, you can point ChatbotsHub at your site and it will crawl the pages, clean the content, and turn it into a trained chatbot — without copying and pasting a single paragraph.
This feature sits right alongside document training: you can mix uploaded files and crawled websites in the same knowledge base.
What website URL training does
You enter a single URL — for example your homepage, help center, or docs site. ChatbotsHub then visits the public pages on that domain, extracts the meaningful text, and indexes it using the same retrieval pipeline that powers document training. The result is a chatbot that can answer questions using everything published on your site.
How the crawler works, step by step
- 1You submit a website URL from the Knowledge Sources tab in your dashboard.
- 2The URL is validated and checked for safety before any request is made.
- 3The crawler fetches the page HTML and discovers links on the same domain.
- 4Navigation, scripts, styles, cookie banners, and footers are stripped out.
- 5The remaining clean text is chunked, embedded, and stored for retrieval.
- 6Your chatbot is ready to answer from the crawled content.
Under the hood this is the same retrieval-augmented generation flow used for files, so answers stay grounded in your real content.
Built-in safety
Crawling the open web has to be done responsibly. ChatbotsHub validates every URL and blocks requests to localhost, internal IP addresses, and private networks to prevent server-side request forgery (SSRF). Crawling endpoints are also rate-limited so the feature stays safe and predictable.
Crawl limits by plan
- Free: up to 10 pages per website.
- Starter: up to 100 pages per website.
- Pro: up to 1,000 pages per website.
Each crawled website counts as one knowledge source, just like an uploaded document. Not sure which plan fits? Read our AI chatbot pricing guide.
Best practices for clean results
- Point the crawler at content-rich sections like your docs or help center.
- Keep your pages updated — re-crawl after major content changes.
- Remove or de-index thin pages that add noise instead of answers.
- Combine a website crawl with focused FAQ documents for the best coverage.
Your website is already your best knowledge base — website training just makes it answer questions.
Website training vs document upload
Use website training when your knowledge already lives online and changes often. Use document upload for PDFs, contracts, and internal files that aren’t published on the web. Most teams use both together.
Paste a URL and watch it learn.
Train a chatbot from your website