»  Home  »  Blogs  »  FastFind NX can crawl a lot of pages

FastFind NX can crawl a lot of pages

In testing FastFind I had to find a large enough sample of content to be able to test with. At first I was using our site to do the tests but that was skewing our web stats since I had to do dozens of crawls a day while I was testing.

The next thing I tried was to write a simple program to generate web pages with random content. That worked for a while but I ended up with a few very large files that had links to all the sub pages when what I wanted was something more natural. I wanted lots of pages with lots of links to lots of other pages.

Then I remembered reading that someone had made a version of Wikipedia that you could put on a cd/dvd. So I went looking and found Wikipedia for schools which is exactly what I was after.

Here's a screenshot of the crawl of that content. The pages that couldn't be indexed are just links to the wiki (which wasn't in the download) so to crawl over 30,000 pages of content it took a little over an hour.



If you are an existing FastFind customer would like to help us make FastFind even better, why not sign up to become a beta tester of FastFind NX ? To signup just send me an email to rodney at interspire dot com. Just make sure you do it quickly since the beta program is only open to a limited number of places.

Comments




Leave a reply:
Your Name *: Email (private) *: Website:
Please copy the characters from the image below into the text field below. Doing this helps us prevent automated submissions.
Security Code: img