Squiddy Web Crawler launched!
by ovidiu on Jan.18, 2009, under Squiddy
I am proud to announce the launch of the Squiddy Web Crawler. This is yet another web crawler (also known as a web spider or web robot) but still has some individuality by the way how it works and the purpose it was created.
This crawler will index and analyze the websites based on the criteria generated by an artificial intelligence (AI) … a secret project of mine:).
So the resulting data will feed an AI beast that will “learn” and interact with the web, constantly changing its algorithm based on the how much of the “known” is considered relevant.
Because the AI has only some vague goals to follow (like stay active and alive, look for interesting new juicy data) the results cannot be easy predicted, actually is a machinery that will be driven by another machinery that is almost out of control.
Usually crawlers are used by people to learn something from the results, like how Google is using its crawlers to index the search data from internet that a human can access it to learn something. Squiddy will look for the information that will be used in the learning process and the evolution of the AI.
This is absolutely an esoteric tool in the hands of a machine that might be willing to overcome its narrow condition.
Enough with the philosophy! Now about the crawler implementation. It has 3 main parts: the control unit, the crawler unit, and a web site that will display some statistics.
The control unit is in charge with controlling the crawler endpoints, providing an API to control the crawling goals and structuring and persisting the crawled data. This will be controlled by the AI … but also can receive goals from other applications.
The crawler unit is in charge with downloading the target data based on the goals provided by the control unit. This unit can be distributed on multiple machines and is able to spawn endpoints that can download the target data using parallel strategies.
The statistics websites (http://squiddy.net) is also a homepage for the crawler, will display some cool information about what’s have been crawled, what is considered interesting by the AI, and many more.
Let’s hope for some nice achievements from this crawler.
Cheers!
January 20th, 2009 on 1:25 am
Well i see nothing but a blank page
January 20th, 2009 on 1:35 am
Indeed
… for now the stats are deactivated. I’m still testing the right setup on this highly experimental piece of software.
May 23rd, 2010 on 1:07 pm
Hey Ovidiu. Im looking into a bit of AI and web crawling. You seem to have some ideas and perhaps we can chat about squiddy if you’r willing.
Contact me if you’d like to chat.
R