High CPU by web crawler

I'm using a web crawler that is now using close to the maximum CPU.

https://github.com/KrasnayaSecurity/Wik … crawler.py">https://github.com/KrasnayaSecurity/WikiDefenseSuite/blob/master/webscanners/wikicrawler.py

Earlier designs did not use so much CPU, and I think it may have to do with the extra features added to prevent recrawling the same webpages. Memory management is better because of redundant URLs being removed. It used to be projected that it could not run for more than a day because the memory would fill up with the URLs in the crawl list.

If there were multiple crawlers running or other important processes running, would the CPU time just go to them instead and maybe make the crawlers run slightly slower, or would it cause a problem?

~~![](<URL url=)http://sturmkrieg.ru/img/src/1407937333203.png" />

EDIT

The memory seems like it's starting to plateau and the crawler seems to be slowing down as if it has been getting almost all the pages on that site.~~

0 Replies

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct