High CPU by web crawler (9686)

Question Title

Please include an alpha-numeric character in your title (0-9, A-Z, a-z)

Description

<r>I'm using a web crawler that is now using close to the maximum CPU.

<url url="https://github.com/KrasnayaSecurity/WikiDefenseSuite/blob/master/web_scanners/wiki_crawler.py"><link_text text="https://github.com/KrasnayaSecurity/Wik … crawler.py">https://github.com/KrasnayaSecurity/WikiDefenseSuite/blob/master/web_scanners/wiki_crawler.py</link_text></url>

Earlier designs did not use so much CPU, and I think it may have to do with the extra features added to prevent recrawling the same webpages. Memory management is better because of redundant URLs being removed. It used to be projected that it could not run for more than a day because the memory would fill up with the URLs in the crawl list.

If there were multiple crawlers running or other important processes running, would the CPU time just go to them instead and maybe make the crawlers run slightly slower, or would it cause a problem?

![](http://sturmkrieg.ru/img/src/1407937333203.png)~~![](</s><URL url=)http://sturmkrieg.ru/img/src/1407937333203.png<e>" /></e>

EDIT

The memory seems like it's starting to plateau and the crawler seems to be slowing down as if it has been getting almost all the pages on that site.~~</r>

Please enter a question

Tags (Optional, comma-separated)

Compute

Storage

Networking

Databases

Services

Developer Tools

Industries

Pricing

Community

Engage With Us

High CPU by web crawler

0 Replies

Reply

Tips: