Hint: MemoryError suggests that his dicts have filled up his address
space (probably). 1-3GB on Linux, 2?GB on Windows. At least for 32bit
versions.
So storing the whole URL in memory is probably out of question, storing
it only in some form of files might be slightly slow, so one compromise
would
I don't have a lot of experience, but I would suggest dictionaries
(which use hash values).
A possible scenario would be somthing similar to Andreas'
visited = dict()
url = "http://www.monty.com";
file = "/spam/holyhandgrenade/three.html"
visited[url] = file
unvisited = dict()
url = "http://
Am Montag, den 07.04.2008, 00:32 -0500 schrieb Luke Paireepinart:
> devj wrote:
> > Hi,
> > I am making a web crawler using Python.To avoid dupliacy of urls,i have to
> > maintain lists of downloaded urls and to-be-downloaded urls ,of which the
> > latter grows exponentially,resulting in a MemoryE
devj wrote:
> Hi,
> I am making a web crawler using Python.To avoid dupliacy of urls,i have to
> maintain lists of downloaded urls and to-be-downloaded urls ,of which the
> latter grows exponentially,resulting in a MemoryError exception .What are
> the possible ways to avoid this ??
>
get more R
Hi,
I am making a web crawler using Python.To avoid dupliacy of urls,i have to
maintain lists of downloaded urls and to-be-downloaded urls ,of which the
latter grows exponentially,resulting in a MemoryError exception .What are
the possible ways to avoid this ??
--
View this message in context:
h