RE: [PHP-DB] Real Killer App!

2003-03-12 Thread Rich Gray
> I'm having a heck of a time trying to write a little web crawler for my > intranet. I've got everything functionally working it seems like, but > there is a very strange problem I can't nail down. If I put in an entry > and start the crawler it goes great through the first loop. It gets the > url

Re: [PHP-DB] Real Killer App!

2003-03-12 Thread Nicholas Fitzgerald
Rich Gray wrote: I'm having a heck of a time trying to write a little web crawler for my intranet. I've got everything functionally working it seems like, but there is a very strange problem I can't nail down. If I put in an entry and start the crawler it goes great through the first loop. It ge

Re: [PHP-DB] Real Killer App!

2003-03-12 Thread Jim Hunter
from a web page. Jim Hunter ---Original Message--- From: Nicholas Fitzgerald Date: Wednesday, March 12, 2003 10:15:52 AM To: [EMAIL PROTECTED] Subject: Re: [PHP-DB] Real Killer App! Rich Gray wrote: >>I'm having a heck of a time trying to write a little web crawler f

Re: [PHP-DB] Real Killer App!

2003-03-12 Thread Peter Beckman
Could it be that a certain web server sees you connect, you request a file, and that file happens to take forever to load, leaving your script hanging until memory runs out or something else? Do you have timeouts set properly to stop the HTTP GET/POST if nothing is happening on that connection? P

Re: [PHP-DB] Real Killer App!

2003-03-12 Thread Nicholas Fitzgerald
do you have an index on the table that you are using to store the URLs that still need to be parsed? This table is going to get huge! And if you do not delete the URL that you just parsed from the list it will grow even faster. And if you do not have an index on that table and you are doing a table sca

Re: [PHP-DB] Real Killer App!

2003-03-13 Thread Paul Burney
on 3/12/03 5:45 PM, Nicholas Fitzgerald at [EMAIL PROTECTED] appended the following bits to my mbox: > is that entire prog as it now exists. Notice I have NOT configured it as > yet to into the next level. I did this on purpose so I wouldn't have to > kill it in the middle of operation and potenci

Re: [PHP-DB] Real Killer App!

2003-03-13 Thread Nicholas Fitzgerald
Ok, here's something else I've just noticed about this problem. I noticed that when this things gets to a certain point, somewhere in the 4th run through, it hits a certain url, then the next, at which point it seems to pause for several seconds, then it goes back and hits that first certain ur

Re: [PHP-DB] Real Killer App!

2003-03-13 Thread Brent Baisley
I missed the beginning of this whole thing so excuse me if this has been covered. Have you looked at how much time elapses before it dies? If the same amount of time lapses before it dies, than that's your problem. I don't know what you have your maximum script execution run time set to, but yo

Re: [PHP-DB] Real Killer App!

2003-03-13 Thread Nicholas Fitzgerald
Actually, that was the first thing I thought of. So I set set_time_limit(0); right at the top of the script. Is it possible there is another setting for this in php.ini that I also need to deal with? I have other scripts running and I don't want to have an unlimited execution time on all of the

Re: [PHP-DB] Real Killer App!

2003-03-13 Thread Nicholas Fitzgerald
As you guys know I've been going around in circles with this spider app problem for a couple days. I think I finally found where the screwup is, and I'm sure you'll be interested to hear about it. I had been testing with three sites because they were fairly diverse and gave me a lot of differen

Re: [PHP-DB] Real Killer App!

2003-03-14 Thread W. D.
At 01:58 3/14/2003, Nicholas Fitzgerald wrote: >As you guys know I've been going around in circles with this spider app >problem for a couple days. > >How would you do it? http://www.hotscripts.com/PHP/Scripts_and_Programs/Search_Engines/more3.html Start Here to Find It Fast!© -> http://www.U

Re: [PHP-DB] Real Killer App!

2003-03-14 Thread Nicholas Fitzgerald
I've already looked at all of these, well most of them anyway. The only one's I haven't looked at are the ones that just do real time searches. Nothing of what I've seen is as functional as what I've designed, and for the post part built. Which is why I built it. This spider issue is the only

Re: [PHP-DB] Real Killer App!

2003-03-14 Thread Brent Baisley
Have you tried adding a flush() statement in at certain points? Perhaps put one in after every page is processed. Technically this is designed for browsers, but perhaps it will help here. It will most likely slow things down, but if it works, then you can adjust it from there (like every 10 pag

Re: [PHP-DB] Real Killer App!

2003-03-14 Thread Nicholas Fitzgerald
I did try a flush table on the table, even down to for every read or write with no luck. Haven't tried the clearstatcache() though, sounds like an idea who's time has come. Nick Brent Baisley wrote: Have you tried adding a flush() statement in at certain points? Perhaps put one in after every

Re: [PHP-DB] Real Killer App!

2003-03-15 Thread Nicholas Fitzgerald
Well, I've gotten a long way on this, and here's the results up to now: On Red Hat 8.0 appache 2.0.40, php 4.22, mysql 3.23.54a PII 366 30gig hdd, 256meg mem: Everything works flawlessly. Have spidered several HUGE sites. Goes fast, goes accurate. On windows 2000 sp3, apache 2.0.44, php 4.3.1, m

Re: [PHP-DB] Real Killer App!

2003-03-15 Thread VolVE
- Original Message - From: "Nicholas Fitzgerald" <[EMAIL PROTECTED]> To: "PHP Database List" <[EMAIL PROTECTED]> Sent: Saturday, March 15, 2003 19:55 Subject: Re: [PHP-DB] Real Killer App! > Well, I've gotten a long way on this, and here's the results

Re: [PHP-DB] Real Killer App!

2003-03-16 Thread Paul Burney
on 3/15/03 7:55 PM, Nicholas Fitzgerald at [EMAIL PROTECTED] appended the following bits to my mbox: > Spider still dies, but now it's finally given me an error: "FATAL: > erealloc(): unable to allocate 11 bytes". This is interesting, as I'm > not using erealloc() anywhere in the script. When I we

Re: [PHP-DB] Real Killer App!

2003-03-16 Thread Nicholas Fitzgerald
That sounds about right I think. I'm using 4.3.1 so I really am beginning to think it's a bug of some kind. I'm definitely not running into memory problems, this server has 1.5 gig and isn't coming anywhere close to using all of it even when everything on the box is jumping. It's most likely no