Re: Speeding up get URL

2008-08-05 Thread Dave Cragg
On 4 Aug 2008, at 13:00, Alex Tweedly wrote: You should be able to achieve that using 'load URL' - set off a number of 'load's going and then by checking the URLstatus you can process them as they have finished arriving to your machine; and as the number of outstanding requested URLs decr

Re: Speeding up get URL

2008-08-05 Thread Alex Tweedly
Sarah Reichelt wrote: On Mon, Aug 4, 2008 at 12:35 AM, Shari <[EMAIL PROTECTED]> wrote: Goal: Get a long list of website URLS, parse a bunch of data from each page, if successful delete the URL from the list, if not put the URL on a different list. I've got it working but it's slow. It tak

Re: Speeding up get URL

2008-08-05 Thread Brian Yennie
Shari, I'm not sure there is much you can do to speed up the fetching of URLs, but my two suggestions would be: 1) See if you can process more than one download at a time - this will be more complex to code, but may be a bit faster so that 1 slow download doesn't affect another. Of course

Re: Speeding up get URL

2008-08-04 Thread Shari
Cable modem, yes. CGI, I don't know a word of the language. So even your hosting ISP can get involved? Lordy, I had no idea there were so many pitfalls. I can understand their issues however, knowing how much spam I get from people who are probably using similar searches for bad things. I

Re: Speeding up get URL

2008-08-04 Thread Shari
Search engines have API's? I did not know that. I will definitely look into this. I didn't realize I had so many different options to choose from. Options are good, very good indeed :-) Thank you! Shari I believe most of the major search engines have APIs for returning search results as

Re: Speeding up get URL

2008-08-04 Thread Mark Smith
Very good point about doing it from a remote server - if the speed difference were great, then an hourly-paid Amazon EC2 server might be just the job... Mark On 4 Aug 2008, at 13:13, Alex Tweedly wrote: If so, you might get a big improvement by converting the script into a CGI script, an

Re: Speeding up get URL

2008-08-04 Thread Bernard Devlin
I think you've had a lot of good suggestions for solving this problem. However, depending on the kind of data you're trying to parse out (and the frequency with which that data changes), you might be better to let Google or Yahoo do the search (using the kind of advanced search like: "some meaning

Re: Speeding up get URL

2008-08-04 Thread Rick Harrison
On Aug 4, 2008, at 4:17 AM, Shari wrote: One service provider that I extract data from does not want more than one hit every 50 seconds in order to be of service to hundreds of simultaneous users, so they protect themselves from "denial of service attacks" that overload their machines.

Re: Speeding up get URL

2008-08-04 Thread Alex Tweedly
Sorry if this message comes through twice - first attempt might have failed, so I'm resending form a different account. Sarah Reichelt wrote: On Mon, Aug 4, 2008 at 12:35 AM, Shari <[EMAIL PROTECTED]> wrote: Goal: Get a long list of website URLS, parse a bunch of data from each page, if s

Re: Speeding up get URL

2008-08-04 Thread Shari
One service provider that I extract data from does not want more than one hit every 50 seconds in order to be of service to hundreds of simultaneous users, so they protect themselves from "denial of service attacks" that overload their machines. I did notice that even with their affiliate XML fi

Re: Speeding up get URL

2008-08-04 Thread Shari
Good suggestions, Sarah. Thank you! I've settled on a solution that's going to partly go in the back door (retrieving their XML data via their affiliate door) and partly go in the front door (get or load url). So I'll parse what I can from their affiliate XML files and do the rest the other

Re: Speeding up get URL

2008-08-03 Thread Jim Ault
The major limitation for you case is that each request sent to a web server is dependent on the response time from that web server. Some servers intentionally return with a delay to control bandwidth demands and load balancing, especially if some of their hosted customers are downloading videos or

Re: Speeding up get URL

2008-08-03 Thread Sarah Reichelt
On Mon, Aug 4, 2008 at 12:35 AM, Shari <[EMAIL PROTECTED]> wrote: > Goal: Get a long list of website URLS, parse a bunch of data from each > page, if successful delete the URL from the list, if not put the URL on a > different list. I've got it working but it's slow. It takes about an hour > per

Re: Speeding up get URL

2008-08-03 Thread Shari
I'd do that in a heartbeat if they had a way. They used to, but at this time the only offering they have is for affiliates, and it has severe limitations. I just got done checking it out and it isn't designed for what I need. I might be able to "fudge" it and I will give fudging a try. But

Re: Speeding up get URL

2008-08-03 Thread Shari
Noel, I've done a bit of research and I don't think they have such issues. Several folks are doing similar things very publicly (the website is aware of it) and it doesn't seem to be a problem. Usually if something is disallowed you'll find it referenced very clearly in their user forums.

Re: Speeding up get URL

2008-08-03 Thread Jim Ault
Noel is correct. Even Google will ban IP addresses of those machines that will execute too many searches in a short time. One answer is to use proxy servers, but that is a more complex process. One suggestion is to send an email to the support group for the "one domain" and ask if there is a bet

Re: Speeding up get URL

2008-08-03 Thread Noel
Yes, something like what you are describing could easily be confused with a DOS attack. DOS attacks are done by flooding a server with requests for webpages to the point that the server crashes due to its inability to process all the requests. Even if you are not considered a DOS attack, the

Re: Speeding up get URL

2008-08-03 Thread Shari
It's always one domain, the same domain, and I have no control over the domain or its hosting company. The domain itself probably has millions of pages. Anybody can sell products thru them, and they make it very easy to do so. So there are probably thousands (or more) folks with massive quan

Re: Speeding up get URL

2008-08-03 Thread Jim Ault
The major limitation for your case is that each request sent to a web server is dependent on the response time from that web server. Some servers intentionally return with a delay to control bandwidth demands and load balancing, especially if some of their hosted customers are downloading videos o

Re: Speeding up get URL

2008-08-03 Thread Shari
I wonder if using "load" URL might be faster? sims I haven't tried it. The docs made it seem like the wrong choice as the url must be fully loaded for the handler to continue. I check this by looking for in the fetched url. According to the docs "load" downloads the url in the background

Re: Speeding up get URL

2008-08-03 Thread Jim Sims
On Aug 3, 2008, at 4:35 PM, Shari wrote: Goal: Get a long list of website URLS, parse a bunch of data from each page, if successful delete the URL from the list, if not put the URL on a different list. I've got it working but it's slow. It takes about an hour per 10,000 urls. I sell ts

Speeding up get URL

2008-08-03 Thread Shari
Goal: Get a long list of website URLS, parse a bunch of data from each page, if successful delete the URL from the list, if not put the URL on a different list. I've got it working but it's slow. It takes about an hour per 10,000 urls. I sell tshirts. Am using this to create informational