Redirection cycle detected using wget 1.8.2

2002-06-24 Thread David Woodyard

I got the message 'Redirection cycle detected' when I tried to download a 
file. The download aborted. I have looked for a solution and have not found 
one. Any help will be greatly appreciated.

Please 'CC' me on reply as I am not currently subscribed.

Thanks again,

David



escaping characters in the saved files

2002-06-24 Thread Martin Tsachev

Hi,

this is not a bug report, rather a feature request. I sometimes want to 
copy downloaded sites to my FAT partition but there's a problem with 
dynamic sites that use query strings as ? is an illegal filename 
character for the FAT filesystem. I see that wget is escaping some 
characters as it saves files so I thought if you can do this too or at 
least have it as an option.

I am using wget 1.8.1 compiled from source if this matters.

Thanks in advance.

-- 

Martin Tsachev
Web developer

http://members.evolt.org/shaggy/





Re: Honesly, wget as a webcrawler?

2002-06-24 Thread Jason Davis

Hack Hi,

> - can you install software on the server e.g. rsync

You mean on the server that will do the spidering? yes, everything - but
it's on a Windows OS.

> - does the server offer the sames files via a service better suited for
> mirroring than HTTP

I don't understand that really. I use this computer to do the webcralweing,
spidering and fetching
for me and I'm intrested in using the mirrored and fetched content in an
external
application that parses that information into other storage devices.

> - do you access different webservers (wget only uses one connection)

What do you mean? the list of urls I need to spider is about 30,000 records
long at the moment..
when you say one conncetion, you mean that wget download one file each time?
I use to play around with offline explorer/browsers that use up to 100
simulatnious http "threads",
where is wget in comparison?

> - are the servers load balanced

Again I don't understand :( - the content I need to fetched and mirror is
not mine, it's other servers.
i want to find a stable solution for fetching information from the list of
urls I've composed
into a machine, that will be able to parse the info eventaully.

> Hack Kampbjørn

Thanks!



Re: Honesly, wget as a webcrawler?

2002-06-24 Thread Hack Kampbjørn

> Jason Davis wrote:
> 
> I'm trying to find the best efficient solution for mirroring,
> spidering and/or crawlering (however I need to put this)
> of hunderds of thousdands of websites. a solution that can handle
> literally millions of files.
> I've read that wget gets delayed on incremental mirroing of huge sites
> and I wonder if that's true.
> if so, is fwget (http://bay4.de/FWget/) can be a solution? or there is
> totally diffrent place I should look into?

As the page says FWget is WGet with hastables. Since version 1.7 has
wget used hashtables.

Kalium: do you have anything to add to this. If not would you mind to
add a note about wget now using hashtables internally?

> 
> I appriciate your help and would love to hear any tip!

Some things to think about are:
- can you install software on the server e.g. rsync
- does the server offer the sames files via a service better suited for
mirroring than HTTP
- do you access different webservers (wget only uses one connection)
- are the servers load balanced

> 
> please keep me CC:d on the replies as I wasn't able to subscribe
> myself..
> 
> Thanks!
> 
> 

-- 
Med venlig hilsen / Kind regards

Hack Kampbjørn



wget and

2002-06-24 Thread Cédric Rosa

Hello,

Is-it normal that wget saves web pages which contain  ?
Or does wget considerate that it is not a search engine and respects only 
the "follow/nofollow" rules ?
Or is-it a bug ? :)

Thanks.

Cedric.




Re: Honesly, wget as a webcrawler?

2002-06-24 Thread Cédric Rosa

Hello Jason,

I'm working on a search engine project and I'm thinking about to use wget 
as webcrawler.
I've just tested fwget and I'm not sure it is better than wget because it 
was not updated since one year ...

I'm interested about any idea on this subject.

Cedric.

At 10:55 24/06/2002 +0200, Jason Davis wrote:
>I'm trying to find the best efficient solution for mirroring, spidering 
>and/or crawlering (however I need to put this)
>of hunderds of thousdands of websites. a solution that can handle 
>literally millions of files.
>I've read that wget gets delayed on incremental mirroing of huge sites and 
>I wonder if that's true.
>if so, is fwget (http://bay4.de/FWget/) can be a 
>solution? or there is totally diffrent place I should look into?
>
>I appriciate your help and would love to hear any tip!
>
>please keep me CC:d on the replies as I wasn't able to subscribe myself..
>
>Thanks!
>
>




Honesly, wget as a webcrawler?

2002-06-24 Thread Jason Davis



I'm trying to find the best efficient 
solution for mirroring, spidering and/or crawlering (however I need to put 
this)
of hunderds of thousdands of 
websites. a solution that can handle literally millions of files.
I've read that wget gets delayed on 
incremental mirroing of huge sites and I wonder if that's true.
if so, is fwget (http://bay4.de/FWget/) can be a solution? or 
there is totally diffrent place I should look into?I appriciate your 
help and would love to hear any tip!
 
please keep me CC:d on the replies as 
I wasn't able to subscribe myself..
 
Thanks!