wget problems, asp, vars
I'm trying to get the following site: http://overlord.hig.se/schema/schemav8.asp?KI=3DV20BKT=245+KN=Algoritmer+och+datastrukturer+B; I've tried real hard but failed. Can someone more experienced help me with this please. The site is an asp site that fetches information from some database when you feed it with those variables. Actually, the last variable isn't needed.
Re: wget and asp
Try the source I sent you. Dominique wrote: thank you Max, np Is it different than the one I CVS-ed yesterday? I mean, does it have changes in creating filenames? Please note, that I finally compiled it and could run it. No changes... I did run autoconf, so you could go straight to configure (as you have too new an autoconf version). CVS = a lot of breaks during checking out What breaks? It was freezing during check out - something I never had yet using CVS. I had to ctrl-c and restart a few times. Weird... I had no such problems - but I was only running a 'cvs upd' having checked it out before. Something has just occurred to me: by default, wget defaults to recursion resticted to 5 levels. Perhaps that is the problem? If so, an -l0 will fix it. If not, could you have another go at describing it? Max.
Re: wget and asp
No changes... I did run autoconf, so you could go straight to configure (as you have too new an autoconf version). it compiled just fine now Something has just occurred to me: by default, wget defaults to recursion resticted to 5 levels. Perhaps that is the problem? If so, an -l0 will fix it. well i dont think so. i have all the files on my disk, but there is confusion with names (filenames vs html content vs mozilla interpretation). The directory I am downloading is 2 levels at most anyway. If not, could you have another go at describing it? 1) some filenames are not consistent (although there are no slashes anymore): names on disk, names in html code and names mozilla shows or wants 2) mother script to dynamically generate example files was not found (neither by wget nor by hand), I mean the asp file used to generate ?filename stuff. This has to do with the site's structure I guess, but I may be wrong. I think rather than explaining again the whole big story, it would be easier if someone more experienced with wget than me tried to get the /html structure of the site I practice hacking on, or at least those 3 files I described lately. If you want to test wget against it, of course, otherwise I will be giving up here. thank you Dominique Max.
Re: wget and asp
The problem is that with a ?x=y, where y contains slashes, wget passes them unchanged to the OS, causing directories to be created, but fails to adjyst relative links to account for the fact that the page is in a deeper directory that it should be. The solution is to map / to _ or something. Max. that was my naive thinking as well... yes, the filename brakes at / dominique
Re: wget and asp
Max Bowsher wrote: The problem is that with a ?x=y, where y contains slashes, wget passes them unchanged to the OS, causing directories to be created, but fails to adjyst relative links to account for the fact that the page is in a deeper directory that it should be. The solution is to map / to _ or something. Thomas Lussnig wrote: a) The OS do not automaticly create directorys, this have wget todo Looking back at my last email, I think Did I really say that?!. That part is, of course nonsense. However, the directories are created regardless. b) The idee to create directorys even for parameter is not wrong !!! This is only an example (not working) but you can see why directorys where chosen insteed of _ http://download.com/get.php?filename=/windows/putty.exe http://download.com/get.php?filename=/linux/putty.tgz I don't understand your example, but regardless, wget 1.9-beta from cvs URL-encodes slashes in a query string. Dominique: If I understand your problem correctly, then wget 1.9 has solved it. Max.
Re: wget and asp
To invoke html examples they use calls like (just the first example): http://www.w3schools.com/html/tryit.asp?filename=tryhtml_basic What filename did you expect for this ? - tryit.asp - tryit.asp?filename=tryhtml_basic - tryhtml_basic Wget saves a file and a directory with this very name, but it gets stuck at this one: http://www.w3schools.com/html/tryit_edit.asp?filename=tryhtml_basicreferer=http://www.w3schools.com/html/html_examples.asp - tryit_edit.asp - tryit_edit.asp?filename=tryhtml_basic - tryit_edit.asp?filename=tryhtml_basicreferer=http:__www.w3schools.com_html_html_examples.asp The Problem is gernal how should wget know what parameter are relevant to put them in the filename or not. Also you can see it sometime on first look it is not so easy for an computer. And the hardest problem you would become if you got on an page where someone do not wan't that you download anithing. Hard to get the pictures http://www.flirtface.de/ Extrem hard to get anything http://suche.org/chater-treff/ On the last page i think you have with wget anchance near 0% to get the content. Cu Thomas
Re: wget and asp
What filename did you expect for this ? - tryit.asp - tryit.asp?filename=tryhtml_basic - tryhtml_basic Once again: the loaction is: http://www.w3schools.com/html/tryit.asp?filename=tryhtml_basic It is a frame set which requires frames. One of them is a problem, because it has special characters. I was expectinng tryit.asp?filename=tryhtml_basic for the frameset. And the file is downloaded, and is correct. But this is only a frameset. The two frames are tryit_view.asp?filename=tryhtml_basic (which is also downloaded, and also correct) and the third should be tryit_edit.asp?filename=tryhtml_basicreferer=http://www.w3schools.com/html/html_examples.asp and just this one is truncated. I think some regexp or pattern or explicit list of where_not_to_break_a_string characters would solve the problem. Or maybe it is already possible, but I dont know how? thanks a lot! dominique Wget saves a file and a directory with this very name, but it gets stuck at this one: http://www.w3schools.com/html/tryit_edit.asp?filename=tryhtml_basicreferer=http://www.w3schools.com/html/html_examples.asp - tryit_edit.asp - tryit_edit.asp?filename=tryhtml_basic - tryit_edit.asp?filename=tryhtml_basicreferer=http:__www.w3schools.com_html_html_examples.asp The Problem is gernal how should wget know what parameter are relevant to put them in the filename or not. Also you can see it sometime on first look it is not so easy for an computer. And the hardest problem you would become if you got on an page where someone do not wan't that you download anithing. Hard to get the pictures http://www.flirtface.de/ Extrem hard to get anything http://suche.org/chater-treff/ On the last page i think you have with wget anchance near 0% to get the content. Cu Thomas
Re: wget and asp
Dominique wrote: tryit_edit.asp?filename=tryhtml_basicreferer=http://www.w3schools.com/html/html _examples.asp and just this one is truncated. I think some regexp or pattern or explicit list of where_not_to_break_a_string characters would solve the problem. Or maybe it is already possible, but I dont know how? I think that some URL encoding has not happened somewhere. Whether wget or the web server is at fault, I don't know, but the solution would be to URL encode the slashes. Max.
Re: wget and asp
Is it something I can do myself or the code has to be changed? Domi I think that some URL encoding has not happened somewhere. Whether wget or the web server is at fault, I don't know, but the solution would be to URL encode the slashes. Max.
Re: wget and asp
tryit_edit.asp?filename=tryhtml_basicreferer=http://www.w3schools.com/html/html _examples.asp and just this one is truncated. I think some regexp or pattern or explicit list of where_not_to_break_a_string characters would solve the problem. Or maybe it is already possible, but I dont know how? I think that some URL encoding has not happened somewhere. Whether wget or the web server is at fault, I don't know, but the solution would be to URL encode the slashes. Why should be there an url encoding ? / are an legal character in url and in the GET string. Ist used for example for Path2Query translation. The main problem is that wget need to translate an URL to and Filesystem name. Filesystem names are PATH and FILE names. And wget do it right i think. example: http://my.domain/dyn_page.sql/content_id/1891/session/0815 Server: File: /dyn_page.sql Query String /content_id/1891/session/0815 Client: 0. dyn_page.sql/content_id/1891/session/0815(current i think) 1. dyn_page.sql_content_id_1891_session_0815 2. 0815 Only the Author of the webpage could tell you what is an good translation from an URL to filesystem if there is an querystring on the page, else ALL solutions have their bad sites !!! Only solution if you would try to make more people happy is to add some translation style and an option for selection. This is the same problem as with javascript urls. But here an solution could be theoretical defined for the url naming: - setup an webserver - wildcard domain *.mirror - error_hanlder page in the document_root - table that contain original_url to filename mapping filled by wget - the error handler cann strip of the .mirror and then return the propper content POSITIV: work for all URL styles NEGATIV: new Cocept, more work, not the original intention meet. msg04276/pgp0.pgp Description: PGP signature
Re: wget and asp
Thomas Lussnig wrote: Why should be there an url encoding ? / are an legal character in url and in the GET string. Ist used for example for Path2Query translation. The main problem is that wget need to translate an URL to and Filesystem name. Yes, you are right, I wasn't think clearly. Filesystem names are PATH and FILE names. And wget do it right i think. example: http://my.domain/dyn_page.sql/content_id/1891/session/0815 Server: File: /dyn_page.sql Query String /content_id/1891/session/0815 Client: 0. dyn_page.sql/content_id/1891/session/0815(current i think) 1. dyn_page.sql_content_id_1891_session_0815 2. 0815 Only the Author of the webpage could tell you what is an good translation from an URL to filesystem if there is an querystring on the page, else ALL solutions have their bad sites !!! ?? The problem is that with a ?x=y, where y contains slashes, wget passes them unchanged to the OS, causing directories to be created, but fails to adjyst relative links to account for the fact that the page is in a deeper directory that it should be. The solution is to map / to _ or something. Max.
Re: wget and asp
You don't give a whole lot of information. It's kind of impossible to help when you don't know what the problem is. Posting the URL of the problem site would be a good idea. Max. Dominique wrote: Is it possible at all? dominique Dominique wrote: Hi, I have a problem trying to wget a site for off-line usage. The site is made in asp and uses lots of stuff like: xxx.asp?filename=yyy when I download a sample asp page it seems to be almost empty and have no links!!! I dont understand... I tried mirroring, html extensions and all I could find relevant in the man page. Would someone please help me solve this? thank you dominique
Re: wget and asp
Posting the URL of the problem site would be a good idea. well, I have quite a few. let's start with this: http://www.w3schools.com/html/default.asp or just anything from such a page page. I hacked around for a while with no apparent success. thanks dominique Max. Dominique wrote: Is it possible at all? dominique Dominique wrote: Hi, I have a problem trying to wget a site for off-line usage. The site is made in asp and uses lots of stuff like: xxx.asp?filename=yyy when I download a sample asp page it seems to be almost empty and have no links!!! I dont understand... I tried mirroring, html extensions and all I could find relevant in the man page. Would someone please help me solve this? thank you dominique -- -- Dominik Szczerba, Dr. CO-ME, D-ITET, ETZ F85 ETH Zentrum, Gloriastr. 35 CH-8092 Zurich http://www.vision.ee.ethz.ch/~domi -- ii swear i never use vi^[:wqZZ --
Re: wget and asp
Dominique wrote: Posting the URL of the problem site would be a good idea. well, I have quite a few. let's start with this: http://www.w3schools.com/html/default.asp or just anything from such a page page. I hacked around for a while with no apparent success. Try this and it works !!! wget -U Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.1) http://www.w3schools.com/html/default.asp Problem is that these site Block wget Cu Thomas Lußnig msg04260/pgp0.pgp Description: PGP signature
Re: wget and asp
Yes! It works!! I just missed -U option thanks a lot! dominique Thomas Lussnig wrote: Try this and it works !!! wget -U Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.1) http://www.w3schools.com/html/default.asp Problem is that these site Block wget Cu Thomas Lunig
wget and asp
Hi, I have a problem trying to wget a site for off-line usage. The site is made in asp and uses lots of stuff like: xxx.asp?filename=yyy when I download a sample asp page it seems to be almost empty and have no links!!! I dont understand... I tried mirroring, html extensions and all I could find relevant in the man page. Would someone please help me solve this? thank you dominique