Re: Reading web pages

2012-01-22 Thread Bystroushaak
Fixed. Bug was caused by HTTP 1.0 'HTTP 1.0 200 OK' reply. On 21.1.2012 13:14, Xan xan wrote: With png works, with pdf not: ./spider2 http://www.google.com/intl/ca/images/logos/mail_logo.png [a lot of output] $ ./spider2 http://static.arxiv.org/pdf/1109.4897.pdf [Longitud: [Excepció:

Re: Reading web pages

2012-01-21 Thread Xan xan
The full code is:; //D 2.0 //gdmd-4.6 fitxer dhttpclient = surt el fitxer amb el mateix nom i .o //Usa https://github.com/Bystroushaak/DHTTPClient //versió 0.0.3 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args)

Re: Reading web pages

2012-01-21 Thread Xan xan
With png works, with pdf not: ./spider2 http://www.google.com/intl/ca/images/logos/mail_logo.png [a lot of output] $ ./spider2 http://static.arxiv.org/pdf/1109.4897.pdf [Longitud: [Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to

Re: Reading web pages

2012-01-21 Thread Bystroushaak
That is really strange - for me, it works with both files. Are you sure, that you can manually download that pdf file? Maybe your provider blocking your connection, or something like that. What type of compiler did you used? On 21.1.2012 13:14, Xan xan wrote: With png works, with pdf not:

Re: Reading web pages

2012-01-21 Thread xancorreu
Al 21/01/12 14:28, En/na Bystroushaak ha escrit: That is really strange - for me, it works with both files. Are you sure, that you can manually download that pdf file? Maybe your provider blocking your connection, or something like that. I don't think so. It's arxiv pdf. What type of

Re: Reading web pages

2012-01-20 Thread Xan xan
Thanks for that. The standard library would include it. It will easy the things high level, please. For the other hand, how to specify the protocol? It's not the same http://foo than ftp://foo Thanks, Xan. 2012/1/20 Bystroushaak bystrou...@kitakitsune.org: You can always use my module:  

Re: Reading web pages

2012-01-20 Thread Xan xan
I get errors: xan@gerret:~/yottium/@codi/aranya-d2.0$ gdmd-4.6 spider.d spider.o: In function `_Dmain': spider.d:(.text+0x4d): undefined reference to `_D11dhttpclient10HTTPClient7__ClassZ' spider.d:(.text+0x5a): undefined reference to

Re: Reading web pages

2012-01-20 Thread Bystroushaak
With dmd 2.057 on my linux machine: bystrousak:DHTTPClient,0$ dmd spider.d dhttpclient.d bystrousak:DHTTPClient,0$ ./spider http://kitakitsune.org [Contingut: !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/html4/loose.dtd; HTML . On 20.1.2012 15:37, Xan

Re: Reading web pages

2012-01-20 Thread Xan xan
Yes. I ddi not know that I have to compile the two d files, although it has sense ;-) Perfect. On the other hand, I see dhttpclient identifies as Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13 How can I change that? 2012/1/20 Bystroushaak

Re: Reading web pages

2012-01-20 Thread Bystroushaak
This module is very simple, only for HTTP protocol, but there is way how to add HTTPS: public void setTcpSocketCreator(TcpSocket function(string domain, ushort port) fn) You can add lambda function which return SSL socket, which will be called for every connection. FTP is not supported -

Re: Reading web pages

2012-01-20 Thread Bystroushaak
First version was buggy. I've updated code at github, so if you want to try it, pull new version (git pull). I've also added new example into examples/user_agent_change.d On 20.1.2012 16:08, Bystroushaak wrote: There are two ways: Change global variable for module:

Re: Reading web pages

2012-01-20 Thread Xan xan
Thank you very much, Bystroushaak. I see you limite httpclient to xml/html documents. Is there possibility of download any files (and not only html or xml). Just like: HTTPClient navegador = new HTTPClient(); auto file = navegador.download(http://www.google.com/myfile.pdf;) ? Thanks a lot,

Re: Reading web pages

2012-01-20 Thread Bystroushaak
It is unlimited, you just have to cast output to ubyte[]: std.file.write(logo3w.png, cast(ubyte[]) cl.get(http://www.google.cz/images/srpr/logo3w.png;)); On 20.1.2012 17:53, Xan xan wrote: Thank you very much, Bystroushaak. I see you limite httpclient to xml/html documents. Is there

Re: Reading web pages

2012-01-20 Thread Bystroushaak
If you want to know what type of file you just downloaded, look at .getResponseHeaders(): std.file.write(logo3w.png, cast(ubyte[]) cl.get(http://www.google.cz/images/srpr/logo3w.png;)); writeln(cl.getResponseHeaders()[Content-Type]); Which will print in this case: image/png Here is full

Re: Reading web pages

2012-01-20 Thread Xan xan
Thanks, but what fails that, because I downloaded as collection of bytes. No matter if a file is a pdf, png or whatever if I downloaded as bytes, isn't? Thanks, 2012/1/20 Bystroushaak bystrou...@kitakitsune.org: If you want to know what type of file you just downloaded, look at

Re: Reading web pages

2012-01-20 Thread Bystroushaak
Thats because you are trying writeln binary data, and that is impossible, because writeln IMHO checks UTF8 validity. On 20.1.2012 18:08, Xan xan wrote: Before and now, I get this error: $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf [Excepció:

Re: Reading web pages

2012-01-20 Thread Bystroushaak
rawWrite(): stdout.rawWrite(cast(ubyte[]) navegador.get(a)); On 20.1.2012 18:18, Xan xan wrote: Mmmm... I understand it. But is there any way of circumvent it? Perhaps I could write to one file, isn't? 2012/1/20 Bystroushaakbystrou...@kitakitsune.org: Thats because you are trying writeln

Re: Reading web pages

2012-01-20 Thread Xan xan
Thank you very much. I should invite you to a beer ;-) For the other hand, I get this error: [Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint] if I only want the length: //D 2.0 //gdmd-4.6 fitxer dhttpclient = surt

Re: Reading web pages

2012-01-20 Thread Xan xan
The same error with: [...] foreach (a; args[1..$]) { |___|___|___|___write([Longitud: ); |___|___|___|___stdout.rawWrite(cast(ubyte[]) navegador.get(a)); |___|___|___|___writeln(]); |___|___|___} [...] 2012/1/20 Bystroushaak bystrou...@kitakitsune.org: rawWrite(): stdout.rawWrite(cast(ubyte[])

Re: Reading web pages

2012-01-20 Thread Bystroushaak
On 20.1.2012 18:42, Xan xan wrote: Thank you very much. I should invite you to a beer ;-) Write me if you will be in prag/czech republic :) For the other hand, I get this error: [Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string

Reading web pages

2012-01-19 Thread Xan xan
Hi, I want to simply code a script to get the url as string in D 2.0. I have this code: //D 2.0 //gdmd-4.6 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; int main(string [] args) { if (args.length 2) { writeln(Usage:);

Re: Reading web pages

2012-01-19 Thread Timon Gehr
On 01/19/2012 04:30 PM, Xan xan wrote: Hi, I want to simply code a script to get the url as string in D 2.0. I have this code: //D 2.0 //gdmd-4.6 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; int main(string [] args) { if (args.length 2) {

Re: Reading web pages

2012-01-19 Thread Kapps
The host is www.google.com - http is only a web protocol. The DNS lookup is independent of HTTP, and thus should not include it. Note that you're also missing a space after the GET. Also, in terms of the example given, some servers won't like you not using the Host header, some won't like the

Re: Reading web pages

2012-01-19 Thread Bystroushaak
You can always use my module: https://github.com/Bystroushaak/DHTTPClient On 19.1.2012 20:24, Timon Gehr wrote: On 01/19/2012 04:30 PM, Xan xan wrote: Hi, I want to simply code a script to get the url as string in D 2.0. I have this code: //D 2.0 //gdmd-4.6 import std.stdio, std.string,