Re: Best practise to add SOCKS proxy support?

2009-09-02 Thread Peter Paul
On Tue, 1 Sep 2009 17:54:42 +0200 Oleg Kalnichevski wrote: > This is the expected behaviour. You can override it, though, by using > this workaround: > > http://svn.apache.org/repos/asf/httpcomponents/oac.hc3x/trunk/src/contrib/org/apache/commons/httpclient/contrib/ssl/HostConfigurationWithStick

Re: HttpClient 3.1 to 4.0 migration

2009-09-02 Thread Oleg Kalnichevski
Gerald Turner wrote: Hello HttpClient Users List, I have spent the last couple days upgrading a dozen applications from HttpClient 3.1 to 4.0. First off, I must say that I'm very pleased that MultiThreadedHttpConnectionManager (now ThreadSafeClientConnManager) is using synchronization rather tha

HttpClient 3.1 to 4.0 migration

2009-09-02 Thread Gerald Turner
Hello HttpClient Users List, I have spent the last couple days upgrading a dozen applications from HttpClient 3.1 to 4.0. First off, I must say that I'm very pleased that MultiThreadedHttpConnectionManager (now ThreadSafeClientConnManager) is using synchronization rather than thread interrupts. B

Re: Charset trouble, questionmarks

2009-09-02 Thread Ken Krugler
Hi Magnus, I used curl to grab the file, and the bytes at 0x1845...0x1847 are 0xC3 0xA5, which is valid UTF-8 for the u00E5 code point (latin small letter a with ring above). I also used Bixo (http://bixo.101tec.com) to crawl the same page, and wound up with the same raw data. Bixo uses H

Re: Charset trouble, questionmarks

2009-09-02 Thread Oleg Kalnichevski
On Wed, Sep 02, 2009 at 11:54:42AM -0400, NBW wrote: > No use of things like InputStreamReaders then I take it. > InputStreamReaders is used by one utility method in HttpCore (EntityUtils#toString()). However, it uses an InputStreamReaders constructor that explicitly takes the charset name to be

Re: Charset trouble, questionmarks

2009-09-02 Thread NBW
No use of things like InputStreamReaders then I take it. On Wed, Sep 2, 2009 at 11:41 AM, Oleg Kalnichevski wrote: > On Wed, Sep 02, 2009 at 11:39:35AM -0400, NBW wrote: > > What about passing -Dfile.encoding=utf-8? > > > > HttpClient does not use system properties (per design) > > Oleg > > >

Re: Charset trouble, questionmarks

2009-09-02 Thread Oleg Kalnichevski
On Wed, Sep 02, 2009 at 11:39:35AM -0400, NBW wrote: > What about passing -Dfile.encoding=utf-8? > HttpClient does not use system properties (per design) Oleg > On Wed, Sep 2, 2009 at 10:58 AM, Magnus Olstad Hansen wrote: > > > > > > But when you call httpclient.execute(httpget, responseHandl

Re: Charset trouble, questionmarks

2009-09-02 Thread NBW
What about passing -Dfile.encoding=utf-8? On Wed, Sep 2, 2009 at 10:58 AM, Magnus Olstad Hansen wrote: > > > But when you call httpclient.execute(httpget, responseHandler), the > > BasicResponseHandler will call EntityUtils.toString, and that in turn > > uses ISO-8859-1 as its default charset whe

Re: Charset trouble, questionmarks

2009-09-02 Thread Magnus Olstad Hansen
Thanks for the reply, Ken. > > The basic problem is that determining the character set of a web page > is complex, and not something that HttpClient is designed to handle. > > If you check out (for example) the Nutch source, you'll see that it > has a multi-step process, where it uses the Content-t

Re: Charset trouble, questionmarks

2009-09-02 Thread Florent Blondeau
Hi Magnus, Don't know exactly where you can plug this, but this project helped me a lot parsing non ISO charset : http://jchardet.sourceforge.net/ hope that helps Florent Pingwy 27, rue des arènes 49100 Angers Magnus Olstad Hansen a écrit : Hello, I'm using HttpClient 4.0 to download a

Re: Charset trouble, questionmarks

2009-09-02 Thread Ken Krugler
Hi Magnus, On Sep 2, 2009, at 1:22am, Magnus Olstad Hansen wrote: Hello, I'm using HttpClient 4.0 to download a webpage the same way as shown in one of the examples. This is my method to return a webpage as a string: protected static String leechUrl(String url) throws IOException

Re: Charset trouble, questionmarks

2009-09-02 Thread Oleg Kalnichevski
On Wed, Sep 02, 2009 at 10:22:16AM +0200, Magnus Olstad Hansen wrote: > Hello, > > I'm using HttpClient 4.0 to download a webpage the same way as shown in > one of the examples. This is my method to return a webpage as a string: > >protected static String leechUrl(String url) throws IOExc

Charset trouble, questionmarks

2009-09-02 Thread Magnus Olstad Hansen
Hello, I'm using HttpClient 4.0 to download a webpage the same way as shown in one of the examples. This is my method to return a webpage as a string: protected static String leechUrl(String url) throws IOException { HttpClient httpclient = new DefaultHttpClient();