Hi,
I just notice that either protocol-http or protocol-httpclient can't get all
page source of a big html file. For instance, I want to get all the page
source of this url (http://www.taobao.com/), but the code below can't do the
job:
1) protocol-httpclient
try {
byte[] buffer = new byte[HttpBase.BUFFER_SIZE];
//byte[] buffer = new byte[contentLength];
int bufferFilled = 0;
int totalRead = 0;
ByteArrayOutputStream out = new ByteArrayOutputStream();
while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1
&& totalRead < contentLength) {
totalRead += bufferFilled;
out.write(buffer, 0, bufferFilled);
}
content = out.toByteArray();
}
2) protocol-http
String contentEncoding = getHeader(Response.CONTENT_ENCODING);
if ("gzip".equals(contentEncoding) ||
"x-gzip".equals(contentEncoding)) {
content = http.processGzipEncoded(content, url);
} else if ("deflate".equals(contentEncoding)) {
content = http.processDeflateEncoded(content, url);
} else {
if (Http.LOG.isTraceEnabled()) {
Http.LOG.trace("fetched " + content.length + " bytes from " +
url);
}
}
How can I achieve what I want?
Thanks