If you're crawling web pages, you need to have a limit to the amount of data
any page returns.
Otherwise you'll eventually run into a site that returns an unbounded amount of
data, which will kill your JVM.
See SimpleHttpFetcher in Bixo for an example of one way to do this type of
limiting (th
jmap result:
Debugger attached successfully.
Server compiler detected.
JVM version is 22.1-b02
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 2147483648 (2048.0MB)
NewSize =
I am using httpclient 4.3 to crawl webpages.
I start 200 threads and PoolingHttpClientConnectionManager with
totalMax 1000 and perHostMax 5
I give java 2GB memory and one thread throws an exception(others still
running, this thread is dead)
Exception in thread "Thread-156" java.lang.OutOfMemoryErr
If a server supports NTLM and Kerberos authentication, but when setting up the
client I only provide basic credentials I get a log for each of the NTLM and
NEGOTIATE authentication schemes.
Taking the example from :
https://hc.apache.org/httpcomponents-client-4.3.x/httpclient/examples/org/apach
Hi,
I have a a custom Entity stream class as below
public class GcsCustomInputStreamEntity extends InputStreamEntity {
InputStream sourceStream - null;
Public GcsCustomInputStreamEntity(InputStream sourceStream) {
this.sourceStream = sourceStream
}