Re: Concurrent HTTP requests?

Michele Mostarda Tue, 03 Mar 2015 01:53:23 -0800

Hi Luca,

  happy to see new users!


On 2 March 2015 at 22:41, Luca Matteis <[email protected]> wrote:

> Thanks Lewis,
>
> Would you know if the HTTP client in Any23 does any user agent testing
> on a resource, and redirection to obtain the triples it needs?
>

at what I remember the user agent specified by the Any23 client is fixed
and can be changed via system property.
It discriminates on the retrieved content on the basis of the declared
mime-type and content.
Redirections are handled transparently by the Apache HTTPClient used to
perform requests.

>
> I could use the official HTTP client by apache which supports async
> requests to run them concurrently, and then only use Any23 for the
> parsing, however, I want to make sure I'm applying the appropriate
> redirection and headers on resources I encounter.
>

Any23 already uses an HTTP Client to handle requests and redirects, anyway
if you need to customize some behaviors driven by header the quickest
choice is to use (as you suggested) an external HTTP client like Apache
HTTP and then process data programmatically.

>
> Thanks,
> Luca
>

Best
Michele

>
> On Mon, Mar 2, 2015 at 10:27 PM, Lewis John Mcgibbney
> <[email protected]> wrote:
> > Hey Luca,
> >
> > On Mon, Mar 2, 2015 at 1:08 PM, <[email protected]>
> wrote:
> >>
> >>
> >> I'm new to using Any23, and it's already been a great library to use.
> >
> >
> > great
> >
> >>
> >> However I'm stuck with something rather basic. I followed this example
> >> on how to simply GET a URL and return the triples it contains:
> >> http://any23.apache.org/dev-data-extraction.html
> >
> >
> > OK
> >
> >>
> >>
> >> I'd like to run many HTTP requests in a non-blocking fashion,
> >> concurrently. Are there facilities to do this using the HTTP code
> >> contained in Any23?
> >>
> > There is no code in Any23 for this. You may wish to investigate the Any23
> > Basic HTTP crawler plugin however
> > https://github.com/apache/any23/tree/master/plugins/basic-crawler
> > You can define the number of crawlers on the command line
> >
> https://github.com/apache/any23/blob/master/plugins/basic-crawler/src/main/java/org/apache/any23/cli/Crawler.java#L67
> > As an alternative you could investigate using something like Crawler
> Commons
> > [0] or Apache Nutch [1] for dealing with the HTTP logic
> >
> > [0] https://code.google.com/p/crawler-commons/
> > [1] http://nutch.apache.org
> >
> >
>



-- 
Michele Mostarda
Senior Software Engineer
skype: michele.mostarda
twitter: micmos
mail: [email protected]

Re: Concurrent HTTP requests?

Reply via email to