Hi Luca, happy to see new users!
On 2 March 2015 at 22:41, Luca Matteis <[email protected]> wrote: > Thanks Lewis, > > Would you know if the HTTP client in Any23 does any user agent testing > on a resource, and redirection to obtain the triples it needs? > at what I remember the user agent specified by the Any23 client is fixed and can be changed via system property. It discriminates on the retrieved content on the basis of the declared mime-type and content. Redirections are handled transparently by the Apache HTTPClient used to perform requests. > > I could use the official HTTP client by apache which supports async > requests to run them concurrently, and then only use Any23 for the > parsing, however, I want to make sure I'm applying the appropriate > redirection and headers on resources I encounter. > Any23 already uses an HTTP Client to handle requests and redirects, anyway if you need to customize some behaviors driven by header the quickest choice is to use (as you suggested) an external HTTP client like Apache HTTP and then process data programmatically. > > Thanks, > Luca > Best Michele > > On Mon, Mar 2, 2015 at 10:27 PM, Lewis John Mcgibbney > <[email protected]> wrote: > > Hey Luca, > > > > On Mon, Mar 2, 2015 at 1:08 PM, <[email protected]> > wrote: > >> > >> > >> I'm new to using Any23, and it's already been a great library to use. > > > > > > great > > > >> > >> However I'm stuck with something rather basic. I followed this example > >> on how to simply GET a URL and return the triples it contains: > >> http://any23.apache.org/dev-data-extraction.html > > > > > > OK > > > >> > >> > >> I'd like to run many HTTP requests in a non-blocking fashion, > >> concurrently. Are there facilities to do this using the HTTP code > >> contained in Any23? > >> > > There is no code in Any23 for this. You may wish to investigate the Any23 > > Basic HTTP crawler plugin however > > https://github.com/apache/any23/tree/master/plugins/basic-crawler > > You can define the number of crawlers on the command line > > > https://github.com/apache/any23/blob/master/plugins/basic-crawler/src/main/java/org/apache/any23/cli/Crawler.java#L67 > > As an alternative you could investigate using something like Crawler > Commons > > [0] or Apache Nutch [1] for dealing with the HTTP logic > > > > [0] https://code.google.com/p/crawler-commons/ > > [1] http://nutch.apache.org > > > > > -- Michele Mostarda Senior Software Engineer skype: michele.mostarda twitter: micmos mail: [email protected]
