Thanks Lewis, Would you know if the HTTP client in Any23 does any user agent testing on a resource, and redirection to obtain the triples it needs?
I could use the official HTTP client by apache which supports async requests to run them concurrently, and then only use Any23 for the parsing, however, I want to make sure I'm applying the appropriate redirection and headers on resources I encounter. Thanks, Luca On Mon, Mar 2, 2015 at 10:27 PM, Lewis John Mcgibbney <[email protected]> wrote: > Hey Luca, > > On Mon, Mar 2, 2015 at 1:08 PM, <[email protected]> wrote: >> >> >> I'm new to using Any23, and it's already been a great library to use. > > > great > >> >> However I'm stuck with something rather basic. I followed this example >> on how to simply GET a URL and return the triples it contains: >> http://any23.apache.org/dev-data-extraction.html > > > OK > >> >> >> I'd like to run many HTTP requests in a non-blocking fashion, >> concurrently. Are there facilities to do this using the HTTP code >> contained in Any23? >> > There is no code in Any23 for this. You may wish to investigate the Any23 > Basic HTTP crawler plugin however > https://github.com/apache/any23/tree/master/plugins/basic-crawler > You can define the number of crawlers on the command line > https://github.com/apache/any23/blob/master/plugins/basic-crawler/src/main/java/org/apache/any23/cli/Crawler.java#L67 > As an alternative you could investigate using something like Crawler Commons > [0] or Apache Nutch [1] for dealing with the HTTP logic > > [0] https://code.google.com/p/crawler-commons/ > [1] http://nutch.apache.org > >
