Thanks Lewis,

Would you know if the HTTP client in Any23 does any user agent testing
on a resource, and redirection to obtain the triples it needs?

I could use the official HTTP client by apache which supports async
requests to run them concurrently, and then only use Any23 for the
parsing, however, I want to make sure I'm applying the appropriate
redirection and headers on resources I encounter.

Thanks,
Luca

On Mon, Mar 2, 2015 at 10:27 PM, Lewis John Mcgibbney
<[email protected]> wrote:
> Hey Luca,
>
> On Mon, Mar 2, 2015 at 1:08 PM, <[email protected]> wrote:
>>
>>
>> I'm new to using Any23, and it's already been a great library to use.
>
>
> great
>
>>
>> However I'm stuck with something rather basic. I followed this example
>> on how to simply GET a URL and return the triples it contains:
>> http://any23.apache.org/dev-data-extraction.html
>
>
> OK
>
>>
>>
>> I'd like to run many HTTP requests in a non-blocking fashion,
>> concurrently. Are there facilities to do this using the HTTP code
>> contained in Any23?
>>
> There is no code in Any23 for this. You may wish to investigate the Any23
> Basic HTTP crawler plugin however
> https://github.com/apache/any23/tree/master/plugins/basic-crawler
> You can define the number of crawlers on the command line
> https://github.com/apache/any23/blob/master/plugins/basic-crawler/src/main/java/org/apache/any23/cli/Crawler.java#L67
> As an alternative you could investigate using something like Crawler Commons
> [0] or Apache Nutch [1] for dealing with the HTTP logic
>
> [0] https://code.google.com/p/crawler-commons/
> [1] http://nutch.apache.org
>
>

Reply via email to