Subscribe

2015-09-29 Thread Manali Shah
Hello, I would like to subscribe to the mailing list. Best, Manali

Trying to work Rotating agent id plugin

2015-10-02 Thread Manali Shah
Hello, I am currently trying crawl the web using nutch 1.11 trunk version from https://github.com/apache/nutch I am trying to use a particular property from the nutch-default.xml named: http.agent.rotate false If true, instead of http.agent.name, alternating agent names are chosen from a list p

Can't retrieve Tika parser for mime-type text/aspdotnet

2015-10-04 Thread Manali Shah
Hello, I am trying crawl a website using nutch trunk along with the latest tika It gives me an error: Can't retrieve Tika parser for mime-type text/aspdotnet But when I try to parse the same url using the tika-app-1.10.jar using the command $ java -jar tika-app-1.10.jar -m url It prints the me