I customized Solr browse GUI for Nutch based on Solr 3.3. Here is the
link to the war file I created as well as instructions:
http://thetechietutorials.blogspot.com/2011/07/customized-solr-browser-interface-for.html
Have fun!
On 7/20/11, Markus Jelsma markus.jel...@openindex.io wrote:
There are
As promised, I customized Solr browse GUI for Nutch based on Solr 3.3.
Here is the link to the war file I created as well as instructions:
http://thetechietutorials.blogspot.com/2011/07/customized-solr-browser-interface-for.html
Have fun with Nutch and Solr!
On 7/26/11, Geek Gamer
Hello,
I am modifiying htmlparser for my own purposes. After lots of coding
and testing, I pretty much know what to do.
I was wondering, if we were lets say lingpipe library to do some named
entity recognition at parse stage. Many libraries such as lingpipe,
but not limited to lingpipe have some
Hello,
Looking at my crawler output, I noticed that some pages are not
captured, because they do some sort of js loading on pageLoad() -
these are not per se - lets say an ajax request to get some json, and
render it with in dom with js - however these are XHR calls that
return plain html.
Could
HI Alexander,
I don't want to state the obvious here but this will depend directly on what
type of loading your Nutch implementation deals with...
You are correct in stating that we store data in segments, namely
/crawl_fetch
/content
/crawl_parse
/parse_data
/crawl_generate
/parse_text
I
has this been solved?
If your http.content.limit has not been increased in nutch-site.xml then you
will not be able to store this data and index with Solr.
On Mon, Jul 25, 2011 at 6:18 PM, Chip Calhoun ccalh...@aip.org wrote:
I'm still having trouble. I've set a windows environment variable,
Hi Markus,
I am getting you until the last parts of your comments.
cope with non-edited... edited by whom? and for what purpose? To give a
better relative tf score...
To comment on the first part, and please ignore or correct me if I am wrong,
but do we not give each page and therefore each
To be honest, I am not a Nutch guru. If I were you, I would run solrindex
for each segment one by one for now (or write a script to automate that).
Solr will combine results for each segment. I tested that before. You have
to test it by yourself because your system is in production. :-)
Down the
Hi Cheng Li,
Please experiment with this. We have been gradually getting the
pluginCentral section of the wiki updated as it needed a total face lift, so
would appreciate any additional input you may have for updating the writing
Plugin example which is already there. Apart being completely out
Hi Marseld,
I'm just putting my thoughts out here, however Hadoop is not shipped with
Nutch 1.3 anymore therefore I don't know where you would set this specific
property within yout Nutch instances...
How are you running Hadoop
what version of Nutch
what mode are you running Nutch in?
On Tue,
You can run jetty (for example, mvn jetty:run, or java -jar start.jar for
solr. :-)
On Sun, Jul 24, 2011 at 5:12 AM, Markus Jelsma
markus.jel...@openindex.iowrote:
You need Solr for indexing.
Hi Everyone
I have Nutch-Gora-Hbase configuration and I've crawled some urls.
I want to perform
And Solr is generating lucene index anyway.
2011/7/19 lewis john mcgibbney lewis.mcgibb...@gmail.com
Hi Kelvin,
I see you are posting on a couple of threads with regards to the Lucene
index generated by Nutch which you correctly point out is not there. It is
not possible to create a Lucene
12 matches
Mail list logo