[ ] +1 - yes, I vote for the proposal
This is awesome
--- On Thu, 4/1/10, Andrzej Bialecki wrote:
From: Andrzej Bialecki
Subject: [VOTE] Nutch to become a top-level project (TLP)
To: nutch-user@lucene.apache.org
Date: Thursday, April 1, 2010, 12:23 PM
Hi all,
According to an earlier [DISCUSS
Long time since I wrote plugin.
You could simply embed different logic in the same plugin - cant you?
sudhi
--- On Tue, 7/28/09, Koch Martina wrote:
From: Koch Martina
Subject: Host specific parsing
To: "nutch-user@lucene.apache.org"
Date: Tuesday, July 28, 2009, 2:24 AM
Hi,
has anyone bu
As a very old nutch user an developer of plugins and even implemented nutch in
some products - I could help you.
I am based in Houston, Texas -- skype me on hooduku
sudhi
--- On Mon, 7/27/09, sf30098 wrote:
From: sf30098
Subject: Support needed
To: nutch-user@lucene.apache.org
Date: Monday, J
Michael,
Yes you can run just the crawler part. Lucene provides the API to index the
crawled data and search the indexed data. So short answer is you can do it.
Thanks
sudhi
--- On Mon, 7/28/08, Michael Chan <[EMAIL PROTECTED]> wrote:
From: Michael Chan <[EMAIL PROTECTED]>
Subject: Running Nutc
Check the http plugins. It should have away. Just search the list, this
question has been addressed already.
Thanks
Sudhi
Deepa Devanathan <[EMAIL PROTECTED]> wrote: hi guys,
I have a site i need to crawl but the very first page asks for a username,
password.
Is there a way I can supply these t
For http://www.myopensourcejobs.com, we are uisng similar to OpensourceXML
That works like a champ.
I am not sure, if PHP-Java bridge would have any difference in terrms of perf.
Thanks
Sudhi
Stefan Neufeind <[EMAIL PROTECTED]> wrote:
Chris Stephens wrote:
> Has anyone had succes
I do not believe we have a nsf-plugin.
You will have to write a plugin to be able to handle nsf. There is a plugin
tutorial on nutch wiki. Please refer to it.
Thanks
Sudhi
Deepa Devanathan <[EMAIL PROTECTED]> wrote:
hi guys,
Can Nutch parse thru Lotus notes databases - .nsf files
Hello Nutchians
I am sure many of you would have experienced the same problem as me right now.
I have a domain name http://www.myopensourcejobs.com
I have my app hosted on a server (virtual dedicated server) 68.x.x.x in Go
daddy.
I want to configure and associate IPaddress and domain n
Please try this command
bin/nutch crawl search -dir /usr/data/crawl -depth 2 &> crawl.log &
where search folder contains the list of files containing URLs.
The crawler will crawl data into /usr/data/crawl/crawldb folder.
crawl.log being the log file.
Hope this helps.
Thanks
S
Please try this command
bin/nutch crawl search -dir /usr/data/crawl -depth 2 &> crawl.log &
where search folder contains the list of files containing URLs.
The crawler will crawl data into /usr/data/crawl/crawldb folder.
crawl.log being the log file.
Hope this helps.
Thanks
Oops, Ignore my previous mail. Just check the search.jsp. there is a parameter
"lang". By default it is set to en. You could change based on the locale
settings. Accordingly you could manage the search directories too.
Please refer to search.jsp and Opensearchservlet. It is pretty straight
for
There are couple of ways that this could be done as per the mailing lists.
One is, user is given the choice of selecting the directory or you could
deploy two war files with different searcher.dir configured to correspinding
conf folder
Thanks
Sudhi
nasm <[EMAIL PROTECTED]> wrote:
h
in add same banners ads to rentavilce the server
On 7/18/06, Sudhi Seshachala wrote:
> Thanks.
> I have written PArse plugins which pretty customizes the crawling and parses
> according to the rules defined in PArse plugin. I have a index and Query
> plugin specific to the domain I opera
to run my
crawlers). I have two machines running legacy fedora core2.
Hope that helps.
Thanks
Sudhi
Nutch Newbie <[EMAIL PROTECTED]> wrote:
Good work!
On 7/17/06, Sudhi Seshachala wrote:
In addition for crawling, I have customized the process of crawling.
>
Just curiou
Hello Nutchians,
Please visit the site http://www.myopensourcejobs.com. The site is built
using LAMP and Nutch.
I use the Nutch crawler to crawl jobs from commercial sites such as Hotjobs,
DICE and CareerBuilder (As of today), specifically for opensource skill sets.
Basically the site filter
p://tonalweb.com
-Original Message-----
From: Sudhi Seshachala [mailto:[EMAIL PROTECTED]
Sent: Monday, June 26, 2006 11:44 PM
To: nutch-user@lucene.apache.org
Subject: Re: Title: search?
You should be looking if title is indexed. Make sure index-basic plugin is
included. If url is okay, it should be inc
You should be looking if title is indexed. Make sure index-basic plugin is
included. If url is okay, it should be included. But to be doubly sure, I would
go and investigate index-plugin. I am assuming you are using 0.8.
Sudhi Seshachala <[EMAIL PROTECTED]> wrote:
You should be l
You should be looking if title is
Tonal Web Design - Stijn <[EMAIL PROTECTED]> wrote: I have all the plugins
enabled for nutch,
yet I'm not able to do queries like "title:keyword"? I can only do "url:"
and "site:"
what else should I look at to fix this?
I also can't do ? Wild card searches.
Hello folks,
I am working on adopting nutch for a vertical.
I have been able to get it up and running in pretty basic scenarios.
I need some help in getting up to speed in trying to crawl sites which has
some weird encoding on the URLs.
I am kind of lost, how to go about it? If some one can share s
19 matches
Mail list logo