I am using HTMLparser to parse all html pages and to
get required information out of that.
Let me tell you my crawler.
I have to search for all pages of group website(e.g.
www.group.com and it contians links to
news.groups.com, forum.news.com etc...).So I will get
all links of group website using P
"
-Original Message-
From: Ranjan K. Baisak [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 29, 2006 12:06 PM
To: java-user@lucene.apache.org
Subject: Re: Hi Experts
For internet searching Nutch is the best tool. But
however as you dont want to use cygwin then you need
to use Lucene in
For internet searching Nutch is the best tool. But
however as you dont want to use cygwin then you need
to use Lucene in following way.
You need to download whole page and create an index
out of that page. Then use lucene to search offline
content than online.
I have used lucene in this way and I h
The way lucene works is you need to have the index first.
Only then you can search it.
So if you want to search within a given URL, you need to somehow create
the index of all the webpages within that URL. If the webserver linked
to that URL is also yours, then that would not be a big deal.
But i
Hi Experts,
Iam a new bie. Iam suppose to select a search engine for my project, which
should search from the given URL and display the result.
I should use the search engine in windows OS only. I should not use any other
external tool like CGYWIN which is used in NUTCH.
Any expet guidance w
Otis Gospodnetic wrote:
This was the old behaviour in Lucene. You may want to check the
ML archives to see why the lock was moved to temp dir by default, I
remember there being some discussion around that. This might have
been2 years ago?
To answer my own question, I've just discovered
It seems to me that lucene doesn't use B-tree for its indexing storage.
Any paper/article which explains the theory behind data-structure of
single index(segment). I am not referring to the merge algorithm, I am
curious to know the storage structure of a single optimized lucene index.
Any po
This was the old behaviour in Lucene. You may want to check the ML
archives to see why the lock was moved to temp dir by default, I remember there
being some discussion around that. This might have been2 years ago?
Otis
- Original Message
From: Daniel Noll <[EMAIL PROTECTED]>
Gopikrishnan Subramani wrote:
With Lucene 1.4.3 I used to use a different lock directory from the default
by setting org.apache.lucene.lockdir system property. But as per
http://lucene.apache.org/java/docs/systemproperties.html this is not
supported in Lucene 1.9 and above. I don't see any equiva
Comments inline below.
On Tuesday 28 March 2006 18:29, Ramana Jelda wrote:
>
> Hi,
> I have a got strange problem.
> My searchterm : "mp3 player"
> Lucene Query :
> +(
> +(
> spanNear([productName:mp, productName:3], 3, true)
> spanNear([subName:mp, subName:3], 3, true)
>)
> +(p
On Dienstag 28 März 2006 13:24, Gopikrishnan Subramani wrote:
> With Lucene 1.4.3 I used to use a different lock directory from the
> default by setting org.apache.lucene.lockdir system property. But as per
> http://lucene.apache.org/java/docs/systemproperties.html this is not
> supported in Lucen
I use ant for many other builds so I know that's not the problem.
When I run 'ant' from the directory I have untarred lucene to I get a 'build
failed' error, it says,
Cannot find common-build.xml imported from /root/lucene-1.9.1/build.xml
Where could I find common-build.xml, and build-deprec
Of course they are monitoring this mail list, Lucene rocks and it is
beating them. Do yourself a favor and dedicate some time to testing
Lucene vs. any commercial application. A little time spent up front
testing the tools can save you significant time later optimizing,
hacking in a new tool, or re
Weird, I was just about to comment on the fact that since posting that
my organization has decided to use Lucene, I got calls from two
commercial vendors that didn't give me the time of the day while I was
doing my comparison analysis.
Both of them referred to some random "colleague" in the busine
I don't have any experience with Red-piranha.
I'm wondering if the Red-Piranha project is still going strong. The
so-called "community edition" seems to be lacking development according
to the CVS repository:
http://cvs.sourceforge.net/viewcvs.py/red-piranha/CVSROOT/
(The last commits seem
I much prefer the one catch-all field approach, personally. As long
as scoring works out how you'd like using this technique, then it'll
make for simpler (and thus faster) queries. However, reindexing is
necessary to bring another field into the mix, whereas run-time ORing
is more flexibl
On 3/28/06, Michael Levy <[EMAIL PROTECTED]> wrote:
> I'm looking for advice on selecting a search application. I'm
> responsible for developing a new search platform for use in a historical
> research organization and museum. I've pretty much decided on Lucene as
> the library for custom servlet
Thanks v. much for your thoughts, a lot to think about. I'm currently doing
some benchmark tests on typical usage scenarios with lucene. I'm actually
using lucene through its integration with Jackrabbit dms so may not be
easy/possible to use a different search engine anyway. Of course I'd rather
b
Hi Thomas,
Sound like FUD to me. No concrete numbers, and the benchmark they mention
eh, haven't we all seen "funny" benchmarks before? Lucene is used in many
large operations (e.g. Technorati, Simpy) that involve a LOT of indexing and
searching, large indices, etc. I suggest you try bot
thomasg wrote:
Hi, we are currently intending to implement a document storage / search tool
using Jackrabbit and Lucene. We have been approached by a commercial search
and indexing organisation called ISYS who are suggesting the following
problems with using Lucene. We do have a requirement to st
Hi,
I have a got strange problem.
My searchterm : "mp3 player"
Lucene Query :
+(
+(
spanNear([productName:mp, productName:3], 3, true)
spanNear([subName:mp, subName:3], 3, true)
)
+(productName:player subName:player)
)
Throws following lucene BooleanScorer2 exception.
Caused by:
Hi Otis
Thanks for the information. I'm actually writing something to search files
containing code (such as JSP files) so I do expect there will be a few
problems like this because I guess Lucene's out-of-the box analyzers are
really suited to natural languages. But, I was wondering if you could
I'm looking for advice on selecting a search application. I'm
responsible for developing a new search platform for use in a historical
research organization and museum. I've pretty much decided on Lucene as
the library for custom servlet apps that would use the Lucene API directly.
At the sa
Hi Gopi,
I don't know whether org.apache.lucene.lockDir is available at 2.0.
You may want to see release note for the system property:
http://svn.apache.org/repos/asf/lucene/java/tags/lucene_1_9_1/CHANGES.txt
regards,
Koji
> -Original Message-
> From: Gopikrishnan Subramani [mailto:[EM
Thanks, Koji. I tried org.apache.lucene.lockDir and found that to be
working. But my only concern is if this is the suggested approached or a
deprecated one. I wanted to know this to make sure my application is Lucene
2.0 compliant.
Thanks again..
Gopi
On 3/28/06, Koji Sekiguchi <[EMAIL PROTECT
Hi Gopi,
The name of the system property has been changed to
org.apache.lucene.lockDir and it should be still available.
Thank you,
Koji
> -Original Message-
> From: Gopikrishnan Subramani [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 28, 2006 8:24 PM
> To: java-user@lucene.apache.o
Hello,
With Lucene 1.4.3 I used to use a different lock directory from the default
by setting org.apache.lucene.lockdir system property. But as per
http://lucene.apache.org/java/docs/systemproperties.html this is not
supported in Lucene 1.9 and above. I don't see any equivalent API for
lockDir pro
thomasg wrote:
1) By default, Lucene only indexes the first 10,000 words from each
document. When increasing this default out-of-memory errors can occur. This
implies that documents, or large sections thereof, are loaded into memory.
ISYS has a very small memory footprint which is not affected by
Hi, we are currently intending to implement a document storage / search tool
using Jackrabbit and Lucene. We have been approached by a commercial search
and indexing organisation called ISYS who are suggesting the following
problems with using Lucene. We do have a requirement to store and search
l
Hi Nick
Have you tried the Lucene Index Accessor contribution?
We have a similar update/search pattern and it works very well.
http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049
Patrick
On 28/03/06, Nick Atkins <[EMAIL PROTECTED]> wrote:
> I'm using Lucene runn
30 matches
Mail list logo