I am currently dealing with lucene indexes of the size of 8 GIG. Searching
is fast but retrieving documents slow down the process of returning results
to the user. Also the index is updated very frequently, about 3 times a
minute and more. This leads to an index that grows very fast in number o
Hi Doron,
Yes, we can do it using MultiSearcher.subSearcher(int n). But here we can
not get cluster for individual searcher. For this we have to apply a loop
from (i = 0 to 3000), which we do not want in our case.
We need to show number of hits from each searcher (Without using loop on
hits).
S
Just like your Winamp, Trillian, or other excellent software, its free
version can satisfy most of your needs, and advanced features, like
scripting, your boss need to pay for it.
DBSight is still improving itself. If we can get any support, like how
Doug is supported by Yahoo, we would be happy
Does it mean DBSight is Free?
2007/4/24, Chris Lu <[EMAIL PROTECTED]>:
For those who may be interested,
DBSight 1.4.0 now has unlimited index size with Free version!
Basically DBSight is more like SOLR + database adapter. You just point
it with one or several SQLs to any database, and you can
Hi Doron,
Thanks for the help. I think you're right. I haven't yet tried this,
and I didn't notice that CachingWrapperFilter cached multiple BitSets by
IndexReader. So this may be simpler than I thought. I'll give it a
whirl and see what happens.
Regards,
Peter
-Original Message-
Hi Peter,
I think this is already taken care of by CachingWrapperFilter - because its
caching is (like filtering) by IndexReader, and search by a multiSearcher
eventually attempts to filter against each underlying reader, and those
"sub-" filters are being cached.
So it seems to me that if you ju
I guess there are a few points
- it is impossible to stem with total accuracy using rules alone
- combining a rule based stemmer with a dictionary could also be error
prone. Unrelated words can have the same stem - consider the past tense of
see and the stem of sawing ( cutting wood )
- Stemming
El mar, 24-04-2007 a las 21:49 +0100, [EMAIL PROTECTED]
escribió:
>>
> >> For example, if I search for "eat", I'd like Lucene to find "eating",
> >> "eaten", "ate", etc.
>
> Hi Andrew,
>
> The example you provide can only partially be performed using a rule based
> stemmer, such as those uesd by S
Hi Andrew,
The example you provide can only partially be performed using a rule based
stemmer, such as those uesd by Snowball. Most stemmers are capable of
stemming eating, eats, and eaten to eat. However they will not stem ate to
eat.
While in theory you could consturuct some form of dictionary
Hi Sawan,
If I understand the question correctly, you use MultiSearcher over three
searchers s[0], s[1], s[2], get some 3000 search results, and for result x
(0<=x<3000) need to know if it came from s[0], s[1], or s[2]. If so, take
a look at that MultiSearcher.subSearcher(int n) (n would be the
Hi, jaf,
This is not new and I learned it from Doug.
Basically you maintain a mapping of "document id" to
values, and collect all the values for each hit in hit
collector.
Chris Lu
-
Instant Scalable Full-Text Search On Any
Database/Application
site: http://www.dbsight.n
Hi Andrew,
ahg <[EMAIL PROTECTED]> wrote on 24/04/2007 12:18:22:
> Hi, all,
>
> I'm looking for a simple, straightforward example of how to use the
> Snowball stemmer to make Lucene search results return all variants of
> the terms searched for.
>
> For example, if I search for "eat", I'd like Lu
Hi Chris,
Can you explain how? I know the source is available but perhaps a short
summary would be very useful for the list readers.
--jaf
On 4/24/07, Chris Lu <[EMAIL PROTECTED]> wrote:
Hi, Saurabh,
It's just one query and returns both hits and
categorized counts.
Chris
--- Saurabh Dani <
Hi, all,
I'm looking for a simple, straightforward example of how to use the
Snowball stemmer to make Lucene search results return all variants of
the terms searched for.
For example, if I search for "eat", I'd like Lucene to find "eating",
"eaten", "ate", etc.
In particular, I'm not clear on wh
All,
I'm looking to solve the following problem and I could use some help.
My preferred approach appears to be blocked by Java permissioning, and
I'm not sure if that's by design or by accident.
I have a set of search fixed indices that get built on a 5 hour cycle -
these indices are not up
Hi, Saurabh,
It's just one query and returns both hits and
categorized counts.
Chris
--- Saurabh Dani <[EMAIL PROTECTED]> wrote:
> Hi Chris,
>
> How are you showing the hit counts in "Narrow By
> Year" options on the left? Is this one query for
> each year or a signle query returns both the to
Would this be costlier than a fssync (filesystem) of the index folder
from the primary site to the backup site. How is it different from a
normal file sync operation). Would there be any data consistency issues
?
One option, is to incrementally reindex the files on primary site to
replicate the i
Consider reduce size of per file. Split them into smaller pieces will
definitely help indexer working faster.
A 50M pure text file is amazing size, very few text files reach that size: 50M.
It must be very reasonable if you have to keep all information in such one big
file.
What you think?
Hi Ivan!
btw may be forbidding the sorted search in case of too many results is an
option? I did this way in my case.
Regards,
Artem.
On 4/24/07, Artem Vasiliev <[EMAIL PROTECTED]> wrote:
Ahhh, you said in your original post that your search matches _all_ the
results.. Yup my patch will not h
Ahhh, you said in your original post that your search matches _all_ the
results.. Yup my patch will not help much in this case - after all all the
values have to be read to be compared while sorting! :)
LUCENE-769 patch helps only if result set is significantly less than full
index size.
Regards
Hello Ivan!
It's so sad to me that you had bad results with that patch. :)
The discussion in the ticket is out-of-date - the patch was initially in
several classes, used WeakHashMap but then it evolved to what it's now - one
StoredFieldSortFactory class. I use it in my sharehound app in pretty m
But, I am facing the problem even with -Xms 256m and -Xmx 1024m.
Yes, the file was not added to the index because java process was
already using 1156m of memory, which is much higher than the max heap
memory.
But, even after waiting for a few minutes till the memory came below the
max heap value,
Use java -Xms50m to start your program, that gives a 50M initial heap size.
The OutofHeapMemory is because the default heap memory is not enough for your
application.
-Original Message-
From: Divya Rajendranath [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 24, 2007 7:01 PM
To: java-use
Hello All,
Could any one help me find solution to the following problem ?
I am facing problems while trying to add files of size 50MB to my
application. The application has on-demand indexing of documents in
place.whenever we add a file to our application, we first put the file
details/metadata
Hi,
How to get term frequency of multi terms in particular document? Any API
method other than using TermVector may help?
Also How to calculate termfreq. of time range. i.e : If my index have a
field "TIME" with values in millis (like 1176281188000)., and I want to
calculate term freq. of
Hi,
Anybody have idea about my previous post?
Regards
RSK
On 4/23/07, SK R <[EMAIL PROTECTED]> wrote:
Hi,
In my application, sometimes I need to find doc Id with term frequency
of my terms in my index of multi lines, tokenized & indexed with Standard
Analyzer. For this, now I'm using *
Hi all,
I am using MultiSearcher to search more then one Index folders. I have one
Index searcher array which contains 3 Index searchers...
01. C:\IndexFolder1
02. C:\IndexFolder2
03. C:\IndexFolder3
When I searched in 3 index folders using a MultiSearcher then I got 3000
hits.
1 to 1000 from C
Hi Chris,
How are you showing the hit counts in "Narrow By Year" options on the left? Is
this one query for each year or a signle query returns both the top 30 results
and hit counts for every category?
Thanks
Saurabh
[EMAIL PROTECTED]>
Sent: Tuesday,
For those who may be interested,
DBSight 1.4.0 now has unlimited index size with Free version!
Basically DBSight is more like SOLR + database adapter. You just point
it with one or several SQLs to any database, and you can have Lucene
search!
It has Incremental Indexing, Recreating Index, Synch
Chris Hostetter wrote:
: Basically I'm thinking of writing a different kind of IndexReader which
: uses a database to return fake terms for things like tags. The idea
: would be that it can be slotted in alongside a real index via
: ParallelReader in order to provide the fast-changing part. So
30 matches
Mail list logo