On Nov 11, 2008, at 8:32 AM, Stefan Trcek wrote:
On Tuesday 11 November 2008 02:18:39 Erik Hatcher wrote:
The integration won't be too painful... the main thing is that Solr
requires* some configuration files, literally on the filesystem, in
order to fire up and be happy. And you'll need to
On Monday 10 November 2008 14:58:15 Mark Miller wrote:
> > But: it's slow to load a field for the first time. LUCENE-1231
> > (column-stride fields) aims to greatly speed up the load time.
>
> Test it out though. In some recent testing I was doing it was *way*
> faster than I thought it would be b
Probably a question for Mike M.
Is it possible/sensible to use IndexDeletionPolicy to remove the *newest*
commit points (as opposed to the usual scenario of deleting old commit points).
I experimented with this:
class RollbackDeletionPolicy implements IndexDeletionPolicy
{
pub
Op Tuesday 11 November 2008 11:29:27 schreef Michael McCandless:
>
> The other part of your proposal was to somehow "number" term text
> such that term range comparisons can be implemented fast int
> comparison.
...
>
>http://fontoura.org/papers/paramsearch.pdf
>
> However that'd be quite a bit
On Tuesday 11 November 2008 02:18:39 Erik Hatcher wrote:
>
> The integration won't be too painful... the main thing is that Solr
> requires* some configuration files, literally on the filesystem, in
> order to fire up and be happy. And you'll need to craft Solr's
> schema.xml to jive with how you
Also, one nice optimization we could do with the "term number column-
stride array" is do bit packing (borrowing from the PFOR code)
dynamically.
Ie since we know there are X unique terms in this segment, when
populating the array that maps docID to term number we could use
exactly the r
It seems like for many of your examples (age, zip code, country),
simply computing & storing the mapping yourself (your first option
below) would actually be viable?
Also: I think in fact you never need to merge the term numbering for
many segments during searching? Ie, the search runs one Inde
Op Tuesday 11 November 2008 21:55:45 schreef Michael McCandless:
> Also, one nice optimization we could do with the "term number column-
> stride array" is do bit packing (borrowing from the PFOR code)
> dynamically.
>
> Ie since we know there are X unique terms in this segment, when
> populating t
Paul Elschot wrote:
Op Tuesday 11 November 2008 21:55:45 schreef Michael McCandless:
Also, one nice optimization we could do with the "term number column-
stride array" is do bit packing (borrowing from the PFOR code)
dynamically.
Ie since we know there are X unique terms in this segment, whe
The other part of your proposal was to somehow "number" term text such
that term range comparisons can be implemented fast int comparison.
I like the idea of building dynamic filters on top of a
"column-stride" array of field values. You could extend it to be a
real Scorer, too. EG I could imag
Hi,
I'm pretty new to Lucene, so please bear with me if this has been
covered before.
The wiki suggests sharing a single IndexSearcher between threads for
best performance
(http://wiki.apache.org/lucene-java/ImproveSearchingSpeed). I've
tested running the same set of queries with: multiple threa
Nice! An 8 core machine with a test ready to go!
How about trying the read only mode that was added to 2.4 on your
IndexReader?
And if you you are on unix and could try trunk and use the new
NIOFSDirectory implementation...that would be awesome.
Those two additions are our current hope for
And if you you are on unix and could try trunk and use the new
NIOFSDirectory implementation...that would be awesome.
Woah...that made 2.4 too. A 2.4 release will allow both optimizations.
Many thanks!
-
To unsubscribe, e-m
Nice results, thanks!
The poor disk-based scaling may be fixed by NIOFSDirectory, if you are
on Unix. If you are on Windows it won't help (and will likely be
worse than FSDirectory), because of an apparently bug in Sun's JVM on
Windows whereby NIO positional file reads seem to share a loc
Just to followup... I opened these three issues:
https://issues.apache.org/jira/browse/LUCENE-1441 (fixed in 2.9)
https://issues.apache.org/jira/browse/LUCENE-1442 (fixed in 2.9)
https://issues.apache.org/jira/browse/LUCENE-1448 (still iterating)
Mike
Christian Reuschling wrote:
Hi Guy
I re-ran the no-readonly ram tests:
thread shared
1 64043 53610
2 26999 25260
3 27173 17265
4 22205 13222
5 20795 11098
6 17593 9852
7 17163 8987
8 17275 9052
9 19392 10266
10 27809 10397
11 25987 10724
32 cores, actually :)
I reran the test with readonly turned on (I changed how the time is
measured a little, it should be more consistent):
fs-thread ram-thread fs-shared ram-shared
1 71877 54739 73986 61595
2 34949 26735 43719 28935
3 25581
Dmitri Bichko wrote:
32 cores, actually :)
Glossed over that - even better! Killer machine to be able to test this on.
I reran the test with readonly turned on (I changed how the time is
measured a little, it should be more consistent):
fs-thread ram-thread fs-shared
Mark Miller wrote:
Thats a good point, and points out a bug in solr trunk for me. Frankly
I don't see how its done. There is no code I can see/find to use it
rather than FSDirectory. Still assuming there must be a way, but I
don't see it...
Ah - brain freeze. What else is new :) You have to s
I think you should use NumberTools to format timestamp first, otherwise sort
will not work correctly
On Mon, Nov 10, 2008 at 8:00 PM, Cool The Breezer
<[EMAIL PROTECTED]>wrote:
> Could able to do that using range query
>
> String end = "25337325126";//i.e. 11/30/, assume that this is max
How can i use multiple Boolean operators in a search query.?
For example , from the search text field , i usually get the queries which
looks like
Any (word or phrase) and ( a list of URI's)
example:: rice land http\://www.wtr.org/wordlist#c_2379
http\://www.wtr.org/wordlist#c_65748 http\://w
Hello,
I wanted to know if there are classes in Lucene that support parsing MSWord
documents.
Many thanks,
Dipesh
"Help Ever Hurt Never"- Baba
--- On Tue, 11/11/08, dipesh wrote:
> I wanted to know if there are classes in Lucene that support
> parsing MSWord documents.
Searching the web might help:
http://www.google.com/search?q=lucene+%2Bword
The Apache Tika project (http://incubator.apache.org/tika/) might also be of
interest.
Dav
Hi,
You can use Boolean query for the same.
Boolean query is meant for having a series of queries with boolean operators
defined.
For eg.
lets say you have 3 diff queries A, B, C and you want a final query which
behaves as A && (B || C)
BooleanQuery query = new BooleanQuery();
BooleanQuery te
Dipesh,
Start here.
http://poi.apache.org/
John G.
-Original Message-
From: dipesh [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 11, 2008 8:38 PM
To: java-user@lucene.apache.org
Subject: Parsing MSWord
Hello,
I wanted to know if there are classes in Lucene that support parsing MSW
Thank you,
It was really helpful. I also found some similar work being done in the
Nutch project.
Regards,
Dipesh
On Wed, Nov 12, 2008 at 12:52 PM, Dave Newton <[EMAIL PROTECTED]> wrote:
> --- On Tue, 11/11/08, dipesh wrote:
> > I wanted to know if there are classes in Lucene that support
> > pa
Hi Prabin,
Thanks for suggestion . it worked for me.. Thanks
I'm not aware of Boolean Query , since I'm new to lucene technology
i modified the code like this..
BooleanQuery textQuery = new BooleanQuery();
BooleanQuery uriQuery = new
BooleanQuery();
Yes, I think it is. I think the only catch will be those log timestamps, how
fine you really need them to be, and if you want them very fine what happens
when you do range queries on timestamps. If you have a pile of log files lying
around, it should be pretty easy to get them indexed. You do
28 matches
Mail list logo