Hi Mike,
I'd simply store a field doctype with values pdf, txt, html
and perform a separate search for each type. Although, I'd be
interested if anyone has a cooler way of doing this.
Cheers,
Phil
On Thu, Oct 1, 2009 at 9:56 AM, Michael Masters mmast...@gmail.com wrote:
I was wondering if
Hi Mark,
Is there any Lucene 2.9 versions of this in development that I could
get my hands on? I'd be happy to be an alpha tester.
Cheers,
Phil
LucidGaze for Lucene works as a drop-in replacement for the Lucene JAR;
it requires no changes to the source code of the application, or even
Sorry, just realised my mistake. I should read the docs more
carefully. IndexReader.reopen() does not reopen the existing
IndexReader, but returns a new one.
Phil
On Mon, Sep 14, 2009 at 3:20 PM, Phil Whelan phil...@gmail.com wrote:
Hi,
I'm not sure why my IndexReader.reopen() call
Hi,
I'm not sure why my IndexReader.reopen() call is not working.
The latest results are not coming back, meaning the reader / searcher
has not being re-opened for the new Documents that have been added.
IndexReader openReader = searcher.getIndexReader();
searcher.close();
Hi Uwe,
Thanks for the explanation! It really helps. That makes sense that for
a small number of values, such as hour NumericField is not going to
help me. I'm experimenting with using epoch NumericField for sorting,
which funnily is where I started with 2.4.1, before going down the
usual
Hi,
I've used NumericField to store my hour field.
Example...
doc.add(new NumericField(hour).setIntValue(Integer.parseInt(12)));
Before I was using plain string Field and enumerating them with
TermEnum, which worked fine.
Now I'm using NumericField's I'm not sure how to port this
Hi Jim,
Are you using the same Analyzer for indexing and searching? .yyy
will be seem as a HOSTNAME by StandardAnalyzer and will keep it as one
term, whereas another indexer might split this into 2 terms. This
should not matter either way as long as you are using the same
Analyzer for both
select path is just the filename part,
without the extension, i.e., the part.
That's why I said in my original post that I was kind of surprised that
doing a web query for path:.yyy succeeded, i.e, in the path field in
the index, there is no .yyy, just .
Jim
Phil
On Tue, Aug 4, 2009 at 8:31 AM, Shai Ereraser...@gmail.com wrote:
Hi Darren,
The question was, how given a string aboutus in a document, you can return
that document as a result to the query about us (note the space). So we're
mostly discussing how to detect and then break the word aboutus to
(sorry, tangent. I'll be quick)
On Tue, Aug 4, 2009 at 8:42 AM, Shai Ereraser...@gmail.com wrote:
Interesting ... I don't have access to a Japanese dictionary, so I just
extract bi-grams.
Shai - if you're interested in parsing Japanese, check out Kakasi. It
can split into words and convert
Hi Jim,
On Sun, Aug 2, 2009 at 1:32 AM, oh...@cox.net wrote:
I first noticed the problem that I'm seeing while working on this latter app.
Basically, what I noticed was that while I was adding 13 documents to the
index, when I listed the path terms, there were only 12 of them.
Field text
Hi Jim,
On Sun, Aug 2, 2009 at 9:08 AM, Phil Whelanphil...@gmail.com wrote:
So then, I reviewed the index using Luke, and what I saw with that was that
there were indeed only 12 path terms (under Term Count on the left),
but, when I clicked the Show Top Terms in Luke, there were 13 terms
Hi Prashant,
I agree with Shai, that using Luke and printing out what the Document
looks like before it goes into the index, are going to be your best
bet for debugging this problem.
The problem you're having is that StandardAnalyzer does not break-up
the hostname into separate terms, as it has
On Sun, Aug 2, 2009 at 10:58 AM, Andrzej Bialeckia...@getopt.org wrote:
Thank you Phil for spotting this bug - this fix will be included in the next
release of Luke.
Glad to help. Thanks for building this great tool!
Phil
-
Hi Jim,
On Sun, Aug 2, 2009 at 12:12 PM, oh...@cox.net wrote:
i.e., I was ignoring the 1st term in the TermEnum (since the .next() bumps
the TermEnum to the 2nd term, initially).
Great! Glad you found the problem. I couldn't see it.
Phil
Hi Prashant,
Take a look at this...
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
Cheers,
Phil
On Sun, Aug 2, 2009 at 9:33 PM, prashant
ullegaddiprashullega...@gmail.com wrote:
Hi,
I've a single index of size 87GB containing around 50M documents. When I
search for any query,
Hi Mike,
It's Jibo, not me, having the problem. But thanks for the link. I was
interested to look at the code. Will be buying the book soon.
Phil
On Sat, Aug 1, 2009 at 2:08 AM, Michael McCandless
luc...@mikemccandless.com wrote:
(Please note that ThreadedIndexWriter is source code available
Hi Jim,
I cannot see anything obvious, but both open() and terms() throw
IOException's. You could try putting these in separate try..catch
blocks to see which one it's coming from. Or using e.printStackTrace()
in the catch block will give more info to help you debug what's
happening.
On Sat, Aug
include stop word removal in the
processing of your token stream.
Matt
Phil Whelan wrote:
Hi Matthew / Paul,
On Thu, Jul 30, 2009 at 4:32 PM, Paul Cowanco...@aconex.com wrote:
Matthew Hall wrote:
Place a delimiter between the email addresses that doesn't get removed
in
your
Hi Jim,
There should not be much difference from the lucene end between a new
index and index you want to update (add more documents to). As stated
in the Lucene docs IndexWriter will create the index if it does not
already exist.
Hi Jibo,
Have you tried optimizing indexes? I do not know anything about the
implementation of ThreadedIndexWriter, but if they both optimize down
to the same size, it could just mean that ThreadedIndexWriter is not
as optimized.
Thanks,
Phil
On Fri, Jul 31, 2009 at 11:38 AM, Jibo
Hi Jibo,
Your mergeFactor is different, and the resulting numFiles (segment
files) is different. Maybe each thread is responsible for a segment
file. Just curious - do you have 3 threads?
Phil
-
To unsubscribe, e-mail:
Hi,
I know you can use Field.Store.YES, but I want to inspect the terms /
tokens and their order related to the field name at search time. Is
this possible? Obviously this information is stored in the index, but
I can not find any API to access it. I'm guessing the answer might be
that Terms
Hi,
We have a very large lucene index that we're developing that has a
field of email addresses. (Actually mulitple fields with multiple
emails addresses, but I'll simplify here)
Each document will have one email field containing multiple email addresses.
I am indexing email addresses only
On Thu, Jul 30, 2009 at 11:22 AM, Matthew Hall
mh...@informatics.jax.org wrote:
1. Sure, just have an analyzer that splits on all non letter characters.
2. Phrase queries keep the order intact. (And yes, the positional
information for the terms is kept, which is what allows span queries to
Hi Matthew / Paul,
On Thu, Jul 30, 2009 at 4:32 PM, Paul Cowanco...@aconex.com wrote:
Matthew Hall wrote:
Place a delimiter between the email addresses that doesn't get removed in
your analyzer. (preferably something you know will never be searched on)
Or add them separately (rather than:
On Thu, Jul 30, 2009 at 7:12 PM, oh...@cox.net wrote:
I was wonder if there is a list of special characters for the standard
analyzer?
What I mean by special is characters that the analyzer considers break
characters.
For example, if I have something like foo=something, apparently the
Hi Don,
On Wed, Jul 29, 2009 at 1:42 PM, Donal Murtaghdomur...@yahoo.co.uk wrote:
Course.name Attendance.mandatory Student.name
-
cooking N Bob
art Y
If there are only have a few thousand documents, and the number of
results quite small is this a case where post-search filtering can be
done?
I have not done anything like this myself with Lucene, so is this a
bad idea? If not, what would be the best way to do this?
On Wed, Jul 22, 2009 at 5:46 AM, m.harigm.ha...@gmail.com wrote:
Is there any article or forum for using Hadoop with lucene? Please any1 help
me
Hi M,
Katta is a project that is combining Lucene and Hadoop. Check it out here...
http://katta.sourceforge.net/
Thanks,
Phil
Hi Ganesh,
I'm not sure whether this will work for you, but one way I got around
this was with multiple searches. I only needed the first 50 results,
but wanted to sort by date,hour,min,sec. This could result in 5
results or millions of results.
I added the date to the query, so I'd search for
On Wed, Jul 22, 2009 at 12:28 PM, Matthew Hallmh...@informatics.jax.org wrote:
Not sure if this helps you, but some of the issue you are facing seem
similar to those in the real time search threads.
Hi Matthew,
Do you have a pointer of where to go to see the real time threads?
Thanks,
Phil
32 matches
Mail list logo