eaningful
token.
Input text:
"The President of the United States lives in the White House"
Tokens:
"The"
"President of the United States"
"lives"
"in"
"the"
"White House"
Term: "President"
Result:
"President of a Com
A, ok. Interesting problem there as well.
I'll think on that one some too!
cheers.
> Hi Darren,
>
> The question was, how given a string "aboutus" in a document, you can
> return
> that document as a result to the query "about us" (note the space
Hi,
Pardon the noob question. But which approach is going to be faster
over extremely large document sets. A or B?
A) Multiple field values, Stored.NO,TOKENIZED.
word: one
word: two
word: three
B) Single field value, Stored.NO,TOKENIZED
word: one two three
Thanks for the tip.
Darren
How does the synonym filter work internally? I configured it with a very
large synonym file (90,000 lines) running Solr in glassfish and it started
fine, but when I queried, it hung and ran out of memory. The file wasn' big
enough to exhaust the heapI never was able to get it to run smoothly.
Couldn't one write a custom filter that modified the inbound term
semantics before doing the search? Then, wildcard behavior can be added to
terms without doing query string splicing.
> You might take a look at Ngrams. These can be used to find partial
> matches without resorting to wildcards, alt
I just parse the text into sentences and put those in a multi-valued field
and then search that.
On Wed, 20 Jul 2011 11:27:38 -0400, Peter Keegan
wrote:
> I have browsed many suggestions on how to implement 'search within a
> sentence', but all seem to have drawbacks. For example, from
>
http:/
Hi,
We switched from MMAP to NIOFS due to high memory usage.
Now seeing java.nio.channels.ClosedChannelException and
java.nio.channels.ClosedByInterruptException during search.
Stack traces:
Exception details: IQQG0020E java.io.IOException: null: NIOFSIndexInput
(path="/opt/css-store/Collections
Hi,
I'm trying to perform a query and ened to specify a string pattern occurring
at the end of a line.
Is this possible? Thanks.
Darren
You can also leverage the 'fields' capability in lucene and perhaps match them
against columns to do field-based searching.
-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Sent: Wed 5/11/2005 12:50 PM
To: java-user@lucene.apache.org
Subject: Re: indexing relational ta
query it like
before. What's the proper order to do this?
Also, if anyone has any empirical data on the performance or reliability
of InstantiatedIndex, I'd be curious.
Thanks for the tips!
Darren
-
To unsubscri
er)
ireader = iindex.indexReaderFactory()
isearcher = IndexSearcher(ireader)
Kind of round about way to get an InstantiatedIndex I guess,but maybe
there's a briefer way?
Thank you.
Darren
On Sun, 2008-11-16 at 10:50 -0500, Mark Miller wrote:
> Check out the docs at:
> http://lucene.apache.
Yeah. That makes sense. Its not too hard to wrap those extra steps so I
can end up with something simpler too. Like:
iindex = InstantiatedIndex("path/to/my/index")
I'm lazy so the intermediate hoops to jump through clutter my code.
Hehe.
:)
Darren
On Sun, 2008-11-16 at 11
t its graph and getting the expected
speed?
thanks to anyone who can verify this.
On Sun, 2008-11-16 at 12:37 -0500, Darren Govoni wrote:
> Yeah. That makes sense. Its not too hard to wrap those extra steps so I
> can end up with something simpler too. Like:
>
> iindex = Instanti
mance characteristics with a high number of fields
and is anyone using indexes this way?
thank you for any thoughts.
Darren
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
uot;\-2 Word" and it still
doesn't work. I've used all the analyzers.
What's the trick here?
Thanks,
Darren
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
er
> you're looking for.
>
> Cheers
> Rob
>
> On Thu, Dec 11, 2008 at 3:59 PM, Darren Govoni <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> > This might be a dumb question, but I have a simple field like this
> >
> > field: 0 -2 Word
> >
> &
ionFile" = NO (thought
this one would work).
Same results for the other analyzers more or less.
Weird.
Darren
On Thu, 2008-12-11 at 23:02 +0530, prabin meitei wrote:
> Hi, While constructing the query give the query string in quotes.
> eg: query = queryparser.parse("\"-2 wo
t; toostep.com
>
> On Thu, Dec 11, 2008 at 11:28 PM, Darren Govoni wrote:
>
> > I'm using Luke to find the right combination of quotes,\'s and
> > analyzers.
> >
> > No combination can produce a positive result for "-2 String" for the
>
Hi Matt,
Thanks for the thought. Yeah, I see it there in Luke, but the other
gentleman's idea that maybe Luke is producing different than code might
be a clue. It would be odd, if true, but nothing else works so I will
see if that is it.
Darren
On Fri, 2008-12-12 at 08:03 -0500, Matthew
Indexing with lucene/nutch on top of/instead of DB indexing for:
1) relativity scoring
2) alias searching (i.e. a large amount of aliases, like first names)
3) highlighting
4) cross-datasource searching (multi DB, DB + XML files, etc).
As for best approach to externally index, I do not have any d
Hi,
I want to do a query such as
word: first*
where I want 'first' to be the start of the string value contained in the word
field and not somewhere inside it.
What's the best way to do this?
thanks for any tips,
Darren
One interpretation of the query with ~5 is that your text has 5 words
and ~5 would imply a word in any position can match. Could it be this?
- Original Message -
From: "Ivan Vasilev" <[EMAIL PROTECTED]>
To: "LUCENE MAIL LIST"
Sent: Thursday, April 03, 2008 6:03 AM
Subject: PhraseQuery
I guess I meant searching the index, size of index etc.
So they would search essentially the same?
Sorry that wasn't clear from my original email.
Darren
- Original Message -
From: "Erick Erickson" <[EMAIL PROTECTED]>
To:
Sent: Tuesday, April 15, 2008 1:15
Hi,
Is there a lucene index reader that will load a disk-based index into
memory and perform searches on it from RAM? Sorry if I missed this in
the docs somewhere.
Darren
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
Hi,
Is it possible to read a disk-based index into RAM (entirely) and
have all searches operate on it there? I saw some RAMDirectory examples,
but it didn't look like it will transfer a disk index into RAM.
thanks
D
-
To unsu
one" was present 3 times. This way I
can manipulate the presence of tokens in a document without having to
waste space for them?
Thank you for any thought on this.
Darren
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
rmB^2.0
I want ALL termA results (ordered by score) to come before ANY termB
results (also ordered by score). Is there a way to do this in the query
syntax? Or is this simple multiple queries?
thank you,
Darren
-
To unsubscribe,
new RAMDirectory instance from a different
> Directoryimplementation. This can be used to load a disk-based index
> into memory.
>
> Seems like exactly what you're asking for...
>
> Best
> Erick
>
> On Thu, Jun 26, 2008 at 3:40 PM, Darren Govoni <[EMAIL PROTE
7;s a very very long time all things
considered. I understand about the OS paging and such but in
doing some variations of this to "throw the OS off", I still saw
no difference between on-disk and RAM times. But despite that, the
times are really slow.
Any ideas?
thanks again,
Darren
On
too long for a simple query as this. Do those
figures sound right for Lucene doing this kind of single field match?
Darren
On Wed, 2008-08-13 at 10:24 -0400, Erick Erickson wrote:
> How are you measuring? There is a bunch of setup work for the first
> few queries that go through the system.
Hi,
I combed through the API and some of the mailing list. I need
to get the id of a Document just added. How should this be done?
I'm using Lucene 2.3.2.
thank you,
Darren
-
To unsubscribe, e-mail: [EMAIL PROTECTED
Yeah, you are right. Was looking for a lazy way to avoid writing 5 lines
of code. Hehe.
Thanks,
Darren
On Sat, 2008-08-16 at 10:44 -0400, Mark Miller wrote:
> Darren Govoni wrote:
> > Hi,
> > I combed through the API and some of the mailing list. I need
> > to get the
Hi,
Sorry if I missed this somewhere or maybe its not released yet, but I
was anxiously curious about lucene 3.0's expected features/improvements.
Is there a list yet?
thanks!
Darren
-
To unsubscribe, e-mail: [
:59 PM, Karl Wettin <[EMAIL PROTECTED]> wrote:
>
> >
> > 27 aug 2008 kl. 00.52 skrev Darren Govoni:
> >
> > Hi,
> >> Sorry if I missed this somewhere or maybe its not released yet, but I
> >> was anxiously curious about lucene 3.0's expected fe
threads?
thanks for any tips! You guys rock.
Darren
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Glen,
Thank you for the details there. Its really great what you've done
and I will study it some more! I too though about using multiple writers
into separate indexes and then combining them into one and optimizing,
but haven't tried it yet.
Darren
On Fri, 2008-10-10 at 22:17 -
Congratulations!
A truly stellar achievement.
Can't wait to dive in!
On Sat, 2008-10-11 at 11:50 -0400, Michael McCandless wrote:
> Release 2.4.0 of Lucene is now available!
>
> With 2.4.0 we have relaxed the backwards compatibility policy of the
> Fieldable interface: we now allow changes on
?
thank you for any help. I will keep reading/looking.
Darren
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
in the results they got back. Sort of
like latent relationships.
Does that help?
I thought this could be done using term frequency vectors in Lucene, but
I've never used TFV's before. And can then be limited to just a set of
results.
HTH,
Darren
On Thu, 2008-10-16 at 14:09 -0400, G
does, but even with it, the
clusters are more discrete than a tag cloud which has "shades of gray".
Darren
On Thu, 2008-10-16 at 17:39 -0400, Glen Newton wrote:
> See also:
> http://zzzoot.blogspot.com/2007/10/drill-clouds-for-search-refinement-id.html
> and
> http://zzzoo
rious if
Lucene made this easier with information built into the Document objects
(which would be logical to me).
Darren
On Thu, 2008-10-16 at 17:37 -0400, Glen Newton wrote:
> Yes, tag clouds.
>
> I've implemented them using Lucene here for NRC Research Press articles:
> http:
Has anyone gotten some initial performance observations about
instantiated index?
I replaced my RAMDirectory searcher with one and it was slower or about
the same. The note about it claims 100x possible performance
improvement. Maybe there is a data size beyond which its performance
excels.
thank
My two cents is no, not to use lucene as a primary datastore. Although
there are some datastores that look similar to lucene who define
themselves as primary datastores (the 'nosql' style datastores), I would
put lucene besides the likes of RRD and other specifically purposed
information stores th
If you are going to end up either copying or moving all the data to lucene
(which, when you hook up lucene even to the existing mysql data, it will still
create it's own copy of the data), you might really want to look at other
options:
*column oriented databases (analytical databases). If ope
Hey all,
As you can tell by the subject, interested in 'name searching' and
'nearby name' searching. Scenarios include Geneology and
Similar-Person-from-Different-Datasources matchings. Assuming
java-based lucene, and more than likely the Solr project.
*nickname: would it be feasible to create
Thank you for the link to the previous thread, lot of information there!
*Synonym use of nicknames - that sounds quite feasible. Do you
specifically mean the WordNet module in the Sandbox, or something
different?
> -Original Message-
> From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
>
One side-note is various content management tools already handle a lot
of data extraction (POI/PDFBox/etc).
In the case of Jakarta Slide and Apache Jackrabbit, both use Lucene
under the covers to index this data.
Not sure if you want to take the approach of putting your documents as
'managed' und
best approach to accomplish this. I am also currently with Lucene 3.6 but am
looking to upgrade to 4.2.
Thanks in advance.
Darren Hoffman
the highlighted version.
>
> Best
> Erick
>
> On Sat, Apr 6, 2013 at 11:57 PM, Darren Hoffman wrote:
>> I am creating a Bible search app that indexes each verse of the bible as a
>> separate document. When a user selects a verse from search results, I am
Thanks, Erick. I'll try that.
Darren
On 2013-04-07 3:25 PM, "Erick Erickson" wrote:
>Well, at that point you have a doc ID presumably. When you format your
>responses to the initial query, the link you provide for each verse is
>something like
>
>yourse
using IntelliJ to build the APK file using the discrete lucence
library jars.
Thanks,
Darren
On 8/12/13 1:02 AM, "Lingviston" wrote:
>Hi, I'm trying to use Lucene in my Android project. To start with I've
>created a small demo app. It works with .txt files but I need to w
trying to upgrade to 4.4 but IntelliJ
does not currently support SPI services.
Does 4.4 offer substantial performance improvements that I should take the
time to upgrade and work around the IntelliJ shortfall?
Thanks,
Darren
27;d say so. The CHANGES.txt is where
>I'd
>look to see if anything mentioned is worth your time.
>
>Not to mention SolrCloud...
>
>Erick
>
>
>On Fri, Sep 6, 2013 at 3:41 PM, Darren Hoffman wrote:
>
>> I am using the SmartChineseAnalyzer in v3.6 but accessing o
in
>memory used for identical data, so I'd say so. The CHANGES.txt is where
>I'd
>look to see if anything mentioned is worth your time.
>
>Not to mention SolrCloud...
>
>Erick
>
>
>On Fri, Sep 6, 2013 at 3:41 PM, Darren Hoffman wrote:
>
>> I am using t
that
does not return results in "natural order" has much larger documents even
thought the number of documents is about the same magnitude.
I am currently using version 3.6.
Thanks in advance,
Darren
I asked here[1] and it said "Ask again later."
[1] http://8ball.tridelphia.net/
On 12/06/2011 08:46 PM, Jamie Johnson wrote:
Thanks Robert. Is there a timetable for that? I'm trying to gauge
whether it is appropriate to push for my organization to move to the
current lucene 4.0 implementation
structor that accepts a codec. The exception is being thrown when I try
to instantiate IndexWriterConfig.
Thank you,
Darren Hoffman
57 matches
Mail list logo