Thanks a lot!
Liat
On 14 February 2012 20:06, Uwe Schindler wrote:
> Hi,
>
> To merge TopDocs with "compatible scores", you can use the new Lucene
> (since 3.3) method: TopDocs.merge(). Just execute the query on different
> indexes with same topdocs count and execute this method.
>
> Uwe
>
> ---
Hi,
I have a similar problem - only in my case, the query is the same, but I
use different indexes with different fields names (this is why I need to
separate the search). Is it possible to merge these two sets of results?
Thanks a lot,
Liat
On 14 February 2012 19:46, Uwe Schindler wrote:
> Sco
Ok, thanks a lot
On 15 June 2011 11:36, Ian Lea wrote:
> Don't think so. The boost info is encoded and stored at index time.
>
>
> --
> Ian.
>
>
> On Wed, Jun 15, 2011 at 10:42 AM, liat oren wrote:
> > Hi,
> >
> > I indexed 4 million documents a
Hi,
I indexed 4 million documents and used boosting factors for each document at
indexing time.
I would like to cancel that boosting. Is there a way to do that without
re-indexing all of them?
Many thanks,
Liat
simplest way would be a CollectorDelegate that wraps an existing
> collector and checks a boolean before calling the delegates collect
> method.
>
> simon
>
> On Mon, May 23, 2011 at 8:09 AM, liat oren wrote:
> > Thank you very much.
> >
> > So the best solution w
tml
>
> simon
> >
> > --dho
> >
> > 2011/5/22 Simon Willnauer :
> >> you can impl. you own collector and notify the collector to stop if you
> need to.
> >> simon
> >>
> >> On Sun, May 22, 2011 at 12:06
Hi Everyone,
Is there a way to stop a multi search in the middle?
Thanks a lot,
Liat
that is not the case
> then you'd need another solution.
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Mon, Feb 14, 2011 at 11:52 PM, liat oren wrote:
>
> > Hey All,
> >
> > I try to construct a boolean query that has to run on 3 dif
Hey All,
I try to construct a boolean query that has to run on 3 different set of
indexes:
in two of them, it should query a field name "contents" and in one of them,
it should query a field named "text".
How can I use MultiSearcher to support this structure?
Thanks a lot,
Liat
Hi,
Is it possible to use WhitespaceAnalyzer in one field and another analyzer
in a differnt field?
If it is, how should it be written?
Many Thanks,
Liat
Hi,
I need to give the user the total number of results when running a query.
Currently I use the TopDocCollector to get the top 200 documents.
How can I know the total number of results?
Thanks a lot,
Liat
---
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: liat oren [mailto:oren.l...@gmail.com]
> > Sent: Sunday, September 12, 2010 2:04 PM
> > To: java-user@l
Hi,
I am running a query and get some unexpected results.
When I run boolean query on a text field for the word X, using occur =
SHOULD, the results contain the word X.
However, when I add another boolean query on another field (country) for the
word Y, using occur = MUST, in the results I get o
Ok, thanks, I will try that
On 6 July 2010 11:57, Uwe Schindler wrote:
> Both must be "must", else it makes no sense.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > ---
_do_a_.22search_within_search.22.2C_so_that_the_second_search_is_constrained_by_the_results_of_the_first_query.3F
>
> --
> Ian.
>
>
> On Tue, Jul 6, 2010 at 9:25 AM, liat oren wrote:
> > Hi all,
> >
> > Is it possible to run a search over top 100,000 (for example)
Hi all,
Is it possible to run a search over top 100,000 (for example) results of a
prior search.
So if the user first does the search, gets results, if pressing on the
search button again, I would like it to run on the top 100,000 results.
Thanks,
Liat
de to a current version of Lucene and
> reindex everything with the date as a NumericField.
>
>
> --
> Ian.
>
>
> On Tue, Jun 29, 2010 at 8:32 AM, liat oren wrote:
> > Hi all,
> >
> > I made a mistake, finished indexing all my database (millions o
Hi all,
I made a mistake, finished indexing all my database (millions of
documents..), regarding field dates as usual fields.
Instead of doing:
doc.add(Field.Keyword("indexDate", new Date());
I added it as:
doc.add(new Field("indexDate", String.valueOf(new Date().toString()),
Field.Store.YES, Fiel
Thanks a lot!
Have a nice day,
Liat
On 25 May 2010 15:39, Ian Lea wrote:
> org.apache.lucene.misc.ChainedFilter in contrib.
>
>
> --
> Ian.
>
>
>
> On Tue, May 25, 2010 at 1:15 PM, liat oren wrote:
> > Hi,
> >
> > I would like to have mor
Hi,
I would like to have more than 1 filter on a query - I have two range
filters, and some filters on other fields.
What is the best way to do it?
Many thanks,
liat
n), and then try to recover the index by re-running
> CheckIndex with -fix. But, more importantly, you need to figure out
> why your filesystem has developed corruption because it will likely
> happen again!
>
> Mike
>
> On Thu, Apr 15, 2010 at 3:31 AM, liat oren wrote:
> >
Hi All,
I got the following error while trying to optimize index sized 31 GB:
Exception in thread "Lucene Merge Thread #3"
P.Lucene.Expert.index.MergePolicy$MergeException: java.io.IOException: Data
error (cyclic redundancy check)
at
P.Lucene.Expert.index.ConcurrentMergeScheduler.handleMerg
Hi Joel,
I encounter the same problem.
Could you please elaborate a bit on this?
Many thanks,
Liat
2009/11/2 Joel Halbert
> I opted to use the following query to solve this problem, since it meets
> my requirements, for the time being.
>
> +(cheese sandwich) "cheese sandwich"~slop
>
> This inc
2009/7/13 Erick Erickson
> What are you trying to do? I think you'd get a better response ifyou
> explained what higher-level task/feature you're trying to
> implement.
>
> Best
> Erick
>
> On Mon, Jul 13, 2009 at 4:54 AM, liat oren wrote:
>
> > Hi
Hi all,
I have a list of synonyms for every word.
Is there a good way to use these synonyms?
Currently I use a boost query so if 'a' is the queried word, and 'b' (0.5)
and 'c' (0.2) are its synonyms, I query for:
a^1 + b^0.5 + c^0.2.
Is there a better way of doing it?
Thanks,
Liat
Ok, thanks a lot - I iwll try that tomorrow
Best,
Liat
2009/6/30 Simon Willnauer
> Hi,
> On Sun, Jun 28, 2009 at 2:39 PM, liat oren wrote:
> > Hi,
> >
> > I have an index that is a multi-segment index (how come it is created
> this
> > way?)
> >
> >
Ohh, right. It resolves the problem I mentioned in the second email I sent.
However, in the first mail I sent, the current of the multi-segment reader
is null, which brings that problem.
Thanks
Liat
2009/6/30 Simon Willnauer
> On Mon, Jun 29, 2009 at 9:55 AM, liat oren wrote:
> >
lucene-2.4.1
Thanks,
Liat
2009/6/29 Simon Willnauer
> Quick question, which version of lucene do you use?!
>
> simon
>
> On Mon, Jun 29, 2009 at 9:55 AM, liat oren wrote:
> > The full error is:
> > Exception in thread "main" java.la
ates to this one??
Though it is closed since 2007.
Hope anyone can help with it - even if I try
double totalFreqT = ir.termDocs().freq(); - to get the freq using termDocs
of a multi-segment, I get the same error..
Thanks alot,
Liat
2009/6/28 liat oren
> Hi,
>
> I have an index that is a
Hi,
I have an index that is a multi-segment index (how come it is created this
way?)
When I try to get the freq of a term at the following way:
TermDocs tDocs = this.indexReader.termDocs(term);
tf = tDocs.freq();
the greq method :
public int freq()
{
return current.freq();
}
is in
fernt weight for different terms.
2009/5/26 Grant Ingersoll
> What's a BoostingBooleanQuery?
>
>
> On May 24, 2009, at 7:09 AM, liat oren wrote:
>
> Hi,
>> I have an index of 3 million documents.
>> I perform a regular search, using an analyzer and get the r
Hi,
I have an index of 3 million documents.
I perform a regular search, using an analyzer and get the results within 1-2
minutes.
When I create a boostingBooleanQuery, and search within the index using a
similiarity that the scorePayload return the boosting value, the search
takes about 10 minutes.
rrelevant document could be first. You want to use
> a HitCollector, see the link in my last e-mail. That link includes an
> example of using a bitset, which you can create pretty easily from your
> list of document IDs.
>
> Best
> Erick
>
> On Mon, May 18, 2009 at 2:55 AM
sets bits for docs in your list. See:
>
> http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/HitCollector.html#collect(int,%20float)
>
> Best
> Erick
>
>
> On Sun, May 17, 2009 at 3:57 AM, liat oren wrote:
>
> > Yes, this is what I need - I don't n
the scores reflect all the documents. But you don't
> care because scores are not relevant between different queries, and
> if they are calculated only within the query you're running, all the
> documents returned have scores that rank them relative to each other.
>
> Bes
uments, couldn't you form a Filter? Then
> your results would only be the documents you care about.
>
> If this is irrelevant, perhaps you could explain a bit more about
> the problem you're trying to solve.
>
> Best
> Erick
>
> On Thu, May 14, 2009 at 5:03 AM, l
Hi,
I have a big index and I want to get for a specific search only the grades
of a list of documents.
Is there a better way to get this score than looping on all the reasults
set?
Thanks,
Liat
() after
> each addition? This way if a crash happens, the index will just
> fallback to the last commit.
>
> Mike
>
> On Wed, May 13, 2009 at 3:08 AM, liat oren wrote:
> > Hi all,
> >
> > I ran a code that joins a list of indexes to one index.
> > The commi
No, As I wrote above
For finlin, 6621468 * 6, 5265266 * 12 (I use payload for this)
and TTD - 6621468 * 3 (I use payload for this)
I search for 6621468 * 3 and it and finlin gets a higher score
2009/5/13 Grant Ingersoll
>
> On May 13, 2009, at 3:04 AM, liat oren wrote:
>
>
Hi all,
I ran a code that joins a list of indexes to one index.
The commit and close of the writer is done when we finish looping on the
list of the original paths.
An error occured before it arrived to the commit part.
The index size is big, so it looks like the data is there, but when I check
th
documents
(words) that contain 6621468?
I don't think this is the case as I checked and the index doesn;t have 35433
documents that contain 6621468 or 5265266
2009/5/11 Grant Ingersoll
>
> On May 10, 2009, at 5:59 AM, liat oren wrote:
>
>>
>> The output is the fo
and in TTD, its 20?
I didn't set any fieldNorm or DocumentNorm.
Thanks a lot,
Liat
2009/5/7 Grant Ingersoll
> Hi Liat,
>
> Can you post the code you are using to generate the info below?
>
> -Grant
>
>
> On May 3, 2009, at 11:43 PM, liat oren wrote:
>
> I look
) fieldWeight(worlds:66 in 1), product of:
2.1213202 = (MATCH) btq, product of:
0.70710677 = tf(phraseFreq=0.5)
3.0 = scorePayload(...)
0.7768564 = idf(worlds: 66=4)
1.0 = fieldNorm(field=worlds, doc=1)
Thanks again,
Liat
2009/5/3 liat oren
> Hi,
>
> I try to debug
Hi,
I try to debug boosting query.
Is there a way to see the term boost in the documents? I see them in spans
in BoostingTermQuery, yet, from there I can't see which document I am in.
If I want to copy some of the document in an index that saves the boosting -
how can it be done?
The problem I am
Please see in a new thread - Boosting query - debuging
Thanks!
2009/5/2 Chris Hostetter
>
> : Sorry, you can see the script below:
>
> uh ... ok. so now you've posted a bunch of your code, but you still
> haven't addresed the root of what Erick and I were both getting at...
>
> : > Erick means
rg/lucene-java/ImproveIndexingSpeed.
>
>
> --
> Ian.
>
>
> On Thu, Apr 30, 2009 at 8:29 AM, liat oren wrote:
> > Hi,
> >
> > I noticed that when I start to index, it indexes 7 documents a second.
> After
> > 30 minutes it goes down to 3 documents a second.
> >
Hi,
I noticed that when I start to index, it indexes 7 documents a second. After
30 minutes it goes down to 3 documents a second.
After two hours it becomes very slow (I stopped it when it arrived to 320MB
and did 1 document in almost a minute)
As you can see, it happens only after 2000, 3000 doc
t; Murat Yakici
> Department of Computer & Information Sciences
> University of Strathclyde
> Glasgow, UK
> ---
> The University of Strathclyde is a charitable body, registered in Scotland,
> with registration number SC015263.
>
>
&
Yes, I agree with you - I also tried this approach in the past and it was
terribely slow - looping on the term vectors.
What I have done - is dividing indexes into steps - which of course, if can
be avoided, it will be more than great!!
As for my problem - it was a code problems, I sloved it, th
Yes, for this specific part, I have this prior knowledge which is based on a
training set.
About the things you raise here, there are two things you might mean, I am
not sure:
1. If you don't have that "prior" knowledge, then all it means you need to
modify the formula of the score, no? to give mo
Thanks, Murat.
It was very useful - I also tried to override IndexWriter and
DocumentsWriter instead, but it didn't work well. DocumentsWriter can't be
overriden.
So, I didn't find a better way to make the changes.
My needs are having for every term in different documents different values.
So, l
Dear Murat,
I saw your question and wondered how did you implement these changes?
The requirement below are the same ones as I am trying to code now.
Did you modify the source code itself or only used Lucene's jar and just
override code?
I would very much apprecicate if you could give me a short
ry.
> Regarding the synonyms - it looks quite OK to me. Maybe you should try to
> use ony Occur.MUST for all TermQuery instances. A simple debugging should
> also give you some clue about what is the problem.
> Good luck, Eran.
>
> On Wed, Apr 22, 2009 at 1:52 PM, liat o
t is the problem?
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
Process exited.
Thanks,
Liat
2009/4/21 Eran Sevi
> Hi,
>
> You might want to take a look at Payloads. If you know the freq
each synonym,
> but set the boost of the TermQuery according to the synonnym score.
> This is also where you could "punish" synnonyms comparing to the
> original word. This will only help with queries with contruction API
> that takes (sub) queries as input (so it will not help
g (?) values for term
> frequencies.
> If you can explain what you are trying to solve, people on the list may
> be able to suggest such alternatives.
> - Doron
>
> On Sun, Apr 19, 2009 at 2:39 PM, liat oren wrote:
>
> > Hi,
> >
> > I would like to be able to set th
Sorry, you can see the script below:
Thanks
// Index Method
**/
public void index(DoubleMap doubleMap, String dirPath, String originalPath)
throws IOException
{
File f = new File(dirPath);
IndexWriter writer = null;
if(f.exists(
Hi,
I saw a very old thread that suggests an implementation for Synonyms that
takes into account differnt weight to differnt synonyms and gives a penalty
factor to synonyms, to avoid getting documents with the synonyms prior to
documents with the original words.
http://mail-archives.apache.org/mod
Hi,
I would like to be able to set the term freq to differnt values at index
time, or at search time.
So if a document has the following text: 1 2, the freq of 1 will get 100 and
the freq of 2 will get 200. I want to avoid expanding it by writing 1 100
times.
I looked at Similarity class and wan
>
> Best
> Erick
>
> On Fri, Apr 17, 2009 at 1:07 AM, liat oren wrote:
>
> > Thanks for the answer.
> >
> > In Luke, I used the WhiteSpaceAnalyzer as well. The scores AND the
> explain
> > method worked perfectly.
> >
> > In my appli
also has an
> explain (tab?) that will show you what Luke does, which may
> be useful.
>
> The default operator should be "OR", but looking at the actual
> query should help you figure out whether that's happening.
>
> Best
> Erick
>
>
> On Thu, Apr 16, 2
I wanted to add also that I index it tokenized and that when I use Luke to
do this search, it gives the correct results.
Should I run the query differntly than the way I do?
2009/4/16 liat oren
> Hi,
>
> I try to understand why the following query gives the scoring below:
>
> d
Hi,
I try to understand why the following query gives the scoring below:
document 1 : a b c document 2 : g k a h u c
0.0 = (NON-MATCH) product of:
0.0 = (NON-MATCH) sum of:
0.0 = coord(0/3)
0.06155877
The query code is:
IndexSearcher searcher = new IndexSearcher(path);
Analyzer analy
of the synonyms? I guess I would start smaller, say
> maybe 3, and then evaluate your results with different numbers.
>
>
> On Mar 22, 2009, at 2:40 PM, liat oren wrote:
>
> Ok, thanks. I will look how to use it.
>>
>> 10 words look too many? How many would you suggest?
&g
Ok, thanks. I will look how to use it.
10 words look too many? How many would you suggest?
Thanks again,
Liat
2009/3/19 Grant Ingersoll
>
> On Mar 19, 2009, at 5:13 AM, liat oren wrote:
>
> I am looking for a quick solution to expand queries so they will look for
>> synon
Ingersoll
>
> On Mar 17, 2009, at 5:44 AM, liat oren wrote:
>
> Thanks for all the answers.
>>
>> I am new to Lucene and in the emails its the first time I heard of the
>> bigrams and thus read about them a bit.
>>
>> Question - if I query for "cat animal&qu
r maybe your classpathentry is some Windows thing -
> in which case I can't help.
>
>
> --
> Ian.
>
>
> On Tue, Mar 17, 2009 at 10:55 AM, liat oren wrote:
> > Hi Ian,
> >
> > Thanks for the answer.
> > Yes, I meant running in from command l
the command line e.g. $ java
> -jar lukexxx.jar, it simply sounds like your classes are not on the
> classpath. Add them.
>
>
> --
> Ian.
>
>
> On Tue, Mar 17, 2009 at 10:20 AM, liat oren wrote:
> > Hi,
> >
> > I edited Luke's code so it also uses
Hi,
I edited Luke's code so it also uses my classes (I added the jar to the
class-path and put it in the lib folder).
When I run from java it works good.
Now I try to build it and invoke Luke's jar outside java and get the
following error:
Exception in thread "main" java.lang.NoClassDefFoundError
ed--which
> >> reduces the potential # of bi-grams in your data set by a factor of
> >> 1/2.
> >>
> >> -Babak
> >>
> >> Tangent: Liat's example brings up an interesting issue about n-grams,
> >> namely that indexing only internally s
>
> You can always bump the allowed, see BooleanQuery.setMaxClauseCount()
>
> Best
> Erick
>
> On Mon, Mar 16, 2009 at 6:52 AM, liat oren wrote:
>
> > Hi,
> >
> > I try to search a long query and get the following erroe:
> > org.apache.lucene.queryP
Hi,
Is there any idea of how to make it work?
Many thanks,
Liat
2009/3/9 liat oren
> I have an index that has for every two words a score.
> I would like my analyzer - that is a combination of whitespace tokenizer, a
> stop words analyzer and stemming.
>
> The regular score
Hi,
I try to search a long query and get the following erroe:
org.apache.lucene.queryParser.ParseException: Cannot parse too many
boolean clauses
Is there any way for Lucene to receive a long query?
Thanks,
Liat
in the problem you are trying to solve at a higher level (instead
> of the current solution)? I imagine it is something related to
> co-occurrence analysis.
>
>
>
> On Mar 8, 2009, at 8:05 AM, liat oren wrote:
>
> Hi Grant,
>>
>> No, you can only have two words -
Thanks!
2009/3/9 Andrzej Bialecki
> liat oren wrote:
>
>> Yes, I changed it to TOKENIZED and its working now, Thanks!
>>
>> About Luke, what do you mean by saying that the analyzer is in the
>> classpath?
>> It exists in a package in my computer - it also
Yes, I changed it to TOKENIZED and its working now, Thanks!
About Luke, what do you mean by saying that the analyzer is in the
classpath?
It exists in a package in my computer - it also has its filter and other
classes. How can it be used in Luke?
2009/3/8 Andrzej Bialecki
> liat oren wr
0) and animal 5
times(0.5*10).
But I hope to have another better solution.
Thanks
2009/3/8 Grant Ingersoll
> Hi Liat,
>
> Some questions inline below.
>
> On Mar 8, 2009, at 5:49 AM, liat oren wrote:
>
> Hi,
>>
>> I have scores between words, for example - d
Hi,
I have scores between words, for example - dog and animal have a score of
0.5 (and not 0), dog and cat have a score of 0.2, etc.
These scores are stored in an index:
Doc1: field words: dog animal
field score: 0.5
Doc2: field words: dog cat
field score: 0.2
If the user searc
*strongly* advise that you get a copy of Luke. It is a wonderful
> tool
> that allows you to examine your index, analyze queries, test queries, etc.
>
> But be aware that the site that maintains Luke was having problems
> yesterday,
> look over the user list messages from yeste
Hi,
I would like to do a search that will return documents that contain a given
word.
For example, I created the following index:
IndexWriter writer = new IndexWriter("C:/TryIndex", new StandardAnalyzer());
Document doc = new Document();
doc.add(new Field(WordIndex.FIELD_WORLDS, "111 222 333", F
d on that score, might do the trick for you.
>
> Other boosting and custom score options include BoostingQuery,
> BoostingTermQuery and CustomScoreQuery.
>
>
> A google search for "lucene boosting" throws up lots of hits.
>
>
> --
> Ian.
>
>
>
> O
Hi,
I would like to add to lucene's score another factor - a score between
words.
I have an index that holds couple of words with their score.
How can I take it into account when using Lucene search?
Many thanks,
Liat
inal indexes.
>
> Maybe a higher-level problem statement would help generate
> more suggestions.
>
> Best
> Erick
>
> On Thu, Feb 26, 2009 at 7:07 AM, liat oren wrote:
>
> > Hi,
> >
> > I have two indexes, each has a tokenized field and I would lik
Hi,
I have two indexes, each has a tokenized field and I would like to combine
them both into one field in a new index.
How can it be done?
(Is it a good approach or is it better to hold them as untokenized text and
only when I create the new index, then to tokenize it?)
Many thanks,
Liat
Hi,
I have two indexes, each has a tokenized field and I would like to combine
them both into one field in a new index.
How can it be done?
(Is it a good approach or is it better to hold them as untokenized text and
only when I create the new index, then to tokenize it?)
Many thanks,
Liat
85 matches
Mail list logo