> I'm indexing logs from a transaction-based application.
> ...
> millions documents per month, the size of the indices is ~35 gigs per
month
> (that's the lower bound). I have no choice but to 'store' each field
values
> (as well as indexing/tokenizing them) because I'll need to retrieve them
in
Hi Martin,
- Original Message
From: Martin Braun <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, October 23, 2006 4:29:03 AM
Subject: experiences with lingpipe
hi all,
does anybody have practical experiences with Ling Pipes Spellchecker
(http://www.alias-i.com/lingpipe
You can also use Term Vectors, at the cost of extra storage. Search
this list for Term Vectors for info on how to implement.
On Oct 23, 2006, at 5:50 AM, beatriz ramos wrote:
Hello,
I´m working with Lucene. I need to get the number of occurrences of
the term
in the document. I had seen the
Martin Braun wrote:
hi all,
does anybody have practical experiences with Ling Pipes Spellchecker
(http://www.alias-i.com/lingpipe/demos/tutorial/querySpellChecker/read-me.html)?
I wrote the demo and I am the company 'system tuner' so I can perhaps
help out here.
With lucenes spellcheck
That may be a good idea. Is it possible to do this efficiently, like inside
of the collect() call of a hitCollector? Right now, that's how my reporting
tool works:
Searcher searcher = new MultiSearcher(directories[] ...);
HitCollector myHC = new MyHitCollector(searcher, ...);
Searcher.search(myQ
> thanks what i was looking for was the fact if i can donot need to boost
> docs then what will be the difference a) in query results and b) time
> for indexing and c) time to run query and collect result ?
There is also some precision loss with index time boosting. Also see the
"Score Boosting"
For your case a,b,c, there won't be much differences.
Boost at indexing time can be more flexible. You can use one field's
value to boost the document's ranking. For example, you could boost
your products' ranking by their prices, or rating scores.
--
Chris Lu
-
Instant F
thanks what i was looking for was the fact if i can donot need to boost
docs then what will be the difference a) in query results and b) time
for indexing and c) time to run query and collect result ?
Daniel Naber wrote:
On Monday 23 October 2006 19:39, Rupinder Singh Mazara wrote:
wher
On Monday 23 October 2006 19:39, Rupinder Singh Mazara wrote:
> where can i get info on how boosting terms at index time compares to
> boosting terms at query time ?
At index time you can boost fields and/or documents. Only at query time you
can boost terms.
Regards
Daniel
--
http://www.da
Yeah, but I haven't used the termfreq thingy enough to think of it
automatically ...
Besides, I'm learning that if I put a fooliwh answer out there, someone'll
correct me.
Thanks
Erick
On 10/23/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
On Monday 23 October 2006 21:16, Erick Erickson wrote:
>
On Monday 23 October 2006 21:16, Erick Erickson wrote:
> Use TermDocs.seek(Term) to get to the term. That'll position your TermDocs
> variable at a list, ordered by document ID of the ocurrences of a term. Then
> TermDocs.skipTo(doc ID) will get you to the list of terms for that document
> (you hav
Use TermDocs.seek(Term) to get to the term. That'll position your TermDocs
variable at a list, ordered by document ID of the ocurrences of a term. Then
TermDocs.skipTo(doc ID) will get you to the list of terms for that document
(you have to know what Lucene DocId you care about here.).
Now TermDo
I thought I'd update folks on the continuing saga. Many thanks to all who've
contributed to my education.
Here's our current resolution;
It turns out that the PM will cope with restricting wildcards two ways.
1> there must be at least 3 non-wildcard characters
2> wildcards cannot appear in the fi
hi all
where can i get info on how boosting terms at index time compares to
boosting terms at query time ?
case 1 : if i have a index with all terms with the default boost value
and i apply a boost value terms at query time
versus
case 2: i boost individual terms at index time with a boost
hi all,
does anybody have practical experiences with Ling Pipes Spellchecker
(http://www.alias-i.com/lingpipe/demos/tutorial/querySpellChecker/read-me.html)?
With lucenes spellcheck contribution I am not really satisfied because
the Index has some (many?) mispelled words, so the did you mean clas
I have a requirement to highlight phrases. I came across a reference to
this alternate highlighter implementation. But I am unable to see the
source files for the same. Can someone please point me to it?
Thanks,
Harini
mark harwood wrote:
See here for a thread reviewing the challenges and po
No, I wasn't using NFS. It was difficult to make a diagnostic, since we had
no access to the file system of the production machine. Since it occured on
production only (live on a busy web site), we decided to circumvent the
problem by making an alternative implementation that would not use mixed
r
Hello,
I´m working with Lucene. I need to get the number of occurrences of the term
in the document. I had seen the documentations ant I don´t find anything.
Do you have any idea?
Thanks.
I am e-mailed almost daily about tackling Lucene consulting gigs, and
I simply do not make the time to even give the time of day what with
kids, day job, daydreaming about eventually getting LIA2 done, and
did I mention kids?. Kids rock! I typically refer folks to Otis,
and he likely say
I'd like to suggest a minor change in the QueryParser.jj. I thought
I'd describe it here and get some feedback before posting a patch.
The issue is that I can't get my hands on some clauses (typically
PhraseQuery instances) in my subclass of MultiFieldQueryParser, which
I'd like to do to implemen
Kalpesh,
Are you using sorting? If you are, then the patch attached to LUCENE-651 may
help. It fixes a race condition that exists in the initialization of the
FieldCache (which is used to accelerate sorting).
Cheers,
Ollie
> -Original Message-
> From: kalpesh patel [mailto:[EMAIL PROTEC
21 matches
Mail list logo