>
> In my particular case I add album catalogsno to my index as a keyword
> field , but of course if the cat log number contains a space as they often
> do (i.e. cad 6) there is a mismatch. Ive now changed my indexing to index
> the value as 'cad6' removing spaces. Now if the query sent to the quer
On 01/02/2012 22:03, Robert Muir wrote:
On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor wrote:
So it seems like it just broke the text up at spaces, and does text analysis
within getFieldQuery(), but how can it make the assumption that text should
only be broken at whitespace ?
you are right, see
: So it seems like it just broke the text up at spaces, and does text analysis
: within getFieldQuery(), but how can it make the assumption that text should
: only be broken at whitespace ?
whitespace is a significant metacharacter to the Queryparser - it is used
to distinguish multiple clauses
On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor wrote:
>
> So it seems like it just broke the text up at spaces, and does text analysis
> within getFieldQuery(), but how can it make the assumption that text should
> only be broken at whitespace ?
you are right, see this bug report:
https://issues.apa
> int gap = (pp[pp.length-1] - pp[0]) - (pp.length - 1);
>
> Don't want to cause an IndexOutOfBoundsException
Right... that's what I meant with "(boundary cases)"...
>Doron wrote:
> > int gap = (pp[pp.length] - pp[0]) - (pp.length - 1);
int gap = (pp[pp.length-1] - pp[0]) - (pp.length - 1);
Don't want to cause an IndexOutOfBoundsException
-Paul
-
To unsubscribe, e-mail: java-user-unsub
So I subclass Query Parser and give it query
dug up
then debugging shows it calls getFieldQuery(String field, String
queryText, boolean quoted) twice
once with
queryText=dug
and one with
queryText=up
but then when I run it with query dúg up the first call is
queryText=dúg
even though the
Thanks for the discussion, I really appreciate you pointing out that the
> Code here ignores PhraseQuery (PQ) 's positions:
And by "here" you mean my original code not your suggestion.
> To accommodate for this, the overall extra gap can be added to the slope:
> int gap = (pp[pp.length] -
Hi Prasad,
I was looking through documentation few days ago and found helpful
information in Lucene FAQs.
Here are the links
http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_PDF_documents.
3F
http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_file_formats_l
ike_OpenDocument
Hi,
Please find our requirement and we trying to accomplish this.
Our client is looking for a Extended search engine like searching the given
text inside the documents like (PDF, Msg, Excel, XML, Word, TXT etc) and
return the list of file names where it find the text. Using the return list we
Hi
We have added all the files including PDF/Word/Excel/Txt files but it is
searching and finding which are there text files. How to Strip text from either
of the Documents (PDF/HTML/XML/MSword/PPT/XLS)
Thanks,
Prasad K.V.S.H. * Project Manager *
PACIFIC COAST STEEL (Pinnacle) Project
Ness T
Assume we have a Lucene index over which several types of analyses are
performed.
Assume that the conclusions of some analysis require that new tokens be added
to existing documents in the index.
For example, a repeating pattern p (sequence of words) that appears in a large
part of the documen
What did you try and what exceptions did you get? You might review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Wed, Feb 1, 2012 at 8:54 AM, Prasad KVSH wrote:
> It will be great if you provide some working examples on this. We tried
> to deploy solr.war but getting exceptions.
It will be great if you provide some working examples on this. We tried
to deploy solr.war but getting exceptions.
Thanks
Prasad
-Original Message-
From: Ian Lea [mailto:ian@gmail.com]
Sent: Wednesday, February 01, 2012 7:22 PM
To: java-user@lucene.apache.org
Subject: Re: lucene-3.0.
Hi Karthik,
I appreciate your quick response.
I guess the next question is how to do strip the text from
PDF/HTML/XML/MSword/PPT/XLS and where it will store for indexing.
What are the other scenarios (like adding files, deleting files) where
we need to execute indexfiles.classs.
Thanks
Prasa
You could also take a look at Solr. From
http://lucene.apache.org/solr/features.html
* Easy ways to pull in data from databases and XML files from local
disk and HTTP sources
* Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika
Sounds just what you need.
--
Ian.
O
Hi
>>lucene-3.0.3 can be used for searching a text from
Lucene 's primary job is to do a text search.
May it be PDF/HTML/XML/MSword/PPT/XLS
U have to have the code for plugin to do 2 things
1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS)
2) Index this processed text us
Hi,
lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc,
xls, msg, TXT files. For this we have any common function to accomplish
this. Please help me on this.
Thanks
Prasad
On 28/01/2012 11:22, Uwe Schindler wrote:
-Original Message-
From: Paul Taylor [mailto:paul_t...@fastmail.fm]
Sent: Saturday, January 28, 2012 10:33 AM
To: 'java-user@lucene.apache.org'
Subject: Does Fuzzy Search scores the same as Exact Match
All things being equal does a fuzzy match gi
Thanks for the quick response.
Will try to do it this way:
Query q = null;
MultiFieldQueryParser par = new
MultiFieldQueryParser(Version.LUCENE_29, searchFields, analyzer, boosts);
par.setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRIT
Hi one addition:
In the coming Lucene 3.6 there are more safety checks in MMapDirectory so the
SIGSEGV is more unlikely (it tracks cloned index input in a thread safe list on
close). But this only *helps* to find the issue, but does not guarantee that
your JVM crashes, sorry.
As Robert and Mike
On Tue, Jan 31, 2012 at 9:42 PM, Trejkaz wrote:
> So when we close() our own TextIndex wrapper class, it would call
> decRef() - but if another thread is still using the index, this call
> to decRef() wouldn't actually close the reader. IMO, this wouldn't
> really satisfy the meaning of "close" f
Right, you have to ensure (by using the "right" IndexDeletionPolicy)
that no commit is ever removed until all readers open against that
commit have been closed.
"Normally" the filesystem ensures this for us (protects still-open
files from being deleted), but NFS (unfortunately!) lacks such
semanti
Hi,
all MultiTermQueries are constant score by default since Lucene 2.9, you can
change that back to scoring mode:
WildcardQuery.setRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
This slows down the query immense or throws TooManyClauses exceptions if too
many terms match the wildcar
Hi,
I have an issue with Lucene 2.9.4 and sorting of wildcard queries.
If I set a boost to some documents during indexing like this:
doc.setBoost(1000.00);
and execute a query like this:
PRODUCT_GROUP:2020*
I don't get results with a high boost value returned before the documents with
no b
The javadocs for ParallelReader say that all indexes must have same
number of docs and all be created and modified the same way. Doesn't
sound like your shards.
I think you need to create a MultiReader on top of the readers for
your individual shards and pass that to the IndexSearcher constructor
I'm not clear exactly what you are asking but I think you will have to
build your TermQuery instances one at a time and that sounds fine, if
it does what you want and is sufficiently fast.
--
Ian.
On Tue, Jan 31, 2012 at 1:34 PM, Pedro Lacerda wrote:
> For the first strategy i'm using MoreLike
Thanks Ian.
>>The deprecation warning in the javadocs says "Please pass an ExecutorService
>>to IndexSearcher, instead" so I'd do that.
I may need to use IndexSearcher(Reader, ExecutorService). I have sharded my
index. Say if i have 10 indexes then i will have 10 IndexSearchers. How to use
this
> I am upgrading from 3.0.3 to 3.5.0.
>
> 1) NumberTools is deprecated. I am converting long to string and storing it
> in Index. Now this is deprecated. If i replace this API with NumericUtils /
> NumericField, will it work for existing index? Whether i need to rebuild the
> index?
You will ne
I suggest you look at Solr instead of lucene. http://lucene.apache.org/solr/
--
Ian.
On Wed, Feb 1, 2012 at 7:40 AM, Dheeraj Kv wrote:
> Hi
> I learnt about Lucene from google and i thought of implementing it my
> company.
> I don't want to use Lucene as a web search application. I hav
And have you used Luke to see exactly what is being indexed, as Erick suggested?
See
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F.
for other things to check.
--
Ian.
On Wed, Feb 1, 2012 at 6:43 AM, Gal Mainzer wrote:
> I tried to use escapin
Hi
I learnt about Lucene from google and i thought of implementing it my
company.
I don't want to use Lucene as a web search application. I have a large backup
storage and which consists of html file, doc files and pdf files.
I need to search inside a file as well as search for file names
Hello all,
I am upgrading from 3.0.3 to 3.5.0.
1) NumberTools is deprecated. I am converting long to string and storing it in
Index. Now this is deprecated. If i replace this API with NumericUtils /
NumericField, will it work for existing index? Whether i need to rebuild the
index?
2) I am u
I would recommend to use TermsFilter (http://goo.gl/BC9eQ, possibly wrapped
by a ConstantScoreQuery). You must do the query building by hand, yuery
*parser* cannot do that:
TermsFilter tf = new TermsFilter(); // it is in lucene-queries.jar
tf.addTerm(new Term("id", val1));
tf.addTerm(new Term("id"
34 matches
Mail list logo