real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Michael Breu
Hello, I'm looking for an infix suggester that allows infix search for a given term. This might not be that important in English. However in German we have quite complex composite words like Donaudampfschifffahrtsgesellschaftskapitän which is composed by the nouns Donau (danube), Dampf (steam)

Re: [suggestions] fetch terms from a FilterAtomicReader(subclass)?

2014-10-27 Thread Olivier Binda
On 10/27/2014 07:32 AM, Clemens Wyss DEV wrote: Is it possible to fetch the terms of a FilterAtomicReader in order to provide suggestions from a subset of all documents in an index? Yes, it is possible. I do it by feeding a custom Dictionary with a custom InputIterator in the lookup.build() me

Re: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Michael Sokolov
Have you considered combining the AnalyzingInfixSuggester with a German decompounding filter? If you break compound words into their constituent parts during analysis, then the suggester will be able to do what you want (prefix matches on the word-parts). I found this project with a quick goo

RE: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Oliver Christ
The hard way may be to use the standard Analyzing Suggester but to add each (analyzed) suffix of the surface string (mapping to the full surface form) during automaton generation. I.e. when adding "Donau...", you add all analyzed suffixes "donau...", "onau...", "nau...", ... - all mapping to "

Lucene not showing Low Score Doc

2014-10-27 Thread Priyanka Tufchi
Hi All Actually I have set of 10 doc which i gave for comparison through apache lucene now when i check score for the set ,out of 10 i am getting 8 in my database , rest 2 are not showing . If the score is very less still lucene should show something , how can i handle it as i have to show all 10

AW: [suggestions] fetch terms from a FilterAtomicReader(subclass)?

2014-10-27 Thread Clemens Wyss DEV
Salut Olivier, would you mind providing me your Suggester-class code (or the relevant snippets) as an ideal jump-start? -Clemens -Ursprüngliche Nachricht- Von: Olivier Binda [mailto:olivier.bi...@wanadoo.fr] Gesendet: Montag, 27. Oktober 2014 11:51 An: java-user@lucene.apache.org Betref

Re: Lucene not showing Low Score Doc

2014-10-27 Thread Shai Erera
Hi Your question is a bit fuzzy -- what do you mean by not showing "low scores"? Are you sure that these 2 documents are matched by the query? Can you boil it down to a short test case that demonstrates the problem? In general though, when you search through IndexSearch.search(Query, int), you wo

Re: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Michael Breu
Hello Michael, Thank you for your kind support. I had a look into the elasticsearch-analysis-decompound and tried to integration. However it seemed to me that it is somewhat hard to integrate it into our work based on lucene-core. I have manged to set up a test environment, however I was not su

Re: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Michael Breu
Hello Oliver, I already had a look into the AnalyzingSuggester before. I was not able to spot the location where it generates the prefixes. It works with some path analysis based on automaton (both for analysis and query). It is not really clear to me how to extend this automaton. Could you give

Re: Lucene not showing Low Score Doc

2014-10-27 Thread Priyanka Tufchi
Hi Actually , It should give 10 docs match index but it is giving for 8 . I checked rest 2 are not matching doc with very less score . Is there any way I can get those two doc which have not matched. And I have set hitpage =10 . Thanks Priyanka On Mon, Oct 27, 2014 at 6:14 AM, Shai Erera wrot

Weighted tags for document instances (at index time)

2014-10-27 Thread Ralf Bierig
I want to index documents together with a list of tags (usually between 10-30) that represent meta information about this document. Normally, i would create an extra field "tag" store every tag, by its name, inside that field and create my 10-30 fields that and adding it to the document before

RE: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Oliver Christ
Hi Michael, There may be several entry points, I'm not sure which one still works - the suggester data processing chain has changed quite a bit since I looked at it about two years ago, maybe Mike or Robert can chime in if I'm totally off. One way I experimented with was to implement a custom T

Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Nischal, I had similar indexing issue. My lucene indexing took 22 mins for 70 MB docs. When i debugged the problem, i found out the indexWriter.addDocument(doc) taking a really long time. Have you already found the solution about it? Thank you, Jason -- View this message in context: http:

RE: Making lucene indexing multi threaded

2014-10-27 Thread Fuad Efendi
I believe there were many reports of many-thousands-docs per second in average. I experienced similar SOLR speeds many years ago too, with small documents (512-bytes each) You can check harddrive performance at first (use SSD, etc...); and second, check your indexing architecture: is it multithrea

RE: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Fuad, Thanks for your suggestions and quick response. I am using a single-threaded indexing way to add docs. I will try the multiple-threaded indexing to see if my issue will be resolved. This issue only exists after I upgraded lucene version from 2.4.1(with Java 1.6) to 4.8.1(with Java 1.7).

Re: Making lucene indexing multi threaded

2014-10-27 Thread G.Long
Like Nischal, did you check that you don't call the commit() method after each indexed document? :) Regards, Gary Long Le 27/10/2014 16:47, Jason Wu a écrit : Hi Fuad, Thanks for your suggestions and quick response. I am using a single-threaded indexing way to add docs. I will try the multipl

Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Gary, Thanks for your response. I only call the commit when all my docs are added. Here is the procedure of my Lucene indexing and re-indexing: 1. If index data exists inside index directory, remove all the index data. 2. Create IndexWriter with 256MB RAMBUFFERSIZE 3. Process D

Indexing Weighted Tags per Document

2014-10-27 Thread Ralf Bierig
I want to index documents together with a list of tags (usually between 10-30) that represent meta information about this document. Normally, i would create an extra field "tag" store every tag, by its name, inside that field and create my 10-30 fields that and adding it to the document before

Re: AW: [suggestions] fetch terms from a FilterAtomicReader(subclass)?

2014-10-27 Thread Olivier Binda
Here you are. This is written in kotlin but it is similar enough to Java to be usable private fun buildTermsFromIndex(indexReader:IndexReader, field: String, file: File, bits:Bits): WFSTCompletionLookup { val lookup = WFSTCompletionLookup(true) lookup.build(WeightedLuceneDict

Re: Lucene not showing Low Score Doc

2014-10-27 Thread Shai Erera
I'm sorry, I still don't feel like I have all the information in order to help with the problem that you're seeing. Can you at least paste the contents of the documents and the query? Can you search with a TotalHitCountCollector only, and print the total number of hits? Shai On Mon, Oct 27, 2014

Questions about the Lucene query language

2014-10-27 Thread Prad Nelluru
Hi everyone, I'm trying to understand how to use the Lucene query language. 1. Does Lucene support negative phrase queries like -"hello dolly" ? Or do I need to subtract from some other term like: joy -"hello dolly" ? My intention is to find all documents that do not have the words "hell

Re: Questions about the Lucene query language

2014-10-27 Thread Jack Krupansky
Pure negative queries are not supported, but all you need to do is include *:*, which translates into MatchAllDocsQuery. "hello dolly" is the same as "hello dolly"~0 -- Jack Krupansky -Original Message- From: Prad Nelluru Sent: Monday, October 27, 2014 8:57 PM To: java-user@lucene.ap