english dictionary for spelling

2009-12-06 Thread m.harig
-- View this message in context: http://old.nabble.com/english-dictionary-for-spelling-tp26672045p26672045.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail:

english dictionary for spelling

2009-12-06 Thread m.harig
hello all i've a doubt in spell checker , am creating spell index from my original index , but my original index itself has some misspelled words. So i decided to use any proper English dictionary words for my spell checker , can any one tell me is there any option in lucene to do my

updating index

2009-12-04 Thread m.harig
hello all how do i update my existing index to avoid my duplicates , this is how am doing my indexing doc.add(new Field(id,+i,Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field(title, indexForm.getTitle(), Field.Store.YES,

splitting words

2009-11-30 Thread m.harig
hello all i've doubt in lucene split words search , for example if i search for dualcore it should return dual core , how do i split this word ? is there any analyzer in lucene to do it? please any one help me. -- View this message in context:

Re: did you mean issue

2009-11-24 Thread m.harig
What should i do now , could you make me clear ?? Grant Ingersoll-6 wrote: On Nov 24, 2009, at 1:16 AM, m.harig wrote: String[] suggestions = spellChecker.suggestSimilar(hoem, 3,indexReader, contents, true); this is how am retrieving my did you mean words And which distance

updating spell index

2009-11-23 Thread m.harig
hello all is there any way to update the spell index directory ? please any1 help me out of this. -- View this message in context: http://old.nabble.com/updating-spell-index-tp26490695p26490695.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: did you mean issue

2009-11-23 Thread m.harig
String[] suggestions = spellChecker.suggestSimilar(hoem, 3,indexReader, contents, true); this is how am retrieving my did you words Grant Ingersoll-6 wrote: How are you invoking the spell checker? On Nov 19, 2009, at 1:22 AM, m.harig wrote: hello all i've a doubt

did you mean issue

2009-11-18 Thread m.harig
hello all i've a doubt in spell checker , when i search for a keyword hoem am getting the spell results as in the following order (in which am retrieving 4 suggested words) form hold home them my need is to get the home word to be fetched first. But its in the third position ,

Re: remove duplicate when merging indexes

2009-11-10 Thread m.harig
) this will delete the old document and add the new one. simon On Tue, Nov 10, 2009 at 10:05 AM, m.harig m.ha...@gmail.com wrote: hello all,   This is my situation ,  i've multiple indexes , for example , index1 , index2 , index3 ... i've to update the indexes every night . If i open my IndexWriter

Re: remove duplicate when merging indexes

2009-11-10 Thread m.harig
Thanks again this is my code , doc.add(new Field(id,+i,Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field(title, indexForm.getTitle(), Field.Store.YES, Field.Index.ANALYZED)); doc.add(new Field(contents,

Re: remove duplicate when merging indexes

2009-11-10 Thread m.harig
Thanks simon ,, this is my code doc.add(new Field(id,+i,Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field(title, indexForm.getTitle(), Field.Store.YES, Field.Index.ANALYZED)); doc.add(new Field(contents,

Re: remove duplicate when merging indexes

2009-11-10 Thread m.harig
Thanks Ian , it works , thanks a lot. Ian Lea wrote: Try updateDocument(new Term(id, +i), doc). See javadocs for Term constructors. -- Ian. On Tue, Nov 10, 2009 at 9:47 AM, m.harig m.ha...@gmail.com wrote: Thanks again this is my code ,  doc.add(new Field(id,+i

Re: search problem

2009-10-29 Thread m.harig
Thanks Erick , i understand the issue , but my doubt is when you search for a keyword which is originally a single word, for example , metacity is really single keyword . when i search for meta city am not able to get the results , this is what my doubt , if you goto google and search for

singular and plural search

2009-10-21 Thread m.harig
hello all i've a doubt in plural singular word searching , i've got code snippet from nabble forum , private static Analyzer createEnglishAnalyzer() { return new Analyzer() { public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result =

Re: singular and plural search

2009-10-21 Thread m.harig
thanks erick , A little more information would help here.1 Are you using the same analyzer at both index and query time? no . sorry , am using StandardAnalyzer at the index time , during querying am using the code snippet found from nabble. 2 Assuming 1 is yes, did you re-index your data

Re: singular and plural search

2009-10-21 Thread m.harig
Thanks erick , It works fine , if i use the (code snippet found from nabble) same analyzer for both indexing querying . But the highlighter has gone for plural words. Hope i need to search more , i'll come back to you once if i can't find out. Thanks again erick. -- View this message in

RE: index reader for multiple indexes

2009-10-03 Thread m.harig
is an IndexReader on top of various Sub-IndexReaders. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: m.harig [mailto:m.ha...@gmail.com] Sent: Friday, October 02, 2009 6:52 PM To: java-user@lucene.apache.org

index reader for multiple indexes

2009-10-02 Thread m.harig
hello all , am merging more than one indexes to search a document , how do i use IndexReader here to open multiple indexes? (since IndexReader will open one directory at a time) could any1 please suggest me? -- View this message in context:

get all tokens from index

2009-09-09 Thread m.harig
hello all , is there any way to get all tokens from my index ? please anyone suggest me -- View this message in context: http://www.nabble.com/get-all-tokens-from-index-tp25359411p25359411.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: get all tokens from index

2009-09-09 Thread m.harig
Thanks Ahmet , i found the solution. thanks a lot Ahmet Arslan wrote: hello all, is there any way to get all tokens from my index ? please anyone suggest me The code below prints all terms of a field. String path = E:\\ThesaurusSolrHome\\data\\index; String field =

RE: reading index

2009-08-08 Thread m.harig
Hello Will my reader.reopen() method work on windows machine when the index get updated? i mean my tomcat server will allow the reader to update my index? please help me. -- View this message in context: http://www.nabble.com/reading-index-tp24862928p24875673.html Sent from the Lucene -

reading index

2009-08-07 Thread m.harig
hello all, thanks to lucene. Am using lucene 2.4.0 for my application. My doubt is , can i read the index for many number of times? i mean , i've a search application which reads the index , which is 300MB in size, am reading my index at every time the user hits the page . Is it

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks This is my codw snippet IndexSearcher searcher = new IndexSearcher(indexDir); Analyzer analyzer = new StopAnalyzer(); WildcardQuery query = new WildcardQuery(new Term(DEFAULT_FIELD));

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks for your reply, my original code snippet is IndexSearcher searcher = new IndexSearcher(indexDir); Analyzer analyzer = new StopAnalyzer(); BooleanClause.Occur[] flags = { BooleanClause.Occur.SHOULD,

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks , i've noticed that , but the code is for known tokens, how do i do it for dynamic tokens , meaning , i don't know the urls , someone picked up the urls and i'll index it. Is there any technique to use while indexing ? am using lucene 2.4.0 version. Please suggest me. --

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks all, but how nutch handle this problem? am aware of nutch but not in depth. If i search the keyword about us , nutch gives me exactly what i want. Is there any scoring techinques? please let me know. -- View this message in context:

Re: A Presentation on Building a Hadoop + Lucene System Architecture

2009-08-04 Thread m.harig
Hello Do you've any idea about the integration of Lucene with Hadoop BrickMcLargeHuge wrote: Hey all, I just wanted to send a link to a presentation I made on how my company is building its entire core BI infrastructure around Hadoop, HBase, Lucene, and more. It features

RE: indexing 100GB of data

2009-07-23 Thread m.harig
Thanks all , Very thankful to all , am tired of hadoop settings , is it good to use read such type large index with lucene alone? will it go for OOM ? anyone pl suggest me. -- View this message in context: http://www.nabble.com/indexing-100GB-of-data-tp24600563p24620846.html

indexing 100GB of data

2009-07-22 Thread m.harig
hello all We've got 100GB of data which has doc,txt,pdf,ppt,etc.., we've separate parser for each file format, so we're going to index those data by lucene. (since we scared of Nutch setup , thats why we didn't use it) My doubt is , will it be scalable when i index those dcouments ?

Re: indexing 100GB of data

2009-07-22 Thread m.harig
Thanks Shai So there won't be problem when searching that kind of large index . am i right? Can anyone tell me is it possible to use hadoop with lucene?? -- View this message in context: http://www.nabble.com/indexing-100GB-of-data-tp24600563p24602064.html Sent from the

Re: indexing 100GB of data

2009-07-22 Thread m.harig
Is there any article or forum for using Hadoop with lucene? Please any1 help me -- View this message in context: http://www.nabble.com/indexing-100GB-of-data-tp24600563p24605164.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

.net lucene doubt

2009-07-16 Thread m.harig
hello all , am using .Net lucene for my search application , how do i index non english pages ? Is there any analyzers to do it?? because am struggling with utf8 problem , please any1 help me -- View this message in context: http://www.nabble.com/.net-lucene-doubt-tp24510928p24510928.html

optimized searching

2009-06-30 Thread m.harig
hello all, i've gone through most of the posts from this forum , i need a code snippet for searching large index, currently am iterating , hits = searher.search(query); for (int inc = 0; inc hits.length(); inc++) { Document doc = hits.doc(inc);

Re: Read large size index

2009-06-30 Thread m.harig
Thanks Simon , Its working now , thanks a lot , i've a doubt i've got 30,000 pdf files indexed , but if i use the code which you sent , returns only 200 results , because am setting TopDocs topDocs = searcher.search(query,200); as i said if use Integer.MAX_VALUE , it

Re: Read large size index

2009-06-30 Thread m.harig
Hi there, On Tue, Jun 30, 2009 at 12:41 PM, m.harigm.ha...@gmail.com wrote: Thanks Simon ,          Its working now , thanks a lot , i've a doubt       i've got 30,000 pdf files indexed ,  but if i use the code which you sent , returns only 200 results , because am setting   TopDocs

Re: optimized searching

2009-06-30 Thread m.harig
Thanks eric in Ian's link, particularly see the section Don't iterate over morehits than necessary. A couple of other things: 1 Loading the entire document just to get a field or two isn't very efficient, think about lazy loading (See FieldSelector) i done it , but have couple of

RE: Read large size index

2009-06-30 Thread m.harig
Thanks Uwe, can you please give me a code snippet , so that i can resolve my issue , please The correct way to iterate over all results is to use a custom HitCollector (Collector in 2.9) instance. The HitCollector's method collect(docid, score) is called for every hit. No need to

Read large size index

2009-06-29 Thread m.harig
hello all Am doing a search application on lucene, its working fine when my index size is small, am getting java heap space error when am using large size index, i came to know about hadoop with lucene to solve this problem , but i don't have any idea about hadoop , i've searched thru

Re: Read large size index

2009-06-29 Thread m.harig
Simon Willnauer wrote: Hey there, before going out to use hadoop (hadoop mailing list would help you better I guess) you could provide more information about you situation. For instance: - how big is you index - version of lucene - which java vm - how much heap space - where does the

Re: Read large size index

2009-06-29 Thread m.harig
Simon Willnauer wrote: On Mon, Jun 29, 2009 at 1:48 PM, m.harigm.ha...@gmail.com wrote: Simon Willnauer wrote: Hey there, before going out to use hadoop (hadoop mailing list would help you better I guess) you could provide more information about you situation. For instance: - how

Re: Read large size index

2009-06-29 Thread m.harig
Thanks Simon I don't run any application on the tomcat , moreover i restarted it , am not doing any jobs except searching , we've a 500GB drive , we've indexed around 100,000 documents , it gives me around 1GB index . When i tried to search pdf i got the heap space error , -- View

Re: Read large size index

2009-06-29 Thread m.harig
Thanks Simon , This is how am indexing my documents , indexWriter.addDocument(doc, new StopAnalyzer()); indexWriter.setMergeFactor(10); indexWriter.setMaxBufferedDocs(100);

Re: Read large size index

2009-06-29 Thread m.harig
Thanks again, Did i index my files correctly, please need some tips, the following is the error when i run my keyword , i typed pdf , thats it , because i've got around 30,000 files named pdf, HTTP Status 500 - type Exception report message description The server encountered

Re: Read large size index

2009-06-29 Thread m.harig
Thanks Simon , Hey there, that makes things easier. :) ok here are some questions: Do you iterate over all docs calling hits.doc(i) ?If so do you have to load all fields to render your results, if not you should not retrieve all of them? Yes, am iterating over all docs by calling

Re: Read large size index

2009-06-29 Thread m.harig
Thanks SImon , Example: IndexReader open = IndexReader.open(/tmp/testindex/); IndexSearcher searcher = new IndexSearcher(open); final String fName = test; is fName a field like summary , contents?? TopDocs topDocs = searcher.search(new TermQuery(new Term(fName, lucene)),

query doc boost difference

2009-03-25 Thread m.harig
Hello all Can anyone tell me what is the difference between query.setBoost() and doc.setBoost()... More over if use query.setBoost(4.0f) am not able to boost my results . which one makes my results better please anyone help me out of this... -- View this message in context:

need scoring help

2009-03-20 Thread m.harig
Hello all i've a search application running on lucene-2.3.0 , say for example am indexing 10 urls as an input , when am searching am not able to get the expected result at the best ranking, i.e, unrelated hits are coming up rather than related hits. I've been working this for a

boosting query

2009-03-19 Thread m.harig
Hello all, i've a search application which uses lucene-2.3.0 , and my application running for a banking domain. Am indexing some banking urls as an input and am searching some keywords. What my doubt is when i search cards, the less count keyword url comes up. I mean , for

Number range search

2008-08-13 Thread m.harig
hi all. am indexing a price field by doc.add(new Field(price, 1450, Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field(price, 3800, Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field(price,