Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread maxmil
Hi, This is the first time i am using Lucene. I need to index pdf's with very few fields, title, date and body (long field) for a web based search. The results i need to display have to show not only the documents found but for each document a snapshot of the text where the search term has

Re: Lucene SpellChecker returns no suggetions after changing Server

2008-12-12 Thread Matthias W.
Yes, I'm passing the same index for Spellchecker and IndexReader. I'm going to test if this is a reason for my problem. But I still don't understand why the same code is working on the testserver. I think this could be because of the rights from tomcat. Is there any tutorial about the tomcat

Re: Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread Ian Lea
Hi Lucene can store the original text of the document. You make the lucene fields to do what you need. Have a look at the apidocs for Field.Store and you'll see that you've got three choices: Yes, No or Compress. For your display snapshots, have a look at the lucene highlighter package. And

Re: Taxonomy in Lucene

2008-12-12 Thread Michael McCandless
John can you describe some of these changes? They sound cool! Mike John Wang wrote: We are doing lotsa internal changes for performance. Also upgrading the api to support for features. So my suggestion is to wait for 2.0. (should release this this month, at the latest mid jan) We can take

Re: Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread maxmil
Thanks very much. Looks like Field.Store.COMPRESS is what i want. I'll also have a look at the search highlight stuff and getting Lucene in Action. Ian Lea wrote: Hi Lucene can store the original text of the document. You make the lucene fields to do what you need. Have a look at

Re: Spell check of a large text

2008-12-12 Thread Lucene User no 1981
Grant, It's definitely dictionary based spell checker. A bit fleshing out, currently the document gets indexed and then it's analysed (bad words, repetitions etc), spell check - no corrections - would be yet another step in the process. It's all read-only stuff, the document content is not

Re: Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread Paul Libbrecht
I also encountered these options of the Field constructor but I never found a way to be sure that the field is really not loaded in RAM and only return with Field.reader(). There seems to be no contract in the javadoc. Moreover the reader access methods went away between 1.9 and 2.2 if I

Re: Taxonomy in Lucene

2008-12-12 Thread Karsten F.
Hi John, I will take a look in the bobo-browse source code at week end. Do you now the xtf implementation of faceted browsing: starting point is org.cdlib.xtf.textEngine.facet.GroupCounts#addDoc ? (It works with millions of facet values on millions of hits) What is the starting point in

Re: How to search for -2 in field?

2008-12-12 Thread Darren Govoni
Tried them all, with quotes, without. Doesn't work. At least in Luke it doesn't. On Fri, 2008-12-12 at 07:03 +0530, prabin meitei wrote: whitespace analyzer will tokenize on white space irrespective of quotes. Use standard analyzer or keyword analyzer. Prabin meitei toostep.com On Thu, Dec

Re: How to search for -2 in field?

2008-12-12 Thread prabin meitei
one more thing, few times I have encountered that I get different results in Luke then in my actual code. Try in your code directly using standard analyzer and quoted query string. print your query to check if the query formed is correct (query is formed with quoted string). Can you tell what

Re: How to search for -2 in field?

2008-12-12 Thread Matthew Hall
Are you absolutely, 100% sure that the -2 token has actually made it into your index? As a VERY basic way to check this try something like this: import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.TermEnum; public class IndexTerms { public static void

Re: How to search for -2 in field?

2008-12-12 Thread Greg Shackles
I admit I only read through this thread quickly so maybe I missed something, but it sounds like you're trying different Analyzers for searching, when what you really need is to use the right analyzer during indexing. Generally you want to use the same analyzer for both indexing and searching so

How to add an Arabic and Farsi language analyzer to Lucene

2008-12-12 Thread Ian Vink
Anyone heard of one for Lucene.NET ? Ian

.NET list?

2008-12-12 Thread Ian Vink
I am using java-user@lucene.apache.org for help, but sometimes I'd like Lucene.net specific help. Is there a mailing list for Lucene.NET on apache? Ian

Re: .NET list?

2008-12-12 Thread Erik Hatcher
On Dec 12, 2008, at 9:43 AM, Ian Vink wrote: I am using java-user@lucene.apache.org for help, but sometimes I'd like Lucene.net specific help. Is there a mailing list for Lucene.NET on apache? Yes, see the mail list section here: http://incubator.apache.org/lucene.net/ Erik

RE: Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread Sudarsan, Sithu D.
You can use PDFBOX. http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.h tml Sincerely, Sithu D Sudarsan sithu.sudar...@fda.hhs.gov sdsudar...@ualr.edu -Original Message- From: maxmil [mailto:m...@alwayssunny.com] Sent: Friday, December 12, 2008 3:34 AM To:

Re: Taxonomy in Lucene

2008-12-12 Thread John Wang
wiki:http://bobo-browse.wiki.sourceforge.net/ this describes the upcoming 2.0 release, which is in the ill-named branch: BR_DEV_1_5_0 We are still doing some development work on that, feel free to check out the branch and we will be doing a release shortly. some features we aimed for 2.0 and

Re: Taxonomy in Lucene

2008-12-12 Thread John Wang
HI Karsten: I will check out xtf library. there is no connection between solr and browseengien other than Lucene and java. Thanks -John On Fri, Dec 12, 2008 at 3:52 AM, Karsten F. karsten-luc...@fiz-technik.dewrote: Hi John, I will take a look in the bobo-browse source code at

Re: Spell check of a large text

2008-12-12 Thread Grant Ingersoll
On Dec 12, 2008, at 5:36 AM, Lucene User no 1981 wrote: Grant, It's definitely dictionary based spell checker. A bit fleshing out, currently the document gets indexed and then it's analysed (bad words, repetitions etc), spell check - no corrections - would be yet another step in the process.

Re: How to add an Arabic and Farsi language analyzer to Lucene

2008-12-12 Thread Grant Ingersoll
I just added an Arabic Analyzer to contrib/analysis. No clue as to when that will percolate to .NET version. I believe you can search the archives for help w/ Persian, as I recall someone offering something in the past. On Dec 12, 2008, at 9:40 AM, Ian Vink wrote: Anyone heard of one

Re: How to search for -2 in field?

2008-12-12 Thread Darren Govoni
Hi Matt, Thanks for the thought. Yeah, I see it there in Luke, but the other gentleman's idea that maybe Luke is producing different than code might be a clue. It would be odd, if true, but nothing else works so I will see if that is it. Darren On Fri, 2008-12-12 at 08:03 -0500, Matthew Hall

Lucene - Authentication

2008-12-12 Thread Aaron Schon
Hi , if I have a Lucene index (or Solr) that is installed in client premises. how would you go about securing the index from being queries in unauthorized fashion. For example, from malicious users or hackers, or for that matter internal users trying to reengineer the system and use it for

Re: Lucene - Authentication

2008-12-12 Thread Chris Hostetter
: X-Mailer: YahooMailRC/1155.45 YahooMailWebService/0.7.260.1 : References: 1229011161.7448.10.ca...@nuraku : 32a1c320812110848u302dd645h4143205068fe3...@mail.gmail.com : 1229015253.7448.12.ca...@nuraku : 295da8fe0812110932x3b31380dla64b09f1b09be...@mail.gmail.com :