Hi,
This is the first time i am using Lucene.
I need to index pdf's with very few fields, title, date and body (long
field) for a web based search.
The results i need to display have to show not only the documents found but
for each document a snapshot of the text where the search term has
Yes, I'm passing the same index for Spellchecker and IndexReader.
I'm going to test if this is a reason for my problem.
But I still don't understand why the same code is working on the testserver.
I think this could be because of the rights from tomcat.
Is there any tutorial about the tomcat
Hi
Lucene can store the original text of the document. You make the
lucene fields to do what you need. Have a look at the apidocs for
Field.Store and you'll see that you've got three choices: Yes, No or
Compress.
For your display snapshots, have a look at the lucene highlighter package.
And
John can you describe some of these changes? They sound cool!
Mike
John Wang wrote:
We are doing lotsa internal changes for performance. Also upgrading
the api
to support for features. So my suggestion is to wait for 2.0. (should
release this this month, at the latest mid jan) We can take
Thanks very much. Looks like Field.Store.COMPRESS is what i want.
I'll also have a look at the search highlight stuff and getting Lucene in
Action.
Ian Lea wrote:
Hi
Lucene can store the original text of the document. You make the
lucene fields to do what you need. Have a look at
Grant,
It's definitely dictionary based spell checker. A bit fleshing out,
currently the document gets indexed and then it's analysed (bad words,
repetitions etc), spell check - no corrections - would be yet another
step in the process. It's all read-only stuff, the document content is
not
I also encountered these options of the Field constructor but I never
found a way to be sure that the field is really not loaded in RAM and
only return with Field.reader(). There seems to be no contract in the
javadoc.
Moreover the reader access methods went away between 1.9 and 2.2 if I
Hi John,
I will take a look in the bobo-browse source code at week end.
Do you now the xtf implementation of faceted browsing:
starting point is
org.cdlib.xtf.textEngine.facet.GroupCounts#addDoc
?
(It works with millions of facet values on millions of hits)
What is the starting point in
Tried them all, with quotes, without. Doesn't work. At least in Luke it
doesn't.
On Fri, 2008-12-12 at 07:03 +0530, prabin meitei wrote:
whitespace analyzer will tokenize on white space irrespective of quotes. Use
standard analyzer or keyword analyzer.
Prabin meitei
toostep.com
On Thu, Dec
one more thing, few times I have encountered that I get different results in
Luke then in my actual code. Try in your code directly using standard
analyzer and quoted query string. print your query to check if the query
formed is correct (query is formed with quoted string).
Can you tell what
Are you absolutely, 100% sure that the -2 token has actually made it
into your index?
As a VERY basic way to check this try something like this:
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.TermEnum;
public class IndexTerms {
public static void
I admit I only read through this thread quickly so maybe I missed something,
but it sounds like you're trying different Analyzers for searching, when
what you really need is to use the right analyzer during indexing.
Generally you want to use the same analyzer for both indexing and searching
so
Anyone heard of one for Lucene.NET ?
Ian
I am using java-user@lucene.apache.org for help, but sometimes I'd like
Lucene.net specific help. Is there a mailing list for Lucene.NET on apache?
Ian
On Dec 12, 2008, at 9:43 AM, Ian Vink wrote:
I am using java-user@lucene.apache.org for help, but sometimes I'd
like
Lucene.net specific help. Is there a mailing list for Lucene.NET on
apache?
Yes, see the mail list section here: http://incubator.apache.org/lucene.net/
Erik
You can use PDFBOX.
http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.h
tml
Sincerely,
Sithu D Sudarsan
sithu.sudar...@fda.hhs.gov
sdsudar...@ualr.edu
-Original Message-
From: maxmil [mailto:m...@alwayssunny.com]
Sent: Friday, December 12, 2008 3:34 AM
To:
wiki:http://bobo-browse.wiki.sourceforge.net/
this describes the upcoming 2.0 release, which is in the ill-named
branch: BR_DEV_1_5_0
We are still doing some development work on that, feel free to check out the
branch and we will be doing a release shortly.
some features we aimed for 2.0 and
HI Karsten:
I will check out xtf library.
there is no connection between solr and browseengien other than Lucene
and java.
Thanks
-John
On Fri, Dec 12, 2008 at 3:52 AM, Karsten F.
karsten-luc...@fiz-technik.dewrote:
Hi John,
I will take a look in the bobo-browse source code at
On Dec 12, 2008, at 5:36 AM, Lucene User no 1981 wrote:
Grant,
It's definitely dictionary based spell checker. A bit fleshing out,
currently the document gets indexed and then it's analysed (bad words,
repetitions etc), spell check - no corrections - would be yet another
step in the process.
I just added an Arabic Analyzer to contrib/analysis. No clue as to
when that will percolate to .NET version. I believe you can search
the archives for help w/ Persian, as I recall someone offering
something in the past.
On Dec 12, 2008, at 9:40 AM, Ian Vink wrote:
Anyone heard of one
Hi Matt,
Thanks for the thought. Yeah, I see it there in Luke, but the other
gentleman's idea that maybe Luke is producing different than code might
be a clue. It would be odd, if true, but nothing else works so I will
see if that is it.
Darren
On Fri, 2008-12-12 at 08:03 -0500, Matthew Hall
Hi , if I have a Lucene index (or Solr) that is installed in client premises.
how would you go about securing the index from being queries in unauthorized
fashion. For example, from malicious users or hackers, or for that matter
internal users trying to reengineer the system and use it for
: X-Mailer: YahooMailRC/1155.45 YahooMailWebService/0.7.260.1
: References: 1229011161.7448.10.ca...@nuraku
: 32a1c320812110848u302dd645h4143205068fe3...@mail.gmail.com
: 1229015253.7448.12.ca...@nuraku
: 295da8fe0812110932x3b31380dla64b09f1b09be...@mail.gmail.com
:
23 matches
Mail list logo