date:20110328

Japanese/Chinese language support

2011-03-28 Thread Vinaya Kumar Thimmappa

Hello All, I am looking for Japanese/Chinese stemmer . Does this exists ? do we require it ? (Analyser are already present in lucene) I did a goggle and did not find any conclusive answer. Thanks in advance vinaya - To unsub

a faster way to addDocument and get the ID just added?

2011-03-28 Thread Trejkaz

Hi all. I'm trying to parallelise writing documents into an index. Let's set aside the fact that 3.1 is much better at this than 3.0.x... but I'm using 3.0.3. One of the things I need to know is the doc ID of each document added so that we can add them into auxiliary database tables which are ke

Re: comparing lucene scores across queries

2011-03-28 Thread Chris Hostetter

: I see, well if you say the norm isn't a problem for my case, I will just : disable the coord factor by initializing BooleanQuery(true); and I should be : done. querynorm hsouldn't be a problem (since your booleanqueries all have hte same structure, and odn't use query boosts ... i assume) but

Re: comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco

I see, well if you say the norm isn't a problem for my case, I will just disable the coord factor by initializing BooleanQuery(true); and I should be done. If this is not correct, please anybody let me know. On 28 March 2011 11:44, Uwe Schindler wrote: > Hi, > > As you seem to want to do very s

RE: comparing lucene scores across queries

2011-03-28 Thread Uwe Schindler

Hi, As you seem to want to do very specific things, it might still be interesting to provide a modified Similarity (by subclassing DefaultSimilaity). You could then e.g. return also 1.0 to disable the queryNorm() which may also be a problem (but it isn't for your queries). Theoretically, you can c

Re: comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco

ok thanks, I will pass well I dunno how to verify it. Even if I try then I get some scores, but I dunno if comparing them is reliable. On 28 March 2011 11:36, Uwe Schindler wrote: > Hi, > > You don't need to extend BooleanQuery, you can just pass "true" in its > ctor, > see: http://s.apache.org

RE: comparing lucene scores across queries

2011-03-28 Thread Uwe Schindler

Hi, You don't need to extend BooleanQuery, you can just pass "true" in its ctor, see: http://s.apache.org/QvK Of course you can also subclass DefaultSimilarity and return 1 as coord, but that is more work than passing true to a ctor. For your type of queries, disabling coord should be enough, bu

Re: comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco

One more thing, instead of extending the BooleanQuery class to remove the coord factor, can I also extend the Similarity class to do it ? Still the other question is open: just to be sure, if I disable the coord factor I can finally compare my BooleanQuery results ? thanks > > > > On 28 March 20

Re: comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco

Cool, so just to be sure, if I disable the coord factor I can finally compare my BooleanQuery results ? On 28 March 2011 10:11, Uwe Schindler wrote: > Hi Patrick, > > You can disable the coord factor in the constructor of BooleanQuery. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63,

RE: comparing lucene scores across queries

2011-03-28 Thread Uwe Schindler

Hi Patrick, You can disable the coord factor in the constructor of BooleanQuery. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > Sent: Monday,

Re: comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco

Hi, thanks for reply. Yeah, I've read the Similarity class documentation several times, but I need some tip. My queries are BooleanQueries but they always have the same structure (the same structure of the docs, they are actually docs from collection): 3 fields. What if I simplify the similarity

RE: comparing lucene scores across queries

2011-03-28 Thread Uwe Schindler

No, scores are in general not comparable between different queries. The problem lies in many things: - Each query has a norm factor that makes it more compareable if they are sub clauses of a BooleanQuery. But you are right, this norm factor should be the same. - Some queries like FuzzyQuery rely

comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco

Hi, sorry I've already asked few days ago, but I got no reply and I really need some help on this.. I'm running several queries against a doc collection. The queries are documents of the collection itself, I need to measure how similar is each document to the rest of the collection. Now, Lucene

Re: file formats: MacRoman and UTF-8...

2011-03-28 Thread Patrick Diviacco

thanks, solved On 28 March 2011 09:30, Uwe Schindler wrote: > Hi, > > Replace the "stupid": > writer = new BufferedWriter(new FileWriter(fileOutput)); > > by: > writer = new BufferedWriter(new OutputStreamWriter(new > FileOutputStream(fileOutput), "UTF-8")); > > Unfortunately, you cannot give a

RE: file formats: MacRoman and UTF-8...

2011-03-28 Thread Uwe Schindler

Hi, Replace the "stupid": writer = new BufferedWriter(new FileWriter(fileOutput)); by: writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileOutput), "UTF-8")); Unfortunately, you cannot give a charset to FileWriter itself. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213

Re: file formats: MacRoman and UTF-8...

2011-03-28 Thread Patrick Diviacco

hi, I'm using my own code: Writer writer = null; try { //File fileOutput = new File("output.trectext"); File fileOutput = new File(args[1]); writer = new BufferedWriter(new FileWriter(fileOutput)); writer.write(contents.toString()); } catch (FileNotFoundException e) { e.printStackTrace(); } cat

RE: file formats: MacRoman and UTF-8...

2011-03-28 Thread Uwe Schindler

Hi, You have to give the Charset when creating the Writer. If you give no charset, Java uses the platform default. This question has nothing to do with Lucene, it is better suited at an XML or JAVA general forum. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de

Re: file formats: MacRoman and UTF-8...

2011-03-28 Thread Paul Libbrecht

java -Dfile.encoding=utf-8 should do the trick. Or... which java app are you using? paul Le 28 mars 2011 à 09:03, Patrick Diviacco a écrit : > When I run my Lucene app and a parse a xml file I get the following error > due to some fonts such as "é" written in the text file. > > If I save the

file formats: MacRoman and UTF-8...

2011-03-28 Thread Patrick Diviacco

When I run my Lucene app and a parse a xml file I get the following error due to some fonts such as "é" written in the text file. If I save the text file as UTF-8 with my text editor I don't have this issue, but when I create it with a java app, it is saved as MacRoman. How can I specify a differ

Japanese/Chinese language support

a faster way to addDocument and get the ID just added?

Re: comparing lucene scores across queries

Re: comparing lucene scores across queries

RE: comparing lucene scores across queries

Re: comparing lucene scores across queries

RE: comparing lucene scores across queries

Re: comparing lucene scores across queries

Re: comparing lucene scores across queries

RE: comparing lucene scores across queries

Re: comparing lucene scores across queries

RE: comparing lucene scores across queries

comparing lucene scores across queries

Re: file formats: MacRoman and UTF-8...

RE: file formats: MacRoman and UTF-8...

Re: file formats: MacRoman and UTF-8...

RE: file formats: MacRoman and UTF-8...

Re: file formats: MacRoman and UTF-8...

file formats: MacRoman and UTF-8...

19 matches

Site Navigation

Mail list logo

Footer information