When I run my Lucene app and a parse a xml file I get the following error
due to some fonts such as é written in the text file.
If I save the text file as UTF-8 with my text editor I don't have this
issue, but when I create it with a java app, it is saved as MacRoman.
How can I specify a
java -Dfile.encoding=utf-8
should do the trick.
Or... which java app are you using?
paul
Le 28 mars 2011 à 09:03, Patrick Diviacco a écrit :
When I run my Lucene app and a parse a xml file I get the following error
due to some fonts such as é written in the text file.
If I save the text
Hi,
You have to give the Charset when creating the Writer. If you give no
charset, Java uses the platform default. This question has nothing to do
with Lucene, it is better suited at an XML or JAVA general forum.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
hi, I'm using my own code:
Writer writer = null;
try {
//File fileOutput = new File(output.trectext);
File fileOutput = new File(args[1]);
writer = new BufferedWriter(new FileWriter(fileOutput));
writer.write(contents.toString());
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Hi,
Replace the stupid:
writer = new BufferedWriter(new FileWriter(fileOutput));
by:
writer = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(fileOutput), UTF-8));
Unfortunately, you cannot give a charset to FileWriter itself.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213
thanks, solved
On 28 March 2011 09:30, Uwe Schindler u...@thetaphi.de wrote:
Hi,
Replace the stupid:
writer = new BufferedWriter(new FileWriter(fileOutput));
by:
writer = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(fileOutput), UTF-8));
Unfortunately, you cannot give
Hi,
sorry I've already asked few days ago, but I got no reply and I really need
some help on this..
I'm running several queries against a doc collection. The queries are
documents of the collection itself, I need to measure how similar is each
document to the rest of the collection.
Now, Lucene
No, scores are in general not comparable between different queries. The
problem lies in many things:
- Each query has a norm factor that makes it more compareable if they are
sub clauses of a BooleanQuery. But you are right, this norm factor should be
the same.
- Some queries like FuzzyQuery rely
Hi, thanks for reply.
Yeah, I've read the Similarity class documentation several times, but I need
some tip.
My queries are BooleanQueries but they always have the same structure (the
same structure of the docs, they are actually docs from collection): 3
fields.
What if I simplify the
Hi Patrick,
You can disable the coord factor in the constructor of BooleanQuery.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Patrick Diviacco [mailto:patrick.divia...@gmail.com]
Sent: Monday,
Cool, so just to be sure, if I disable the coord factor I can finally
compare my BooleanQuery results ?
On 28 March 2011 10:11, Uwe Schindler u...@thetaphi.de wrote:
Hi Patrick,
You can disable the coord factor in the constructor of BooleanQuery.
Uwe
-
Uwe Schindler
One more thing, instead of extending the BooleanQuery class to remove the
coord factor, can I also extend the Similarity class to do it ?
Still the other question is open: just to be sure, if I disable the coord
factor I can finally compare my BooleanQuery results ?
thanks
On 28 March 2011
Hi,
You don't need to extend BooleanQuery, you can just pass true in its ctor,
see: http://s.apache.org/QvK
Of course you can also subclass DefaultSimilarity and return 1 as coord, but
that is more work than passing true to a ctor.
For your type of queries, disabling coord should be enough, but
ok thanks, I will pass well I dunno how to verify it. Even if I try then I
get some scores, but I dunno if comparing them is reliable.
On 28 March 2011 11:36, Uwe Schindler u...@thetaphi.de wrote:
Hi,
You don't need to extend BooleanQuery, you can just pass true in its
ctor,
see:
Hi,
As you seem to want to do very specific things, it might still be
interesting to provide a modified Similarity (by subclassing
DefaultSimilaity). You could then e.g. return also 1.0 to disable the
queryNorm() which may also be a problem (but it isn't for your queries).
Theoretically, you can
I see, well if you say the norm isn't a problem for my case, I will just
disable the coord factor by initializing BooleanQuery(true); and I should be
done.
If this is not correct, please anybody let me know.
On 28 March 2011 11:44, Uwe Schindler u...@thetaphi.de wrote:
Hi,
As you seem to
: I see, well if you say the norm isn't a problem for my case, I will just
: disable the coord factor by initializing BooleanQuery(true); and I should be
: done.
querynorm hsouldn't be a problem (since your booleanqueries all have hte
same structure, and odn't use query boosts ... i assume) but
Hi all.
I'm trying to parallelise writing documents into an index. Let's set
aside the fact that 3.1 is much better at this than 3.0.x... but I'm
using 3.0.3.
One of the things I need to know is the doc ID of each document added
so that we can add them into auxiliary database tables which are
18 matches
Mail list logo