fetch all the search results along with their corresponding values for all
the terms used for scoring and then you use those values and play-around
with them and re-rank your results to your hearts content/wish.
--kk
On Wed, Jul 15, 2009 at 11:28 AM, henok sahilu wrote:
> what i want to do
languages mixed with
english content. As you can see for english it applies the usual process of
stemming/stop-word-removal etc. Try it out and do let us know if you face
any issues.
Thanks,
KK.
On Sat, Jul 11, 2009 at 8:05 AM, Robert Muir wrote:
> there is really no default in lucene
&g
you might go thru this as
well,
http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/
HTH,
KK
On Thu, Jul 9, 2009 at 10:52 AM, Aditya R wrote:
>
> Hi all,
>
> I am new to lucene. In my sample application I have used lucene to index my
> 17 field d
no idea about the webapp demo app, but are you sure you have all the
required files like the jar in the right place?
On Sat, Jun 27, 2009 at 9:50 PM, mayank juneja wrote:
> Hi
>
> I am a new user to Lucene.
>
> I tried running the Lucene web application demo provided with the source. I
> am able
Thank you very much Yonik. I downloaded the latest Solr build, pulled the
WordDelimiterFilter and used it with the same option as used by Solr default
and it worked like a charm. Thanks to Robert also.
Thanks,
KK
On Tue, Jun 9, 2009 at 7:01 PM, Yonik Seeley wrote:
> I just cut'n
t unicoded word
endings and behaving as expected.
Any idea on this issue is welcome. Help me fix the issue. BTW, lucene ppl
when is that basic worddelimiterfilter going to be added to Lucene as well?
Any idea?
Thanks,
KK.
On Tue, Jun 9, 2009 at 7:01 PM, Yonik Seeley wrote:
> I just cut
t;साल";
int length = hindiStr.length();
System.out.println("str length " + length);
for (int i=0; i wrote:
> KK can you give me an example of some indian text for which it is doing
> this?
>
> Thanks!
>
> On Mon, Jun 8, 2009 at 1:03 AM, KK wrote:
Hi All,
I'm trying to index some indian web page content which are basically a mix
of indian and say 5% of english content in the same page itself. For all
this I can not use standard or simple analyzer as they break the non-english
words in a wrong places say[because the isLetter(ch) happens to be
rs therein. I hope I
made it clear. What could be the reason for this? Any idea on fixing the
same.
Thanks,
KK
On Sat, Jun 6, 2009 at 9:45 PM, Robert Muir wrote:
> kk, i haven't had that experience with worddelimiterfilter on indian
> languages, is it possible you could provide me
g 5 values, thats fine, but somehow
its messing with unicode content. How to get rid of that? Any thougts? It
seems setting those values is some proper way might fix the problem, I'm not
sure, though.
Thanks,
KK.
On Fri, Jun 5, 2009 at 7:37 PM, Robert Muir wrote:
> kk an easier solution to
st it
here.
Thanks,
KK.
On Fri, Jun 5, 2009 at 7:37 PM, Robert Muir wrote:
> kk an easier solution to your first problem is to use
> worddelimiterfilterfactory if possible... you can get an instance of
> worddelimiter filter from that.
>
> thanks,
> robert
>
> On Fri, Ju
e of the other
one. Anyway can you guide me getting rid of the above error. And yes I'll
change the order of applying the filters as you said.
Thanks,
KK.
On Fri, Jun 5, 2009 at 5:48 PM, Robert Muir wrote:
> KK, you got the right idea.
>
> though I think you might want to ch
y out the delimiter. Will update you on that.
Thanks a lot.
KK
On Fri, Jun 5, 2009 at 5:30 PM, Robert Muir wrote:
> i think you are on the right track... once you build your analyzer, put it
> in your classpath and play around with it in luke and see if it does what
> you want.
>
>
ts = new PorterFilter(ts);
return ts;
}
}
Does this sound OK? I think it will do the job...let me try it out..
I dont need custom filter as per my requirement, at least not for these
basic things I'm doing? I think so...
Thanks,
KK.
On Thu, Jun 4, 2009 at 6:36 PM, Robert Muir wrote:
thing. What you say?
Thanks,
KK.
On Thu, Jun 4, 2009 at 6:36 PM, Robert Muir wrote:
> KK well you can always get some good examples from the lucene contrib
> codebase.
> For example, look at the DutchAnalyzer, especially:
>
> TokenStream tokenStream(String fieldName, Reader rea
ndEdn. Is something htere? do
let me know.
Thanks,
KK.
On Thu, Jun 4, 2009 at 6:19 PM, Robert Muir wrote:
> KK, for your case, you don't really need to go to the effort of detecting
> whether fragments are english or not.
> Because the English stemmers in lucene will not modify y
Uwe, thanks for your lightening fast reponse :-).
I'm looking into that and let me see how far I can go...Also I request Muir
to point me to the exact analyzer he mentiioned in thr previous mail.
Thanks,
KK
On Thu, Jun 4, 2009 at 6:10 PM, Uwe Schindler wrote:
> > I request Uwe to g
om analyzer only if thats not going to be too
complex. LOL, I'm a new user to lucene and know basics of Java coding.
Thank you very much.
--KK.
On Thu, Jun 4, 2009 at 5:30 PM, Robert Muir wrote:
> yes this is true. for starters KK, might be good to startup solr and look
> a
ntifiers, but do we really need that?
Because we've only non-english content mixed with english[and not french or
russian etc].
What is the best way of approaching the problem? Any thoughts!
Thanks,
KK.
On Wed, Jun 3, 2009 at 9:42 PM, Robert Muir wrote:
> KK, is all of your latin script t
nt intermingled with
non-english content. I must metion that we dont have stemming and case
folding for these non-english content. I'm stuck with this. Some one do let
me know how to proceed for fixing this issue.
Thanks,
KK.
Thanks for your response.
BTW, I got it done using TopDocs in place of Hits and used this
String content = searcher.doc(topDocs.scoreDocs[i].doc).get("content");
instead of
String content = hits.doc(i).get("content");
Thanks,
KK
On Tue, Jun 2, 2009 at 6:52 PM, Erick Eric
not figure out how to plug the
same thing in the above code fragment, a good example will be helpful.
As of now I thing its the highlighter thats taking the major part of the
time consumed for search. So we can restrict the whole thing for only the
part that we are going to show on the first page. Any idea on the same is
very welcome. Thank you.
--KK.
or I've to convert the same to \u
format[this is just replace &# with \u and replace the 4-dig number
with its hex equivalent]. This manual
method doesnot sound good to me. If there is any standard way to doing
the same, please someone let ke know. Thank you.
--KK.
One question?
Is it mand
the end of the day, then
try to get that done.
--KK
On Thu, May 28, 2009 at 1:16 PM, Ritu choudhary wrote:
> Is this possible through lucene or has anybody tried such thing?
>
> On 28/05/2009, Ritu choudhary wrote:
> > well friend let me explain the whole thing to you then:
> >
Yes, the getBestFragment() returns the matched fragment "fragmentcount"
numbers each separated with the "fragmentseparator".
what exactly you mean by "highlight the searched word in the document." what
is this document???
first let us know what exactly you want to
what exactly is your requirement?
Displaying the final search results in a webpage? or anything else.
The results that you are getting is correct. Now you have to decide what you
want to do with that.
I thought you are trying to show the results in a webpage.
--KK
On Thu, May 28, 2009 at 11:54
Forgot:
Are you trying all this from command line? Because thats wehn you get the
ouput as unprocessed html , those span tags, when you pass the same to
display the content as a webpage they will be processed by the browser and
you will see the colored matches.
--KK
On Thu, May 28, 2009 at 11:49
Yes , thats the expected output.
Now put that full content[whatever the searcer returned] in the html page
alongwith the styling for the same, and you will see the matches in yellow
[you chose yellow as color for highlighting].
--KK
On Thu, May 28, 2009 at 11:42 AM, Ritu choudhary wrote:
>
ghter highlighter = new Highlighter(new QueryScorer(query));
You missed the formatter altogether but you added thestyler at the end,
though. Add it and it will work like a charm.
--KK
On Wed, May 27, 2009 at 10:40 PM, Ritu choudhary wrote:
> Am i coding it wrongly ...please reply.
>
@Ritu
Wouter's reply must have fixed the problem, right? Or still stuck?
--KK
On Wed, May 27, 2009 at 1:46 PM, Wouter Heijke wrote:
> Hi,
> It sounds to me that you are highlighting the query string and not the
> document. You will have to pass the document's content to
classpath. As you can see in
the last part of the code the final output is being written to a file. As
per your requirement remove that code as well as the part that adds html and
style tags.
Now the code adds the highllight span whereeve there is a match. So now
we've to put the style script in
le and will definitely go through the examples
of LIA 2ndEdn. Thank you.
--KK
On Tue, May 26, 2009 at 6:55 PM, Erick Erickson wrote:
> It's fairly easy to construct your own analyzer bystringing together some
> filters and tokenizers. LIA (1st ed)
> had a SynonymAnalyzer. You probably
Thank you very much @ Grant.
I used the whitespaceanalyzer and other highlighter methods provided for
all unicoded docs and its working fine. Thank you all.
The book LIA2ndEdn helped me a lot specifically the examples in the
highlighting section.
Thanks,
KK.
On Tue, May 26, 2009 at 4:43 PM
more information you can always
post in in this mailing list.
HTH
KK
On Tue, May 26, 2009 at 12:39 PM, StanleyTan wrote:
>
>
> Hi Alexander,
>
> thanks for your advise. but im thinking how am i suppose to integrate in?
> because i tried to google, and after d/ling the zip folder
port both english and non-english indexing/searching/highlighting. Thank
you all. Any ideas on the same are always welcome.
Thanks,
KK.
On Tue, May 26, 2009 at 1:24 AM, Robert Muir wrote:
> as mentioned previously, i dont think your text is being analyzed the way
> you want.
>
&g
ormation on the net regarding
the advanced features of lucene all of which are clearly explained in this
book with examples.
Thank you.
KK.
On Mon, May 25, 2009 at 7:47 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> I would do some googling to find examples, or rea
a lucene index from the command line using the raw
unicode texts like this,
[...@kk-laptop]$ java LuceneSearcher "\u0BAA\u0BBF\u0BB0\u0B9A\u0BC1\u0BB0"
and it gives me the page that mathces the above query. Now I tried to do the
same alongwith highliting. So in the code I posted above you
Thanks @Michael.
I've no idea about this contrib though I'm looking into highlighter. Can you
throw some lights on the same. The steps to be taken for achieving the same.
I'm completely new to this thing. Can you point me to some examples for the
same? Thank you.
KK.
On Mon, May 2
Thanks for your response @Seid.
Can any Lucene user give me directions on this regard? I'm stuck.
Really appreciate your help.
Thanks,
KK
On Mon, May 25, 2009 at 2:43 PM, Seid Muhie wrote:
> actually I used the normal java standard libraries for this work. I
> used lucene only to r
One more information I would like to add,
# I'm building index mostly for non-english texts/documents. and searching
is done using unicode utf-8 texts[its obivious, right?]
Thanks
KK
On Mon, May 25, 2009 at 10:58 AM, KK wrote:
> Hi All.
> I want to do the same thing with say a wind
s per your mail, you used Java to extract the neighbors, Is that using the
standard techniques i.e using those spanqueries/termvectors or something
else.
If you can elaborate all this a bit It'd be very helpful.
Thank you.
KK>
On Mon, May 25, 2009 at 10:51 AM, Seid Muhie wrote:
> fo
to make use of SpanQuery, TermVector and
TermVectorMapper for these purposes, right?
NB:I also want to add hit highlighting after fixing the neighbor problem.
Thanks,
KK.
On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll wrote:
> See
> http://www.lucidimagination.com/search/docu
words after that and will show
that to end user. Any idea on doing the same will be very helpful. Thank
you.
KK.
e pointers on doing unicode normalization please let me
know. If you think that might help I'ld definitely give it a try.
Thanks,
KK
On Thu, May 21, 2009 at 7:40 PM, Robert Muir wrote:
> hello, your example (hindi), is probably suffering from a number of search
> issues:
7;ve to get the utf-8
encoding for them. Is this the way to fix this? or there are other and
better ways for doing the same.
I need proper guidance from someone who has faced similar problems earlier.
All are welcome to give their views/ideas on the same.
Thanks,
KK
ive me ideas on this. Along
with this I would also like to do hit highlighting irrespective of language.
Ideas on this will be equally helpful.
Is simpleAnalyzer() good enough for indexing and searching?
Thanks,
KK
some write ups on the
same, do give me the pointers. it'll help me a lot.
Pointers to the unicode default algorithms mentioned in your mail will be
equally helpful.
Thanks,
KK.
On Thu, May 21, 2009 at 8:03 PM, Robert Muir wrote:
> its definitely an area in lucene that could use some i
the analyzer solr was using in the default
setting[i used the default setting only, and pretty sure it was using lot
many analyzers/filter factory]. Thanks for all your time and appreciation.
Thanks,
KK.
nguages. Cant we have a single indexer that handles non-eng and eng in
equally good ways? Or any other ideas on the same ?
Thanks,
KK.
On Thu, May 21, 2009 at 6:18 PM, Joel Halbert wrote:
> The highlighter should be language independent. So long as you are
> consistent with your use
Thank you very much. As you told me I just added a single line in the jsp
page mentioning the charset as utf-8 and it worked like a charm. Thank you.
KK
On Thu, May 21, 2009 at 5:47 PM, Uwe Schindler wrote:
> If you print the result e.g. to a webpage through the servlet API, the
> out
s for these regional languages. Any other ideas of doing the
same would be helpful as well.
Thanks,
KK.
x27;m able
to see the regional text. but no through the browser . How to decoding when
fetching the search results throught searcher?
Thanks
KK
On Thu, May 21, 2009 at 1:05 PM, KK wrote:
> Thanks @Uwe.
> #To answer your last mails query, textOnly is the output of the method
> downloadPa
'm not
using the encoding for HTTP parameters, I'll use that and let you know.
Thank you very much.
KK,
On Thu, May 21, 2009 at 12:50 PM, Uwe Schindler wrote:
> I forgot:
>
> > byte [] utfEncodeByteArray = textOnly.getBytes();
> > String utfString = new S
alyzer used during
> indexing and searching. Often analyzers written for specific languages
> cannot correctly handle characters from foreign languages. But e.g.
> StandardAnalyzer or WhitespaceAnalyzer does not modify the tokens in any
> way
> (if making them lowercase is not a problem).
laces. Earlier I was
using Solr and I was posting using the same method and retreival was also
working fine, but I dont' know what is the issue with lucene, may be I'm
missing something. Can someone tell me what could be the issue? Thank you.
KK,
Thank you ag...@john.
This is even better. I don't have to bother about the 3rd argument, right?
I'll use the same one everytime for both registering a new core as well as
adding docs to an existing one.
Thanks,
KK.
On Wed, May 20, 2009 at 6:54 PM, John Byrne wrote:
> Hi KK,
>
it fixed my
problem very soon. Thank you all and special thanks to Lucene guys.
Thanks,
KK.
On Wed, May 20, 2009 at 6:28 PM, John Byrne wrote:
> I think the problem is that you are creating an new index every time you
> add a document:
>
> IndexWriter writer = new IndexWriter(tru
hits = searcher.search(query);
} catch (Exception ex) {
ex.printStackTrace();
}
int hitCount = hits.length();
System.out.println("Results found :" + hitCount);
for (int ix=0; (ix wrote:
> Hi KK,
>
> Easier still, you cou
How to create a new index? everytime I need to do so , I've to create a new
directory and put the path to that, right? how to automate the creation of
new directory?
I'm a new user of lucene. Please help me out.
Thanks,
KK.
rJ user to switch to lucene?
an estimate on this?
Thanks,
KK.
60 matches
Mail list logo