Re: Simple Web Search

2008-06-17 Thread Lukas Vlcek
Hi, If your content is stored in database then you might be also interested in Compass (I have a very positive experience with this product). Hibernate search can be other interesting product for you (I don't have any experience with this product so I am not able to tell you). Lukas On Tue, Jun

Re: distributed lucene progress

2008-06-02 Thread Lukas Vlcek
FYI: The Ning's code seems to be part of Hadoop contrib package now. On Sat, May 31, 2008 at 5:35 AM, Matt Ronge [EMAIL PROTECTED] wrote: On May 21, 2008, at 3:19 PM, Otis Gospodnetic wrote: No, that's a separate project on SF, IIRC. I am also interested in distributed lucene. I took a

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread Lukas Vlcek
Does it make sense to consider using OpenOffice to convert from MS formats to PDF or HTML before indexing. Would this yield me a lower fail rate as opposed to pure POI approach? I don't care about formating now I care about content in the first place. Formating would be important only in the case

Re: Does Lucene save an offline version of web pages?

2008-04-27 Thread Lukas Vlcek
Hi, this sounds like job for Nutch (one of Lucene family projects). On Sun, Apr 27, 2008 at 8:26 PM, Legolas wood [EMAIL PROTECTED] wrote: Hi Thank you for reading my post. I have to design a system with the following requirements, I think Lucene or one of the projects which are based on

Re: Compass

2008-01-22 Thread Lukas Vlcek
Hi, I am using Compass with Spring and JPA. It works pretty nice. I don't store index into database, I use traditional file system based Lucene index. Updates work very well but you have to be careful about proper mapping of your objects into search engine (specially parent-child mappings).

Nutch - Microsoft Search Server integration

2008-01-14 Thread Lukas Vlcek
Hi, Is it possible to integrate Nutch into MS Search Server via OpenSearch API? (MS Search Server support Open Search: http://www.microsoft.com/enterprisesearch/serverproducts/searchserver/features.aspx ) I think it should be possible to pass user query from MS server to Nutch and integrate

Re: Wikia search goes live today

2008-01-08 Thread Lukas Vlcek
Kubes [EMAIL PROTECTED] wrote: Star ratings are being stored but not accounted for in the score as of yet. The plan is to include them in future indexing scores. :) Dennis Mike Klaas wrote: On 7-Jan-08, at 11:49 PM, Lukas Vlcek wrote: This would be great! I am particularly

Re: Wikia search goes live today

2008-01-08 Thread Lukas Vlcek
I should note that this technique is probably not easily applicable to current Lucene scoring mechanism without additional development. On 1/8/08, Lukas Vlcek [EMAIL PROTECTED] wrote: After checking the Lucene API of ParallelReader it seems that the star score could be stored in different

Re: Wikia search goes live today

2008-01-08 Thread Lukas Vlcek
: Lukas Vlcek wrote: So staring will be accommodated only during indexing phase. Does it mean it will be pretty static value not a dynamically changing variable... correct? In other words if I add my starts to some document it won't affect the scoring immediately but after indexing cycle

Wikia search goes live today

2008-01-07 Thread Lukas Vlcek
Hi, I noticed that Wikia search goes live today (see http://www.devxnews.com/article.php/3719906). Does anybody know where I could find more technical information about their solution? Are they going to contribute their enhancements back to Lucene/Nutch/Hadoop code? My understanding is that as

Re: Wikia search goes live today

2008-01-07 Thread Lukas Vlcek
of the ASF. But, that is a discussion for somewhere else... On Jan 7, 2008, at 8:13 AM, Grant Ingersoll wrote: On Jan 7, 2008, at 7:48 AM, Lukas Vlcek wrote: Hi, I noticed that Wikia search goes live today (see http://www.devxnews.com/article.php/3719906). Does anybody know where

Re: Wikia search goes live today

2008-01-07 Thread Lukas Vlcek
in the future. Obviously I don't see the big picture but I think they don't have any other option then contributing back to community if they mean it seriously. On Jan 8, 2008 8:49 AM, Lukas Vlcek [EMAIL PROTECTED] wrote: This would be great! I am particularly interested how they are going

Re: Lucene jdbc

2007-11-26 Thread Lukas Vlcek
AFAIK no. Lucene is revelance based query engine not relation based engine like SQL database. However, if you really want to use SQL on top of Lucene index then there can be a way. You need to store index into database (see

ApacheCon 2008 Europe - Lucene stuff

2007-11-26 Thread Lukas Vlcek
Hi, Is anybody going to present anything about Lucene (and related technologies - Solr, Hadoop, ...) at ApacheCon 2008 Europe? Any training sessions, invited talks and/or specific track? The conference pages (http://www.eu.apachecon.com/) does not contain any details yet. Regards, Lukas --

Re: Customized search with Lucene?

2007-10-25 Thread Lukas Vlcek
just before clicking those docs. This mean updating the documents so again it should be done carefully. Seems I'm adding more questions than answers so I'll better stop here... Doron Lukas Vlcek [EMAIL PROTECTED] wrote on 24/10/2007 23:45:21: Doron, Sorry for the late reply. I got

Re: Customized search with Lucene?

2007-10-24 Thread Lukas Vlcek
search for query Q7 boost doc D5 by B17 If user U2 search for query Q3 boost doc D15 by B2 Seems lots of info, and it must be persistent. Perhaps o.a.l.search.function can help - assuming you have this info available at search time, and can use it to create a ValueSource. Doron Lukas Vlcek

Re: getting summary from lucene index

2007-10-16 Thread Lukas Vlcek
Hi, See highlighter package in Lucene/contrib folder. Regards, Lukas On 10/16/07, mic1099 [EMAIL PROTECTED] wrote: I used nutch to index my aplication. I wanted to handle indexing my self so i used lucene api to index. Everything went ok except of getting summary. Under the term summary i

Customized search with Lucene?

2007-10-13 Thread Lukas Vlcek
Hi, I am looking for an easy (~preferred) way of implementing customized search with Lucene. What I mean by this is changing order of returned hits according to user profile. In simple words I would like to be able to tweak order of documents in Hits collection before it is presented to the

Search in SharePoint Server 2007

2007-08-24 Thread Lukas Vlcek
Hi, Does anybody here have any experience with Search technology used in Microsoft Office SharePoint Server 2007? (More info can be found here: http://office.microsoft.com/en-us/sharepointserver/HA102261451033.aspx) I am particularly interested in some comparison to Lucene technology. Does it

Re: Question about highlighting returning nothing

2007-08-16 Thread Lukas Vlcek
IBM T.J. Watson Research Center (914) 945-2472 http://www.research.ibm.com/people/g/donnagresh [EMAIL PROTECTED] Lukas Vlcek [EMAIL PROTECTED] 08/15/2007 03:49 PM Please respond to java-user@lucene.apache.org To java-user@lucene.apache.org cc Subject Re: Question about

Re: Question about highlighting returning nothing

2007-08-16 Thread Lukas Vlcek
in the rewritten query. Cheers Mark - Original Message From: Lukas Vlcek [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Thursday, 16 August, 2007 4:06:36 PM Subject: Re: Question about highlighting returning nothing Donna, Now I understand what you are saying (seems that I

Re: Question about highlighting returning nothing

2007-08-15 Thread Lukas Vlcek
Donna, I have been investigation highlighters in Lucene recently a bit. The humble experience I've learned so far is that highlighting is completely different task from indexing/searching tandem. This simple fact is not obvious to a lot of people. In your particular casue it would be helpful if

Re: How to keep user search history and how to turn it into information?

2007-08-13 Thread Lukas Vlcek
Enis, thanks for excellent answer! Lukas On 8/13/07, Enis Soztutar [EMAIL PROTECTED] wrote: Hi, Lukas Vlcek wrote: Enis, Thanks for your time. I gave a quick glance at Pig and it seems good (seems it is directly based on Hadoop which I am starting to play with :-). It obvious

Re: Nested Fields

2007-08-10 Thread Lukas Vlcek
Hi, Have you checked Compass http://www.opensymphony.com/compass/ framework (built on top of Lucene)? This might be interesting for you: http://www.opensymphony.com/compass/versions/1.2M3/html/core-xsem.html BR Lukas On 8/10/07, Jeff French [EMAIL PROTECTED] wrote: Spencer, it seems

How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
Hi, I would like to keep user search history data and I am looking for some ideas/advices/recommendations. In general I would like to talk about methods of storing such data, its structure and how to turn it into valuable information. As for the structure: == For now I don't have

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
, ... etc) in Lucene community I am still wondering if once can use user search history data for such purpose and if the answer is yes then how (practical examples are welcomed). Lukas On 8/10/07, Enis Soztutar [EMAIL PROTECTED] wrote: Lukas Vlcek wrote: Hi Enis, Hi again, On 8/10/07, Enis

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
Hi Enis, On 8/10/07, Enis Soztutar [EMAIL PROTECTED] wrote: Hi, Lukas Vlcek wrote: Hi, I would like to keep user search history data and I am looking for some ideas/advices/recommendations. In general I would like to talk about methods of storing such data, its structure and how

Re: Lucene in large database contexts

2007-08-10 Thread Lukas Vlcek
Also you can look at Hibernate Search http://search.hibernate.org/. BR Lukas On 8/10/07, Lukas Vlcek [EMAIL PROTECTED] wrote: Hi, did you have a chance to look at Compasshttp://www.opensymphony.com/compass/? It can do exactly what you want. Lukas On 8/10/07, Antonello Provenzano [EMAIL

Re: Lucene in large database contexts

2007-08-10 Thread Lukas Vlcek
Hi, did you have a chance to look at Compasshttp://www.opensymphony.com/compass/? It can do exactly what you want. Lukas On 8/10/07, Antonello Provenzano [EMAIL PROTECTED] wrote: Hi There! I've been working for a while on the implementation of a website oriented to contents that would

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
concerning Users Search / Hostory/ Retrieval History/Cache Management... Thanks, dt www.ejinz.com Search News - Original Message - From: Lukas Vlcek [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Friday, August 10, 2007 2:28 AM Subject: How to keep user search history and how

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-08-05 Thread Lukas Vlcek
, you may be trying to do something that is best done without the Highlighter. In summary , you should use Document.getFields (more efficient if you are getting more than one field anyway) and get around the offset issues above. - Mark Lukas Vlcek wrote: Mark, thank you for this. I

Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Lukas Vlcek
Hi Lucene experts, The following is a simple Lucene code which generates StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0 official releasse. Can anyone tell me what is wrong with this code? Is this a bug or a feature of Lucene? Any comments/hits highly welcommed! In a nutshell

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Lukas Vlcek
problems down the road. I will look into this further. - Mark Lukas Vlcek wrote: Hi Lucene experts, The following is a simple Lucene code which generates StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0official releasse. Can anyone tell me what is wrong with this code

Re: multi-field and wildcard query highlighter questions

2007-07-26 Thread Lukas Vlcek
document which match to Query. For example if user provides query like:[111 333] then I would like to get [b111/b b333/b]. I don't want to get anything like [b111/b 222 b333/b]. Any idea how to do that? - Mark Lukas Vlcek wrote: Hi, I have two questions: 1) Is it possible to get some

multi-field and wildcard query highlighter questions

2007-07-20 Thread Lukas Vlcek
Hi, I have two questions: 1) Is it possible to get some highlighted text when using wildcard query? (I am using query rewrite) I found that it works for queries like [prefix*suffix] or [prefix?suffix] but I was not able to get results for queries like [prefix*] 2) What kind of problems I

Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Lukas Vlcek
Hi, Can anybody point me to some references how to create an ideal set of stop words? I konw that this is more like a theoretical question but how do Luceners determine which words shuold be excluded when creating Analyzers for a new languages? And which technique was used for validation of stop

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Lukas Vlcek
: There is a handy class in contrib/misc.../ that will show you the most frequent terms in an index. Handy dandy. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Lukas Vlcek [EMAIL PROTECTED

Re: Merging Indeces

2007-04-17 Thread Lukas Vlcek
Hi, try to look at Compass (http://www.opensymphony.com/compass/). It is built on top of Lucene but provides additional concepts (transactions is one of them). You might find this useful depending on your needs. Regards, Lukas On 4/16/07, Erick Erickson [EMAIL PROTECTED] wrote: See below.

Re: why Apache doesnt create a nice forum like the others???

2007-03-27 Thread Lukas Vlcek
Eric, How do you manage Reply-to: field in your gmail? I always have to change Reply-to field in Setting (which requires more then three clicks!) and since this is a manual (and tedious) process it can introduce mistakes (mis-addressed addresses). The problem is that I am signed up to more

Re: ensuring search String availability in the content returned by lucene

2007-03-12 Thread Lukas Vlcek
Hi, I am not sure if I can help you a lot but you can check how Nutch does this (although it does not do exactly what you want). See *org.apache.nutch.summary.basic.BasicSummarizer * or *org.apache.nutch.summary.lucene.LuceneSummarizer* You should also check Highliter API (

Re: Indexing clarification , please advice

2006-12-14 Thread Lukas Vlcek
Hi, May be you can consider using Compass (http://www.opensymphony.com/compass/) which could help you in your situation. They claim that some actions (like updating the index very often) are treated in a very efficient way (due to caching which is not a native part of Lucene library). Regards,

Re: lucene - general question

2006-12-04 Thread Lukas Vlcek
, Eshwaramoorthy Babu [EMAIL PROTECTED] wrote: Hi Lukas, Thanks for your response. I was planning to search for 1st xml ID's in 2nd XML. so I thought of using lucene for search. Can you please suggest me some scripting solution. Is perl right solution? Thanks, Babu On 12/4/06, Lukas Vlcek

Re: Lucene on SQL 2005

2006-12-04 Thread Lukas Vlcek
Hi, You should consider using Compass http://www.opensymphony.com/compass/. Lukas On 12/5/06, Saroj K M [EMAIL PROTECTED] wrote: Dear All, I am a new user to Lucene. I am having a requirement as follows. I am using SQL Server 2005 database, The Database having a Table named --- Product

Re: lucene - general question

2006-12-03 Thread Lukas Vlcek
Hi Babu, Sorry but I don't see any point in using Lucene if you don't need search functionality. Also for parsing XML files I would consider using some scripting language (as opposed to pure Java based solution). The reason is that scripting languages can be more effectire when simplicity of

Re: Fwd: Hibernate Lucene trademark issues

2006-11-22 Thread Lukas Vlcek
%40lists.jboss.org/msg00392.html and for the future (but flexible) http://www.mail-archive.com/hibernate-dev%40lists.jboss.org/msg00393.html HTH Emmanuel Lukas Vlcek wrote: Emanuael, I would be glad to hear your answer here (on user list). Regards, Lukas -- Forwarded

Fwd: Hibernate Lucene trademark issues

2006-11-17 Thread Lukas Vlcek
Lukas, I'd be happy to answer your question, but I don't think Lucene dev is the appropriate area for that kind of discussion. let's move this discussion here http://forum.hibernate.org/viewforum.php?f=9 (or in the Lucene User list if you want to). Emmanuel Lukas Vlcek wrote: Hi Emmanuel, I am

Re: Kneobase: open source, enterprise search

2006-05-02 Thread Lukas Vlcek
I was quickly looking at its web page eariler this day and it looks good so far! Good news! However, I have one question: does Kneobase contain any kind of web crawler functionality (like Nutch) or do I have to feed it with all sources *manually*? How much can be gathering of web data automated?

Re: how do I connect to the SVN repository to grab the latest source?

2006-01-03 Thread Lukas Vlcek
I use the following url: http://svn.apache.org/repos/asf/lucene/java/trunk and it works well for me. Lukas On 1/4/06, gekkokid [EMAIL PROTECTED] wrote: if your using windows just download subversion from subversion.tigris.org and install it - then just enter the command found on the lucene