Re: Thoughts on Search Analytics?

2011-05-06 Thread N. Hira
On 06-May-2011, at 2:04 AM, Paul Libbrecht wrote: Le 6 mai 2011 à 00:20, Otis Gospodnetic a écrit : thus far, only search-testing has provided some analytics measures for us (precision and recall ones). We, of course, construct the test-suites from the logs. Interesting. It

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread N. Hira
Where do you get your Lucene/Solr downloads from? [X] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors

Re: Lucene Newbie Questions

2010-05-31 Thread N Hira
Frank -- Lucene can definitely do this stuff. This review of the Query Syntax might offer you some insight: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html Specifically, you can look up Fuzzy Searches and Synonyms. There are a couple of key ways to handle synonyms, so you might

Re: Lucene Newbie Questions

2010-05-31 Thread N Hira
:33 PM, N Hira nh...@cognocys.com wrote: Frank -- Lucene can definitely do this stuff. This review of the Query Syntax might offer you some insight: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html Specifically, you can look up Fuzzy Searches and Synonyms. There are a couple

Re: Solr tutorial

2010-05-31 Thread N Hira
I don't know of a single tutorial that puts it all together, but the rich documents feature implemented in Solr-284 would be where I would start: https://issues.apache.org/jira/browse/SOLR-284 Look here if you're using Solr 1.4 -- it should address your needs:

Re: Searching Subversion comments:

2010-03-08 Thread N Hira
I use svn diff --change revisionNumber to get the list of files associated with a given commit. You might also want to look at http://freshmeat.net/projects/svnweb/ HTH -h From: Erick Erickson erickerick...@gmail.com To: java-user java-user@lucene.apache.org

Re: If you could have one feature in Lucene...

2010-02-25 Thread N. Hira
I think it speaks to the maturity of the project ... Lucene has solved some of the easier problems in the problem space and the ones that remain are ... difficult. I recently introduced Lucene/Nutch to a group of ~10 relatively capable Java developers. While they find it easy to use,

Re: How to get a apache public license

2009-12-23 Thread N Hira
To you as well, Weiwei Wang. You can theoretically release your project under a license that is very similar to the Apache license at any time, presuming you are licensing rights related to your project. To create a project that is maintained by the Apache Software Foundation, you should

Re: lucene 2.4.1 : document in index but not returned in search

2009-10-02 Thread N Hira
Which analyzer do you use in luke? The general practice is to use the same analyzer for indexing and searching. Good luck. -h - Original Message From: Rathinapriya Nagalingam rnaga...@in.ibm.com To: java-user@lucene.apache.org Sent: Friday, October 2, 2009 10:51:42 AM Subject:

Re: Searching doubt

2009-08-04 Thread N Hira
Good summary, Shai. I've missed some of this thread as well, but does anyone know what happened to the suggestion about query manipulation? e.g., query (about us) = query(about us, aboutus) query(credit card) = query(credit card, creditcard) Regards, -h - Original Message

Re: Searching index problems with tomcat

2009-05-27 Thread N. Hira
Lazzara marco.lazz...@gmail.com wrote: I attache the file testIndex.zip.Run the query with : PHILIPCIMIANO, or RESEARCHER. I use StandardAnalyzer.Is it a problem? Marco Lazzara 2009/5/27 N. Hira nh...@cognocys.com Not sure if this applies here, but that tends to happen when

Re: Searching index problems with tomcat

2009-05-27 Thread N. Hira
/ testIndex,fieldsearch); try { this.paths = this.rdfind.Search(text, path); } catch (ParseException e1) { e1.printStackTrace(); } catch (IOException e1) { e1.printStackTrace(); } Marco Lazzara 2009/5/27 N. Hira nh

Re: Searching index problems with tomcat

2009-05-27 Thread N. Hira
);*/ } isearcher.close(); return resultingpaths; } 2009/5/27 N. Hira nh...@cognocys.com Thanks. Could you also post the code for RDFinder.Search() and the output from query.toString() when text is PHILIPCIMIANO? -h On 27-May-2009, at 12:40 PM, Marco Lazzara wrote

Re: Searching index problems with tomcat

2009-05-27 Thread N Hira
Cool! 1. So you are creating a parser with { name, synonyms, propIn }, correct? 2. Sorry -- I meant the output of query.toString(); I'm expecting to see something like this when the sentence parameter is set to philipcimiano: name:philipcimiano synonyms:philipcimiano propIn:philipcimiano

Re: Searching index problems with tomcat

2009-05-26 Thread N Hira
Marco, Does the part of the web app that is responsible for searching have permissions to read /home/marco/testIndex? Could you add some code to your searching app to print out the directory listing to confirm? Also, I may have missed this posting, but could you provide the answer from Step

Re: Searching index problems with tomcat

2009-05-26 Thread N Hira
marco marco 58 2009-05-24 12:00 segments_c -rw-r--r-- 1 marco marco 20 2009-05-24 12:00 segments.gen 2009/5/26 N Hira nh...@cognocys.com Marco, Does the part of the web app that is responsible for searching have permissions to read /home/marco/testIndex? Could you add some code

Re: Link map over results? or term freq

2008-10-16 Thread N. Hira
I think I understand what you're describing as a link map to be a tag cloud where each tag is a frequent or strong term. We did something like this as an experiment (without Lucene): http://www.cognocys.com/prospector/news.html If you're talking about something similar, then I think you can

Re: Search all Related Documents

2008-09-18 Thread N. Hira
You can search the lucene and solr mailing lists for denormalize but the general response is to try one of: 1. de-normalize the data while indexing - advantage: one query - disadvantage: data repetition 2. use 2 indices - advantage: no need for repetition; this is

Re: Lucene Memory Leak

2008-09-05 Thread N. Hira
I'm not an expert, so please take this with a grain of salt, but if you return the Hits object, you are inadvertently holding on to that IndexSearcher, right? According to the FAQ (http://wiki.apache.org/lucene-java/ ImproveSearchingSpeed), iterating over all Hits will result in

Re: Similarity percentage between two Strings

2008-09-03 Thread N. Hira
I don't know how much of this is a Lucene problem, but -- as I'm sure you will inevitably hear from others on the list -- it depends on what your definition of similar is. By similar, do you mean: 1. Identical, except for variations in case (upper/lower) 2. Allow 1., but also allow

Re: Similarity percentage between two Strings

2008-09-03 Thread N. Hira
. For Life. N. Hira wrote: I don't know how much of this is a Lucene problem, but -- as I'm sure you will inevitably hear from others on the list -- it depends on what your definition of similar is. By similar, do you mean: 1. Identical, except for variations in case (upper/lower) 2. Allow 1

Re: searching for C++

2008-06-24 Thread N. Hira
This isn't ideal, but if you have a defined list of such terms, you may find it easier to filter these terms out into a separate field for indexing. -h -- Hira, N.R. Solutions Architect Cognocys, Inc. (773) 251-7453 On

Re: Transaction semantics in Document addition

2008-05-19 Thread N Hira
How about an attribute (fullyIndexed=true/false) to keep track of whether the indexing was successful? We used a similar attribute for a similar problem, but stored it in the accompanying database instead. -h - Original Message From: Michael McCandless [EMAIL PROTECTED] To:

Re: Transaction semantics in Document addition

2008-05-19 Thread N Hira
in Document addition In your scenario, it might work, but I wonder how you generate hits, excluding the fullyindexed=false. -Original Message- From: N Hira [mailto:[EMAIL PROTECTED] Sent: 19 May 2008 18:31 To: java-user@lucene.apache.org Subject: Re: Transaction semantics in Document addition

Re: Search Order

2008-05-05 Thread N. Hira
Please review: http://wiki.apache.org/lucene-java/LuceneFAQ I suspect your question is answered as: How do I make sure that a match in a document title has greater weight than than a match in a document body? -h -- Hira,

Re: [ANN] Luke 0.8 released

2008-02-04 Thread N. Hira
Thank you for this. Luke has been *extremely* helpful. -h -- Hira, N.R. Solutions Architect Cognocys, Inc. On 04-Feb-2008, at 10:17 PM, Andrzej Bialecki wrote: Hi all, I just released Luke 0.8, the Lucene Index Toolbox. As

Re: StopWords problem

2007-12-27 Thread N. Hira
Hi Liaqat, Are you sure that the Urdu characters are being correctly interpreted by the JVM even during the file I/O operation? I would expect Unicode characters to be encoded as multi-byte sequences and so, the string-matching operations would fail (if the literals are different from

Re: Time of processing hits.doc()

2007-11-18 Thread N. Hira
Can you explain the problem you're trying to address from the user's perspective? From the description you've provided, you may want to look up Faceted Searching. Another option may be to use a HitCollector, but it would help us if you could describe the problem at a higher level.

Re: How to implement cut of score ?

2007-08-13 Thread N. Hira
Donna, If I understand the problem correctly, it is: given a [job description], find [candidates] that we would not otherwise find. That seems to be a user-weighted similarity problem more than a simple search problem. IOW: 1. Given a [job description], create a set of queries that look for

Re: Fine Tuning Lucene implementation

2007-07-24 Thread N. Hira
Could you show us the relevant source from doBodySearch()? -h On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote: I ran some tests and it seems that the slowness is from Lucene calls when I do doBodySearch, if I remove that call, Lucene gives me results in 5 seconds. otherwise it takes

Re: Fine Tuning Lucene implementation

2007-07-24 Thread N. Hira
; } I really need to optimize doBodySearch(...) as this takes the most time. thanks guys, Askar On 7/24/07, N. Hira [EMAIL PROTECTED] wrote: Could you show us the relevant source from doBodySearch()? -h On Tue, 2007-07-24 at 19:58

Re: Lucene Index of Oracle RDBMS

2007-01-08 Thread N. Hira
Max, We use a (customized version of) Lucene as part of our Cognocys IAM product, which is also available with Oracle RDBMS. I can tell you that the software is used at Medtronic, a global medical technology company. -h ---

Re: JVM Crash

2006-06-13 Thread N Hira
We had a similar problem. We discovered that it was basically that eden/from was out of memory and made two changes and that seems to have helped: 1. Reduce [Max]PermSize to 128M 2. Use the concurrent garbage collector Good luck. -h --- Ross Rankin [EMAIL PROTECTED] wrote: We keep getting

Re: Problems with Lucene

2006-06-01 Thread N Hira
Alberto, It might be helpful if you would provide the full stack-trace. We use Lucene with our web application like many other projects. I can assure you that there is no basic incompatibility, but you may indeed be experiencing something specific to your environment. -h Alberto