Re: Use SOLR like the MySQL LIKE

2008-11-18 Thread Aleksander M. Stensby
Hi there, You should use LowerCaseTokenizerFactory as you point out yourself. As far as I know, the StandardTokenizer recognizes email addresses and internet hostnames as one token. In your case, I guess you want an email, say [EMAIL PROTECTED] to be split into four tokens: average joe

Re: Use SOLR like the MySQL LIKE

2008-11-18 Thread Carsten L
Thanks for the quick reply! It is supposed to work a little like the Google Suggest or field autocompletion. I know I mentioned email and userid, but the problem lies with the name field, because of the whitespaces in combination with the wildcard. I looked at the

Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-18 Thread Erik Hatcher
Glen, The thing is, Solr has a database integration built-in with the new DataImportHandler. So I'm not sure how much interest Solr users would have in LuSql by itself. Maybe there are LuSql features that DIH could borrow from? Or vice versa? Erik On Nov 17, 2008, at 11:03

Error in indexing timestamp format.

2008-11-18 Thread con
Hi Guys I have timestamp fields in my database in the format, ddmmyyhhmmss.Z AM eg: 26-05-08 10:45:53.66100 AM But I think the since the solr date format is different, i am unable to index the document with the solr.DateField. So is there any option by which I can give my timestamp

Re: Error in indexing timestamp format.

2008-11-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
How are you indexing the data ? by posting xml? or using DIH? On Tue, Nov 18, 2008 at 3:53 PM, con [EMAIL PROTECTED] wrote: Hi Guys I have timestamp fields in my database in the format, ddmmyyhhmmss.Z AM eg: 26-05-08 10:45:53.66100 AM But I think the since the solr date format is

TextProfileSigature using deduplication

2008-11-18 Thread Marc Sturlese
Hey there, I've been testing and checking the source of the TextProfileSignature.java to avoid similar entries at indexing time. What I understood is that it is useful for huge text where the frequency of the tokens (the words in lowercase just with number and leters in taht case) is important.

Re: Use SOLR like the MySQL LIKE

2008-11-18 Thread Aleksander M. Stensby
Ah, okay! Well, then I suggest you index the field in two different ways if you want both possible ways of searching. One, where you treat the entire name as one token (in lowercase) (then you can search for avera* and match on for instance average joe etc.) And then another field where you

Re: TextProfileSigature using deduplication

2008-11-18 Thread Mark Miller
Have you tried the tunning params for TextProfileSignature? I probably have to update the dedupe wiki. You can set the quantRate and the minTokenLength. Those are the variables names and you set them right with signatureClass, signatureField, fields, etc. Whether or not you can tune it to

Re: TextProfileSigature using deduplication

2008-11-18 Thread Mark Miller
I have my own duplication system to detect that but I use String comparison so it works really slow... What are you doing for the String comparison? Not exact right?

EmbeddedSolrServer questions

2008-11-18 Thread Thierry Templier
Hello, I have some questions regarding the use of the EmbeddedSolrServer in order to embed a solr instance into a Java application. 1°) Is an instance of the EmbeddedSolrServer class threadsafe when used by several concurent threads? 2°) Regarding to transactions, can an instance of the

Re: TextProfileSigature using deduplication

2008-11-18 Thread Marc Sturlese
I have my own duplication system to detect that but I use String comparison so it works really slow... What are you doing for the String comparison? Not exact right? hey, My comparison method looks for similar (not just exact)... what I do is to compare two text word to word. What I do

Re: TextProfileSigature using deduplication

2008-11-18 Thread Andrzej Bialecki
Marc Sturlese wrote: Hey there, I've been testing and checking the source of the TextProfileSignature.java to avoid similar entries at indexing time. What I understood is that it is useful for huge text where the frequency of the tokens (the words in lowercase just with number and leters in taht

Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-18 Thread Glen Newton
Erik, Right now there is no real abstraction like DIH in LuSql. But as indicated in the TODO section of the documentation, I was planning on implementing or straight borrowing DIH in the near future. I am assuming that Solr is all multi-threaded as performant as it can be. Is there a test SQL

Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-18 Thread Shalin Shekhar Mangar
Hi Glen, There is an issue open for making DIH API friendly. Take a look and let us know what you think. https://issues.apache.org/jira/browse/SOLR-853 On Tue, Nov 18, 2008 at 8:26 PM, Glen Newton [EMAIL PROTECTED] wrote: Erik, Right now there is no real abstraction like DIH in LuSql. But

Deadlock with DirectUpdateHandler2

2008-11-18 Thread Toby Cole
Has anyone else experienced a deadlock when the DirectUpdateHandler2 does an autocommit? I'm using a recent snapshot from hudson (apache- solr-2008-11-12_08-06-21), and quite often when I'm loading data the server (tomcat 6) gets stuck at line 469 of DirectUpdateHandler2: // Check if

Re: Deadlock with DirectUpdateHandler2

2008-11-18 Thread Mark Miller
Toby Cole wrote: Has anyone else experienced a deadlock when the DirectUpdateHandler2 does an autocommit? I'm using a recent snapshot from hudson (apache-solr-2008-11-12_08-06-21), and quite often when I'm loading data the server (tomcat 6) gets stuck at line 469 of DirectUpdateHandler2:

Re: Deadlock with DirectUpdateHandler2

2008-11-18 Thread Mark Miller
Mark Miller wrote: Toby Cole wrote: Has anyone else experienced a deadlock when the DirectUpdateHandler2 does an autocommit? I'm using a recent snapshot from hudson (apache-solr-2008-11-12_08-06-21), and quite often when I'm loading data the server (tomcat 6) gets stuck at line 469 of

Re: Programatic way to know when an optimize is finished?

2008-11-18 Thread Phillip Farber
I'm using Perl LWP which has a default 30 sec timeout on the http request. I can set it to a larger number like 24 hours :-) I guess. How do you set your timeout? Phil Lance Norskog wrote: The 'optimize' http command blocks. If you script your automation, you can just call the http and then

Re: Error in indexing timestamp format.

2008-11-18 Thread con
Hi Noble, I am using DIH. Noble Paul നോബിള്‍ नोब्ळ् wrote: How are you indexing the data ? by posting xml? or using DIH? On Tue, Nov 18, 2008 at 3:53 PM, con [EMAIL PROTECTED] wrote: Hi Guys I have timestamp fields in my database in the format, ddmmyyhhmmss.Z AM eg: 26-05-08

Re: Error in indexing timestamp format.

2008-11-18 Thread Shalin Shekhar Mangar
Take a look at the DateFormatTransformer. You can find documentation on the DataImportHandler wiki. http://wiki.apache.org/solr/DataImportHandler On Tue, Nov 18, 2008 at 10:41 PM, con [EMAIL PROTECTED] wrote: Hi Noble, I am using DIH. Noble Paul നോബിള്‍ नोब्ळ् wrote: How are you

Re: specifying Sort criteria through Solr admin ui ...

2008-11-18 Thread Chris Hostetter
: Is there a way to specify sort criteria through Solr admin ui. I tried : doing it thorugh the query statement box but it did not work. the search box on the admin gui is fairly limited ... it's jsut a quick dirty way to run test queries. other options like sorting, filtering, and faceting

RE: Query Response Doc Score - Int Value

2008-11-18 Thread Nguyen, Joe
You don't need to hack the code since you can virtually treated these scores 2.3518934 and 2.2173865 as if they were both equal (ignoring digits after the decimal point). Score = original score(2.3518934) + function(date_created) You can scale the value of function(date_created) so that digits

Re: EmbeddedSolrServer questions

2008-11-18 Thread Jeryl Cook
i am using embeddedSolrServer and simply has a queue that documents are sent to ..and a listerner on that queue that writes it to the index.. or just keep it simple, and do a synchronization block around the method in the writeserver that writes the document to the index. Jeryl Cook /^\ Pharaoh

Is there a DTD/XSD for XML response?

2008-11-18 Thread Simon Hu
Hi, I assume there is a schema definition or DTD for XML response but could not find it anywhere. Is there one? thanks -Simon -- View this message in context: http://www.nabble.com/Is-there-a-DTD-XSD-for-XML-response--tp20565773p20565773.html Sent from the Solr - User mailing list

solr-ruby gem

2008-11-18 Thread Kashyap, Raghu
Anyone knows if the solr-ruby gem is compatible with solr 1.3?? Also anyone using acts_as_solr plugin? Off late the website is down and can't find any recent activities on that -Raghu

Re: solr-ruby gem

2008-11-18 Thread Erik Hatcher
On Nov 18, 2008, at 2:41 PM, Kashyap, Raghu wrote: Anyone knows if the solr-ruby gem is compatible with solr 1.3?? Yes, the gem at rubyforge is compatible with 1.3. Also, the library itself is distributed with the binary release of Solr, in client/ruby/ solr-ruby/lib Also anyone using

Re: solr-ruby gem

2008-11-18 Thread Matt Mitchell
I've been using solr-ruby with 1.3 for quite a while now. It's powering our experimental, open-source OPAC, Blacklight: blacklight.rubyforge.org I've got a custom query builder and response wrapper, but it's using solr-ruby underneath. Matt On Tue, Nov 18, 2008 at 2:57 PM, Erik Hatcher [EMAIL

Re: Deadlock with DirectUpdateHandler2

2008-11-18 Thread Mike Klaas
On 18-Nov-08, at 8:54 AM, Mark Miller wrote: Mark Miller wrote: Toby Cole wrote: Has anyone else experienced a deadlock when the DirectUpdateHandler2 does an autocommit? I'm using a recent snapshot from hudson (apache- solr-2008-11-12_08-06-21), and quite often when I'm loading data the

Re: Deadlock with DirectUpdateHandler2

2008-11-18 Thread Mark Miller
Mike Klaas wrote: autoCommitCount is written in a CommitTracker.synchronized block only. It is read to print stats in an unsynchronized fashion, which perhaps could be fixed, though I can't see how it could cause a problem lastAddedTime is only written in a call path within a

Re: Deadlock with DirectUpdateHandler2

2008-11-18 Thread Toby Cole
On 18 Nov 2008, at 20:18, Mark Miller wrote: Mike Klaas wrote: autoCommitCount is written in a CommitTracker.synchronized block only. It is read to print stats in an unsynchronized fashion, which perhaps could be fixed, though I can't see how it could cause a problem lastAddedTime

Processing of prx file for phrase queries: Whole position list for term read?

2008-11-18 Thread Burton-West, Tom
Hello, We are working with a very large index and with large documents (300+ page books.) It appears that the bottleneck on our system is the disk IO involved in reading position information from the prx file for commonly occuring terms. An example slow query is the new economics. To

Re: Using properties from core configuration in data-config.xml

2008-11-18 Thread gistolero
Very cool :-) Both suggestions work fine! But only with solr version 1.4: https://issues.apache.org/jira/browse/SOLR-823 Use a nightly build (e.g. 2008-11-17 works): http://people.apache.org/builds/lucene/solr/nightly/ See below for examples for both solutions... ((( 1 ))) There may be one

Re: Processing of prx file for phrase queries: Whole position list for term read?

2008-11-18 Thread Erik Hatcher
Rather than attempt an answer to your questions directly, I'll mention how other projects have dealt with the very-common-word issue. Nutch, for example, has a list of high frequency terms and concatenates them with the successive word in order to form less-frequent aggregate terms. The

Re: Deadlock with DirectUpdateHandler2

2008-11-18 Thread Mike Klaas
On 18-Nov-08, at 12:18 PM, Mark Miller wrote: Mike Klaas wrote: autoCommitCount is written in a CommitTracker.synchronized block only. It is read to print stats in an unsynchronized fashion, which perhaps could be fixed, though I can't see how it could cause a problem lastAddedTime

Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-18 Thread Glen Newton
Yes, I've found it. Do you want my comments here or in solr-dev or on jira? Glen 2008/11/18 Shalin Shekhar Mangar [EMAIL PROTECTED]: Hi Glen, There is an issue open for making DIH API friendly. Take a look and let us know what you think. https://issues.apache.org/jira/browse/SOLR-853

Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-18 Thread Mike Klaas
On 18-Nov-08, at 6:56 AM, Glen Newton wrote: Erik, Right now there is no real abstraction like DIH in LuSql. But as indicated in the TODO section of the documentation, I was planning on implementing or straight borrowing DIH in the near future. I am assuming that Solr is all multi-threaded

Wait Flush, Wait Searcher and commit Scenarios

2008-11-18 Thread Grant Ingersoll
Was wondering if anyone can fill me in on the when and why I would set waitFlush and waitSearcher to false when sending a commit command? I think I understand what they do technically (I've looked at the code), but I am not clear about why I would want to do it. Is there a risk in

Re: Wait Flush, Wait Searcher and commit Scenarios

2008-11-18 Thread Ryan McKinley
waitFlush I'm not sure... waitSearcher=true it will wait until a new searcher is opened after your commit, that way the client is guaranteed to have the results that were just sent in the index. if waitSearcher=true, a query could hit a searcher that does not have the new documents in

Re: Wait Flush, Wait Searcher and commit Scenarios

2008-11-18 Thread Mark Miller
Does waitFlush do anything now? I only see it being set if eclipse is not missing a reference... Ryan McKinley wrote: waitFlush I'm not sure... waitSearcher=true it will wait until a new searcher is opened after your commit, that way the client is guaranteed to have the results that were

Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
Hi Glen , You can post all the queries first on solr-dev and all the valid ones can be moved to JIRA thanks, Noble On Wed, Nov 19, 2008 at 3:26 AM, Glen Newton [EMAIL PROTECTED] wrote: Yes, I've found it. Do you want my comments here or in solr-dev or on jira? Glen 2008/11/18 Shalin

Re: Using properties from core configuration in data-config.xml

2008-11-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
Thanks gistolero. I have added this to the FAQ http://wiki.apache.org/solr/DataImportHandlerFaq On Wed, Nov 19, 2008 at 2:34 AM, [EMAIL PROTECTED] wrote: Very cool :-) Both suggestions work fine! But only with solr version 1.4: https://issues.apache.org/jira/browse/SOLR-823 Use a nightly

Re: Wait Flush, Wait Searcher and commit Scenarios

2008-11-18 Thread Grant Ingersoll
That explains true, but what about false? Why would I ever set it to false? I f I don't wait, how will I ever know when the new searcher is ready? On Nov 18, 2008, at 10:27 PM, Ryan McKinley wrote: waitFlush I'm not sure... waitSearcher=true it will wait until a new searcher is opened

Re: Wait Flush, Wait Searcher and commit Scenarios

2008-11-18 Thread Ryan McKinley
I am using waitSearcher=false with a crawler. The crawling thread finishes a set of stuff, and calls commit/. It does not want to search, it gets back to crawling ASAP On Nov 18, 2008, at 11:35 PM, Grant Ingersoll wrote: That explains true, but what about false? Why would I ever set it

Re: Error in indexing timestamp format.

2008-11-18 Thread con
Hi Thanks for your quick reply Shalin I have updated my data-config like: entity name=employees transformer=TemplateTransformer,DateFormatTransformer pk=EMP_ID query=select EMP_ID, CREATED_DATE, CUST_ID FROM EMP, CUST where EMP.EMP_ID = CUST.EMP_ID field

Re: Error in indexing timestamp format.

2008-11-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
Do you have a stacktrace? On Wed, Nov 19, 2008 at 10:24 AM, con [EMAIL PROTECTED] wrote: Hi Thanks for your quick reply Shalin I have updated my data-config like: entity name=employees transformer=TemplateTransformer,DateFormatTransformer pk=EMP_ID query=select EMP_ID, CREATED_DATE,

Re: Error in indexing timestamp format.

2008-11-18 Thread con
Hi Shalin Please find the log data. 10:18:30,819 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 10:18:30,838 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No

Re: Error in indexing timestamp format.

2008-11-18 Thread con
Hi Nobble I have cross checked. This is my copy field of schema.xml copyField source=CREATED_DATE dest=date / I am still getting that error. thanks con Noble Paul നോബിള്‍ नोब्ळ् wrote: yoour copyField has the wrong source field name . Field name is not date it is 'CREATED_DATE'

Re: Is there a DTD/XSD for XML response?

2008-11-18 Thread Ryan McKinley
nope... solr does not have a DTD. On Nov 18, 2008, at 1:44 PM, Simon Hu wrote: Hi, I assume there is a schema definition or DTD for XML response but could not find it anywhere. Is there one? thanks -Simon -- View this message in context: