Hi there,
You should use LowerCaseTokenizerFactory as you point out yourself. As far
as I know, the StandardTokenizer recognizes email addresses and internet
hostnames as one token. In your case, I guess you want an email, say
[EMAIL PROTECTED] to be split into four tokens: average joe
Thanks for the quick reply!
It is supposed to work a little like the Google Suggest or field
autocompletion.
I know I mentioned email and userid, but the problem lies with the name
field, because of the whitespaces in combination with the wildcard.
I looked at the
Glen,
The thing is, Solr has a database integration built-in with the new
DataImportHandler. So I'm not sure how much interest Solr users
would have in LuSql by itself.
Maybe there are LuSql features that DIH could borrow from? Or vice
versa?
Erik
On Nov 17, 2008, at 11:03
Hi Guys
I have timestamp fields in my database in the format,
ddmmyyhhmmss.Z AM
eg: 26-05-08 10:45:53.66100 AM
But I think the since the solr date format is different, i am unable to
index the document with the solr.DateField.
So is there any option by which I can give my timestamp
How are you indexing the data ? by posting xml? or using DIH?
On Tue, Nov 18, 2008 at 3:53 PM, con [EMAIL PROTECTED] wrote:
Hi Guys
I have timestamp fields in my database in the format,
ddmmyyhhmmss.Z AM
eg: 26-05-08 10:45:53.66100 AM
But I think the since the solr date format is
Hey there, I've been testing and checking the source of the
TextProfileSignature.java to avoid similar entries at indexing time.
What I understood is that it is useful for huge text where the frequency of
the tokens (the words in lowercase just with number and leters in taht case)
is important.
Ah, okay!
Well, then I suggest you index the field in two different ways if you want
both possible ways of searching. One, where you treat the entire name as
one token (in lowercase) (then you can search for avera* and match on for
instance average joe etc.) And then another field where you
Have you tried the tunning params for TextProfileSignature? I probably
have to update the dedupe wiki.
You can set the quantRate and the minTokenLength. Those are the
variables names and you set them right with signatureClass,
signatureField, fields, etc.
Whether or not you can tune it to
I have my own duplication system to detect that but I use String
comparison
so it works really slow...
What are you doing for the String comparison? Not exact right?
Hello,
I have some questions regarding the use of the EmbeddedSolrServer in order to
embed a solr instance into a Java application.
1°) Is an instance of the EmbeddedSolrServer class threadsafe when used by
several concurent threads?
2°) Regarding to transactions, can an instance of the
I have my own duplication system to detect that but I use String
comparison
so it works really slow...
What are you doing for the String comparison? Not exact right?
hey,
My comparison method looks for similar (not just exact)... what I do is to
compare two text word to word. What I do
Marc Sturlese wrote:
Hey there, I've been testing and checking the source of the
TextProfileSignature.java to avoid similar entries at indexing time.
What I understood is that it is useful for huge text where the frequency of
the tokens (the words in lowercase just with number and leters in taht
Erik,
Right now there is no real abstraction like DIH in LuSql. But as
indicated in the TODO section of the documentation, I was planning on
implementing or straight borrowing DIH in the near future.
I am assuming that Solr is all multi-threaded as performant as it
can be. Is there a test SQL
Hi Glen,
There is an issue open for making DIH API friendly. Take a look and let us
know what you think.
https://issues.apache.org/jira/browse/SOLR-853
On Tue, Nov 18, 2008 at 8:26 PM, Glen Newton [EMAIL PROTECTED] wrote:
Erik,
Right now there is no real abstraction like DIH in LuSql. But
Has anyone else experienced a deadlock when the DirectUpdateHandler2
does an autocommit?
I'm using a recent snapshot from hudson (apache-
solr-2008-11-12_08-06-21), and quite often when I'm loading data the
server (tomcat 6) gets stuck at line 469 of DirectUpdateHandler2:
// Check if
Toby Cole wrote:
Has anyone else experienced a deadlock when the DirectUpdateHandler2
does an autocommit?
I'm using a recent snapshot from hudson
(apache-solr-2008-11-12_08-06-21), and quite often when I'm loading
data the server (tomcat 6) gets stuck at line 469 of
DirectUpdateHandler2:
Mark Miller wrote:
Toby Cole wrote:
Has anyone else experienced a deadlock when the DirectUpdateHandler2
does an autocommit?
I'm using a recent snapshot from hudson
(apache-solr-2008-11-12_08-06-21), and quite often when I'm loading
data the server (tomcat 6) gets stuck at line 469 of
I'm using Perl LWP which has a default 30 sec timeout on the http
request. I can set it to a larger number like 24 hours :-) I guess.
How do you set your timeout?
Phil
Lance Norskog wrote:
The 'optimize' http command blocks. If you script your automation, you can
just call the http and then
Hi Noble,
I am using DIH.
Noble Paul നോബിള് नोब्ळ् wrote:
How are you indexing the data ? by posting xml? or using DIH?
On Tue, Nov 18, 2008 at 3:53 PM, con [EMAIL PROTECTED] wrote:
Hi Guys
I have timestamp fields in my database in the format,
ddmmyyhhmmss.Z AM
eg: 26-05-08
Take a look at the DateFormatTransformer. You can find documentation on the
DataImportHandler wiki.
http://wiki.apache.org/solr/DataImportHandler
On Tue, Nov 18, 2008 at 10:41 PM, con [EMAIL PROTECTED] wrote:
Hi Noble,
I am using DIH.
Noble Paul നോബിള് नोब्ळ् wrote:
How are you
: Is there a way to specify sort criteria through Solr admin ui. I tried
: doing it thorugh the query statement box but it did not work.
the search box on the admin gui is fairly limited ... it's jsut a quick
dirty way to run test queries. other options like sorting, filtering, and
faceting
You don't need to hack the code since you can virtually treated these
scores 2.3518934 and 2.2173865 as if they were both equal (ignoring
digits after the decimal point).
Score = original score(2.3518934) + function(date_created)
You can scale the value of function(date_created) so that digits
i am using embeddedSolrServer and simply has a queue that documents
are sent to ..and a listerner on that queue that writes it to the
index..
or just keep it simple, and do a synchronization block around the
method in the writeserver that writes the document to the index.
Jeryl Cook
/^\ Pharaoh
Hi,
I assume there is a schema definition or DTD for XML response but could not
find it anywhere.
Is there one?
thanks
-Simon
--
View this message in context:
http://www.nabble.com/Is-there-a-DTD-XSD-for-XML-response--tp20565773p20565773.html
Sent from the Solr - User mailing list
Anyone knows if the solr-ruby gem is compatible with solr 1.3??
Also anyone using acts_as_solr plugin? Off late the website is down and
can't find any recent activities on that
-Raghu
On Nov 18, 2008, at 2:41 PM, Kashyap, Raghu wrote:
Anyone knows if the solr-ruby gem is compatible with solr 1.3??
Yes, the gem at rubyforge is compatible with 1.3. Also, the library
itself is distributed with the binary release of Solr, in client/ruby/
solr-ruby/lib
Also anyone using
I've been using solr-ruby with 1.3 for quite a while now. It's powering our
experimental, open-source OPAC, Blacklight:
blacklight.rubyforge.org
I've got a custom query builder and response wrapper, but it's using
solr-ruby underneath.
Matt
On Tue, Nov 18, 2008 at 2:57 PM, Erik Hatcher [EMAIL
On 18-Nov-08, at 8:54 AM, Mark Miller wrote:
Mark Miller wrote:
Toby Cole wrote:
Has anyone else experienced a deadlock when the
DirectUpdateHandler2 does an autocommit?
I'm using a recent snapshot from hudson (apache-
solr-2008-11-12_08-06-21), and quite often when I'm loading data
the
Mike Klaas wrote:
autoCommitCount is written in a CommitTracker.synchronized block
only. It is read to print stats in an unsynchronized fashion, which
perhaps could be fixed, though I can't see how it could cause a problem
lastAddedTime is only written in a call path within a
On 18 Nov 2008, at 20:18, Mark Miller wrote:
Mike Klaas wrote:
autoCommitCount is written in a CommitTracker.synchronized block
only. It is read to print stats in an unsynchronized fashion,
which perhaps could be fixed, though I can't see how it could cause
a problem
lastAddedTime
Hello,
We are working with a very large index and with large documents (300+
page books.) It appears that the bottleneck on our system is the disk
IO involved in reading position information from the prx file for
commonly occuring terms.
An example slow query is the new economics.
To
Very cool :-)
Both suggestions work fine! But only with solr version 1.4:
https://issues.apache.org/jira/browse/SOLR-823
Use a nightly build (e.g. 2008-11-17 works):
http://people.apache.org/builds/lucene/solr/nightly/
See below for examples for both solutions...
((( 1 )))
There may be one
Rather than attempt an answer to your questions directly, I'll mention
how other projects have dealt with the very-common-word issue. Nutch,
for example, has a list of high frequency terms and concatenates them
with the successive word in order to form less-frequent aggregate
terms. The
On 18-Nov-08, at 12:18 PM, Mark Miller wrote:
Mike Klaas wrote:
autoCommitCount is written in a CommitTracker.synchronized block
only. It is read to print stats in an unsynchronized fashion,
which perhaps could be fixed, though I can't see how it could cause
a problem
lastAddedTime
Yes, I've found it.
Do you want my comments here or in solr-dev or on jira?
Glen
2008/11/18 Shalin Shekhar Mangar [EMAIL PROTECTED]:
Hi Glen,
There is an issue open for making DIH API friendly. Take a look and let us
know what you think.
https://issues.apache.org/jira/browse/SOLR-853
On 18-Nov-08, at 6:56 AM, Glen Newton wrote:
Erik,
Right now there is no real abstraction like DIH in LuSql. But as
indicated in the TODO section of the documentation, I was planning on
implementing or straight borrowing DIH in the near future.
I am assuming that Solr is all multi-threaded
Was wondering if anyone can fill me in on the when and why I would set
waitFlush and waitSearcher to false when sending a commit command? I
think I understand what they do technically (I've looked at the code),
but I am not clear about why I would want to do it. Is there a risk
in
waitFlush I'm not sure...
waitSearcher=true it will wait until a new searcher is opened after
your commit, that way the client is guaranteed to have the results
that were just sent in the index. if waitSearcher=true, a query could
hit a searcher that does not have the new documents in
Does waitFlush do anything now? I only see it being set if eclipse is
not missing a reference...
Ryan McKinley wrote:
waitFlush I'm not sure...
waitSearcher=true it will wait until a new searcher is opened after
your commit, that way the client is guaranteed to have the results
that were
Hi Glen ,
You can post all the queries first on solr-dev and all the valid ones
can be moved to JIRA
thanks,
Noble
On Wed, Nov 19, 2008 at 3:26 AM, Glen Newton [EMAIL PROTECTED] wrote:
Yes, I've found it.
Do you want my comments here or in solr-dev or on jira?
Glen
2008/11/18 Shalin
Thanks gistolero.
I have added this to the FAQ
http://wiki.apache.org/solr/DataImportHandlerFaq
On Wed, Nov 19, 2008 at 2:34 AM, [EMAIL PROTECTED] wrote:
Very cool :-)
Both suggestions work fine! But only with solr version 1.4:
https://issues.apache.org/jira/browse/SOLR-823
Use a nightly
That explains true, but what about false? Why would I ever set it to
false? I f I don't wait, how will I ever know when the new searcher
is ready?
On Nov 18, 2008, at 10:27 PM, Ryan McKinley wrote:
waitFlush I'm not sure...
waitSearcher=true it will wait until a new searcher is opened
I am using waitSearcher=false with a crawler. The crawling thread
finishes a set of stuff, and calls commit/. It does not want to
search, it gets back to crawling ASAP
On Nov 18, 2008, at 11:35 PM, Grant Ingersoll wrote:
That explains true, but what about false? Why would I ever set it
Hi
Thanks for your quick reply Shalin
I have updated my data-config like:
entity name=employees
transformer=TemplateTransformer,DateFormatTransformer pk=EMP_ID
query=select EMP_ID, CREATED_DATE, CUST_ID FROM EMP, CUST where EMP.EMP_ID
= CUST.EMP_ID
field
Do you have a stacktrace?
On Wed, Nov 19, 2008 at 10:24 AM, con [EMAIL PROTECTED] wrote:
Hi
Thanks for your quick reply Shalin
I have updated my data-config like:
entity name=employees
transformer=TemplateTransformer,DateFormatTransformer pk=EMP_ID
query=select EMP_ID, CREATED_DATE,
Hi Shalin
Please find the log data.
10:18:30,819 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
10:18:30,838 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: No
Hi Nobble
I have cross checked. This is my copy field of schema.xml
copyField source=CREATED_DATE dest=date /
I am still getting that error.
thanks
con
Noble Paul നോബിള് नोब्ळ् wrote:
yoour copyField has the wrong source field name . Field name is not
date it is 'CREATED_DATE'
nope... solr does not have a DTD.
On Nov 18, 2008, at 1:44 PM, Simon Hu wrote:
Hi,
I assume there is a schema definition or DTD for XML response but
could not
find it anywhere.
Is there one?
thanks
-Simon
--
View this message in context:
48 matches
Mail list logo