Hi Jerome,
5 seconds is too little using Solr 1.3 or 1.4 because of caching
and segment warming. If you turn off caching and segment
warming, then you may be able do 5s latency using either a
RAMDirectory or an SSD. In the future these issues will be fixed
and less than 1s will be possible.
-J
Are there supposed to be multiple parsedquery entries for a
distributed query when debugQuery=true?
Hi Zhong,
For #2 the existing patch SOLR-1395 is a good start. It should be
fairly simple to deploy indexes and distribute them to Solr Katta
nodes/servers.
-J
On Wed, Sep 9, 2009 at 11:41 PM, Zhenyu Zhong zhongresea...@gmail.com wrote:
Jason,
Thanks for the reply.
In general, I would
Hi Zhong,
It's a very new patch. I'll update the issue as we start the
wiki page.
I've been working on indexing in Hadoop in conjunction with
Katta, which is different (it sounds) than your use case where
you have prebuilt indexes you simply want to distributed using
Katta?
-J
On Wed, Sep 9,
Is there a quick way to view index files?
, 2009 at 2:02 PM, Jason Rutherglen jason.rutherg...@gmail.com
wrote:
Is there a quick way to view index files?
For HDFS, failover, sharding you may want to use Solr with Katta.
There's an issue open at:
http://issues.apache.org/jira/browse/SOLR-1301
Near realtime search needs to be added incrementally to Solr. Today I
wouldn't recommend it.
On Wed, Sep 2, 2009 at 10:14 AM, Zhenyu
Agreed, Solr uses random access bitsets everywhere so I'm thinking
this could be an improvement or at least a great option to enable and
try out. I'll update LUCENE-1536 so we can benchmark.
On Thu, Aug 27, 2009 at 4:06 AM, Michael
McCandlessluc...@mikemccandless.com wrote:
On Thu, Aug 27, 2009
Andrew,
Which version of Solr are you using?
There's an open issue to fix caching filters at the segment
level, which will not clear the caches on each commit, you can
vote to indicate your interest.
http://issues.apache.org/jira/browse/SOLR-1308
-J
On Thu, Aug 27, 2009 at 7:06 AM, Andrew
This will be implemented as you're stating when
IndexWriter.getReader is incorporated. This will carry over
deletes in RAM until IW.commit is called (i.e. Solr commit).
It's a fairly simple change though perhaps too late for 1.4
release?
On Tue, Aug 25, 2009 at 3:10 PM,
a new
searcher from IW.getReader without calling IW.commit.
On Tue, Aug 25, 2009 at 4:37 PM, KaktuChakarabatijimmoe...@gmail.com wrote:
Jason,
sounds like a very promising change to me - so much that I would gladly work
toward creating a patch myself.
Are there any specific points in the code u
I guess you'd set the the rows parameter in the HTTP request to a high
number? Check out /solr/admin/form.jsp which has the rows text field
visible.
On Fri, Aug 21, 2009 at 9:30 AM, Elaine Lielaine.bing...@gmail.com wrote:
I want the query to return all the found docs, not just 10 of them by
It seems possible to cache the results of facet queries on a per
segment basis, providing the caching you're describing.
On Fri, Aug 21, 2009 at 8:42 AM, Fuad Efendif...@efendi.ca wrote:
actually a hybrid that goes back to DocSet intersections when it's more
efficient
I noticed that too when I
We should probably move to using Lucene's Filters/DocIdSets
instead of DocSets and merge the two. Then we will not need to
maintain two separate but similar and confusing functionality
classes. This will make seamlessly integrating searching with
Solr's Filters/DocSets into Lucene's new per
Hi Cameron,
You'll need to upgrade to Solr 1.4 as the 1.3 method of faceting
is quite slow (i.e. intersecting bitsets). 1.4 uses
UnInvertedField which caches the terms per doc and
iterates/counts them. The 1.3 method is slow because for every
term (i.e. unique field value) there needs to be a
Fuad,
I'd recommend indexing in Hadoop, then copying the new indexes to Solr
slaves. This removes the need for Solr master servers. Of course
you'd need a Hadoop cluster larger than the number of master servers
you have now. The merge indexes command (which can be taxing on the
servers because
Your term dictionary will grow somewhat, which means the term
index could consume more memory. Because the term dictionary has
grown there could be less performance in looking up terms but
that is unlikely to affect your application. How many unique
terms will there be?
On Mon, Aug 17, 2009 at
This would be good! Especially for NRT where this problem is
somewhat harder. I think we may need to look at caching readers
per corresponding http session. The pitfall is expiring them
before running out of RAM.
On Fri, Aug 14, 2009 at 6:34 AM, Yonik Seeleyyo...@lucidimagination.com wrote:
You could implement optimistic concurrency? Where a version is
stored in the document? Or using the date system you described.
Override DirectUpdateHandler2.addDoc with the custom logic.
It seems like we should have a way of performing this without
custom code and/or an easier way to plug logic
not a Java developer but I am in charge of implementing
Solr. Any more detailed/straightforward instructions are very much
appreciated.
Thank you.
Jason Rutherglen-2 wrote:
You could implement optimistic concurrency? Where a version is
stored in the document? Or using the date system you
be interesting to
compare performance with SOLR... Distributed?
-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
Sent: August-12-09 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips
For your fields with many terms you may want
at 10:51 AM, Fuad Efendif...@efendi.ca wrote:
SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to
be); check this
http://issues.apache.org/jira/browse/SOLR-475
(and probably http://issues.apache.org/jira/browse/SOLR-711)
-Original Message-
From: Jason Rutherglen
Is there a way to do this via a URL?
Hi Alan,
Solr 1.4 does not contain near realtime search capabilities and
it could be variously detrimental to call commit too often as
indexing and searches could precipitously degrade. That being
said, most of the NRT functionality is not too difficult to add,
except for per segment caching
For your fields with many terms you may want to try Bobo
http://code.google.com/p/bobo-browse/ which could work well with your
case.
On Wed, Aug 12, 2009 at 12:02 PM, Fuad Efendif...@efendi.ca wrote:
I am currently faceting on tokenized multi-valued field at
http://www.tokenizer.org (25 mlns
Is there a way to do this currently? If a shard takes an
inordinate amount of time compared to the other shards, it's useful
to see the various qtimes per shard, with the aggregated results.
Fuad,
The lock indicates to external processes the index is in use, meaning
it's not cause ConcurrentMergeScheduler to block.
ConcurrentMergeScheduler does merge in it's own thread, however
if the merges are large then they can spike IO, CPU, and cause
the machine to be somewhat unresponsive.
wrote:
Hi Jason,
I am using Master/Slave (two servers);
I monitored few hours today - 1 minute of document updates (about 100,000
documents) and then SOLR stops for at least 5 minutes to do background jobs
like RAM flush, segment merge...
Documents are small; about 10Gb of total index size
Autosuggest is something that would be very useful to build into
Solr as many search projects require it.
I'd recommend indexing relevant terms/phrases into a Ternary
Search Tree which is compact and performant. Using a wildcard
query will likely not be as fast as a Ternary Tree, and I'm not
sure
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/hyphenation/TernaryTree.html
On Wed, Jul 29, 2009 at 12:08 PM, Jason
Rutherglenjason.rutherg...@gmail.com wrote:
Autosuggest is something that would be very useful to build into
Solr as many search projects require
using a
different search engine. It'd sure be kewl if this became a core
feature of solr!
I like the idea of the tree approach, sounds much faster. The root is
the least letters to start suggestions and the leaves are the full
phrases?
-Original Message-
From: Jason Rutherglen
I'd like to be able to define within a single Solr core, a set
of indexes in multiple directories. This is really useful for
indexing in Hadoop or integrating with Katta where an
EmbeddedSolrServer is distributed to the Hadoop cluster and
indexes are generated in parallel and returned to Solr
I am referring to setting properties on the *existing* policy
available in Lucene such as LogByteSizeMergePolicy.setMaxMergeMB
On Tue, Jul 21, 2009 at 5:11 PM, Chris
Hostetterhossman_luc...@fucit.org wrote:
: SolrIndexConfig accepts a mergePolicy class name, however how does one
: inject
Are we currently supporting this or in 1.4? (i.e.
IndexReader.open and IndexWriter.setTermIndexInterval) It's
useful for trie range, shingles, etc, where many terms are
potentially created.
, 2009 at 2:21 PM, Jason
Rutherglenjason.rutherg...@gmail.com wrote:
Yeah that's what I was thinking of as an alternative, use enwiki
and randomly generate facet data along with it. However for
consistent benchmarking the random data would need to stay the
same so that people could execute
the
WikipediaTokenizer and then Tee the Categories, etc. off to separate fields
automatically for faceting, etc.
-Grant
On Jul 17, 2009, at 10:48 AM, Jason Rutherglen wrote:
The question that comes to mind is how it's different than
http://people.apache.org/~gsingers/wikipedia/enwiki-20070527-pages
Interesting, many sites don't store text in Lucene/Solr and so
need a way to highlight text stored in a database (or some
equivalent), they have two options, re-analyze the doc for the
term positions or access the term vectors from Solr and hand
them to the client who then performs the
From the Solr admin page, solr/admin/file/?file=schema.xml and
/solr/select/?q=solrversion=2.2start=0rows=10indent=on
renders improperly (meaning the XML isn't formatted). Maybe
Chrome doesn't support XML?
On Jul 14, 2009, at 7:38 PM, Jason Rutherglen wrote:
You think enwiki has enough data for faceting?
On Tue, Jul 14, 2009 at 2:56 PM, Grant Ingersollgsing...@apache.org
wrote:
At a min, it is trivial to use the EnWikiDocMaker and then send the doc
over
SolrJ...
On Jul 14, 2009, at 4:07 PM
Kind of regrettable, I think we can look at changing this in Lucene.
On Tue, Jul 14, 2009 at 12:08 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
On Tue, Jul 14, 2009 at 2:30 AM, Charlie Jackson
charlie.jack...@cision.com
wrote:
The wiki page for merging solr cores
Is there a standard index like what Lucene uses for contrib/benchmark for
executing faceted queries over? Or maybe we can randomly generate one that
works in conjunction with wikipedia? That way we can execute real world
queries against faceted data. Or we could use the Lucene/Solr mailing lists
, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Is there a standard index like what Lucene uses for contrib/benchmark for
executing faceted queries over? Or maybe we can randomly generate one
that
works in conjunction with wikipedia? That way we can execute real world
queries against
SOLR 1.4 has a new feature
https://issues.apache.org/jira/browse/SOLR-475that speeds up faceting
on fields with many terms by adding
an UnInvertedField.
Bobo uses a custom field cache as well. It may be useful to benchmark the 3
different approaches (bitsets, SOLR-475, Bobo). This could be a good
SolrIndexConfig accepts a mergePolicy class name, however how does one
inject properties into it?
Shall we create an issue for this so we can list out desirable features?
On Sun, Jul 12, 2009 at 7:01 AM, Yonik Seeley ysee...@gmail.com wrote:
On Sat, Jul 11, 2009 at 7:38 PM, Jason
Rutherglenjason.rutherg...@gmail.com wrote:
Are we planning on implementing caching (docsets, documents
Is there a way to set this in SOLR 1.3 using solrconfig? Otherwise one
needs to instantiate a class that statically
calls BooleanQuery.setAllowDocsOutOfOrder?
Are we planning on implementing caching (docsets, documents, results) per
segment reader or is this something that's going to be in 1.4?
Not quite yet, there is the IndexReader.clone patch that needs to be
completed that Ocean depends on
https://issues.apache.org/jira/browse/LUCENE-1314. I had it completed
but then things changed in IndexReader so now it doesn't work and I
have not had time to complete it again. Otherwise the
the
trade-off between frequency and edit distance. I'll file a jira and look at
submitting a patch.
Cheers,
Jason
On Thu, Oct 9, 2008 at 9:22 AM, Grant Ingersoll [EMAIL PROTECTED] wrote:
Sorting in the SpellChecker is handled by the SuggestWord.compareTo()
method in Lucene. It looks like:
public
Hi Grant,
Here are solr config files (attached) and java code (included below) to
recreate the test case.
Jason
ListPairString, Integer terms = new ArrayListPairString,
Integer();
terms.add(new PairString, Integer(chanel, 834));
terms.add(new PairString, Integer(chant
do better than the two-step sort.
Cheers,
Jason
, as it is a bit more
sophisticated than Levenstein when it comes to scoring.
I just tried J-W and *yes* it seems to do a much better job! I'd certainly
vote for that becoming the default :)
Thanks for all the help! Much appreciated.
Jason
--
Jason Rennie
Head of Machine Learning Technologies
On Wed, Oct 8, 2008 at 3:31 PM, Jason Rennie [EMAIL PROTECTED] wrote:
I just tried J-W and *yes* it seems to do a much better job! I'd certainly
vote for that becoming the default :)
Ack! I did some more testing and J-W results started to get weird
(including suggesting courses for coursets
?
Jason
Sure. I just sent the relevant files/code directly to you. Let me know if
you don't get them or have any trouble with them.
Jason
On Tue, Oct 7, 2008 at 3:27 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:
Can you share your spellchecker setup and the code for the test case? I
would like
you, I'd assume that some of the documents will be lost, so I'd reindex any
recently added documents.
Jason
On Wed, Oct 1, 2008 at 12:58 PM, Michael Tedesco [EMAIL PROTECTED]wrote:
Hey,
I just want to copy over my indexes from my production server running 1.2
to another test server running 1.3
Sounds like the DisMax handler would work well for you. I'm no expert, but
I'm fairly certain that if you created a solr.DisMaxRequestHandler handler
with qf containing those three fields, you could issue the query +france
+flag +french and get the desired results.
Jason
On Mon, Oct 6, 2008
chanel always top chant and chang since
they all have the same edit distance yet chanel is two orders of
mangnitude more popular?
Is there anything I could be doing wrong to create these problems? If not,
are these known issues? If not, should I create jira's for them?
Thanks,
Jason
due to the higher frequency?
- query is yello. 53 document hits. No suggestions. yellow yields
36560 document. Does the spellchecker only run when there are no document
hits?
Btw, is there a better place to be posting comments/questions like this?
Jason
On Mon, Oct 6, 2008 at 4:08 PM
be going on here?
Thanks,
Jason
On Wed, Sep 24, 2008 at 4:22 PM, Jason Rennie [EMAIL PROTECTED] wrote:
On Wed, Sep 24, 2008 at 4:07 PM, Grant Ingersoll [EMAIL PROTECTED]wrote:
Just mimic the configuration for the spellCheckCompRH in the handler that
you use for querying.
Sounds even better
weirdness earlier (no
inserts/deletes), but that disappeared now that I'm using the
StandardTokenizer.
Cheers,
Jason
On Fri, Sep 26, 2008 at 9:33 AM, Shalin Shekhar Mangar
[EMAIL PROTECTED] wrote:
Jason, can you please open a jira issue to add this feature?
Done.
https://issues.apache.org/jira/browse/SOLR-795
Jason
The question I have is what is the optimal approach for integrating
realtime into SOLR? What classes should be extended or created?
On Sat, Sep 27, 2008 at 9:40 AM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
Solr today is not suited for real-time search (seeing newly added docs in
search
like a more appropriate schedule for rebuilding the spelling index.
Is there or could there be an option to rebuild the spelling index on
optimize?
Thanks,
Jason
/spellCheckCompRH.
I found a post from a month ago indicating that it can be done. Can someone
fill me in on what I'm missing?
I see that solrj has support for the spellcheck response, so it looks like
dealing with the response will be easy once I get the query right.
Jason
P.S. Thanks to everyone who had
/lst
arr name=last-components
strspellcheck/str
/arr
Thanks,
Jason
PROTECTED] wrote:
why to restart solr ? reloading a core may be sufficient.
SOLR-561 already supports this
-
On Thu, Sep 18, 2008 at 5:17 PM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
Servlets is one thing. For SOLR the situation is different. There
are always small changes people want
, and so either a dynamic schema, or no schema at all is best.
Otherwise the documents would need to have a schemaVersion field, this
gets messy I looked at this.
Jason
On Wed, Sep 17, 2008 at 5:10 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
On Wed, Sep 17, 2008 at 4:50 PM, Henrib [EMAIL PROTECTED] wrote
at 1:27 PM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
If the configuration code is going to be rewritten then I would like
to see the ability to dynamically update the configuration and schema
without needing to reboot the server.
Exactly. Actually, multi-core allows you to instantiate
with the rsync based batch replication.
On Wed, Sep 17, 2008 at 2:21 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
If the configuration code is going to be rewritten then I would like
to see the ability to dynamically update
off in production in servelet containers imo as well.
This can really be such a pain in the ass on a live site...someone touches
web.xml and the app server reboots*shudder*. Seen it, don't dig it.
Jason Rutherglen wrote:
This should be done. Great idea.
On Wed, Sep 17, 2008 at 3:41 PM
:56 AM, Mark Miller [EMAIL PROTECTED] wrote:
Dynamic changes are not what I'm against...I'm against dynamic changes that
are triggered by the app noticing that the config have changed.
Jason Rutherglen wrote:
Servlets is one thing. For SOLR the situation is different. There
are always small
If the configuration code is going to be rewritten then I would like
to see the ability to dynamically update the configuration and schema
without needing to reboot the server. Also I would like the
configuration classes to just contain data and not have so many
methods that operate on the
16, 2008 at 10:12 AM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
SQL database such as H2
Mainly to offer joins and be able to perform hierarchical queries.
Can you define or give an example of what you mean by hierarchical queries?
A downside of any type of cross-document queries (like joins
integrate it, it
should be the easiest on the list.
Jason
On Mon, Sep 15, 2008 at 11:44 AM, Ryan McKinley [EMAIL PROTECTED] wrote:
Here are my gut reactions to this list... in general, most of this comes
down to sounds great, if someone did the work I'm all for it!
Also, no need to post
and a new SOLR war
file to individual servers.
Cheers,
Jason
parsing.
Not sure what you mean here...
See also https://issues.apache.org/jira/browse/LUCENE-494 for related
stuff.
Thanks for the pointer.
Jason
are easy to make via the solrj client we use. Though, for one of our
indexes, we perform all of the updates offline and run an optimize before
putting the index into production. Hope this helps.
Cheers,
Jason
--
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http
, is there anything we could do to easily trim-down
computation time (besides removing common words from the query)?
Jason
--
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog pictures: http://samanthalyrarennie.blogspot.com/
On Thu, Sep 11, 2008 at 11:54 AM, Mark Miller [EMAIL PROTECTED] wrote:
What kind of traffic are you getting when it takes seconds? 1 request? 12?
I'd estimate concurrency around 3, though the speed doesn't change much when
we run the same query on a server with zero traffic.
Jason
?
Nope.
Are you using multi-value filed for filter ...
No, it does not have the multiValue attribute turned on. The qf field is
just an integer.
Any thoughts/comments are appreciated.
Thanks,
Jason
minimizes
maintenance. #1 or #2 seems like a better choice if it is likely you will
eventually need to physically separate the indexes.
Jason
On Mon, Sep 8, 2008 at 6:15 PM, Chris Hostetter [EMAIL PROTECTED]wrote:
: I found this thread in the archive...
:
: I'm responsible for a number of ruby
Have you tried performing an optimize? Solr doesn't seem to fully
integrate all updates into a single index until an optimize is performed.
Jason
On Wed, Sep 10, 2008 at 1:05 PM, sundar shankar [EMAIL PROTECTED]wrote:
Hi All,
We have a cluster of 4 servers for the application
interested in realtime
search to get involved as it may be something that is difficult for
one company to have enough resources to implement to a production
level. I think this is where open source collaboration is
particularly useful.
Cheers,
Jason Rutherglen
[EMAIL PROTECTED]
On Wed, Sep 3, 2008 at 4
to
convert a double into a long first and I am not an expert at bit level math.
Cheers,
Jason
Kevin Guillaume,
Many thanks for the pointers. It sounds like one of these two solutions
will fit our needs.
Cheers,
Jason
On Thu, Aug 21, 2008 at 5:33 PM, Guillaume Smet [EMAIL PROTECTED]wrote:
On Thu, Aug 21, 2008 at 11:23 PM, Jason Rennie [EMAIL PROTECTED] wrote:
Is there an option
Count me as interested. Our documents are product descriptions, many
fields of which are very short. Not sure if it would make large enough of
an impact to warrant us rolling our own solr build, but I'm definitely
interested to see the custom Similarity class.
Thanks,
Jason
On Thu, Aug 21
basic, possibly as simple as removing plural endings. Our index is over
product descriptions, so it's important that we stem normal variations in
nouns, but adverbs, verbs and possibly adjective variations are not so
important and sometimes cause problems for us.
Jason
. For anyone unfamiliar w/ daemontools, here's
DJB's explanation of why they rock compared to inittab, ttys, init.d, and
rc.local:
http://cr.yp.to/daemontools/faq/create.html#why
Jason
for a
production environment. A bit tricky to set, but solid once you have it in
place.
http://cr.yp.to/daemontools.html
Jason
--
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog pictures: http://samanthalyrarennie.blogspot.com/
cause
a large pile of threads to accumulate if we're not careful...
Jason
=bags^-1.0 ?
Thanks,
Jason
Thanks for the pointers. Looks interesting, at least as a starting point
for something more sophisticated.
Cheers,
Jason
On Mon, Aug 4, 2008 at 4:38 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:
See https://issues.apache.org/jira/browse/SOLR-236 and
http://wiki.apache.org/solr
lucene not
easily return term counts for a document with the standard indexing b/c it's
term-based (i.e. inverted). Does TermVectors=true cause solr/lucene to
store an additional doc-based index?
Thanks,
Jason
On Mon, Aug 4, 2008 at 5:06 PM, Brian Whitman [EMAIL PROTECTED]wrote:
not out of the box
Just tried adding a pf field to my request handler. When I did this, solr
returned all document fields for each doc (no score) instead of returning
the fields specified in fl. Bug? Feature? Anyone know what the reason for
this behavior is? I'm using solr 1.2.
Thanks,
Jason
returns all document fields, no score field.
Jason
On Tue, Jul 22, 2008 at 2:55 PM, Mike Klaas [EMAIL PROTECTED] wrote:
On 22-Jul-08, at 11:53 AM, Jason Rennie wrote:
Just tried adding a pf field to my request handler. When I did this, solr
returned all document fields for each doc
Doh! I mistakenly changed the request handler from dismax to standard.
Ignore me...
Jason
On Tue, Jul 22, 2008 at 2:59 PM, Jason Rennie [EMAIL PROTECTED] wrote:
I'm using solrj and all I did was add a pf entry to solrconfig.xml. I
don't think it could be an ampersand issue...
Here's
thread, so this option
would not affect operations.
In case you're curious, we use solr as the search engine for
www.stylefeeder.com. It has served us very well so far, handling over 3000
queries/day.
Thanks,
Jason
--
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http
is
separate from thread(s) making queries. Is waitSearcher=false designed to
allow queries to be processed while a commit/optimize is being run? Are
there any negative side effects to this setting (other than a query being
slightly out-of-date :)?
Thanks,
Jason
I had some trouble getting the current production build (1.2.0) working
on 10gR3 (10.1.3.0.0).
I had to remove 3 bad characters off of the front of the web.xml file
and re-jar the WAR file. It worked perfectly after that minor
modification.
Jason
-Original Message-
From: Chris
601 - 700 of 706 matches
Mail list logo