Hey guys,
I'm wondering how people are managing regression testing, in particular with
things like text based search.
I.e. if you change how fields are indexed or change boosts in dismax,
ensuring that doesn't mean that critical queries are showing bad data.
The obvious answer to me was using
why is solr copy my complete index to somewhere when i start an delta-import?
i copy one core, start an full-import from 35Million docs and then start an
delta-import from the last hour (~2000Docs).
dih/solr need start to copy the hole index... why ? i think he is copy the
index, because my
Hello again ;-)
after a full-import from 36M Doc`s my delta import dont work fine.
if i starts my delta (which runs on another core very fast) the commit need
vry long.
I think, that solr copys the hole index and commit the new documents in the
index and then reduce the index size after
i have the same problem. any resolutions ?
-
--- System
One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
1 Core with 31 Million Documents other Cores 100.000
- Solr1 for Search-Requests - commit every Minute - 5GB Xmx
Hi Mark,
What we're doing is using a bunch of acceptance tests with JBehave to
drive our testing. We run this in a clean room environment, clearing
out the indexes before a test run and inserting the data we're
interested in. As well as tests to ensure things just work we have a
bunch of tests
Hello List,
Please see my question at
http://stackoverflow.com/questions/5552919/how-does-lucene-solr-achieve-high-performance-in-multi-field-faceted-search,
I would be interested to know some details.
Thank you,
Robin
Mark,
In one project, with Lucene not Solr, I also use a smallish unit test sample
and apply some queries there.
It is very limited but is automatable.
I find a better way is to have precision and recall measures of real users run
release after release.
I could never fully apply this yet on
As far as I am aware of, licensing issues make that impossible for us ...
On 04/05/2011 07:29 PM, Kaufman Ng wrote:
Looks like you are using openjdk. Can you try using Sun jdk?
On Mon, Apr 4, 2011 at 6:53 AM, Upayavirau...@odoko.co.uk wrote:
This is not Solr crashing, per se, it is your
I am using Solr 1.4.1(windows os) and below are the settings in my solr
config file:
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
ramBufferSizeMB32/ramBufferSizeMB
maxMergeDocs1/maxMergeDocs
lockTypenative/lockType
While writing the index, I am
Yes my mistake, you're right about #1.
On Wednesday 06 April 2011 05:25:50 William Bell wrote:
Thank you for pointing out #2. The commitsToKeep is interesting, but I
thought each commit would create a segment (before optimized) and be
self contained in the index.* directory?
I would only
Hi Marcus,
Your curl cmds don't work in that format on my unix. I conver them as
follows, and they still don't work:
$ curl --fail $solrIndex/update?commit=true -d '*:*'
$ curl --fail $solrIndex/update -d ''
From the browser:
Hi,
At Cominvent we've often had the need to visualize the internal architecture of
Apache Solr in order to explain both the relationships of the components as
well as the flow of data and queries. The result is a conceptual architecture
diagram, clearly showing how Solr relates to the
Solved. The correct translation of Marcus cmd:
$ curl http://localhost:8080/solr/update?commit=true -H Content-Type:
text/xml --data-binary 'deletequery*:*/query/delete'
http://stackoverflow.com/questions/2358476/solr-delete-not-working-for-some-reason
NB: the response is still not what I'd
Nice, thank you!
Wish there was something similar or extra to this one depicting where
do SolrJ's CommonsHttpSolrServer and EmbeddedSolrServer fit in.
Regards,
Stevo.
On Wed, Apr 6, 2011 at 11:44 AM, Jan Høydahl jan@cominvent.com wrote:
Hi,
At Cominvent we've often had the need to
I try to use sort by function in a new release of SOLR 3.1, but I have some
problems, for example:
http://localhost:8983/new_search/select?q=mothers
dayindent=truefl=templateSetId,score,templateSetPopularitysort=product(templateSetPopularity,query(mothers
day)) desc
templateSetPopularity - my
Hi all,
I'd love to share the diagram, just not sure how to do that on the list
(it's a word document I tried to send as attachment).
Jens, to answer your questions:
1. Correct, in our setup the source of the data is a DB from which we
pull the data using DIH (search the list for my previous post
Hi,
Tell me for which solr version does Patch file
SOLR-2351(https://issues.apache.org/jira/secure/attachment/12470560/mlt.patch)
fixed for .
Regards!
Isha
Hi Greg,
I need the servlet API in my app for it to work, despite being command
line.
So adding this to the maven POM fixed everything:
dependency
groupIdjavax.servlet/groupId
artifactIdservlet-api/artifactId
version2.5/version
The only way I know of (and it's a little, well, a lot arcane)
is to ping the admin/system handler. As it happens, I just
had to do something like this. This uses apache commons
http client 3X, NOT the most recent FWIW...
The URl can be admin/see solrconfig.xml
I'd really like to find out that
They're lost, never to be seen again. You'll have to reindex them.
Best
Erick
On Tue, Apr 5, 2011 at 4:25 PM, Robert Petersen rober...@buy.com wrote:
Hello fellow enthusiastic solr users,
I tried to find the answer to this simple question online, but failed.
I was wondering about this,
Hmmm, this should work just fine. Here are my questions.
1 are you absolutely sure that the new synonym file
is available when reindexing?
2 does the sunspot program do anything wonky with
the ids? The documents
will only be replaced if the IDs are identical.
3 are you sure that a
Please re-post the question here so others can see
the discussion without going to another list.
Best
Erick
On Wed, Apr 6, 2011 at 4:09 AM, Robin Palotai m.palotai.ro...@gmail.comwrote:
Hello List,
Please see my question at
The problem is query(mothers day)
See http://wiki.apache.org/solr/FunctionQuery#query
You can't directly include query syntax because the function parser
wouldn't know how to get to the end of that syntax.
You could either do
query($qq) and then add a qq=mothers day to the request
Or if you
Hi all,
Solr 3.1.0 uses different javabin format from 1.4.1
So if I use Solrj 1.4.1 jar , then i get javabin error while saving to
3.1.0
and if I use Solrj 3.1.0 jar , then I get javabin error while reading the
document from solr 1.4.1.
How to go for reindexing in this situation.
--
Thanks
Hello everyone, i need to know if some has used solr for indexing and
storing images (upt to 16MB) or binary docs.
How does solr behaves with this type of docs? How affects performance?
Thanks Everyone
--
__
Ezequiel.
Http://www.ironicnet.com
Hello,
i have a problem with dataimporthandler.
i want to index many products directly from db with this component.
i want to index some products little by little.. and every time i finish a
piece
i want to be sure that indexes are committed before go on with the other
piece.
i see that i can
Ok thanks, that's an idea :-)
Maybe we should suggest to have a method in CommonsHttpSolrServer that is
returning Solr's version...
Marc.
On Wed, Apr 6, 2011 at 2:58 PM, Erick Erickson erickerick...@gmail.comwrote:
The only way I know of (and it's a little, well, a lot arcane)
is to ping the
Carbon copied:
*Context*
This is a question mainly about Lucene (or possibly Solr) internals. The
main topic is *faceted search*, in which search can happen along multiple
independent dimensions (facets) of objects (for example size, speed, price
of a car).
When implemented with relational
On 4/5/2011 1:17 PM, Chris Hostetter wrote:
the boost param of edismax is probably a lot better choice then either
bq/bf -- but it really depends on wether you want an additive boost or a
multiplicitive one (of course with teh function query syntax add(),
product() and (query() can be combined
On Wed, Apr 6, 2011 at 12:00 PM, Shawn Heisey s...@elyograg.org wrote:
We aren't yet using dismax in production, but I've had it in my config for a
while now. I've changed it to edismax in the 3.1 setup I'm putting together
now. It has the following in the bf parameter:
Yes, I had already check the code for it and use it to compile a c# method that
returns the same signature.
But I have a strange issue:
For instance, using MinTokenLenght=2 and default QUANT_RATE, passing the text
frederico (simple text no big deal here):
1. using my c# app returns
There's not much to go on here, can you provide details
on how you check that you've committed? How are you
configuring DIH? etc.
It might be helpful to review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Wed, Apr 6, 2011 at 10:11 AM, Gastone Penzo gastone.pe...@gmail.comwrote:
On Apr 5, 2011, at 3:17 PM, Chris Hostetter wrote:
one of the original use cases for bq was for artificial keyword boosting,
in which case it still comes in handy...
bq=meta:promote^100 text:new^10 category:featured^100 (*:*
-category:accessories)^10
Yeah I thought of this specific
Another question that maybe is easier to answer, how can i store binary
data? Any example schema?
2011/4/6 Ezequiel Calderara ezech...@gmail.com
Hello everyone, i need to know if some has used solr for indexing and
storing images (upt to 16MB) or binary docs.
How does solr behaves with this
On 4/6/2011 10:55 AM, Robin Palotai wrote:
Therefore, Lucene supposedly has some advanced technique for multi-field
queries other than just taking the intersection of matching documents based
on the inverted index.
I don't think so, neccesarily. It's just that Lucene's algorithms to
doing
PS: If you want to see how Solr actually computes facetting (the
facetting code lives in the 'Solr' codebase, not in the lower level
lucene codebase), here's the file to look at, this web snapshot is from
1.4.1 dont' know if it's been changed more recently, but I don't think
majorly:
You can store binary data using a binary field type -- then you need
to send the data base64 encoded.
I would strongly recommend against storing large binary files in solr
-- unless you really don't care about performance -- the file system
is a good option that springs to mind.
ryan
I put binary data in an ordinary Solr stored field, don't need any
special schema.
I have run into trouble making sure the data is not corrupted on the way
in during indexing, depending on exactly what form of communication is
being used to index (SolrJ, SolrJ with EmbeddedSolr, DIH, etc.),
Ha, there's a binary field type?!
I've stored binary data in an ordinary String field type, and it's
worked. But there were some headaches to get it to work, might have
been smoother if I had realized there was actually a binary field type.
But wait I'm talking about Solr 'stored field',
Hi, your answers were really helpfull
I was thinking in putting the base64 encoded file into a string field. But
was a little worried about solr trying to stem it or vectorize or those
stuff.
Seen in the example of the schema.xml:
!--Binary data type. The data should be sent/retrieved in as
Ha, there's a binary field type?!
I've stored binary data in an ordinary String field type, and it's
worked. But there were some headaches to get it to work, might have
been smoother if I had realized there was actually a binary field type.
How, you can't just embed control characters in
Hi, your answers were really helpfull
I was thinking in putting the base64 encoded file into a string field. But
was a little worried about solr trying to stem it or vectorize or those
stuff.
String field types are not analyzed. So it doesn't brutalize your data. Better
use BinaryField.
On 4/6/2011 2:39 PM, Markus Jelsma wrote:
Ha, there's a binary field type?!
I've stored binary data in an ordinary String field type, and it's
worked. But there were some headaches to get it to work, might have
been smoother if I had realized there was actually a binary field type.
How, you
Well...by default there is a pretty decent schema that you can use as a
template in the example project that builds with Solr. Tika is the library
that does the actual content extraction so it would be a good idea to try
the example project out first.
Adam
2011/4/6 Ezequiel Calderara
Hi Everyone,
I am having an identical problem with concatenating author's first and last
names stored in an xml blob.
Because this field is multivalued copyfield does not work.
Does anyone have a solution?
Regards,
Alexei
--
View this message in context:
Ezequiel,
Am 06.04.2011 20:38, schrieb Ezequiel Calderara:
Anyone knows any storage for images that performs well, other than FS ?
you may have a look on http://www.danga.com/mogilefs/ ? :)
Regards
Stefan
Sorry about bringing an old thread back, I thought my solution could be
useful.
I also had to deal with multiple data sources. If the data source number
could be queried for in one of your parent entities then you could get it
using a variable as follows:
entity name=ChildEntity
On Wed, Apr 6, 2011 at 15:31 PM, Adam Estrada estrada.adam.gro...@gmail.com
wrote:
Well...by default there is a pretty decent schema that you can use as a
template in the example project that builds with Solr. Tika is the library
that does the actual content extraction so it would be a good
Once and awhile, my post.jar seems to fail on commit. Durring the commit
process, I have gotten a few errors. One is that EOF character found, and
another is that semicolon expected after the. I also have come across a was
expected.
So my question is what characters do I need to strip out of
On Wed, Apr 6, 2011 at 15:31 PM, Adam Estrada
estrada.adam.gro...@gmail.com
I wanted to know how large field's size affects performance.
If you use replication then it's a huge impact on performance as the data gets
sent over the network. It's also a memory hog so there's less memory and
Once and awhile, my post.jar seems to fail on commit. Durring the commit
process, I have gotten a few errors. One is that EOF character found, and
another is that semicolon expected after the. I also have come across a
was expected.
So my question is what characters do I need to strip out
Hi All,
I'm hoping someone can give me some pointers. I've got Solr 1.4.1 and am
using DIH to import a table from and Ingres database. The table contains
a column which is a CLOB type. I've tried to use a CLOB transformer to
transform the CLOB to a String but the index only contains something
Hi Stefan,
Thanks, my eclipse is now perfectly configured.
It makes it very easy for amateurs like me!
For other amateurs the steps are:
1. checkout the sources:
*svn checkout
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/*
2. the root folder (lucene_solr_3_1 in this
**Legal Disclaimer***
This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in
Reply Inline:
On Apr 6, 2011, at 8:12 AM, Erick Erickson wrote:
Hmmm, this should work just fine. Here are my questions.
1 are you absolutely sure that the new synonym file
is available when reindexing?
Not sure what you mean here, solr is running as root, and the file is never
moved
Oh woe is me... lol NP good to know. I'll get them on the next go
'round. :)
Thanks for the answer!
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, April 06, 2011 6:05 AM
To: solr-user@lucene.apache.org
Subject: Re: what happens to
Thanks All, I figured it out.
http://lucene.472066.n3.nabble.com/general-debugging-techniques-td868300.html
See the last line on this page.
-Original Message-
From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com]
Sent: Wednesday, April 06, 2011 6:15 PM
To:
Sounds good. Please go ahead and make this change yourself.
Done.
Ta,
Greg
On 6 April 2011 22:52, Steven A Rowe sar...@syr.edu wrote:
Hi Greg,
I need the servlet API in my app for it to work, despite being command
line.
So adding this to the maven POM fixed everything:
(11/04/06 5:25), Robert Petersen wrote:
I tried to find the answer to this simple question online, but failed.
I was wondering about this, what happens to uncommitted docsPending if I
stop solr and then restart solr? Are they lost? Are they still there
but still uncommitted? Do they get
Really? Great! I was wondering if there was some cleanup cycle like
that which would occur upon shutdown. That sounds like much more
logical behavior!
-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
Sent: Wednesday, April 06, 2011 4:03 PM
To:
Is there a configuration value I can specify for multiple cores to use
the same conf directory?
Thanks
I understand Solr can do pretty powerful geospatial search
http://www.ibm.com/developerworks/java/library/j-spatial/
http://www.ibm.com/developerworks/java/library/j-spatial/But I also
understand lots of DB researchers have done lots of geospatial related work,
can someone give an overview of the
I would not use replication. LinkedIn consumer search is a flat system
where one process indexes new entries and does queries simultaneously.
It's a custom Lucene app called Zoie. Their stuff is on Github..
I would get documents to indexers via a multicast IP-based queueing
system. This scales
A fuzzy signature system will not work here. You are right, you want
to try MLT instead.
Lance
On Wed, Apr 6, 2011 at 9:47 AM, Frederico Azeiteiro
frederico.azeite...@cision.com wrote:
Yes, I had already check the code for it and use it to compile a c# method
that returns the same signature.
Tomcat has to be configured to use UTF-8.
http://wiki.apache.org/solr/SolrTomcat?highlight=%28tomcat%29#URI_Charset_Config
On Fri, Mar 25, 2011 at 6:58 PM, kushti sandyl...@gmail.com wrote:
Grijesh wrote:
Try to send HTML data using format CDATA .
Doesn't work with
$content = ;
And
The bigger answer is that you cannot get to this size by just configuring Solr.
You may have to invent a lot of stuff. Like all of Google.
Where did you get these numbers? The proposed query rate is twice as big as
Google (Feb 2010 estimate, 34K qps).
I work at MarkLogic, and we scale to 100's
Hi,
Tell me for which solr version does Patch file
SOLR-2351(https://issues.apache.org/jira/secure/attachment/12470560/mlt.patch)
fixed for .
Regards!
Isha
Sean,
Geospatial search in Lucene/Solr is of course implemented based on
Lucene's underlying index technology. That technology was originally just
for text but it's been adapted very successfully for numerics and querying
ranges too. The only mature geospatial field type in Solr 3.1 is
Hi,
I have documents with a field that has 1A2B3C alphanumeric characters. I can
query for * and sort results based on this field, however I'd like to uniq
these results (remove duplicates) so that I can get the 5 largest unique
values. I can't use the StatsComponent because my values have
Hi,
I think you are saying dupes are the main problem? If so,
http://wiki.apache.org/solr/Deduplication ?
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Peter Spam ps...@mac.com
To:
Hello Ephraim, hello Lance, hello Walter,
thanks for your replies:
Ephraim, thanks very much for the further detailed explanation. I will try
to setup a demo system in the next few days and use your advice.
LoadBalancers are an important aspect of your design. Can you recommend one
LB
Just a quick comment re LinkedIn's stuff. You can look at Zoie (also covered
in
Lucene in Action 2), but you may be more interested in Sensei.
And yes, big systems like that need sharding and replication, multiple master
and lots of slaves.
Otis
Sematext :: http://sematext.com/ :: Solr
72 matches
Mail list logo