On 7/30/2013 12:23 AM, Santanu8939967892 wrote:
Yes, your assumption is correct. The index size is around 250 GB and
we index 20/30 meta data and store around 50.
We have plan for a Solr cloud architecture having two nodes one Master
and other one is replica of the master
I have a case where I want to documents and metadata content from a
datebase. The metadata is is not a problem, but it does not appear that I
can handle the document content (held as BLOBS in the database) with
out-of-the-box SOLR 4.4 functionality.
I was hoping to to be able to solve this by
Hi,
Do you want 5 replicas? 1 or 2 is enough.
If you already have 100 million records, you don't need to do batch
indexing. Push it once, Solr has the capability to soft commit every N docs.
Use round robin and send documents to different core. When you search,
search from all the cores.
How
Thanks for the reply. I think this approach will work only for new
collections. Is there any approach to shift some existing cores to a new
machine or node??
--
View this message in context:
http://lucene.472066.n3.nabble.com/Machine-memory-full-tp4080511p4081235.html
Sent from the Solr - User
Thank you for the quick response.
I checked the document on spellcheck.collate. Looks like, it is going to
return the suggestion to the client and the client need to make one more
request to the server with the suggestion.
Is there any way to auto correct at the server end?
--
View this
Currently, while using ExtractingResourceHandler to index rich documents like
pdfs, docs, etc. solr automatically indexes the time-created/modified in
human-readable time format (Wed May 29 20:38:30 IST 2013).
How can I make solr to index the time in unixtime format?
--
View this message in
Hi Raymond Wiker,
When we search like this
1) tag:”test” works
2) tag:”TEST” works
3) tag:”test” tag:”other” works to find items with both tags
4) tag:”TEST” tag:”other” *doesn’t work.*
Either 2 should fail with true case sensitivity or 4 should work (as the
combination of two valid
Sorry, but Solr synonym processing does not know about wildcards, so it is
bypassed when a wildcard is present.
Technically, it could probably be enhanced to support them, at least for
some common special cases such as yours, but that prospect won't help you
right now.
Your best bet is to
#3 and #4 are different queries - the other term is used in different
fields. What is your default search field, which will be used for other in
#3?
Is your tag field a string field type? If so, then it is case sensitive.
If you really need it to be case insensitive, make it a text field
After some investigation I found that the problem is not with Jetty's
version but usage of --exec flag.
Namely, when --exec is used (to specify JVM args) then shutdown is not
graceful, it seems that Java process that is just killed.
Not sure how to handle this...
Regards,
Artem Karpenko.
Uh, sorry for spamming, but if anyone interested there is a way to
properly shutdown Jetty when it's launched with --exec flag.
You can use JMX to invoke method stop() on the Jetty's Server MBean.
This triggers a proper shutdown with all Solr's close() callbacks executed.
I wonder why it's not
Thanks for letting us know. See if you can add it to the documentation
somewhere.
Solr is not using Tomcat 9, but I believe that was primarily because Tomcat
9 requires Java 7 and Solr 4.x is staying with Java 6 as minimum
requirement.
Regards,
Alex.
Personal website:
Of course, I meant Jetty (not Tomcat). So apologies for spam and confusion
of my own. The rest of the statement stands.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
I don't seem to be seeing a signifigant slowdown over time when I use the old
defaults for merge threads and max merges.
- Mark
On Jul 25, 2013, at 10:17 AM, Mark Miller markrmil...@gmail.com wrote:
I'm looking into some possible slow down after long indexing issues when I
get back from
Just use the UAX29URLEmailTokenizerFactory, which recognizes email
addresses.
Any particular reason that you're trying to reinvent the wheel?
-- Jack Krupansky
-Original Message-
From: Luis Cappa Banda
Sent: Tuesday, July 30, 2013 10:53 AM
To: solr-user@lucene.apache.org
Subject:
I'm noticing some very odd behavior using dataimport from the Admin UI.
Whenever I limit the number of rows to 75 or below, the aliases field never
gets populated. As soon as I increase the limit to 76 or more, the aliases
field gets populated!
What am I not understanding here?
On Tue, Jul 30,
We would like to use HP SiteScope to monitor the availability of
the individual Solr shards. Any ideas on how we can do that? Is there a
shard based URL that is a sure shot of knowing that the shard is feeling
healthy?
Thanks! :)
: coming as part of search results. Here, I am applying boosting on the no of
: reviews and the has_image(This will be Y Or N) and I am expecting the
: product which has no of reviews count is more and the has_image=Y should
: come first. But, in some of the cases , I am not getting what I am
:
Try attaching debug=query and see what the parsed query looks, that can
often give you clues as to what's really going on. Of course if tag is a string
type then Jack's comment is spot on, it's case sensitive.
The admin/analysis page will also help you understand the analysis chains.
But also,
: bq: I am also trying to figure out if I can place
: extra dimensions to the solr score which takes other attributes into
: consideration
To re-iterate erick's point, you should definitely look at using things
like the {!boost} qparser combined with funciton queries that take into
account
Until I get the data refed I there was another field (a date field) that
was there and not when the geo field was/was not... i tried that field:*
and query times come down to 2.5s .. also just removing that filter brings
the query down to 30ms.. so I'm very hopeful that with just a boolean i'll
be
I am curious why the field:* walks the entire terms list.. could this be
discovered from a field cache / docvalues?
steve
On Tue, Jul 30, 2013 at 2:00 PM, Steven Bower sbo...@alcyon.net wrote:
Until I get the data refed I there was another field (a date field) that
was there and not when the
Going over the comments in SOLR-1316, I seemed to have lost the
forrest for the trees. What is the benefit of using the spellcheck
based suggester over something like the terms component to get
suggestions as the user types?
Maybe it is faster because it builds the in-memory data structure on
Does adding facet.mincount=2 help?
On Tue, Jul 30, 2013 at 11:46 PM, Dotan Cohen dotanco...@gmail.com wrote:
To search for duplicate IDs, I am running the following query:
select?q=*:*facet=truefacet.field=idrows=0
However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving
On 7/30/2013 12:16 PM, Dotan Cohen wrote:
To search for duplicate IDs, I am running the following query:
select?q=*:*facet=truefacet.field=idrows=0
However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving
OutOfMemoryError errors instead of the desired facet:
snip
Might there be a
Are you talking about the document's ID field?
If so, you can't have duplicates... the latter document would overwrite the
earlier.
If not, sorry for asking irrelevant questions. :)
Michael Della Bitta
Applications Developer
o: +1 646 532 3062 | c: +1 917 477 7906
appinions inc.
“The
On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal alghos...@gmail.com wrote:
Does adding facet.mincount=2 help?
In fact, when adding facet.mincount=20 (I know that some dupes are in
the hundreds) I got the OutOfMemoryError in seconds instead of
minutes.
--
Dotan Cohen
http://gibberish.co.il
Thanks guys! Will play around with it function query.
Thanks,
-Utkarsh
On Tue, Jul 30, 2013 at 10:50 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:
: bq: I am also trying to figure out if I can place
: extra dimensions to the solr score which takes other attributes into
: consideration
On Tue, Jul 30, 2013 at 9:23 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
Are you talking about the document's ID field?
If so, you can't have duplicates... the latter document would overwrite the
earlier.
If not, sorry for asking irrelevant questions. :)
In Solr 4.1 we
Since this is a one-time problem, Have you thought of just dumping all the
IDs and looking for dupes using sort and awk or something similar to that?
Michael Della Bitta
Applications Developer
o: +1 646 532 3062 | c: +1 917 477 7906
appinions inc.
“The Science of Influence Marketing”
18
On Tue, Jul 30, 2013 at 9:43 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
Since this is a one-time problem, Have you thought of just dumping all the
IDs and looking for dupes using sort and awk or something similar to that?
All 100,000,000 of them :) That would take even
On 7/30/2013 12:49 PM, Dotan Cohen wrote:
Thanks, the query ran for almost 2 full minutes but it returned
results! I'll google for how to increase the disk cache for queries
like this. Other than the Qtime, is there no way to judge the amount
of memory required for a particular query to run?
A little bit of history:
We built a solr-like solution on Lucene.NET and C# about 5 years ago, which
including faceted search. In order to get really good facet performance, what
we did was pre-cache all the facet fields in RAM as efficient compressed data
structures (either a variable byte
Hello, Jack, Steve,
Thank you for your answers. I´ve never used UAX29URLEmailTokenizerFactory,
but I´ve read about it before trying RegExp´s queries. As far as I
know, UAX29URLEmailTokenizerFactory
allows to tokenize an entry text value into patterns that match URLs,
E-mails, etc. Reading the
Dotan,
Could you please provide more line of the stack trace?
I have no idea why it made worse at 4.3. I know that 4.3 can use facets
backed on DocValues, which are modest for the heap. But from what I saw,
but can be wrong it's disabled from numeric facets. Hence, I can suggest to
reindex id as
Hello guys,
Hey, I think I´ve found how to do this just adding a filter. Just for
anyone´s curiosity:
fieldType name=emails class=solr.TextField sortMissingLast=true
omitNorms=true
analyzer
tokenizer class=solr.UAX29URLEmailTokenizerFactory/
filter
On Tue, Jul 30, 2013 at 11:48 PM, Robert Stewart robert_stew...@epam.comwrote:
Also we need to issue frequent commits since we are constantly streaming
new content into the system.
I'd like to say show me profiler snapshot, but after that note. Solr's
filter/field caches are top level
I´ve tried this kind of queries in the past but I detected that they have a
poor performance and that they are incredibly slow. But it´s just my
experience, maybe someone can share with us any other opinion.
2013/7/30 Raymond Wiker rwi...@gmail.com
On Jul 30, 2013, at 22:05 , Luis Cappa Banda
Steve,
The FieldCache and DocValues are irrelevant to this problem. Solr's
FilterCache is, and Lucene has no counterpart. Perhaps it would be cool
if Solr could look for expensive field:* usages when parsing its queries
and re-write them to use the FilterCache. That's quite doable, I think.
I
Hey, David,
I´ve been reading the thread and I think that is one of the most educative
mail-threads I´ve read in Solr mailing list. Just for curiosity: internally
for Solr, is it the same a query like field:* and field:[* TO *]? I
think that it´s expected to receive the same number of numFound
Very good read... Already using MMap... verified using pmap and vsz from
top..
not sure what you mean by good hit raitio?
Here are the stacks...
Name Time (ms) Own Time (ms)
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext,
Bits) 300879 203478
@David I will certainly update when we get the data refed... and if you
have things you'd like to investigate or try out please let me know.. I'm
happy to eval things at scale here... we will be taking this index from its
current 45m records to 6-700m over the next few months as well..
steve
On
Luis,
field:* and field:[* TO *] are semantically equivalent -- they have the
same effect. But they internally work differently depending on the field
type. The field type has the chance to intercept the range query to do
something smart (FieldType.getRangeQuery(...)). Numeric/Date (trie)
Thank you very much, David. That was a great explanation!
Regards,
- Luis Cappa
2013/7/30 Smiley, David W. dsmi...@mitre.org
Luis,
field:* and field:[* TO *] are semantically equivalent -- they have the
same effect. But they internally work differently depending on the field
type. The
You could also try the terms component which provides a very efficient
facet-like feature - counting the terms. And you can set a minimum term
frequency of 2, so only the dups would come back:
curl http://localhost:8983/solr/terms?terms.fl=idterms.mincount=2;
-- Jack Krupansky
-Original
1) Depends on your document routing strategy. It sounds like you could
be using the compositeId strategy and if so, there's still a hash
range assigned to each shard, so you can split the big shards into
smaller shards.
2) Since you're replicating in 2 places, when one of your servers
crash,
Hi,
We are using Solr 4.4 to ingest geo data and it's really slow. When we don't
index the geo it takes seconds to ingest 100, 000 records but as soon as we add
it takes 2 hours.
Also we found that when changing the distErrPct from 0.025 to 0.1, 1000 rows
are ingested in 20 sec vs 2 min. But
Hi,
We are using Solr 4.4 to ingest geo data and it's really slow. When we don't
index the geo it takes seconds to ingest 100, 000 records but as soon as we add
it takes 2 hours.
Also we found that when changing the distErrPct from 0.025 to 0.1, 1000 rows
are ingested in 20 sec vs 2 min. But
Hello all,
Is anyone experiencing issues with the numFound when using group=true in
SolrCloud 4.4?
Sometimes the results are off for us.
I will post more details shortly.
Thanks.
Hi,
I'm using Apache Solr to index RSS Feeds.
I'm with success getting data (url and if feed is active to index) from a
database, and using that has a source of an entity to index the rss data.
I'm trying to reach a result but i don't get it. I will try to explain that
with an example.
The
Hello,
I have been wanting some tools for measuring performance of SOLR, similar
to Mike McCandles' lucene benchmark.
so yet another monitor was born, is described here:
http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
I tested it on the problem of garbage collectors (see
This seems like a fairly large issue. Can you create a Jira issue ?
Bill Bell
Sent from mobile
On Jul 30, 2013, at 12:34 PM, Dotan Cohen dotanco...@gmail.com wrote:
On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal alghos...@gmail.com wrote:
Does adding facet.mincount=2 help?
In fact,
Hello.
I wanted to do a follow-up after the contest has been running for a week.
It has been going relatively well. There was a lot of visitors last week,
then a bit of quiet and then - after some of you re-announced the contest -
a second wave of activities. Thanks to everybody contributing and
On 7/30/2013 6:59 PM, Roman Chyla wrote:
I have been wanting some tools for measuring performance of SOLR, similar
to Mike McCandles' lucene benchmark.
so yet another monitor was born, is described here:
http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
I tested it
Hi,
The solr4.0’s log always show that “Waiting for client to connect to
ZooKeeper” and “Client is connected to Zookeeper” ,
But I look at the code , it only happen when “state ==
KeeperState.Expired”.
We can see the value of state is syncConnected, how did it happen?
Can anyone
Hi Marta,
Presumably you are indexing polygons -- I suspect complex ones. There isn't
too much that you can do about this right now other than index them in
parallel. I see you are doing this in 2 threads; try 4, or maybe even 6.
Also, ensure that maxDistErr is reflective of the smallest
On Tue, Jul 30, 2013 at 9:56 PM, Shawn Heisey s...@elyograg.org wrote:
On 7/30/2013 12:49 PM, Dotan Cohen wrote:
Thanks, the query ran for almost 2 full minutes but it returned
results! I'll google for how to increase the disk cache for queries
like this. Other than the Qtime, is there no
On Tue, Jul 30, 2013 at 11:00 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
Dotan,
Could you please provide more line of the stack trace?
Sure, thanks:
responselst name=errorstr
name=msgjava.lang.OutOfMemoryError: Java heap space/strstr
name=tracejava.lang.RuntimeException:
On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
j...@basetechnology.com wrote:
The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe...
any particular reason you did not use it?
See:
http://wiki.apache.org/solr/Deduplication
and
59 matches
Mail list logo