Hoss,
What about the case where there's only a small number of fields (a dozen or
two) but each field has hundreds of thousands or millions of values? Would Solr
be able to handle that?
From: Chris Hostetter
To: solr-user@lucene.apache.org
Sent: Tuesday, Ma
Okay, thanks for clarifying.
On Wed, Mar 20, 2013 at 12:11 AM, Justin L. wrote:
> Shalin,
>
> Thanks for your questions- the mystery is solved this morning. My "unique"
> key was only unique within an entity and not between them. There was only
> one instance of overlap- the no-longer mysteriou
NO, is this something to do with commit time in the master.
Currently I am doing an explicit commit in the Java (SolrJ) program every 5
minutes after loading data into the master. And my load on the master also huge.
Thanks
Sandeep A
-Original Message-
From: Mark Miller [mailto:markrmil
On Tue, Mar 19, 2013 at 8:52 PM, Michael Ryan wrote:
> I was wondering if anyone is aware of an existing Jira for this bug...
>
> _query_:"\"a b\"~2"
> ...is parsed as...
> PhraseQuery(someField:"a b")
> ...instead of the expected...
> PhraseQuery(someField:"a b"~2)
>
> _query_:"\"a b\""~2
> ...is
I was wondering if anyone is aware of an existing Jira for this bug...
_query_:"\"a b\"~2"
...is parsed as...
PhraseQuery(someField:"a b")
...instead of the expected...
PhraseQuery(someField:"a b"~2)
_query_:"\"a b\""~2
...is parsed as...
PhraseQuery(someField:"a b"~2)
_query_:"\"a b\"~2"~3
...i
I don't think SolrCloud works with the transient stuff.
- Mark
On Mar 19, 2013, at 8:04 PM, didier deshommes wrote:
> Hi,
> I cannot get Solrcloud to respect transientCacheSize when creating multiple
> cores via the web api. I'm runnig solr 4.2 like this:
>
> java -Dbootstrap_confdir=./solr/co
Hi,
I cannot get Solrcloud to respect transientCacheSize when creating multiple
cores via the web api. I'm runnig solr 4.2 like this:
java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=conf1 -DzkRun -DnumShards=1 -jar start.jar
I'm creating multiple cores via the core admin
Did you try QueryParsing.toString? As in:
logger.info("db retrieve time=" + (System.currentTimeMillis() - start) + ",
query=" +
QueryParsing.toString(rb.getQuery(), rb.req.getSchema()) + ",
indexIds=" + getIndexIds(rb));
-- Jack Krupansky
-Original Message-
From: Andrew Lund
Well, it wouldn't have because I forgot about it. But now that you have
reminded me and we have a day or two for 4.2.1 because hossman has asked for a
reprieve, I think we can fix this.
- Mark
On Mar 19, 2013, at 6:25 PM, yriveiro wrote:
> Solr 4.2.1 will solve this issue?
>
>
>
> -
>
Hi All,
I want to validate my approach by the experts, just to make sure i am on
doing anything wrong.
#Docs in Solr : 25M
Solr Versin : 4.2
Our requirement is to list top download document based on user country.
So we have a dynamic field "*numdownload.**" which is evaluate as
*numdownloads.
Hi Sarita,
I've not dug into your code detail but my first impression is that
you are missing store term positions?
> FieldType fieldType = new FieldType();> IndexOptions indexOptions =
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
> fieldType.setIndexOptions(indexOptions);
> fieldTyp
What are the likely ramifications of having a stored field with millions of
"words"?
For example, If I had an article and wanted to store the user id of every
user who has read it and stuck it into a simple white space delimited field.
What would go wrong and when?
My tests lead me to believe thi
Solr 4.2.1 will solve this issue?
-
Best regards
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-2-mechanism-proxy-request-error-tp4047433p4049127.html
Sent from the Solr - User mailing list archive at Nabble.com.
On Mar 15, 2013, at 6:43 AM, Rafał Radecki wrote:
> I use http and get /solr/replication?command=indexversion urls to get
> index versions on master and slave. The replication works fine but
> index versions from /solr/replication?command=indexversion differ.
I think thats normal - it's a littl
Deja-vu?
http://mail-archives.apache.org/mod_mbox/lucene-general/201303.mbox/%3CCAHd9_iR-HtNDu-3a9A5ekTFdb+5mo1eWVcu4Shp8AD=qtpq...@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/lucene-general/201303.mbox/%3Calpine.DEB.2.02.1303081644040.5502@frisbee%3E
https://issues.apache.org/ji
(13/03/20 6:14), Van Tassell, Kristian wrote:
...but I'm finding some examples where the stored text is so big (14,000 words)
that Solr fails to highlight anything. But the data is definitely in the text
field and is returning due to that hit.
Does anyone have any ideas why this happens?
Pr
: In order to support faceting, Solr maintains a cache of the faceted
: field. You need one cache for each field you are faceting on, meaning
: your memory requirements will be substantial, unless, I guess, your
1) you can consider trading ram for time by using "facet.method=enum" (and
disabling
Toke Eskildsen [t...@statsbiblioteket.dk] wrote:
[Solr, 11M documents, 5000 facet fields, 12GB RAM, OOM]
> 5000 fields @ 9 MByte is about 45GB for faceting.
> If you are feeling really adventurous, take a look at
> https://issues.apache.org/jira/browse/SOLR-2412
I tried building a test-index wi
hello,
i am trying to debug the following query in the analyzer:
*+itemModelNoExactMatchStr:JVM1640CJ01 +plsBrandId:0432 +plsBrandDesc:ge*
the query is going against a field (plsBrandDesc) that is being indexed with
solr.EdgeNGramFilterFactory and a minGramSize of 3. i have included the
compl
Don't use CloudSolrServer for writes. Instead, use
ConcurrentUpdateSolrServer, something like:
SolrServer solrServer = new ConcurrentUpdateSolrServer(solrUrl, 100, 4);
The 100 corresponds to how many docs to send in a batch. The higher
this is, the better performance is (to a point, don't set tha
We're running 3 c1.mediums, but mostly because we had spare
reservations for them. They barely break a sweat with our small
clusters (7 nodes total at the moment).
Michael Della Bitta
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271
It's a bit off-topic, but .. to mention it, because Brian said Graph Database
-- Neo4J uses / can use Lucene .. so, dependent of the usecase this is worth a
look?
On Tuesday, March 19, 2013 at 10:11 PM, Shawn Heisey wrote:
> On 3/19/2013 2:31 PM, Brian Hurt wrote:
> > Which is the problem- yo
...but I'm finding some examples where the stored text is so big (14,000 words)
that Solr fails to highlight anything. But the data is definitely in the text
field and is returning due to that hit.
Does anyone have any ideas why this happens?
On 3/19/2013 2:31 PM, Brian Hurt wrote:
Which is the problem- you might think that 60ms unique key accesses
(what I'm seeing) is more than good enough- and for most use cases,
you'd be right. But it's not unusual for a single web-page hit to
generate many dozens, if not low hundreds, of calls to
The simple answer is no. But the real question is what you are trying to
accomplish. Lucene and Solr are built and optimized around the concept of a
Boolean Query with AND, OR, and NOT terms/clauses - that should be
sufficient to implement whatever it is that you are trying to implement. For
ex
: Which is the problem- you might think that 60ms unique key accesses
: (what I'm seeing) is more than good enough- and for most use cases,
: you'd be right. But it's not unusual for a single web-page hit to
: generate many dozens, if not low hundreds, of calls to get document by
: id. At which
60ms does seem excessive for the simplest possible access - lookup by the
unique key field value. SOMETHING is clearly unacceptable at that level. Is
this on decent hardware?
Try a query with &debugQuery=true and look at the "timing" section and see
what component(s) are eating up the lion's s
On Mon, Mar 18, 2013 at 7:08 PM, Jack Krupansky wrote:
> Hmmm... if query by your unique key field is killing your performance, maybe
> you have some larger problem to address.
This is almost certainly true. I'm well outside the use cases
targeted by Solr/Lucene, and it's a testament to the qual
Not to my knowledge. I guess the nearest might be regular expressions
but that would involve one character, rather than one bit per element,
so not nearly as efficient.
How many bits? Can you break them down into separate fields?
Upayavira
On Tue, Mar 19, 2013, at 02:30 PM, Christopher ARZUR wro
Anyone can help me? Each response may save a little kitten from a horrible
and dramatic death somewhere in the world :-P
El 15/03/2013 21:06, "Jack Park" escribió:
> Is there a document that tells how to create multiple threads? Search
> returns many hits which orbit this idea, but I haven't spo
Hi,
I am using solr to index data from binary files using BinURLDataSource. I
was wondering if anyone knows how to extract an excerpt of the indexed data
during search. For example if someone made a search it would return 200
characters as a preview of the whole text content. I read online that
Jegan
By DIH Scheduler you mean
http://wiki.apache.org/solr/DataImportHandler#Scheduling ? If so, then it's not
yet in. More details in the Ticket (which is as well linked from the
Wiki-Page): https://issues.apache.org/jira/browse/SOLR-2305
Regarding your Question on the UI: the "Auto-Refresh"
In a steady state, a SolrCloud cluster puts no load on ZooKeeper other than
maintaining heart beats for the most part.
The heaviest load might be when you start up a few hundred nodes and they all
keep loading and receiving state rapidly at the same time. ZooKeeper handles
that pretty easily th
I understand this may be a better question for the zookeeper list, but I'm
asking here because I'm not completely clear how much load zookeeper takes
on in a solr cloud setup.
I'm trying to determine what specs my zookeeper boxes should be. I'm on EC2,
so what I'm curious about is whether zookeepe
Is the DIH Scheduler available in SOLR 4.1 or in 4.2? I would like to
know if we can schedule delta-import in SOLR 4.1 or 4.2.
In SOLR 4.1 DIH console, I see a "Refresh Status" button and
"Auto-Refresh Status" check box. Is this related to the delta-import
scheduling? I couldn't find any docum
Christopher,
Would you mind if i ask you about a sample?
19.03.2013 19:31 пользователь "Christopher ARZUR" <
christopher.ar...@cognix-systems.com> написал:
> Hi,
>
> Does solr (4.1.0) supports /bitwise/ AND or /bitwise/ OR operator so that
> we can specify a field to be compared against an index
Shalin,
Thanks for your questions- the mystery is solved this morning. My "unique"
key was only unique within an entity and not between them. There was only
one instance of overlap- the no-longer mysterious record and its
doppelganger.
All the other symptoms were side effects from how I was troub
-- distributed environment. But to nail it down, we probably need to see both
-- the applicable
Not sure what this is?
I have
spell
direct
spell
solr.DirectSolrSpellChecker
internal
0.5
2
1
On Mar 19, 2013, at 1:30 PM, "Dyer, James" wrote:
> Mark,
>
> I wasn't sure if Alex is actually testing /select, or if the problem is just
> coming up in /testhandler. Just wanted to verify that before we get into bug
> reports.
Distributed search will use /select if you don't use shards.qt
You may likely be hitting on a bug with WordBreakSolrSpellChecker in a
distributed environment. But to nail it down, we probably need to see both the
applicable section of your config and also this section:
. Also
need an example of a query that succeeds non-distributed (with the exact query
Can you send us what you're trying? It definitely should not be slow. Do
you have a lot or large stored fields that you're trying to retrieve?
Unless you're doing the transaction log / near-real-time stuff, here's how I'd
get a document by id:
/select?q={!term f=id}
The reason the {!t
As Erick was suggesting, add &debugQuery=debug (or &debugQuery=true) to your
Solr query request, and Solr will display more detail about the parsed
query.
For example, I see this on a query of an integer field:
curl
"http://localhost:8983/solr/select/?q=++(i_i:123)+&debugQuery=true&indent=tru
Hello,
I was testing my custom testhandler. Direct spellchecker also was not working
in cloud. After I added
spellcheck
to /select requestHandler it worked but the wordbreak spellchecker. I have
added shards.qt=testhanlder to curl request but it did not solve the issue.
Thanks.
A
Mark,
I wasn't sure if Alex is actually testing /select, or if the problem is just
coming up in /testhandler. Just wanted to verify that before we get into bug
reports.
DistributedSpellCheckComponentTest does have 1 little Word Break test scenario
in it, so we know WordBreakSolrSpellChecker a
Thanks Steve,
I appreciate the work and really fast response (sorry, if it wasn't clear).
The issue was that I used that particular feature as a 'demo' of copyField
in an upcoming training material. So, it is not the (temporary) workaround
that is an issue, but that I had to rethink the explanati
On Mar 19, 2013, at 12:04 PM, Spadez wrote:
> This is the datetime format SOLR requires as I understand it:
>
> 1995-12-31T23:59:59Z
>
> When I try to store this as a datetime field in MySQL it says it isn't
> valid. My question is, ideally I would want to keep a datetime in my
> database so I
Hi Alex,
The copyField fix (SOLR-4567) will be part of 4.2.1 - I backported it to the
4.2.1 branch.
By the way, did you see the workaround I posted in the SOLR-4567 description?:
-
UPDATE: Workaround: instead of using a single copyField directive matching
multiple explicit source fields […
This is the datetime format SOLR requires as I understand it:
1995-12-31T23:59:59Z
When I try to store this as a datetime field in MySQL it says it isn't
valid. My question is, ideally I would want to keep a datetime in my
database so I can sort by date rather than just making it a varchar, so I
4.2.1 might be out within a week or so.
- Mark
On Mar 19, 2013, at 12:48 PM, Alexandre Rafalovitch wrote:
> I am still trying to figure out Solr release cadence. They seem to be
> pretty frequent :-)
>
> However, 4.2 broke one of my config (copyField one) and I am now curious
> when the rel
Alan - Thank you, with that information I can manage to get the correct data
from SOLR and then aggregate it in JSONiq.
Eric / Walter - I totally agree that SOLR is more than likely not the solution
we should be using however all I can do is hope that upper management comes to
their senses and
I'm afraid I won't have time to dig into this for a while, anyone else want
to chime in?
Erick
On Tue, Mar 19, 2013 at 9:08 AM, Andrew Lundgren
wrote:
> This is perhaps more clear:
>
> Assuming you have a schema where:
>
>required="true" omitTermFreqAndPositions="true"/>
>
> Then:
>
> voi
My first thought too, but then I saw that he had the spell component in both
his custom testhander and the /select handler, so I'd expect that to work as
well.
- Mark
On Mar 19, 2013, at 12:18 PM, "Dyer, James"
wrote:
> Can you try including in your request the "shards.qt" parameter? In you
I am still trying to figure out Solr release cadence. They seem to be
pretty frequent :-)
However, 4.2 broke one of my config (copyField one) and I am now curious
when the release that fixes it will be out. I know it is fixed in the
source already.
Regards,
Alex.
P.s. I also have a suggested
Can you try including in your request the "shards.qt" parameter? In your case,
I think you should set it to "testhandler". See
http://wiki.apache.org/solr/SpellCheckComponent?highlight=%28shards\.qt%29#Distributed_Search_Support
for a brief discussion.
James Dyer
Ingram Content Group
(615) 21
This is perhaps more clear:
Assuming you have a schema where:
Then:
void testSamplePrint()throws IOException, SAXException,
ParserConfigurationException{
SolrConfig config = new SolrConfig("solrconfig.xml");
IndexSchema schema = new IndexSchema(config, "schema.xml", null);
Thank you for clarifying.
The logging line is this:
logger.info("db retrieve time=" + (System.currentTimeMillis() - start) + ",
query=" +
rb.getQuery().toString().replaceAll("\\p{Cntrl}", "_") + ", indexIds="
+ getIndexIds(rb));
(The replaceAll call is used to clean out the binary.)
Hi All,
I am in the process upgrading from Solr 3.6.2 to Solr 4.1 and have been running
into
problems with retrieving Term vector information.
Below is the test and source code. The tests fails with a
NullPointerException, because DocsAndPositionsEnum is always null, despite the
fact that I
Hi,
Does solr (4.1.0) supports /bitwise/ AND or /bitwise/ OR operator so
that we can specify a field to be compared against an index using
/bitwise/ AND or OR ?
Thanks,
--
Christopher
To share configs in SolrCloud you just upload a single config set and then link
it to multiple collections. You don't actually use solr.xml to do it.
- Mark
On Mar 19, 2013, at 10:43 AM, "Li, Qiang" wrote:
> We have multiple cores with the same configurations, before using SolrCloud,
> we can
Any exceptions on the master?
- Mark
On Mar 19, 2013, at 2:21 AM, Sandeep Kumar Anumalla
wrote:
> Hi Mark,
>
> I have upgraded Solr 4.2 still I am getting this exception.
>
>
> INFO: removing temporary index download directory files
> NRTCachingDirectory(org.apache.lucene.store.MMapDirecto
We have multiple cores with the same configurations, before using SolrCloud, we
can use relative path in solr.xml. But with Solr4, is seems denied for using
relative path for the schema and config in solr.xml.
Regards,
Ivan
This email message and any attachments are for the sole use of the inte
Hi,
Does solr (4.1.0) supports /bitwise/ AND or /bitwise/ OR operator so
that we can specify a field to be compared against an index using
/bitwise/ AND or OR ?
Thanks,
Christopher
Hi,
I don't understand why the scorer is making a sum of the weight of the
OR clauses. It seems to me that it is unbalancing the query scoring
toward the term that has more alternatives. To me it would make more
sense to have the max of the weight of query term alternatives.
Here is an examp
Basically, you're defining an application-specific feature, so either you
implement the feature in your application code, or as a custom search
component. In either case, you would need to examine the "explain" details.
The debug.explain.structured query parameter can be used to retrieve the
e
Yeah, one ambiguity in typography is whether a hyphen is internal to a
compound term (e.g., "CD-ROM") or a phrase separator as in your case. Some
people are careful to put spaces around the hyphen for a phrase delimiter,
but plenty of people still just drop it in directly adjacent to two words.
what do you mean " log information into solar from a custom analyzer"? Have
info go from your custom analyzer into the Solr log? In which case, just do
something
like:
private static final Logger log =
LoggerFactory.getLogger(YourPrivateClass.class.getName());
and then in your code something like
Please review: http://wiki.apache.org/solr/UsingMailingLists. You're
forcing us to guess at what's going on. You haven't posted the
results of adding debug=query (or debug=all). You haven't shown
us the field definitions for the fields in question. You haven't
given us much info to help us help yo
First, the bootstrap_conf and numShards should only be specified the
_first_ time you start up your leader. bootstrap_conf's purpose is to push
the configuration files to Zookeeper. numShards is a one-time-only
parameter that you shouldn't specify more than once, it is ignored
afterwards I think.
There's no way you can do this within the scope of text analysis I would
say.
You would be better off preparing a separate field within your indexing
code, or, if you don't have this option, you could do it in an
UpdateProcessor, perhaps the ScriptUpdateProcessor might be of use -
basically, prepa
Thanks for your reply. Sorry i forgot to mention that the it is using the
text field and it is stored = true. and i am getting back highlighted
snippets as well. It is just that I need a longer fragment.(What i need is
something like when i copy the fields to text to make sure it does not add
them
What field are you doing your hit highlighting on? You need to look at
the configuration for the highlighting component in solrconfig.xml. Also
note that you can only highlight on *stored* fields. The 'text' field is
by default not stored, so you'd need to change that and re-index.
Upayavira
On T
I am using SOLRCloud 4.1
I have a document as follows
car
this is a body of the document that talks about vehicles
When this gets indexed i create copy the content to the "text" default
field.
results in:text field for this doc
car
this is a body of the document that talks about vehicles
wh
You need to create the core directory on disk, containing a conf
directory, yourself, before you use this API.
If you are using SolrCloud, then I believe this isn't needed because the
config is in Zookeeper.
Upayavira
On Tue, Mar 19, 2013, at 06:01 AM, Ravi_Mandala wrote:
> Hi,
>
> I am trying
On Tue, Mar 19, 2013, at 05:10 AM, Abhishek tiwari wrote:
> What are the best ways to handle solr master failover in solr 3.6?
Simple answer? Upgrade to 4.x/SolrCloud.
Master failover in 3.6 is really not an easy thing to handle. I never
worked out a way to automate it.
If a master goes down,
Hi to everyone,
What is the best way to log information into solar from a custom analyzer?
Is there any way to integrate log4j or is it better to use some solr logging
method?
Thanks again for your invaluable help
Gian Maria
Thanks to everyone, now I have a clearer understanding of where to put my
jar dependencies.
Gian Maria.
-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Monday, March 18, 2013 11:57 PM
To: solr-user@lucene.apache.org
Subject: Re: how to deploy customization in sol
my default is title only I have used debug as well it shows that solr
divides the query into dual and core and then searches both separately now
while calculating the scores it puts the document in which both the terms
appear and in my case the document containing this title:
Wipro 7710U Laptop-D
yes that I know but I want to know that is there way I can separate them in
search results...the exact match one?
On Fri, Mar 15, 2013 at 10:18 PM, Jack Krupansky wrote:
> The "explain" section that is returned if you specify the &debugQuery=true
> parameter will provides the details of what term
On 19 March 2013 11:59, kobe.free.wo...@gmail.com
wrote:
> Thanks for your reply.
>
> Please let me know, if it is also possible to define the complete set/list
> of the fields dynamically in the DIH config file. In our scenario, we will
> be required to change the set/list of the fields based on
79 matches
Mail list logo