I'm not sure i understand your question ...
if you know that you are only ever going to have the 'year' then why not
just index the year as an int?
a TrieDateField isn't really of any use to you, because normal date type
usage (date math, date ranges) are useless because you don't have any
Hello,
Example csv doc has column 'just_the_year' and value '2010':
With the Schema API I can tell the indexing process to treat 'just_the_year'
as a date field.
I know that I can update the solrconfig.xml to correctly parse formats such
as MM/dd/ (which is awesome) but has anyone tried
The CollapsingQParserPlugin does not provide facet counts that are them
same as the group.facet feature in Grouping. It provides facet counts that
behave like group.truncate.
The CollapsingQParserPlugin only collapses the result set. The facets
counts are then generated for the collapsed result
If you see the last comment on:
https://issues.apache.org/jira/browse/SOLR-6143
You'll see there is a discussion starting about adding this feature.
Joel Bernstein
http://joelsolr.blogspot.com/
On Fri, Jun 19, 2015 at 4:14 PM, Joel Bernstein joels...@gmail.com wrote:
The
Thanks Joel,
I don't know why I was unable to find the understanding collapsing email
thread via the search I did on the site but I found it in my own email search
now.
We'll look into our specific scenario and see if we can find a workaround.
Thanks!
CARLOS MAROTO
M +1 626 354 7750
Hi Chris,
Thank you for taking the time to write the detailed response. Very helpful.
Dealing with interesting formats in the source data and trying to evaluate
various options for our business needs. The second scenario you described
(where some values in the date field are just the year) will
Hmm, I can see some things you couldn't do with just using
a tint field for the year. Or rather, some things that wouldn't
be as convenient
But this might help:
http://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
or you can
As stated previously, using Field Collapsing (group parameters) tends to
significantly slow down queries. In my experience, search response gets even
worst when:
- Requesting facets, which more often than not I do in my query formulation
- Asking for the facet counts to be on the groups via the
Hi Upayavira
Thank you for your explanation onthe difference between traditional
grouping and collapsingQParser. I understand more now.
On 6/19/2015 7:11 PM, Upayavira wrote:
On Fri, Jun 19, 2015, at 06:20 AM, Derek Poh wrote:
Hi
I read about collapsingQParser returns the facet count the
Ok sure.
ngrams: The max number of tokens out of which singles will be make the
dictionary. The default value is 2. Increasing this would mean you want
more than the previous 2 tokens to be taken into consideration when making
the suggestions.
I got confused by this, as I could not get the
Hi!
I'm facing a problem.
I'm using SolrCloud 4.10.3, with 2 shards, each shard have 2 replicas.
After index data to the collection, and run the same query,
http://localhost:8983/solr/catalog/select?q=awt=jsonindent=true
Sometimes, it return the right,
{
responseHeader:{
status:0,
Hi Joel
By group heads, is it referring to the document thatis use to represent
each group in the main result section?
Eg. Using the below 3 documentsandwe collapse on field supplier_id
supplier_id:S1
product_id:P1
supplier_id:S2
product_id:P2
supplier_id:S2
product_id:P3
With collapse on
Steve,
Thank you thank you so much. You guys are awesome.
Steve how can i learn more about the lucene indexing process in more
detail. e.g. after we send documents for indexing which function calls till
the doc actually store in index files.
I will be thankful to you. If you guide me here.
Hi
I read about collapsingQParser returns the facet count the same as
group.truncate=true and has this issue with the facet count and the
after filter facet count notthe same.
Using group.facetdoes not has this issue but it's performance is very
badcompared to collapsingQParser.
I trying to
Hello i have a few questions for indexing data.
Existing some hardware or software limits for indexing data?
And is some maximum of indexed documents?
Thanks for your answers.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Limit-indexed-documents-tp4212913.html
Sent from
Hello,
I'm trying to parse Solr Responses with SolrJ, but the responses contain mixed
types : for example 'song' documents and 'movie' documents with different
fields.
The getBeans method takes 1 class type as input parameter, this does not allow
for mixed document types responses.
What would
Yeah I'm just gonna say hands down this was a totally bad question. My fault,
mea culpa. I'm pretty new to working in an IDE environment and using a stack
trace (I just finished my first year of CS at University and now I'm
interning). I'm actually kind of embarrassed by how long it took me to
tomas.kalas kala...@email.cz wrote:
Existing some hardware or software limits for indexing data?
The only really hard Solr limit is 2 billion X per shard, where X is document
count, unique values in a DocValues String field and other things like that.
There are some softer limits, after which
Hi Wenbin,
To me, your instance appears well provisioned. Likewise, your analysis of test
vs. production performance makes a lot of sense. Perhaps your time would be
well spent tuning the query performance for your app before resorting to
sharding?
To that end, what do you see when you
Silly thing … Maybe the immense token was generating because trying to set
string as field type for your text ?
Can be ?
Can you wipe out the index, set a proper type for your text, and index
again ?
No worries about the not full stack trace,
We learn and do wrong things everyday :)
Errare humanum
Grouping does tend to be expensive. Our regular queries typically return in
10-15ms while the grouping queries take 60-80ms in a test environment ( 1M
docs).
This is ok for us, since we wrote our app to take the grouping queries out of
the critical path (async query in parallel with two
On Fri, Jun 19, 2015, at 06:20 AM, Derek Poh wrote:
Hi
I read about collapsingQParser returns the facet count the same as
group.truncate=true and has this issue with the facet count and the
after filter facet count notthe same.
Using group.facetdoes not has this issue but it's performance
Actually the documentation is not clear enough.
Let's try to understand this suggester.
*Building*
This suggester build a FST that it will use to provide the autocomplete
feature running prefix searches on it .
The terms it uses to generate the FST are the tokens produced by the
The CollapsingQParserPlugin currently doesn't calculate facets at all. It
simply collapses the document set. The facets are then calculated only on
the group heads.
Grouping has special faceting code built into it that supports the
group.facet functionality.
Joel Bernstein
Unfortunately this won't give you group.facet results:
q=whatever
fq={!collapse tag=collapse}blah
facet.field={!ex=collapse}my_facet_field
This will give you the expanded facet counts as it removes the collapse
filter.
A good explanation of group.facets is here:
We are running PaperThin's CommonSpot CMS in a Cold Fusion 10 and MS SQL Server
2008 R2 environment. We're using Apache Solr 4.10.4 vice Cold Fusion's Solr. We
can create (and delete) collections through the CS CMS; they appear in (and
disappear from) both the physical file structure as well as
I definitely agree with Erick, the stack trace you posted is not complete
again.
This is an example of the same problem you got with a complete, meaningful
stack trace :
Stacktrace you provided :
org.apache.solr.common.SolrException: Exception writing document id 12345
to the index; possible
The AnalyticsQuery can be used to implement custom faceting modules. This
would allow you to calculate facets counts in an algorithm similar to
group.facets before the result set is collapsed. If you are in distributed
mode you will also need to implement a merge strategy:
I have enough RAM (30G) and Hard disk (1000G). It is not I/O bound or
computer disk bound. In addition, the Solr was started with maximal 4G for
JVM, and index size is 2G. In a typical test, I made sure enough free RAM
of 10G was available. I have not tuned any parameter in the configuration,
it
Hi.
I have an old index running on a standalone Solr 4.7.1 and I have to
migrate its index to my new SolrCloud 5.1 installation.
I'm looking for some way to do this but I'm a little confused.
Could you help me please?
Thank you very much!
Bye
2015-06-17 16:11 GMT+02:00 Shalin Shekhar Mangar shalinman...@gmail.com:
Is ZK healthy? Can you try the following from the server on which Solr
is running:
echo ruok | nc zk1 2181
Thank you very much Shalin for your answer!
My ZK cluster was not ready because two nodes was dead and only one
Framework way?
Maybe try delving into the log4j framework and modify the log4j.properties
file. You can generate different log files based upon what class generated the
message. Here's an example that I experimented with previously, it generates
an update log, and 2 different query logs with
It does. Absolutely. But it depends on what you in it. Start from
http://wiki.apache.org/solr/UpdateXmlMessages#add.2Freplace_documents
On Fri, Jun 19, 2015 at 7:54 AM, 步青云 mailliup...@qq.com wrote:
Hello,
I'm a solr user with some question. I want to append new data to the
existing
Please open a JIRA with details of what the issues are, we should try to
support this..
On 18 Jun 2015 15:07, Bence Vass bence.v...@inso.tuwien.ac.at wrote:
Hello,
Is there any documentation on how to start Solr 5.2.1 on Solaris (Solaris
10)? The script (solr start) doesn't work out of the
Hi all,
I have the following search components that I don't have a solution at the
moment to get them working in distributed mode on solr 4.10.4.
[standard query component]
[search component-1] (StageID - 2500):
handleResponses: get few values from docs and populate parameters for
stats
So, the first I can say is if that is true : it almost killed Solr with
280 files you are doing something wrong for sure.
At least if you are not trying to index 4k full movies xD
Joking apart :
1) You should carefully design your analyser.
2) You should store your fields initially to verify you
Yeah, actually changing the field to text_en or text_en_splitting
actually made it so my indexer indexed all my files. The only problem is, I
don't think it's doing it well.
I have two Cores that I'm working with. Both of them have indexed the same
set of files. The first core, which I will
On 6/19/2015 5:40 AM, Paul Revere wrote:
Our log files show entries for each member indexed:
Error: Could not create instance of 'SolrInputDocument'.
~~
Exception: org.apache.solr.common.SolrInputDocument
There will be a *lot* more detail available on this exception. We will
need all of
You really have to ask more specific questions here. What
are you confused _about_? Have
you gone through the tutorial? Read the Solr In Action book?
Tried _anything_?
Best,
Erick
On Fri, Jun 19, 2015 at 5:02 AM, shacky shack...@gmail.com wrote:
Hi.
I have an old index running on a standalone
First and most obvious thing to try:
bq: the Solr was started with maximal 4G for JVM, and index size is 2G
Bump your JVM to 8G, perhaps 12G. The size of the index on disk is very
loosely coupled to JVM requirements. It's quite possible that you're spending
all your time in GC cycles. Consider
You really, really, really want to get friendly with the
admin/analysis page for questions like:
bq: You're probably right though. I probably have to create a better analyzer
really ;).
It shows you exactly what each link in your analysis chain does to the
input. Perhaps 75% or
the questions
2015-06-19 18:00 GMT+02:00 Erick Erickson erickerick...@gmail.com:
You really have to ask more specific questions here. What
are you confused _about_? Have
I read that I could migrate using the backup script, so I looked for
the backup script in the Solr 4.7.1 source code but I haven't find
Yes the number of indexed documents is correct. But the queries I perform
fall short of what they should be. You're probably right though. I probably
have to create a better analyzer.
And I'm not really worried about the other fields. I've already check to see
if it's storing them correctly and
This may be another forehead-slapper (man, you don't know how often
I've injured myself that way).
Did you commit at the end of the SolrJ indexing to Testcore2? DIH automatically
commits at the end of the run, and depending on how your SolrJ program
is written
it may not have. Or just set
On 6/19/2015 11:15 AM, Jim.Musil wrote:
I noticed that when I issue the CREATE collection command to the api, it does
not automatically put a replica on every live node connected to zookeeper.
So, for example, if I have 3 solr nodes connected to a zookeeper ensemble and
create a collection
Jim:
This is by design. There's no way to tell Solr to find all the cores
available and put one replica on each. In fact, you're explicitly
telling it to create one and only one replica, one and only one shard.
That is, your collection will have exactly one low-level core. But you
realized
As for now, the index size is 6.5 M records, and the performance is good
enough. I will re-build the index for all the records (14 M) and test it
again with debug turned on.
Thanks
On Fri, Jun 19, 2015 at 12:10 PM, Erick Erickson erickerick...@gmail.com
wrote:
First and most obvious thing to
I noticed that when I issue the CREATE collection command to the api, it does
not automatically put a replica on every live node connected to zookeeper.
So, for example, if I have 3 solr nodes connected to a zookeeper ensemble and
create a collection like this:
Dirk,
There are 3 open JIRAs related to this behavior:
https://issues.apache.org/jira/browse/SOLR-3739
https://issues.apache.org/jira/browse/SOLR-3740
https://issues.apache.org/jira/browse/SOLR-3741
We worked around it by adding the explicit + signs if the query matched the
problematic
Thanks as always for the great answers!
Jim
On 6/19/15, 11:57 AM, Erick Erickson erickerick...@gmail.com wrote:
Jim:
This is by design. There's no way to tell Solr to find all the cores
available and put one replica on each. In fact, you're explicitly
telling it to create one and only one
Hi,
We are comparing results between Field Collapsing (group* parameters) and
CollapseQParserPlugin. We noticed that some facets are returning incorrect
counts.
Here are the relevant parameters of one of our test queries:
Field Collapsing:
---
Also, since you are tuning for relative times, you can tune on the smaller
index. Surely, you will want to test at scale. But tuning query, analyzer
or schema options is usually easier to do on a smaller index. If you get a 3x
improvement at small scale, it may only be 2.5x at full scale.
Do be aware that turning on debug=query adds a load. I've seen the
debug component
take 90% of the query time. (to be fair it usually takes a much
smaller percentage).
But you'll see a section at the end of the response if you set
debug=all with the time each
component took so you'll have a sense
53 matches
Mail list logo