Re: Parsing dating during indexing - Year Only

2015-06-19 Thread Chris Hostetter
I'm not sure i understand your question ... if you know that you are only ever going to have the 'year' then why not just index the year as an int? a TrieDateField isn't really of any use to you, because normal date type usage (date math, date ranges) are useless because you don't have any

Parsing dating during indexing - Year Only

2015-06-19 Thread levanDev
Hello, Example csv doc has column 'just_the_year' and value '2010': With the Schema API I can tell the indexing process to treat 'just_the_year' as a date field. I know that I can update the solrconfig.xml to correctly parse formats such as MM/dd/ (which is awesome) but has anyone tried

Re: CollapseQParserPluging Incorrect Facet Counts

2015-06-19 Thread Joel Bernstein
The CollapsingQParserPlugin does not provide facet counts that are them same as the group.facet feature in Grouping. It provides facet counts that behave like group.truncate. The CollapsingQParserPlugin only collapses the result set. The facets counts are then generated for the collapsed result

Re: CollapseQParserPluging Incorrect Facet Counts

2015-06-19 Thread Joel Bernstein
If you see the last comment on: https://issues.apache.org/jira/browse/SOLR-6143 You'll see there is a discussion starting about adding this feature. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 19, 2015 at 4:14 PM, Joel Bernstein joels...@gmail.com wrote: The

RE: CollapseQParserPluging Incorrect Facet Counts

2015-06-19 Thread Carlos Maroto
Thanks Joel, I don't know why I was unable to find the understanding collapsing email thread via the search I did on the site but I found it in my own email search now. We'll look into our specific scenario and see if we can find a workaround. Thanks! CARLOS MAROTO    M +1 626 354 7750

Re: Parsing dating during indexing - Year Only

2015-06-19 Thread levanDev
Hi Chris, Thank you for taking the time to write the detailed response. Very helpful. Dealing with interesting formats in the source data and trying to evaluate various options for our business needs. The second scenario you described (where some values in the date field are just the year) will

Re: Parsing dating during indexing - Year Only

2015-06-19 Thread Erick Erickson
Hmm, I can see some things you couldn't do with just using a tint field for the year. Or rather, some things that wouldn't be as convenient But this might help: http://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html or you can

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Carlos Maroto
As stated previously, using Field Collapsing (group parameters) tends to significantly slow down queries. In my experience, search response gets even worst when: - Requesting facets, which more often than not I do in my query formulation - Asking for the facet counts to be on the groups via the

Re: understanding collapsingQParser with facet vs group.facet

2015-06-19 Thread Derek Poh
Hi Upayavira Thank you for your explanation onthe difference between traditional grouping and collapsingQParser. I understand more now. On 6/19/2015 7:11 PM, Upayavira wrote: On Fri, Jun 19, 2015, at 06:20 AM, Derek Poh wrote: Hi I read about collapsingQParser returns the facet count the

Re: Auto-suggest in Solr

2015-06-19 Thread Zheng Lin Edwin Yeo
Ok sure. ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. I got confused by this, as I could not get the

Same query, inconsistent result in SolrCloud

2015-06-19 Thread Jerome Yang
Hi! I'm facing a problem. I'm using SolrCloud 4.10.3, with 2 shards, each shard have 2 replicas. After index data to the collection, and run the same query, http://localhost:8983/solr/catalog/select?q=awt=jsonindent=true Sometimes, it return the right, { responseHeader:{ status:0,

Re: understanding collapsingQParser with facet vs group.facet

2015-06-19 Thread Derek Poh
Hi Joel By group heads, is it referring to the document thatis use to represent each group in the main result section? Eg. Using the below 3 documentsandwe collapse on field supplier_id supplier_id:S1 product_id:P1 supplier_id:S2 product_id:P2 supplier_id:S2 product_id:P3 With collapse on

Re: Help: Problem in customized token filter

2015-06-19 Thread Aman Tandon
Steve, Thank you thank you so much. You guys are awesome. Steve how can i learn more about the lucene indexing process in more detail. e.g. after we send documents for indexing which function calls till the doc actually store in index files. I will be thankful to you. If you guide me here.

understanding collapsingQParser with facet vs group.facet

2015-06-19 Thread Derek Poh
Hi I read about collapsingQParser returns the facet count the same as group.truncate=true and has this issue with the facet count and the after filter facet count notthe same. Using group.facetdoes not has this issue but it's performance is very badcompared to collapsingQParser. I trying to

Limit indexed documents.

2015-06-19 Thread tomas.kalas
Hello i have a few questions for indexing data. Existing some hardware or software limits for indexing data? And is some maximum of indexed documents? Thanks for your answers. -- View this message in context: http://lucene.472066.n3.nabble.com/Limit-indexed-documents-tp4212913.html Sent from

SolrJ: getBeans with multiple document types in response

2015-06-19 Thread Catala, Francois
Hello, I'm trying to parse Solr Responses with SolrJ, but the responses contain mixed types : for example 'song' documents and 'movie' documents with different fields. The getBeans method takes 1 class type as input parameter, this does not allow for mixed document types responses. What would

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Paden
Yeah I'm just gonna say hands down this was a totally bad question. My fault, mea culpa. I'm pretty new to working in an IDE environment and using a stack trace (I just finished my first year of CS at University and now I'm interning). I'm actually kind of embarrassed by how long it took me to

Re: Limit indexed documents.

2015-06-19 Thread Toke Eskildsen
tomas.kalas kala...@email.cz wrote: Existing some hardware or software limits for indexing data? The only really hard Solr limit is 2 billion X per shard, where X is document count, unique values in a DocValues String field and other things like that. There are some softer limits, after which

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
Hi Wenbin, To me, your instance appears well provisioned. Likewise, your analysis of test vs. production performance makes a lot of sense. Perhaps your time would be well spent tuning the query performance for your app before resorting to sharding? To that end, what do you see when you

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Alessandro Benedetti
Silly thing … Maybe the immense token was generating because trying to set string as field type for your text ? Can be ? Can you wipe out the index, set a proper type for your text, and index again ? No worries about the not full stack trace, We learn and do wrong things everyday :) Errare humanum

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
Grouping does tend to be expensive. Our regular queries typically return in 10-15ms while the grouping queries take 60-80ms in a test environment ( 1M docs). This is ok for us, since we wrote our app to take the grouping queries out of the critical path (async query in parallel with two

Re: understanding collapsingQParser with facet vs group.facet

2015-06-19 Thread Upayavira
On Fri, Jun 19, 2015, at 06:20 AM, Derek Poh wrote: Hi I read about collapsingQParser returns the facet count the same as group.truncate=true and has this issue with the facet count and the after filter facet count notthe same. Using group.facetdoes not has this issue but it's performance

Re: Auto-suggest in Solr

2015-06-19 Thread Alessandro Benedetti
Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the

Re: understanding collapsingQParser with facet vs group.facet

2015-06-19 Thread Joel Bernstein
The CollapsingQParserPlugin currently doesn't calculate facets at all. It simply collapses the document set. The facets are then calculated only on the group heads. Grouping has special faceting code built into it that supports the group.facet functionality. Joel Bernstein

Re: understanding collapsingQParser with facet vs group.facet

2015-06-19 Thread Joel Bernstein
Unfortunately this won't give you group.facet results: q=whatever fq={!collapse tag=collapse}blah facet.field={!ex=collapse}my_facet_field This will give you the expanded facet counts as it removes the collapse filter. A good explanation of group.facets is here:

Error: Could not create instance of 'SolrInputDocument'

2015-06-19 Thread Paul Revere
We are running PaperThin's CommonSpot CMS in a Cold Fusion 10 and MS SQL Server 2008 R2 environment. We're using Apache Solr 4.10.4 vice Cold Fusion's Solr. We can create (and delete) collections through the CS CMS; they appear in (and disappear from) both the physical file structure as well as

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Alessandro Benedetti
I definitely agree with Erick, the stack trace you posted is not complete again. This is an example of the same problem you got with a complete, meaningful stack trace : Stacktrace you provided : org.apache.solr.common.SolrException: Exception writing document id 12345 to the index; possible

Re: understanding collapsingQParser with facet vs group.facet

2015-06-19 Thread Joel Bernstein
The AnalyticsQuery can be used to implement custom faceting modules. This would allow you to calculate facets counts in an algorithm similar to group.facets before the result set is collapsed. If you are in distributed mode you will also need to implement a merge strategy:

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Wenbin Wang
I have enough RAM (30G) and Hard disk (1000G). It is not I/O bound or computer disk bound. In addition, the Solr was started with maximal 4G for JVM, and index size is 2G. In a typical test, I made sure enough free RAM of 10G was available. I have not tuned any parameter in the configuration, it

Migration from Solr 4.7.1 to SolrCloud 5.1

2015-06-19 Thread shacky
Hi. I have an old index running on a standalone Solr 4.7.1 and I have to migrate its index to my new SolrCloud 5.1 installation. I'm looking for some way to do this but I'm a little confused. Could you help me please? Thank you very much! Bye

Re: ZooKeeper connection refused

2015-06-19 Thread shacky
2015-06-17 16:11 GMT+02:00 Shalin Shekhar Mangar shalinman...@gmail.com: Is ZK healthy? Can you try the following from the server on which Solr is running: echo ruok | nc zk1 2181 Thank you very much Shalin for your answer! My ZK cluster was not ready because two nodes was dead and only one

RE: Solr Logging

2015-06-19 Thread Garth Grimm
Framework way? Maybe try delving into the log4j framework and modify the log4j.properties file. You can generate different log files based upon what class generated the message. Here's an example that I experimented with previously, it generates an update log, and 2 different query logs with

Re: How to append new data to index i solr?

2015-06-19 Thread Mikhail Khludnev
It does. Absolutely. But it depends on what you in it. Start from http://wiki.apache.org/solr/UpdateXmlMessages#add.2Freplace_documents On Fri, Jun 19, 2015 at 7:54 AM, 步青云 mailliup...@qq.com wrote: Hello, I'm a solr user with some question. I want to append new data to the existing

Re: Solr 5.2.1 on Solaris

2015-06-19 Thread Ramkumar R. Aiyengar
Please open a JIRA with details of what the issues are, we should try to support this.. On 18 Jun 2015 15:07, Bence Vass bence.v...@inso.tuwien.ac.at wrote: Hello, Is there any documentation on how to start Solr 5.2.1 on Solaris (Solaris 10)? The script (solr start) doesn't work out of the

Distributed Search component question

2015-06-19 Thread Mihran Shahinian
Hi all, I have the following search components that I don't have a solution at the moment to get them working in distributed mode on solr 4.10.4. [standard query component] [search component-1] (StageID - 2500): handleResponses: get few values from docs and populate parameters for stats

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Alessandro Benedetti
So, the first I can say is if that is true : it almost killed Solr with 280 files you are doing something wrong for sure. At least if you are not trying to index 4k full movies xD Joking apart : 1) You should carefully design your analyser. 2) You should store your fields initially to verify you

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Paden
Yeah, actually changing the field to text_en or text_en_splitting actually made it so my indexer indexed all my files. The only problem is, I don't think it's doing it well. I have two Cores that I'm working with. Both of them have indexed the same set of files. The first core, which I will

Re: Error: Could not create instance of 'SolrInputDocument'

2015-06-19 Thread Shawn Heisey
On 6/19/2015 5:40 AM, Paul Revere wrote: Our log files show entries for each member indexed: Error: Could not create instance of 'SolrInputDocument'. ~~ Exception: org.apache.solr.common.SolrInputDocument There will be a *lot* more detail available on this exception. We will need all of

Re: Migration from Solr 4.7.1 to SolrCloud 5.1

2015-06-19 Thread Erick Erickson
You really have to ask more specific questions here. What are you confused _about_? Have you gone through the tutorial? Read the Solr In Action book? Tried _anything_? Best, Erick On Fri, Jun 19, 2015 at 5:02 AM, shacky shack...@gmail.com wrote: Hi. I have an old index running on a standalone

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Erick Erickson
First and most obvious thing to try: bq: the Solr was started with maximal 4G for JVM, and index size is 2G Bump your JVM to 8G, perhaps 12G. The size of the index on disk is very loosely coupled to JVM requirements. It's quite possible that you're spending all your time in GC cycles. Consider

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Erick Erickson
You really, really, really want to get friendly with the admin/analysis page for questions like: bq: You're probably right though. I probably have to create a better analyzer really ;). It shows you exactly what each link in your analysis chain does to the input. Perhaps 75% or the questions

Re: Migration from Solr 4.7.1 to SolrCloud 5.1

2015-06-19 Thread shacky
2015-06-19 18:00 GMT+02:00 Erick Erickson erickerick...@gmail.com: You really have to ask more specific questions here. What are you confused _about_? Have I read that I could migrate using the backup script, so I looked for the backup script in the Solr 4.7.1 source code but I haven't find

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Paden
Yes the number of indexed documents is correct. But the queries I perform fall short of what they should be. You're probably right though. I probably have to create a better analyzer. And I'm not really worried about the other fields. I've already check to see if it's storing them correctly and

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Erick Erickson
This may be another forehead-slapper (man, you don't know how often I've injured myself that way). Did you commit at the end of the SolrJ indexing to Testcore2? DIH automatically commits at the end of the run, and depending on how your SolrJ program is written it may not have. Or just set

Re: CREATE collection bug or feature?

2015-06-19 Thread Shawn Heisey
On 6/19/2015 11:15 AM, Jim.Musil wrote: I noticed that when I issue the CREATE collection command to the api, it does not automatically put a replica on every live node connected to zookeeper. So, for example, if I have 3 solr nodes connected to a zookeeper ensemble and create a collection

Re: CREATE collection bug or feature?

2015-06-19 Thread Erick Erickson
Jim: This is by design. There's no way to tell Solr to find all the cores available and put one replica on each. In fact, you're explicitly telling it to create one and only one replica, one and only one shard. That is, your collection will have exactly one low-level core. But you realized

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Wenbin Wang
As for now, the index size is 6.5 M records, and the performance is good enough. I will re-build the index for all the records (14 M) and test it again with debug turned on. Thanks On Fri, Jun 19, 2015 at 12:10 PM, Erick Erickson erickerick...@gmail.com wrote: First and most obvious thing to

CREATE collection bug or feature?

2015-06-19 Thread Jim . Musil
I noticed that when I issue the CREATE collection command to the api, it does not automatically put a replica on every live node connected to zookeeper. So, for example, if I have 3 solr nodes connected to a zookeeper ensemble and create a collection like this:

RE: Extended Dismax Query Parser with AND as default operator

2015-06-19 Thread Cario, Elaine
Dirk, There are 3 open JIRAs related to this behavior: https://issues.apache.org/jira/browse/SOLR-3739 https://issues.apache.org/jira/browse/SOLR-3740 https://issues.apache.org/jira/browse/SOLR-3741 We worked around it by adding the explicit + signs if the query matched the problematic

Re: CREATE collection bug or feature?

2015-06-19 Thread Jim . Musil
Thanks as always for the great answers! Jim On 6/19/15, 11:57 AM, Erick Erickson erickerick...@gmail.com wrote: Jim: This is by design. There's no way to tell Solr to find all the cores available and put one replica on each. In fact, you're explicitly telling it to create one and only one

CollapseQParserPluging Incorrect Facet Counts

2015-06-19 Thread Carlos Maroto
Hi, We are comparing results between Field Collapsing (group* parameters) and CollapseQParserPlugin. We noticed that some facets are returning incorrect counts. Here are the relevant parameters of one of our test queries: Field Collapsing: ---

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
Also, since you are tuning for relative times, you can tune on the smaller index. Surely, you will want to test at scale. But tuning query, analyzer or schema options is usually easier to do on a smaller index. If you get a 3x improvement at small scale, it may only be 2.5x at full scale.

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Erick Erickson
Do be aware that turning on debug=query adds a load. I've seen the debug component take 90% of the query time. (to be fair it usually takes a much smaller percentage). But you'll see a section at the end of the response if you set debug=all with the time each component took so you'll have a sense