Re: Solr Suggester not working.

2015-06-30 Thread Vincenzo D'Amore
Hi, can you post your final configuration? On Tue, Jun 30, 2015 at 9:57 AM, ssharma7...@gmail.com ssharma7...@gmail.com wrote: davidphilip cherian Alessandro Benedetti, Thanks for you feedback links, I was able to get the suggestions from suggester component. Thanks Regards, Sachin

Re: Questions regarding autosuggest (Solr 5.2.1)

2015-06-30 Thread Thomas Michael Engelke
God damn. Thank you. *ashamed* Am 30.06.2015 00:21 schrieb Erick Erickson: Try not putting it in double quotes? Best, Erick On Mon, Jun 29, 2015 at 12:22 PM, Thomas Michael Engelke thomas.enge...@posteo.de wrote: A friend and I are trying to develop some software using Solr in the

Solr DIH from MySQL with unique ID

2015-06-30 Thread kurt
Hello. I have a question about the Solr Data Import Handler. I'm using Solr 5.2.1 on a Linux server with 32G ram. I have five different collections, and for each collection, I'm trying to import data from a MySQL database. All of the MySQL queries work properly in MySQL, and previously I was

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread dinesh naik
Thanks Eric and Upayavira for your inputs. Is there a way i can associate this to a unique id of document, either using schema browser or TermsComponent? Best Regards, Dinesh Naik On Tue, Jun 30, 2015 at 2:55 AM, Upayavira u...@odoko.co.uk wrote: Use the schema browser on the admin UI, and

Re: Some guidance on memory requirements/usage/tuning

2015-06-30 Thread Toke Eskildsen
On Tue, 2015-06-30 at 16:39 +1000, Caroline Hind wrote: We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same time increased the physical RAM from 24Gb to 96Gb. We run multiple cores on this one server, approximately 20 in total, but primarily we have one that is huge in

Re: Solr Suggester not working.

2015-06-30 Thread ssharma7...@gmail.com
davidphilip cherian Alessandro Benedetti, Thanks for you feedback links, I was able to get the suggestions from suggester component. Thanks Regards, Sachin Vyas. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Suggester-not-working-tp4214086p4214873.html Sent from

Re: fq versus q

2015-06-30 Thread Esther Goldbraich
Thank you Erick. This solution fits part of our queries, will adopt it for those. Yet we have use-cases in which the results can not be cached. Everyone, What do you think about our assumptions and conclusions? As a general rule of thumb, at least in our case, would you please comment on the

Re: Correcting text at index time

2015-06-30 Thread Jack Krupansky
You would have to have a separate instance of the update processor, each with one of the words. Or, you could code a JavaScript script with the stateless script update processor that has the long list or words and replacements as two arrays or an array of objects, and then iterate through the

AUTO: Nicholas M. Wertzberger is out of the office (returning 07/06/2015)

2015-06-30 Thread Nicholas M. Wertzberger
I am out of the office until 07/06/2015. I'll be out of the office through July 4th. Please contact Jason Brown for any pressing JAS Team related items. Note: This is an automated response to your message Re: Correcting text at index time sent on 6/30/2015 8:55:16 PM. This is the only

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread dinesh naik
Hi Erick, This is mainly for debugging purpose. If i have 20M records and few fields in some of the documents are not indexed as expected or something went wrong during indexing then how do we pin point the exact issue and fix the problem? Best Regards, Dinesh Naik On Tue, Jun 30, 2015 at 5:56

Re: Some guidance on memory requirements/usage/tuning

2015-06-30 Thread Erick Erickson
bq: The type of queries that are run can return anything from 1 million to 9.5 million documents, and typically run for anything from 20 to 45 minutes. Uhhh, are you literally setting the rows parameter to over 9.5M and getting that many docs all at once? Or is that just numFound and you're

Re: Questions regarding autosuggest (Solr 5.2.1)

2015-06-30 Thread Alessandro Benedetti
I would like to add some consideration if possible. I find the field type really hard analysed, are you sure is this ok with your suggestions requirement ? Usually is better to keep the field for suggestion as less analysed as possible and then play with the different type of suggesters. If you

Re: SolrCloud 5.2.1 upgrade

2015-06-30 Thread Vincenzo D'Amore
Update: regarding the solrj changelog I found this: - https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5 and this: - https://issues.apache.org/jira/browse/SOLR/component/12324331?selectedTab=com.atlassian.jira.jira-projects-plugin:component-changelog-panel On

Re: Solr Suggester not working.

2015-06-30 Thread ssharma7...@gmail.com
Vincenzo D'Amore, The following is my (CURRENT) Working Final Configuration: *Scheme.xml* fields . . field name=text type=c_text indexed=true stored=true termVectors=true termPositions=true termOffsets=true / field name=document_name type=c_document_name indexed=true stored=true

Re: Solr DIH from MySQL with unique ID

2015-06-30 Thread Erick Erickson
Two very quick questions: 1 how big is your transaction log? Well, do you even have one? If Solr is abnormally terminated, it'll replay the tlog on startup. The scenario here would be something like you were running DIH without any kind of hard commit specified and killed Solr for some reason.

Re: optimize status

2015-06-30 Thread Erick Erickson
I've actually seen this happen right in front of my eyes in the field. However, that was a very high-performance environment. My assumption was that fragmented index files were causing more disk seeks especially for the first-pass query response in distributed mode. So, if the problem is similar,

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Erick Erickson
In short, not unless you want to get into low-level Lucene coding. Inverted indexes are, well, inverted so their very structure makes this difficult. It looks like this: But I'm not convinced yet that this isn't an XY problem. What is the high-level problem you're trying to solve here? Maybe

SolrCloud 5.2.1 upgrade

2015-06-30 Thread Vincenzo D'Amore
Hi All, I have a bunch of java clients connecting to a solrcloud cluster 4.8.1 with Solrj 4.8.0. The question is, I have to switch clients and cluster to the new version at same time? Could I upgrade the cluster and in the following months upgrade clients? BTW, looking at Solrj 5.2.1 I have seen

Re: Questions regarding autosuggest (Solr 5.2.1)

2015-06-30 Thread Erick Erickson
Pesky computers, they keep doing exactly what I tell 'em to do, not what I mean ;) I'll open a JIRA for making Solr DWIM-compliant, Do What I Mean ;) ;) On Tue, Jun 30, 2015 at 4:17 AM, Thomas Michael Engelke thomas.enge...@posteo.de wrote: God damn. Thank you. *ashamed* Am 30.06.2015

Restricting fields returned by Suggester reult.

2015-06-30 Thread ssharma7...@gmail.com
Hi, Is it possible to restrict the result returned by Suggeter to selected fields only? i.e. Currently, Suggester returns data in following structure (XML), Can I restrict the Solr (5.1) Suggestor to return ONLY term EXCLUDE long name=weight str name=payload/ as per Suggeter result XML below ?

Suggester configuration queries.

2015-06-30 Thread ssharma7...@gmail.com
Hi, I have the following Solr 5.1 configuration: *schema.xml* fields . . field name=text type=c_text indexed=true stored=true termVectors=true termPositions=true termOffsets=true / field name=document_name type=c_document_name indexed=true stored=true required=true multiValued=false /

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Erick Erickson
Dinesh: This is what the admin/analysis page is for. It shows you exactly what tokens are produced by what steps in the analysis chain. That would be far better than trying to analyze the indexed terms. Best, Erick On Tue, Jun 30, 2015 at 8:35 AM, dinesh naik dineshkumarn...@gmail.com wrote:

Re: Solr Suggester not working.

2015-06-30 Thread Vincenzo D'Amore
Thanks Sachin Vyas. Maybe I have found a typo, but there is a closed comment -- alone at end of tag str name=spellcheck.collatefalse/str -- On Tue, Jun 30, 2015 at 2:09 PM, ssharma7...@gmail.com ssharma7...@gmail.com wrote: Vincenzo D'Amore, The following is my (CURRENT) Working Final

Re: Solr Suggester not working.

2015-06-30 Thread ssharma7...@gmail.com
Vincenzo D'Amore, Yes You are right, it's a typo, I missed it while cleaning the XML to put on the Solr-User list. But, *REMOVE *the following line, this was not used in my Solr 5.1 configuration: * str name=spellcheck.collatefalse/str-- * Regards, Sachin Vyas. -- View this message in

Re: Solr DIH from MySQL with unique ID

2015-06-30 Thread kurt
Erick, Many thanks for your reply. 1. The file solr.log does not show any errors, however, there is a file solr.log.8 which is 5MB and has a ton of text that was trying to index, but there was an invalid date error. I fixed that. Is it possible that Solr keeps trying to use that log? Can I

Re: Some guidance on memory requirements/usage/tuning

2015-06-30 Thread Alessandro Benedetti
Am I wrong or the current type of default IndexDirectory is the NRTCachingDirectoryFactory since Solr 4.x ? If I remember well this Factory is creating a Directory implementation built on top of a MMapDirectory. In this case we should rely on the Memory Mapping Operative System feature to properly

Suggeter Result Exception in specific scenario

2015-06-30 Thread ssharma7...@gmail.com
Hi, I have the following Solr 5.1 configuration: *schema.xml* fields . . field name=text type=c_text indexed=true stored=true termVectors=true termPositions=true termOffsets=true / field name=document_name type=c_document_name indexed=true stored=true required=true multiValued=false /

Re: Restricting fields returned by Suggester reult.

2015-06-30 Thread Alessandro Benedetti
Actually what you are asking does not make any sense. Solr response is returning that data structure because it must return as much as possible. It is responsibility of the client to get what it needs from the response. Talking about the Java Client, I contributed the SolrJ code to parse the

Re: Solr DIH from MySQL with unique ID

2015-06-30 Thread Erick Erickson
1 Not solr log. The transaction log. If it is present it'll be a child directory of your data directory called tlog, a sibling to your index directory. And big here is gigabytes. And yes, you can just nuke it if you want. You get one automatically if you are using SolrCloud. 2 OK, it was a long

Re: Suggester configuration queries.

2015-06-30 Thread Erick Erickson
This will be pretty much unworkable for any large corpus. The DocumentDictionaryFactory builds its index by reading the stored value from every document in your index to put into a sidecar Solr index (for free text suggester). This can take many minutes so doing this on every commit is an

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread dinesh naik
Hi Erick, I agree with you. But i was checking if we could get hold on the whole document (to see all analyzed field values) . There might be chances that field value is common for multiple documents . In such cases it will be difficult to backtrack which document has the issue . Because

Re: Restricting fields returned by Suggester reult.

2015-06-30 Thread ssharma7...@gmail.com
Alessandro Benedetti, Thanks for the update. Actually, what I meant by - Is it possible to restrict the result returned by Suggeter to selected fields only? was like option of fl available for querying (/select) in Solr, wherein there could be some fields as defined in schema.xml, but we can

Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Mark Ehle
Thanks to all for the help - it's now storing text and I can search and get results just before in 4.6, but I cannot get snippets to appear when I ask for highlighting. when I add documents, here is the URL my script generates:

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi, I am currently investigating the queries with a much small index size (1M) to see the grouping, faceting on the performance degradation. This will allow me to do a lot of tests in a short period of time. However, it looks like the query is executed much faster the second time. This is tested

Re: optimize status

2015-06-30 Thread Shawn Heisey
On 6/29/2015 2:48 PM, Reitzel, Charles wrote: I take your point about shards and segments being different things. I understand that the hash ranges per segment are not kept in ZK. I guess I wish they were. In this regard, I liked Mongodb, uses a 2-level sharding scheme. Each shard

Re: SolrCloud 5.2.1 upgrade

2015-06-30 Thread Shawn Heisey
On 6/30/2015 6:40 AM, Vincenzo D'Amore wrote: I have a bunch of java clients connecting to a solrcloud cluster 4.8.1 with Solrj 4.8.0. The question is, I have to switch clients and cluster to the new version at same time? Could I upgrade the cluster and in the following months upgrade

Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Alessandro Benedetti
Instead of your immense schema, can you give us the details of the Highlight you are trying to use ? And how you are trying to use it ? Which client ? Direct APi calls ? let us know! Cheers 2015-06-30 15:10 GMT+01:00 Mark Ehle marke...@gmail.com: Thanks to all for the help - it's now storing

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread dinesh naik
Hi Alessandro, I am able to check the field wise analyzed results. I was interested in getting the complete document. As Erick mentioned - Reconstructing the doc from the postings lists isactually quite tedious. The Luke program (not request handler) has a function that does this, it's not fast

Re: DIH deletes cause opening of searchers

2015-06-30 Thread Shawn Heisey
On 6/25/2015 2:20 AM, Mikhail Khludnev wrote: On Tue, Jun 23, 2015 at 9:23 AM, Rudolf Grigeľ grige...@gmail.com wrote: How can I prevent opening new searcher after every delete statement ? comment updateLog tag in solrconfig.xml (it always help) The presence or absence of the updateLog

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Alessandro Benedetti
Do you have the original document available ? Or stored in the field of interest ? Should be quite an easy test to reproduce the Analysis simply using the analysis tool Upaya and Erick suggested. Just use your real document content and you will see how it is exactly analysed. Cheers 2015-06-30

Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Mark Ehle
Alessandro - Someone asked to see the schema, I posted it. Should I have just attached it? Does this mailing list support that? I am by no means a SOLR expert. I am a PHP coder who wrote a (very-much-loved by our library staff and patrons) newspaper indexing tool that I am trying to update. I

Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Alessandro Benedetti
No worries, it is not a big deal you shared the schema.xml, I said that only because it turned the mail a little hard to read, anyway, in my opinion the query is correct, so the problem should reside elsewhere. Can you share the solrconfig.xml piece for your select request handler ? Probably it

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Alessandro Benedetti
But what do you mean with the complete document ? Is it not available anymore ? So you have lost your original document and you want to try to reconstruct from the index ? 2015-06-30 16:05 GMT+01:00 dinesh naik dineshkumarn...@gmail.com: Hi Alessandro, I am able to check the field wise

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread Erick Erickson
I'd set filterCache and queryResultCache to zero (size and autowarm count) Leave documentCache alone IMO as it's used to store documents on disk as the pass through various query components and doesn't autowarm anyway. I'd think taking it out would skew your results because of multiple

Re: DIH deletes cause opening of searchers

2015-06-30 Thread Erick Erickson
From the log fragment it's at least worth further investigation. You've had 4 searchers open in less than 1/2 second. That's horribly fast, but you already know that... Let's see the DIH configs, perhaps there's something innocent-seeming there that's causing this. Or, there's a bug somewhere.

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Test_results_round_2.doc http://lucene.472066.n3.nabble.com/file/n4215016/Test_results_round_2.doc -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215016.html Sent from the Solr - User mailing list archive

RE: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Dinesh Naik
Hi Alessandro, Lets say I have 20M documents with 50 fields in each. I have applied text analysis like compression,ngram,synonym expansion on these fields. Checking individually field level analysis can be easily done via admin/analysis . But I need to do 50 times analysis check for these

Re: optimize status

2015-06-30 Thread Upayavira
On Tue, Jun 30, 2015, at 04:42 PM, Shawn Heisey wrote: On 6/29/2015 2:48 PM, Reitzel, Charles wrote: I take your point about shards and segments being different things. I understand that the hash ranges per segment are not kept in ZK. I guess I wish they were. In this regard, I

Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Mark Ehle
Do you mean this?: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int !-- str name=dftext/str -- /lst /requestHandler On Tue, Jun 30, 2015 at 12:11 PM, Alessandro Benedetti

Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Erick Erickson
Something's not right here. Your query does not specify any field, you have q=JOHN GRAP. Which should parse as q=default_search_field:JOHN GRAP. BUT, you've commented the default field out of the select request handler. I don't _think_ that there's a default in the code, but I've been surprised

Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Mark Ehle
Here's what I get: { responseHeader:{ status:0, QTime:27, params:{ echoParams:all, fl:year, df:_text_, indent:true, q:\JOHN GRAP\, hl.simple.pre:em, debug:true, hl.simple.post:/em, hl.fl:text, wt:json, hl:true,

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Upayavira
you can call the same API as the admin UI does. Pass it strings, it returns tokens in json/xml/whatever. Upayavira On Tue, Jun 30, 2015, at 06:55 PM, Dinesh Naik wrote: Hi Alessandro, Lets say I have 20M documents with 50 fields in each. I have applied text analysis like

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi All, I did many tests with very consistent test results. Each query was executed after re-indexing, and only one request was sent to query the index. I disabled filterCache and queryResultCache for this test based on Erick's recommendation. The test document was posted to this email list

RE: Using the DataImportHandler to get filepath from MySQL DataBase BackSlash Character Problem

2015-06-30 Thread Keswani, Nitin - BLS CTR
Hi Paden, I believe you could use a PatternReplaceFilterFactory ( http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.html ) configured in your fieldtype that could replace '' with '\\' at index time. Thanks. Regards, Nitin

Using the DataImportHandler to get filepath from MySQL DataBase BackSlash Character Problem

2015-06-30 Thread Paden
Hello, I'm having a slight Catch-22 scenario going on with my Solr indexing process. I'm using the DataImportHandler to pull a filepath from a database. The problems is that Windows filepaths have the backslash character inside their paths. \\some\filepath So when insert this data into MySQL

Re: cursorMark and timeAllowed are mutually exclusive?

2015-06-30 Thread Bernd Fehling
Thanks for your explanation. Right out of your head, are there any other options which prevent getting a cursorMark? Yes, that was also my idea to set up a separate request handler for harvesting without timeAllowed. As Shawn suggested, a short note about this should go into the documentation.

Re: optimize status

2015-06-30 Thread Upayavira
We need to work out why your performance is bad without optimise. What version of Solr are you using? Can you confirm that your config is using the TieredMergePolicy? Upayavira Oe, Jun 30, 2015, at 04:48 AM, Summer Shire wrote: Hi Upayavira and Erick, There are two things we are talking

Some guidance on memory requirements/usage/tuning

2015-06-30 Thread Caroline Hind
Hi, I am very new to SOLR, and would appreciate some guidance if anyone has the time to offer it. We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same time increased the physical RAM from 24Gb to 96Gb. We run multiple cores on this one server, approximately 20 in total,

Re: Correcting text at index time

2015-06-30 Thread hossmaa
Hi all Thanks for the replies. So there's no getting away from doing it on my own then... @Jack: I need to replace a whole list of shortened words... It would make a crazy regex (which I incidentally wouldn't even know how to formulate). Cheers A. -- View this message in context:

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread Erick Erickson
bq: The index size is only 1 M records. A 10 times of the record size ( 10M) will likely bring the total response time to 1 second This is an extrapolation you simply cannot make. Plus you cannot really tell anything from just a few queries about system performance. In fact you must disregard

Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Erick Erickson
It _looks_ like you're searching against _text_ and trying to highlight on text. On a very brief grep of all the Java code I don't see _text_ defined anywhere (of course I could be missing something here). So none of this makes sense. you have no df field defined, yet you're getting a default