Solr 4.2 update/extract adding unknown field, can we change field type from string to text

2013-09-03 Thread Jai
hi, while indexing document with unknown fields, its adding unknown fields in schema but its always guessing it as string type. is it possible to specify default field type for unknown fields to some other type, like text so that it gets tokenized? also can we specify other properties by default

Re: SolrCloud - Path must not end with / character

2013-09-03 Thread Prasi S
The issue is resolved. I have given all the path inside tomcat as relative paths( solr home, solr war). That was the creating the problem. On Mon, Sep 2, 2013 at 2:19 PM, Prasi S prasi1...@gmail.com wrote: Does this have anyting to do with tomcat? I cannot go back as we already fixed with

Problem with Synonyms

2013-09-03 Thread Christian Loock
Hello, this is my first time writing at this mailing lost, so hello everyone. I am having issues with synonyms. I added the synonym to one of my field types: |fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer

Update field properties via Schema Rest API ?

2013-09-03 Thread bengates
Hello, I'm pretty new to Solr, as a PHP developer. I'm still reading the tutorials for getting started with Solr, adding and indexing data. I'm still using the example/start.jar, as I still didn't succeed to config a true (production-ready) Solr instance. But doesn't matter. As I can't deal

solr cloud and DIH, indexation runs only on one shard.

2013-09-03 Thread jerome . dupont
Hello again, I still trying to index a with solr cloud and dih. I can index but it seems that indexation is done on only 1 shard. (my goal was to parallelze that to go fast) This my conf: I have 2 tomcat instances, One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080) The

Re: Solr 4.2 update/extract adding unknown field, can we change field type from string to text

2013-09-03 Thread Shalin Shekhar Mangar
You can use the dynamic fields feature of Solr to map unknown field names to types. For example, a dynamic field named as *_s i.e. any field name ending with _s can be mapped to string and so on. In your cases, if your field names do not follow a set pattern, then you can even specify a dynamic

Re: solr cloud and DIH, indexation runs only on one shard.

2013-09-03 Thread Shalin Shekhar Mangar
DataImportHandler does not parallelize indexing at all. It is a single threaded indexer which runs on a single node. However, the documents themselves are routed to the correct shard by SolrCloud. Therefore, what you are observing on your servers is normal. If you want to parallelize indexing

Re: Problem with Synonyms

2013-09-03 Thread pravesh
SOLR has a nice analysis page. You can use it to get insight what is happening after each filter is applied at index/search time Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html Sent from the Solr - User mailing

Re: solr cloud and DIH, indexation runs only on one shard.

2013-09-03 Thread YouPeng Yang
Hi jerome.dupont please check what is the updateHandler in your solrconfig.xml updateRequestProcessorChain name=sample processor class=solr.LogUpdateProcessorFactory / processor class=solr.NoOpDistributingUpdateProcessorFactory/ -- by default,it is

Re: Update field properties via Schema Rest API ?

2013-09-03 Thread Shalin Shekhar Mangar
The Schema REST API is a new feature and supports only adding fields (and that too since Solr 4.4). It doesn't support modifying fields yet. On Tue, Sep 3, 2013 at 2:39 PM, bengates benga...@aliceadsl.fr wrote: Hello, I'm pretty new to Solr, as a PHP developer. I'm still reading the tutorials

Starting Solr in Tomcat with specifying ZK host(s)

2013-09-03 Thread maephisto
Hi, I've setup a ZK instance and also deployed Solr in Tomcat7 on a different instance in Amazon EC2. Afterwards I tried starting tomcat specifying the ZK host IP, like so: sudo service tomcat7 start -DzkHost=zk ip:2181 -DnumShards=3 -Dcollection.configName=myconf

Re: Update field properties via Schema Rest API ?

2013-09-03 Thread bengates
Hello, Thanks for your quick reply. This is what I feared. Do you know if this is planned for Solr 4.5 or Solr 5.0 ? I didn't see anything about it in the roadmap. Thank you, Ben -- View this message in context:

Re: Problem with Synonyms

2013-09-03 Thread Christian Loock
Am 03.09.2013 12:11, schrieb pravesh: SOLR has a nice analysis page. You can use it to get insight what is happening after each filter is applied at index/search time Regards Pravesh -- View this message in context:

Re: db-data-config.xml ?

2013-09-03 Thread Shalin Shekhar Mangar
Did you find any other exceptions in the logs? When I pasted the script section of your data config into my test setup, I got an error saying that there is an unclosed string literal in line 6 On Tue, Sep 3, 2013 at 12:23 AM, Kunzman, Doug dkunz...@usgs.gov wrote: Hi - I'm new to Solr and am

Re: Apostrophes in fields

2013-09-03 Thread devendra W
in my case - the fields with apostrophe not returned in results When I search for -- dev it shows me following results dev dev's devendra but when I search for -- dev' (dev with apo only) Nothing comes out as result ? What could be the workaround ? Thanks Devendra -- View this

Re: phonetic search

2013-09-03 Thread Erick Erickson
Hmmm, seems like it should work. First thing I'd try is using the admin interface and look at the analysis page to see how the input is tokenized both at index and search time, that's sometimes surprising. Second, again using the browser, attach debug=query to the URL. That will echo back what

Re: Update field properties via Schema Rest API ?

2013-09-03 Thread Erick Erickson
Is editing a text file really all that onerous? You can edit the schema.xml file with any editor you're comfortable with and issue the core RELOAD command in the interim. Best Erick On Tue, Sep 3, 2013 at 6:20 AM, bengates benga...@aliceadsl.fr wrote: Hello, Thanks for your quick reply.

Re: Problem with Synonyms

2013-09-03 Thread Erick Erickson
Please explain exactly what but nothing really happens means. Do you mean that you see the SF in the analysis page but there are no substitutions? Or you don't get search results? Or??? You have to reload the core after making changes at minimum, you can restart the Solr instance if you're

Re: Problem with Synonyms

2013-09-03 Thread Christian Loock
The SF part is in the analysis page but nothing is substituted. I reloaded, removed and readded the core, reindexednothing worked :( I wonder if the SF actually uses the correct file for synonyms. I have it laying in the conf folder of the core. Is that correct? Am 03.09.2013 13:32,

Memory usage during aggregation - SolrCloud with very large numbers of facet terms.

2013-09-03 Thread Jackson, Andrew
Hi, We have a large, sharded SolrCloud index of 300 million documents which we use to explore our web archives. We want to facet on fields that have very large numbers of distinct values, e.g. host names and domain names of pages and links. Thus, overall, we expect to have millions of distinct

Re: Measuring SOLR performance

2013-09-03 Thread Dmitry Kan
Hi Roman, Thanks, the --additionalSolrParams was just what I wanted and works fine. BTW, if you have some special bug tracking forum for the tool, I'm happy to submit questions / bug reports there. Otherwise, this email list is ok (for me at least). One other thing I have noticed in the err

Re: Update field properties via Schema Rest API ?

2013-09-03 Thread bengates
Hello Erick, Thank you for your reply. Unfortunately, yes it is. I work with a company that has a catalog with many new attributes every day, and sometimes the existing ones change. For instance, one attribute may live with the unit for months (e.g. screen_size =32 cm) and one day my provider

RE: Memory usage during aggregation - SolrCloud with very large numbers of facet terms.

2013-09-03 Thread Michael Ryan
However, the Solr instance we direct our client query to is consuming significantly more RAM (10GB) and is still failing after a few queries when it runs out of heap space. This is presumably due to the role it plays, aggregating the results from each shard. That seems quite odd... What

SolrCloud - shard containing an invalid host:port

2013-09-03 Thread Marc des Garets
Hi, I have setup SolrCloud with tomcat. I use solr 4.1. I have zookeeper running on 192.168.1.10. A tomcat running solr_myidx on 192.168.1.10 on port 8080. A tomcat running solr_myidx on 192.168.1.11 on port 8080. My solr.xml is like this: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true

Re: Starting Solr in Tomcat with specifying ZK host(s)

2013-09-03 Thread maephisto
When i try to deploy using jetty, everything works fine, and the solr instance gets in the cloud sudo java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkHost=zk ip:2181 -DnumShards=3 -jar start.jar -- View this message in context:

RE: Memory usage during aggregation - SolrCloud with very large numbers of facet terms.

2013-09-03 Thread Jackson, Andrew
The default facet.limit is 10, but it's set to 50 for most of the facets. I've included the query parameters below. In case it makes any difference, there are quite a lot of facet fields with large numbers of terms, and the queries are being generated by the Sarnia Drupal module. Thanks, Andy

Re: Measuring SOLR performance

2013-09-03 Thread Roman Chyla
Hi Dmitry, Thanks for the feedback. Yes, it is indeed jmeter issue (or rather, the issue of the plugin we use to generate charts). You may want to use the github for whatever comes next https://github.com/romanchyla/solrjmeter/issues Cheers, roman On Tue, Sep 3, 2013 at 7:54 AM, Dmitry Kan

Solr 4.3: Recovering from Too many values for UnInvertedField faceting on field

2013-09-03 Thread Dennis Schafroth
We are harvesting and indexing bibliographic data, thus having many distinct author names in our index. While testing Solr 4 I believe I had pushed a single core to 100 million records (91GB of data) and everything was working fine and fast. After adding a little more to the index, then

Re: solr cloud and DIH, indexation runs only on one shard.

2013-09-03 Thread jerome . dupont
It works I've done what you said: _ In my request to get list of documents, I add a where clause filtering on the select getting the documents to index: where noticebib.numnoticebib LIKE '%${dataimporter.request.suffixeNotice}' _ And I called my dih on each shard with the parameter

Re: dataimporter tika doesn't extract certain div

2013-09-03 Thread Shalin Shekhar Mangar
I don't know much about Tika but in the example data-config.xml that you posted, the xpath attribute on the field text won't work because the xpath attribute is used only by a XPathEntityProcessor. On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote: I want tika to only index the

Re: Can we used CloudSolrServer for searching data

2013-09-03 Thread Shalin Shekhar Mangar
CloudSolrServer can only be used if you are actually using SolrCloud (i.e. a ZooKeeper aware setup). If you only have a multi-core setup, then you can use LBHttpSolrServer. See http://wiki.apache.org/solr/LBHttpSolrServer On Tue, Aug 27, 2013 at 2:11 PM, Dharmendra Jaiswal

Dynamic Query Analyzer

2013-09-03 Thread Daniel Rosher
Hi, We have a need to specify a different query analyzer depending on input parameters dynamically. We need this so that we can use different stopword lists at query time. Would any one know how I might be able to achieve this in solr? I'm aware of the solution to specify different field

Re: SolrCloud Set up

2013-09-03 Thread Jared Griffith
I think I have it all sorted out. There are some weird network issues here where my test set up is, so that may have been part of the over all issue. Timeouts wouldn't have fixed this issue, that's for sure. On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson erickerick...@gmail.comwrote: bq:

DIH + Solr Cloud

2013-09-03 Thread Alejandro Calbazana
Hi, Quick question about data import handlers in Solr cloud. Does anyone use more than one instance to support the DIH process? Or is the typical setup to have one box setup as only the DIH and keep this responsibility outside of the Solr cloud environment? I'm just trying to get picture of

Re: SolrCloud - Path must not end with / character

2013-09-03 Thread Jared Griffith
Interesting because I was getting the issue when I was passing the full path (without the trailing / ) to Tomcat. On Mon, Sep 2, 2013 at 11:34 PM, Prasi S prasi1...@gmail.com wrote: The issue is resolved. I have given all the path inside tomcat as relative paths( solr home, solr war). That

Re: Solr 4.2 update/extract adding unknown field, can we change field type from string to text

2013-09-03 Thread Chris Hostetter
Your email is vague in terms of what you are actually *doing* and what behavior you are seeing. Providing specific details like This is my schema.xml and this is my solrconfig.xml; when i POST this file to this URL i get this result and i would instead like to get this result is useful for

Re: SolrCloud Set up

2013-09-03 Thread Erick Erickson
Ah, thanks for the closure, it's always nice to know. I used to work with a guy who had a list of network fallacies, that amounted to you can't trust them fully Erick On Tue, Sep 3, 2013 at 12:12 PM, Jared Griffith jgriff...@picsauditing.comwrote: I think I have it all sorted out. There

Re: distributed query result order tie break question

2013-09-03 Thread Chris Hostetter
: like to understand how the ordering is defined so that I can compute an : integer that is sorted in the same way. For example (shard id 24) | : docid or something like that. If you want to ensure a consistent ordering, you have to index a (unique) value that you use as a secondary sort --

Re: SolrCloud Set up

2013-09-03 Thread Walter Underwood
Those are the Fallacies of Distributed Computing from L. Peter Deutsch. The first fallacy is The network is reliable. http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing wunder On Sep 3, 2013, at 10:26 AM, Erick Erickson wrote: Ah, thanks for the closure, it's always nice to

Re: Dynamic Query Analyzer

2013-09-03 Thread Roman Chyla
You don't need to index fields several times, you can index is just into one field, and use the different query analyzers just to build the query. We're doing this for authors, for example - if query language says =author:einstein, the query parser knows this field should be analyzed differently

Re: Problem parsing suggest response

2013-09-03 Thread Chris Hostetter
: 2. The items at and l are not preceded by name. you're getting back a list of items, the odd items (at, l) are strings, and the even items are more complex objects associated with those strings : Can I interfere with the structure? You can choose how the JSON Writer represents the internal

Re: Dynamic Query Analyzer

2013-09-03 Thread Jack Krupansky
Sounds like it would be better for you to preprocess the query in your application layer. Your requirements seem too open-ended to wire into Solr. But, to be sure, please elaborate exactly what sort of variations you need in query analysis. -- Jack Krupansky -Original Message-

Re: SolrCloud Set up

2013-09-03 Thread Jared Griffith
Thankfully it's none of those but more than likely a bad DHCP server (Windows) or client (or combo there of) that is causing the network to freak out. I'll try adjusting the timeouts up to see if it will alleviate this. I am seeing that when I try to restart the solr instances sometimes they

Re: SolrCloud - shard containing an invalid host:port

2013-09-03 Thread Daniel Collins
Was it a test instance that you created 8983 is the default port, so possibly you started an instance before you had the ports setup properly, and it registered in zookeeper as a valid instance. You can use the Core API to UNLOAD it (if it is still running), if it isn't running anymore, I have

Re: SolrCloud Set up

2013-09-03 Thread Erick Erickson
Yep, that's the one, thanks... On Tue, Sep 3, 2013 at 1:38 PM, Walter Underwood wun...@wunderwood.orgwrote: Those are the Fallacies of Distributed Computing from L. Peter Deutsch. The first fallacy is The network is reliable. http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing

Re: Starting Solr in Tomcat with specifying ZK host(s)

2013-09-03 Thread Shawn Heisey
On 9/3/2013 4:13 AM, maephisto wrote: I've setup a ZK instance and also deployed Solr in Tomcat7 on a different instance in Amazon EC2. Afterwards I tried starting tomcat specifying the ZK host IP, like so: sudo service tomcat7 start -DzkHost=zk ip:2181 -DnumShards=3

Re: Apostrophes in fields

2013-09-03 Thread Shawn Heisey
On 9/3/2013 3:59 AM, devendra W wrote: in my case - the fields with apostrophe not returned in results Don't use special characters in field names. If it wouldn't work as an variable name, function name (or other identifier) in a typical programming language (Java, C, Perl), then it will

Solr Cloud hangs when replicating updates

2013-09-03 Thread Kevin Osborn
I was having problems updating SolrCloud with a large batch of records. The records are coming in bursts with lulls between updates. At first, I just tried large updates of 100,000 records at a time. Eventually, this caused Solr to hang. When hung, I can still query Solr. But I cannot do any

Re: Apostrophes in fields

2013-09-03 Thread Jack Krupansky
Show us your full field type with analyzer. I suspect that the problem is that one of the index-time filters is turning dev's into devs (WDF does that), but at query-time there is no filter that removes a trailing apostrophe. Use the Solr Admin UI Analysis page to see home dev's gets indexed

Re: Change the score of a document based on the *value* of a multifield using dismax

2013-09-03 Thread David Smiley (@MITRE.org)
If you want to alter the score in a customized way based on indexed text data on a per-value basis then index Lucene payloads, and use PayloadTermQuery. See the javadocs for PayloadTermQuery in particular and follow the references. This is a bit dated but read this:

Re: distributed query result order tie break question

2013-09-03 Thread Michael Sokolov
On 09/03/2013 12:50 PM, Chris Hostetter wrote: : like to understand how the ordering is defined so that I can compute an : integer that is sorted in the same way. For example (shard id 24) | : docid or something like that. If you want to ensure a consistent ordering, you have to index a

Re: Solr 4.3: Recovering from Too many values for UnInvertedField faceting on field

2013-09-03 Thread Greg Preston
Our index is too large to uninvert on the fly, so we've been looking into using DocValues to keep a particular field uninverted at index time. See http://wiki.apache.org/solr/DocValues I don't know if this will solve your problem, but it might be worth trying it out. -Greg On Tue, Sep 3, 2013

SolrCloud 4.x hangs under high update volume

2013-09-03 Thread Tim Vaillancourt
Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates

mm, tie, qs, ps and CJKBigramFilter and edismax and dismax

2013-09-03 Thread Naomi Dushay
When I have a field using CJKBigramFilter, parsed CJK chars have a different parsedQuery than non-CJK queries. (旧小说 is 3 chars, so 2 bigrams) args sent in: q={!qf=bi_fld}旧小说pf=pf2=pf3= debugQuery str name=rawquerystring{!qf=bi_fld}旧小说/str str

Re: mm, tie, qs, ps and CJKBigramFilter and edismax and dismax

2013-09-03 Thread Naomi Dushay
Re the relevancy changes I note below for edismax, there are already some issues filed: pertaining to the difference in how the phrase queries are merged into the main query: See Michael Dodsworth's comment of 25/Sep/12 on this issue: https://issues.apache.org/jira/browse/SOLR-2058 --

Re: mm, tie, qs, ps and CJKBigramFilter and edismax and dismax

2013-09-03 Thread Jack Krupansky
The query parser sees q=foo bar as two separate source query terms and analyzes each separately, but q=旧小说 is seen by the query parser as a single source query term and then that one source query term gets tokenized by the query term analyzer as two CJK bigrams. Try q=foo-bar and you should