Re: having trouble storing large text blob fields - returns binary address in search results

2013-05-17 Thread Gora Mohanty
On 18 May 2013 02:24, geeky2 wrote: > Hello Gora, > > > thank you for the reply - > > i did finally get this to work. i had to cast the column in the DIH to a > clob - like this. > > cast(att.attr_val AS clob) as attr_val, > cast(rsr.rsr_val AS clob) as rsr_val, > > once this was done, th

Re: protect solr pages

2013-05-17 Thread Tim Vaillancourt
A lot of people (including me) are asking for this type of support in this JIRA: https://issues.apache.org/jira/browse/SOLR-4470 Although brought up frequently on the list, the effort doesn't seem to be moving too much. I can confirm that the most recent patch on this JIRA will work with the spec

Re: HttpClient version

2013-05-17 Thread Shawn Heisey
On 5/17/2013 5:06 PM, Jamie Johnson wrote: > I am trying to use Solr inside of another framework (Storm) that provides a > version of HttpClient (4.1.x) that is incompatible with the latest version > that SolrJ requires (4.3.x). Is there a way to use the older version of > HttpClient with SolrJ?

HttpClient version

2013-05-17 Thread Jamie Johnson
I am trying to use Solr inside of another framework (Storm) that provides a version of HttpClient (4.1.x) that is incompatible with the latest version that SolrJ requires (4.3.x). Is there a way to use the older version of HttpClient with SolrJ? Are there any issues with using an earlier SolrJ (4

Re: Question about attributes

2013-05-17 Thread Jack Krupansky
Maybe you want to set the payload for each term, based on your animal attribute. Then there is minimal support in Solr for payloads. There is no immediate filter for capturing an arbitrary attribute. Take a look at TypeAsPayloadTokenFilterFactory . You could do something similar, like AnimalAs

RE: Speed up import of Hierarchical Data

2013-05-17 Thread O. Olson
Thank you James. I think I got this to work using CachedSqlEntityProcessor – and it seems extremely fast. I will try SortedMapBackedCache on Monday :-). Thank you, O. O. Dyer, James-2 wrote > Using SqlEntityProcessor with cacheImpl="SortedMapBackedCache" is the same > as specifying "CachedSqlEn

Question about attributes

2013-05-17 Thread Thomas Portegys
First time on forum. We are planning to use Solr to house some data mining formation, and we are thinking of using attributes to add some semantic information to indexed content. As a test, I wrote a filter that adds an "animal" attribute to tokens like "dog", "cat", etc. After adding a document

RE: Is payload the right solution for my problem?

2013-05-17 Thread Petersen, Robert
Hi It will not be double the disk space at all. You will not need to store the field you search, only the field being returned needs to be stored. Furthermore if you are not searching the XML field you will not need to index that field, only store it. Hope that helps, Robi -Original Mes

Re: having trouble storing large text blob fields - returns binary address in search results

2013-05-17 Thread geeky2
Hello Gora, thank you for the reply - i did finally get this to work. i had to cast the column in the DIH to a clob - like this. cast(att.attr_val AS clob) as attr_val, cast(rsr.rsr_val AS clob) as rsr_val, once this was done, the ClobTransformer worked. to my knowledge - this parti

Re: Zookeeper Ensemble Startup Parameters For SolrCloud?

2013-05-17 Thread vsilgalis
As an example, I have 9 SOLR nodes (3 clusters of 3) using different versions of SOLR (4.1, 4.1, and 4.2.1), utilizing the same zookeeper ensemble (3 servers), using chroot for the different configs across clusters. My zookeeper servers are just VMs, dual-core with 1GB of RAM and are only used for

Re: Zookeeper Ensemble Startup Parameters For SolrCloud?

2013-05-17 Thread Furkan KAMACI
Hi Mark; Thanks for the answer. Does Solr nodes holds the current state of cluster (which Zookeeper ensemble knows) inside their cache/RAM? 2013/5/17 Mark Miller > The way Solr uses ZK, unless you are also using ZK with something else, I > wouldn't worry about it at all. In a steady state, the

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-17 Thread Lance Norskog
This is great; data like this is rare. Can you tell us any hardware or throughput numbers? On 05/17/2013 12:29 PM, Rishi Easwaran wrote: Hi All, Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our

Re: Solr 4 memory usage increase

2013-05-17 Thread Wei Zhao
We have master/slave setup. We disabled autocommits/autosoftcommits. So the slave only replicates from master and serve query. Master does all the indexing and commit every 5 minutes. Slave polls master every 2.5 minutes and does replication. Both tests with Solr 3.5 and 4.2 was run with the same

protect solr pages

2013-05-17 Thread gpssolr2020
Hi, i want implement security through jetty realm in solr4. So i configured related stuffs in realm.properties ,jetty.xml, webdefault.xml under /solrhome/example/etc. But still it is not working. Please advise. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/prot

Re: Solr 4 memory usage increase

2013-05-17 Thread Wei Zhao
Here is the JVM info: $ java -version java version "1.6.0_26" Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-memory-usage-increase-tp4064066p4064271.h

Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-17 Thread Rishi Easwaran
Hi All, Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our mail search backend. We have been using it since pre-SOLR 1.4 and strong supporters of SOLR community. We deal with millions indexes and b

Re: Facet pivot 50.000.000 different values

2013-05-17 Thread Mikhail Khludnev
On Fri, May 17, 2013 at 12:47 PM, Carlos Bonilla wrote: > We > only need to calculate how many different "B" values have more than 1 > document but it takes ages > Carlos, It's not clear whether you need to take results of a query into account or just gather statistics from index. if later you ca

Re: having trouble storing large text blob fields - returns binary address in search results

2013-05-17 Thread Gora Mohanty
On 17 May 2013 00:02, geeky2 wrote: [...] > i have tried setting them up as clob fields - but this is not working (see > details below) > > i have also tried treating them as plain string fields (removing the > references to clob in the DIH) - but this does not work either. > > > DIH configuration

Re: Solr 4 memory usage increase

2013-05-17 Thread William Bell
Yeah how to turn off index writer ? On Friday, May 17, 2013, Andre Bois-Crettez wrote: > Can you explain your setup more ? > ie. is it master/slave, indexing in parallel, etc ? > > We had to commit more often to reduce JVM memory usage due to > transaction logs in SolrCloud mode, compared with pr

Re: Java heap space exception in 4.2.1

2013-05-17 Thread J Mohamed Zahoor
Memory increase a lot with queries which have facets… ./Zahoor On 17-May-2013, at 10:00 PM, Shawn Heisey wrote: > On 5/17/2013 1:17 AM, J Mohamed Zahoor wrote: >> I moved to 4.2.1 from 4.1 recently.. everything was working fine until i >> added few more stats query.. >> Now i am getting thi

Re: Is payload the right solution for my problem?

2013-05-17 Thread jasimop
I think I just found the solution. Would the right strategy be to store the original XML content and then use a solr.HTMLStripCharFilterFactory when querying? I just made a quick test and it work, the only problem now is that it also finds the data contained in the XML attribute fields. I think I

Bugs with edismax parser

2013-05-17 Thread anonymous user
Hi Solr Users, I've been migrating our existing app from Solr 3.6.2 to Solr 4.3 and I've come across some strange behaviour that I think demonstrate one or more bugs in the edismax parser. -- Setup -- with a clean copy of solr-4

Re: StandardTokenizer vs. hyphens

2013-05-17 Thread Shawn Heisey
On 5/17/2013 10:26 AM, Kai Gülzau wrote: > Is there some StandardTokenizer Implementation which does not break words on > hyphens? > > I think it would be more flexible to retain hyphens and use a > WordDelimiterFactory to split these tokens. You can use the whitespace tokenizer with WDF. This

Re: Using the Collections API

2013-05-17 Thread Mark Miller
What version of Solr? I think there was a bug a couple versions back (perhaps introduced in 4.1 if I remember right) that made it so creates were not spread correctly. - Mark

Re: Java heap space exception in 4.2.1

2013-05-17 Thread Shawn Heisey
On 5/17/2013 1:17 AM, J Mohamed Zahoor wrote: > I moved to 4.2.1 from 4.1 recently.. everything was working fine until i > added few more stats query.. > Now i am getting this error frequently that solr does not run even for 2 > minutes continuously. > All 5GB is getting used instantaneously in f

StandardTokenizer vs. hyphens

2013-05-17 Thread Kai Gülzau
Is there some StandardTokenizer Implementation which does not break words on hyphens? I think it would be more flexible to retain hyphens and use a WordDelimiterFactory to split these tokens. StandardTokenizer today: doc1: email -> email doc2: e-mail -> e|mail doc3: e mail -> e|mail query1: e

Re: Using the Collections API

2013-05-17 Thread Shawn Heisey
On 5/17/2013 4:03 AM, Jared Rodriguez wrote: > So it sounds like you want the collection created with a master and a > replica and you want one to be on each node? If so, I believe that you can > get that effect by specifying maxShardsPerNode=1 as part of your url line. > This will tell solr to c

Re: Solr 4 memory usage increase

2013-05-17 Thread Andre Bois-Crettez
Can you explain your setup more ? ie. is it master/slave, indexing in parallel, etc ? We had to commit more often to reduce JVM memory usage due to transaction logs in SolrCloud mode, compared with previous setups without tlogs. update?commit=true&openSearcher=false André On 05/17/2013 09:56 A

Re: Facet pivot 50.000.000 different values

2013-05-17 Thread Shawn Heisey
On 5/17/2013 2:47 AM, Carlos Bonilla wrote: > To calculate some stats we are using a field "B" with 50.000. different > values as facet pivot in a schema that contains 200.000.000 documents. We > only need to calculate how many different "B" values have more than 1 > document but it takes ages.

Re: error while switching from log4j back to slf4j with solr 4.3

2013-05-17 Thread Shawn Heisey
On 5/17/2013 12:25 AM, Bernd Fehling wrote: > Actually there is no "real" container in my eclipse debugging env :-( > /opt/indigo/eclipse/configuration/org.eclipse.osgi/bundles/884/1/.cp/lib/jetty-webapp-8.1.2.v20120308.jar > > Then it should be copied to lib/ext of eclipse/run-jetty-run > "java.​

Re: Question about Edismax - Solr 4.0

2013-05-17 Thread Sandeep Mestry
Hello Jack, Thanks for pointing the issues out and for your valuable suggestion. My preliminary tests were okay on search but I will be doing more testing to see if this has impacted any other searches. Thanks once again and have a nice sunny weekend, Sandeep On 17 May 2013 05:35, Jack Krupansk

Re: Java heap space exception in 4.2.1

2013-05-17 Thread J Mohamed Zahoor
Hprof introspection shows that huge Double Array are using up 75% of heap space... which belongs to Lucen's FieldCache.. ./zahoor On 17-May-2013, at 12:47 PM, J Mohamed Zahoor wrote: > Hi > > I moved to 4.2.1 from 4.1 recently.. everything was working fine until i > added few more stats qu

Keyword aware Tokenizer?

2013-05-17 Thread Kai Gülzau
Does anybody know of a tokenizer which can be configured with (multiple) regular expressions to mark some of the input text as keyword and behave like StandardTokenizer (or UAX29URLEmailTokenizer) otherwise? Input: Does my order 4711.0815!-somecode_and.other(stuff) arrive on friday? Tokens: does

RE: Speed up import of Hierarchical Data

2013-05-17 Thread Dyer, James
Using SqlEntityProcessor with cacheImpl="SortedMapBackedCache" is the same as specifying "CachedSqlEntityProcessor". Because the pluggable caches are only partially committed, I never added details to the wiki, so it still refers to CachedSEP. But its the same thing. What is new here, though,

Re: Adding filed in Schema.xml

2013-05-17 Thread Alexandre Rafalovitch
Do you have the types corresponding to those fields present? Specifically, "long". You don't get any special type names out of the box, they all need to be present in types area. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovit

Bloom Filters

2013-05-17 Thread Isaac Hebsh
Hi everyone.. I'm indexing docs into Solr using the update request handler, by POSTing data to the REST endpoint (not SolrJ, not DIH). My indexer should return an indication, whether the document existed in the collection before or not, based in its ID. The obvious solution is the perform a query

Re: SurroundQParser does not analyze the query text

2013-05-17 Thread Isaac Hebsh
Thank you Erik and Jack. I opened a JIRA issue: https://issues.apache.org/jira/browse/SOLR-4834 I wish a will have time to sumbit a patch file soon. On Fri, May 17, 2013 at 7:38 AM, Jack Krupansky wrote: > (Erik: Or he can get the LucidWorks Search product and then use "near" and > "before" ope

Re: indexing unrelated tables in single core

2013-05-17 Thread Gora Mohanty
On 16 May 2013 19:11, Rohan Thakur wrote: > hi Mohanty > > I tried what you suggested of using id as common field and changing the SQL > query to point to id > and using id as uniqueKey > it is working but now what it is doing is just keeping the id's that are > not same in both the tables and dis

Re: Solr cloud Some basic questions

2013-05-17 Thread Jack Krupansky
Start by simply experimenting with 2 shards and 2 replicas - 4 nodes. And just run zk on the nodes themselves for simple experiments. It's better to deploy zk separate from the Solr nodes, but for simple testing it shouldn't matter. Get experience with SolrCloud using a simple configuration befo

Adding filed in Schema.xml

2013-05-17 Thread Kamal Palei
Hi All I am trying to add few fields in schema.xml file as below. * * Only the "last_updated_date" (the one in bold letters) getting added. Is there any syntax issue with other 4 entries. Kindly let me know. Thanks kamal

Re: Zookeeper Ensemble Startup Parameters For SolrCloud?

2013-05-17 Thread Mark Miller
The way Solr uses ZK, unless you are also using ZK with something else, I wouldn't worry about it at all. In a steady state, the cluster won't even really talk to ZK in any intensive manner at all. - Mark On May 16, 2013, at 5:07 PM, Furkan KAMACI wrote: > Hi Shawn; > > I will have totally 1

Insane FieldCache usage when using group.facet=true

2013-05-17 Thread Elisabeth Adler
Dear all, I am running a grouped query including facets in my Junit Test cases against a Solr 4.2.1 Embedded Server. When faceting the groups, I want the counts to reflect the number of groups, not the number of documents. But when I enable "&group.facet=true" on the query, the test fails with the

Re: Searching for terms having embedded white spaces like "word1 word2"

2013-05-17 Thread Jack Krupansky
Is this really a text field where you want to search for tokenized keywords? Or is it a string field where you wish strictly to deal with equality of the entire string or explicit wildcards for substring matches, as you've show. You haven't told us your full requirements for this field. The st

Re: Using the Collections API

2013-05-17 Thread A.Eibner
Hi Jared, yes that is what I want to achieve: Creating a master and a replica and I want them to be separate nodes. I just realized that I posted the wrong URL, I was already using the parameter maxShardsPerNode=1. But just to be sure, I also tried it with your URL and I get the same result

Searching for terms having embedded white spaces like "word1 word2"

2013-05-17 Thread kobe.free.wo...@gmail.com
Hi Guys, I have a field defined with the following custom data type, This field has values like "SAN MIGUEL","SAN JUAN","SAN DIEGO" etc. I wish to perform a "Starts With" and "Contains" search on

Re: Facet pivot 50.000.000 different values

2013-05-17 Thread Carlos Bonilla
Sorry, 16 GB RAM (not 8). 2013/5/17 Carlos Bonilla > Hi, > To calculate some stats we are using a field "B" with 50.000. > different values as facet pivot in a schema that contains 200.000.000 > documents. We only need to calculate how many different "B" values have > more than 1 document b

Re: Using the Collections API

2013-05-17 Thread Jared Rodriguez
Hi Alexander, So it sounds like you want the collection created with a master and a replica and you want one to be on each node? If so, I believe that you can get that effect by specifying maxShardsPerNode=1 as part of your url line. This will tell solr to create the master and replica that you

Re: Issue with getting highlight with hl.maxAnalyzedChars = -1

2013-05-17 Thread Dmitry Kan
Can you solve by retaining hl.maxAnalyzedChars=maxLength+buffer, where maxLength is the max length of your text field plus some reasonable buffer on top? On Tue, May 14, 2013 at 1:03 PM, meghana wrote: > Hi, > > Query pasted in my post, is returning 1 record with 0 highlights, if i just > remov

Re: Using the Collections API

2013-05-17 Thread A.Eibner
Hi, sorry for the delay. I have two live nodes (also zookeeper knows these two [app02:9985_solrl,app03:9985_solr]) But when I want to create a collection via: http://app02:9985/solr/admin/collections?action=CREATE&name=storage&numShards=1&replicationFactor=2&collection.configName=storage-conf

Facet pivot 50.000.000 different values

2013-05-17 Thread Carlos Bonilla
Hi, To calculate some stats we are using a field "B" with 50.000. different values as facet pivot in a schema that contains 200.000.000 documents. We only need to calculate how many different "B" values have more than 1 document but it takes ages Is there any other better way/configuration

Re: Solr 4 memory usage increase

2013-05-17 Thread J Mohamed Zahoor
I get the same issue in 1.7.0_09-b05 also. ./zahoor On 17-May-2013, at 12:07 PM, Walter Underwood wrote: > It is past time to get off of Java 6. That is dead. End of life. No more > updates, not even for security bugs. > > What version of Java 6? Some earlier versions had bad bugs that Solr

Re: Explicite update or delete of a dataset

2013-05-17 Thread Peter Sch�tt
Hallo, > To delete: > > curl http://localhost:8983/solr/update?commit=true \ > -H 'Content-type:application/json' \ > -d '{"delete": {"id":"doc-0001"}}' I try it in this way: http://localhost:9180/solr/mycore/update?commit=true&stream.body= oid:"A6DBFADE63A75054E043AC1C02205054" and i

Java heap space exception in 4.2.1

2013-05-17 Thread J Mohamed Zahoor
Hi I moved to 4.2.1 from 4.1 recently.. everything was working fine until i added few more stats query.. Now i am getting this error frequently that solr does not run even for 2 minutes continuously. All 5GB is getting used instantaneously in few queries... SEVERE: null:java.lang.RuntimeExcep