Solr XML Messages

2014-04-07 Thread Александр Вандышев
Tell me whether it is possible to use Solr XML Messages for indexing via update extract hendler?

Re: Anyone going to ApacheCon in Denver next week?

2014-04-07 Thread William Bell
Tuesday AM like 8am might be a good time to meet up. How about at Westin at Starbucks on the 1st floor? I'll be there. Any takers? Bill Bell On Sun, Apr 6, 2014 at 4:04 PM, Jack Krupansky j...@basetechnology.comwrote: I'm here as well, representing DataStax for Apache Cassandra and Solr.

RE: Anyone going to ApacheCon in Denver next week?

2014-04-07 Thread Doug Turnbull
Tuesday 8 AM sounds great. Sent from my Windows Phone From: William Bell Sent: 4/7/2014 12:29 AM To: solr-user@lucene.apache.org Subject: Re: Anyone going to ApacheCon in Denver next week? Tuesday AM like 8am might be a good time to meet up. How about at Westin at Starbucks on the 1st floor?

Re: TikaEntityProcesor Exception Handling

2014-04-07 Thread akash2489
Any updates on this? -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcesor-Exception-Handling-tp3502495p4129580.html Sent from the Solr - User mailing list archive at Nabble.com.

ArrayIndexOutOfBoundsException while reindexing via DIH

2014-04-07 Thread Ralf Matulat
Hi, we are currently facing a new problem while reindexing one of our SOLR 4.4 instances: We are using SOLR 4.4 getting data via DIH out of a MySQL Server. The data is constantly growing. We have reindexed our data a lot of times without any trouble. The problem can be reproduced. There is

Routing distance with Solr?

2014-04-07 Thread Matteo Tarantino
Hi all, this is my first message on this mailing list, so I hope I'm doing all correctly. My problem is: I have to create a search engine of dealers that are in a well defined routing distance from the address entered by the user. I have already used Solr for some previous works, but I never

RE: Query and field name with wildcard

2014-04-07 Thread Croci Francesco Luigi (ID SWS)
Hello Alex, I saw your example and took it as template for my needs. I tried with the aliasing, but, maybe because I did it wrong, it does not work... error: { msg: undefined field all, code: 400 } Here is a snippet of my solrconfig.xml: ... requestHandler name=/select

Re: Block Join Parent Query across children docs

2014-04-07 Thread mertens
Thanks Hoss, with the filter queries it works. I was trying to use a normal query from Mikhail's blog that looked like this: q={!parent which=type_s:parent}+search_t:item1 +search_t:item2 -search_t:item3 That query doesn't work for me but the filter query does just what I want. ps last years

RE: Query and field name with wildcard

2014-04-07 Thread Croci Francesco Luigi (ID SWS)
Sorry, found the problem myself... I used the /select where the str name=defTypeedismax/str was not defined. The other two, /selectEN and /selectDE, worked. Adding the edismax to the /select made it work too. Ciao Francesco -Original Message- From: Croci Francesco Luigi (ID SWS)

Bad request on update.distrib=FROMLEADER

2014-04-07 Thread Gastone Penzo
Hello, i have a problem of bad request during indexing data. I have for nodes with solr cloud. The architecture is this: 10.0.0.86 10.0.0.87 NODE1 NODE 2 | | | | | | | | NODE 3

Re: Using Sentence Information For Snippet Generation

2014-04-07 Thread Dmitry Kan
Furkan, I haven't worked with the boundary scanner before, but one thing I had to tweak with position increments was the highlighter component itself. Because it started to throw exceptions. The solution is described in this thread (a conversation with myself :) )

Re: Block Join Parent Query across children docs

2014-04-07 Thread Mikhail Khludnev
for sake of completeness, here is the same query w/o fq q=+{!parent which=type_s:parent}search_t:item1 +{!parent which=type_s:parent}search_t:item2 -{!parent which=type_s:parent}search_t:item3 here is more detail about the first symbol magic

what is geodist default value

2014-04-07 Thread Aman Tandon
Hello, In my index, i am using the LatlonType, for using the geodist to calculate the distance, and i am using it like geodist(lat, lon, location). Can anybody told me what value the geodist will return if i will pass geodist(0, 0, location) Thanks Aman Tandon

Re: Block Join Parent Query across children docs

2014-04-07 Thread mertens
Yeah, that works also for me. Thanks Mikhail. On Mon, Apr 7, 2014 at 12:42 PM, Mikhail Khludnev [via Lucene] ml-node+s472066n4129604...@n3.nabble.com wrote: for sake of completeness, here is the same query w/o fq q=+{!parent which=type_s:parent}search_t:item1 +{!parent

RE: Solr interface

2014-04-07 Thread Jonathan Varsanik
Do you mean to tell me that the people on this list that are indexing 100s of millions of documents are doing this over http? I have been using custom Lucene code to index files, as I thought this would be faster for many documents and I wanted some non-standard OCR and index fields. Is there

converting 4.7 index to 4.3.1

2014-04-07 Thread Dmitry Kan
Dear list, We have been generating solr indices with the solr-hadoop contrib module (SOLR-1301). Our current solr in use is of 4.3.1 version. Is there any tool that could do the backward conversion, i.e. 4.7-4.3.1? Or is the upgrade the only way to go? -- Dmitry Blog:

Re: ngramfilter minGramSize problem

2014-04-07 Thread Andreas Owen
it works well. now why does the search only find something when the fieldname is added to the query with stopwords? cug - 9 hits mit cug - 0 hits plain_text:mit cug - 9 hits why is this so? could it be a problem that stopwords aren't used in the query because no all fields that are search

Eactly Mathcing for Elevator

2014-04-07 Thread Furkan KAMACI
I've defined a elevator as like that: elevate query text=rüna telecom doc id=id1 / /query query text=rünatelecom doc id=id1 / /query query text=runa telecom doc id=id1 / /query query text=runatelecom doc id=id1 / /query /elevate When I send a query it gives error of:

Re: Commit Within and /update/extract handler

2014-04-07 Thread Erick Erickson
You say you see the commit happen in the log, is openSearcher specified? This sounds like you're somehow getting a commit with openSearcher=false... Best, Erick On Sun, Apr 6, 2014 at 5:37 PM, Jamie Johnson jej2...@gmail.com wrote: I'm running solr 4.6.0 and am noticing that commitWithin

Re: Solr XML Messages

2014-04-07 Thread Erick Erickson
See: https://tika.apache.org/1.4/formats.html short answer yes. Longer answer: It would be a lot easier to reply meaningfully if you told us what you were trying to do. You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Sun, Apr 6, 2014 at 11:20 PM,

Regex for hl.bs.chars

2014-04-07 Thread Furkan KAMACI
Could I define a pattern for hl.bs.chars? I mean *$* shows the start or end of a string at my documents and I want to define it as regex to hl.bs.chars? On the other hand I do not use currently termVectors=on, termPositions=on and termOffsets=on at my fields. Does it cause a performance issue or

RE: Solr interface

2014-04-07 Thread Toke Eskildsen
On Mon, 2014-04-07 at 13:52 +0200, Jonathan Varsanik wrote: Do you mean to tell me that the people on this list that are indexing 100s of millions of documents are doing this over http? Some of us do. Our net archive indexer runs a lot of Tika processes that sends their analysed documents

Re: Routing distance with Solr?

2014-04-07 Thread david.w.smi...@gmail.com
Hi, This is definitely not possible with Solr. Use GraphHopper. ~ David On Mon, Apr 7, 2014 at 5:09 AM, Matteo Tarantino matteo.tarant...@gmail.com wrote: Hi all, this is my first message on this mailing list, so I hope I'm doing all correctly. My problem is: I have to create a search

Re: what is geodist default value

2014-04-07 Thread david.w.smi...@gmail.com
Hi, I'm not sure why you are asking or maybe I'm not getting what you *really* want to know. You'll get the geodesic distance (i.e. the great circle distance, the distance on the surface of a sphere) from 0,0 (off the coast of Africa), to each point indexed in your location field. ~ David On

Re: Solr interface

2014-04-07 Thread Andre Bois-Crettez
You can use Solrj : https://wiki.apache.org/solr/Solrj Anyway, even using http the performance is good. André On 2014-04-07 13:52, Jonathan Varsanik wrote: Do you mean to tell me that the people on this list that are indexing 100s of millions of documents are doing this over http? I have

Re: Solr Search on Fields name

2014-04-07 Thread anuragwalia
Thanks Ahmat and Jack for replying. I found a another way to solve the problem by using FilterQuery. fq=RuleA:*+OR+RuleC:* but due to development platform query parsing stuck some where else. Hopefully after platform fix it will work for me. I will get back to you if any other issue occurred.

Re: ArrayIndexOutOfBoundsException while reindexing via DIH

2014-04-07 Thread Shawn Heisey
On 4/7/2014 3:00 AM, Ralf Matulat wrote: we are currently facing a new problem while reindexing one of our SOLR 4.4 instances: We are using SOLR 4.4 getting data via DIH out of a MySQL Server. The data is constantly growing. We have reindexed our data a lot of times without any trouble. The

Re: Commit Within and /update/extract handler

2014-04-07 Thread Erick Erickson
What does the call look like? Are you setting opening a new searcher or not? That should be in the log line where the commit is recorded... FWIW, Erick On Sun, Apr 6, 2014 at 5:37 PM, Jamie Johnson jej2...@gmail.com wrote: I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to

Duplicate Unique Key

2014-04-07 Thread Simon
Hi all, I know someone has posted similar question before. But my case is little different as I don't have the schema set up issue mentioned in those posts but still get duplicate records. My unique key in schema is field name=id$type=string indexed=true stored=true

Re: Solr interface

2014-04-07 Thread Shawn Heisey
On 4/7/2014 5:52 AM, Jonathan Varsanik wrote: Do you mean to tell me that the people on this list that are indexing 100s of millions of documents are doing this over http? I have been using custom Lucene code to index files, as I thought this would be faster for many documents and I wanted

Regex For *|* at hl.regex.pattern

2014-04-07 Thread Furkan KAMACI
Hi; I try that but it does not work do I miss anything: q=portuhl.regex.pattern=.*\*\|\*.*hl.fragsize=120hl.regex.slop=0.2 My aim is to check whether it includes *|* or not (that's why I've put .* beginning and end of the regex to achieve whatever you match) How to fix it? Thanks; Furkan

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-07 Thread Gregg Donovan
That was my first attempt, but it's much trickier than I anticipated. A filter that calls HttpServletRequest#getParameter() before SolrDispatchFilter will trigger an exception -- see getParameterIncompatibilityException [1] -- if the request is a POST. It seems that Solr depends on the

Re: Duplicate Unique Key

2014-04-07 Thread Erick Erickson
Hmmm, that's odd. I just tried it (admittedly with post.jar rather than SolrJ) and it works just fine. what server are you using (e.g. CloudSolrServer)? And can you create a self-contained program that illustrates the problem? Best, Erick On Mon, Apr 7, 2014 at 8:50 AM, Simon

Re: Regex For *|* at hl.regex.pattern

2014-04-07 Thread Furkan KAMACI
One more question: does that regex works on analyzed field or raw data? 2014-04-07 19:21 GMT+03:00 Furkan KAMACI furkankam...@gmail.com: Hi; I try that but it does not work do I miss anything: q=portuhl.regex.pattern=.*\*\|\*.*hl.fragsize=120hl.regex.slop=0.2 My aim is to check whether

Ranking code

2014-04-07 Thread azhar2007
Hi does anybody know where the ranking code is held. Which file in Solr stores it the solr schema.xml or solrconfig.xml file? -- View this message in context: http://lucene.472066.n3.nabble.com/Ranking-code-tp4129664.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reading Solr index

2014-04-07 Thread François Schiettecatte
Maybe you should try a more recent release of Luke: https://github.com/DmitryKey/luke/releases François On Apr 7, 2014, at 12:27 PM, azhar2007 azhar2...@outlook.com wrote: Hi All, I have a solr index which is indexed ins Solr.4.7.0. Ive attempted to open the index with Luke4.0.0

Reading Solr index

2014-04-07 Thread azhar2007
Hi All, I have a solr index which is indexed ins Solr.4.7.0. Ive attempted to open the index with Luke4.0.0 and also other verisons with no luck. Gives me an error message. Is there a way of reading the data? I would like to convert the file to a readable format where i can see the terms it

Re: Solr interface

2014-04-07 Thread Daniel Collins
I have to agree with Shawn. We have a SolrCloud setup with 256 shards, ~400M documents in total, with 4-way replication (so its quite a big setup!) I had thought that HTTP would slow things down, so we recently trialed a JNI approach (clients are C++) so we could call SolrJ and get the benefits

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-07 Thread Alexandre Rafalovitch
So to rephrase: Solr will barf at unknown parameters, so we cannot currently send them in band. And the out of band dies not work due to post body handling complexity. You are proposing effectively a dynamic set with common prefix to stop the complaints. Plus the code to propagate those params.

Re: How do I add another unrelated query results to solr index

2014-04-07 Thread sanjay92
I think it was not just rootEntity=true. We need to add transformer=TemplateTransformer and make sure that each entity has some kind of Unique column across all entities e.g. in this case field column=doc_id template=salg_${salgrade.GRADE} / is a made up column and this doc_id

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-07 Thread Michael Sokolov
I had to grapple with something like this problem when I wrote Lux's app-server. I extended SolrDispatchFilter and handle parameter swizzling to keep everything nicey-nicey for Solr while being able to play games with parameters of my own. Perhaps this will give you some ideas:

Re: Ranking code

2014-04-07 Thread Shawn Heisey
On 4/7/2014 10:29 AM, azhar2007 wrote: Hi does anybody know where the ranking code is held. Which file in Solr stores it the solr schema.xml or solrconfig.xml file? Your question is very generic. It needs to be more specific -- what are you actually trying to do? The generic answer is both

Re: Duplicate Unique Key

2014-04-07 Thread Simon
Erick, It's indeed quite odd. And after I trigger re-indexing all documents (via the normal process of existing program). The duplication is gone. It can not be reproduced easily. But it did occur occasionally and that makes it a frustrating task to troubleshoot. Thanks, Simon -- View

Re: Fetching uniqueKey and other int quickly from documentCache?

2014-04-07 Thread Gregg Donovan
Yonik, Requesting fl=unique_key:field(unique_key),secondary_key:field(secondary_key),score vs fl=unique_key,secondary_key,score was a nice performance win, as unique_key and secondary_key were both already in the fieldCache. We removed our documentCache, in fact, as it got very such little use.

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-07 Thread Gregg Donovan
Michael, Thanks! Unfortunately, as we use POSTs, that approach would trigger the getParameterIncompatibilityException call due to the Enumeration of getParameterNames before SolrDispatchFilter has a chance to access the InputStream. I opened https://issues.apache.org/jira/browse/SOLR-5969 to

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-07 Thread Candygram For Mongo
I wanted to take a moment and say thank you for your help. We haven't solved the problem yet but it seems like we may be on the path. Responses to your questions below: 1) We are using settings of 6GBs for -Xmx and -Xms on a production server where this process is failing on about 30 million

Re: Duplicate Unique Key

2014-04-07 Thread Erick Erickson
Oh my yes! I feel a great sense of relief every time an intermittent problem becomes reproducible... The problem is not solved, but at least I have a good feeling that once I don't see it any more it's _really_ gone! One possibility is index merging, see:

Re: Analysis of Japanese characters

2014-04-07 Thread T. Kuro Kurosaka
Tom, You should be using JapaneseAnalyzer (kuromoji). Neither CJK nor ICU tokenize at word boundaries. On 04/02/2014 10:33 AM, Tom Burton-West wrote: Hi Shawn, I'm not sure I understand the problem and why you need to solve it at the ICUTokenizer level rather than the CJKBigramFilter Can you

Re: Analysis of Japanese characters

2014-04-07 Thread Shawn Heisey
On 4/7/2014 2:07 PM, T. Kuro Kurosaka wrote: Tom, You should be using JapaneseAnalyzer (kuromoji). Neither CJK nor ICU tokenize at word boundaries. Is JapaneseAnalyzer configurable with regard to what it does with non-japanese text? If it's not, it won't work for me. We use a combination

Re: Solr interface

2014-04-07 Thread Michael Della Bitta
The speed of ingest via HTTP improves greatly once you do two things: 1. Batch multiple documents into a single request. 2. Index with multiple threads at once. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-07 Thread Ahmet Arslan
Hi, I had similar problems before. We were trying to do same thing as you, fetching too many small documents from Oracle with dih. We were getting  Caused by: java.sql.SQLException: ORA-01652: unable to extend temp segment by 128 in tablespace TS_TEMP ORA-06512: at IZCI.GET_FEED_KEYWORDS, line

Re: Searching multivalue fields.

2014-04-07 Thread Vijay Kokatnur
Yes I did restart solr, but did not re-index. Is that necessary? We've got 80G of indexed data, is there a preferred way of doing it without impacting performance? On Sat, Apr 5, 2014 at 9:44 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Did restart solr and you re-index after schema

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-07 Thread Michael Sokolov
Yes, I see. SolrDispatchFilter is - not really written with extensibility in mind. -Mike On 4/7/14 3:50 PM, Gregg Donovan wrote: Michael, Thanks! Unfortunately, as we use POSTs, that approach would trigger the getParameterIncompatibilityException call due to the Enumeration of

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-07 Thread Steve Davids
I have had this exact same use case and we ended up just setting a header value, then in a Servlet Filter we read the header value and set the MDC property within the filter. By reading the header value it didn’t complain about reading the request before making it to the SolrDispatchFilter. We

Re: Solr interface

2014-04-07 Thread Jason Hellman
This. And so much this. As much this as you can muster. On Apr 7, 2014, at 1:49 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: The speed of ingest via HTTP improves greatly once you do two things: 1. Batch multiple documents into a single request. 2. Index with multiple

Re: Commit Within and /update/extract handler

2014-04-07 Thread Jamie Johnson
Below is the log showing what I believe to be the commit 07-Apr-2014 23:40:55.846 INFO [catalina-exec-5] org.apache.solr.update.processor.LogUpdateProcessor.finish [forums] webapp=/solr path=/update/extract

Re: Regex For *|* at hl.regex.pattern

2014-04-07 Thread Jack Krupansky
The regex pattern should match the text of the fragment. IOW, exclude whatever delimiters are not allowed in the fragment. The default is: [-\w ,\n']{20,200} -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 7, 2014 10:21 AM To: