Re: Dynamic boosting with functions

2014-12-04 Thread Mikhail Khludnev
Hello, did you check available functions at https://cwiki.apache.org/confluence/display/solr/Function+Queries ? On Wed, Dec 3, 2014 at 7:11 PM, eakarsu eaka...@gmail.com wrote: I need to boost some of document with with a function. I can give boosting for field myField with bq = myField^10

Re: Too much Lucene code to refactor but I like SolrCloud

2014-12-04 Thread Gili Nachum
Hi Bill. I'm migrating of a Lucene based app to SolrCloud as well. My main motivation is horizontal scalability. My backend is compelx, so the migration is not one cut off, but a long process; Currently I have both Lucene and SolrCloud, indexing to both, and querying from either of them. The

Re: Large fields storage

2014-12-04 Thread Avishai Ish-Shalom
The use case is not for pdf or documents with images but very large text documents. My question is does storing the documents degrade performance more then just indexing without storing? i will only return highlighted text of limited length and probably never download the entire document. On Tue,

Re: SegmentInfos exposed to /admin/luke

2014-12-04 Thread Shawn Heisey
On 12/3/2014 4:35 AM, Alexey Kozhemiakin wrote: We have a high percentage of deleted docs which do not go away because there are several huge ancient segments that do not merge with anything else naturally. Our use case in constant reindexing of same data - ~100 gb, 12 000 000 real records,

Custom SOLR plugin that returns a field + where you can sort on that field

2014-12-04 Thread Mathijs Corten
Hello, *Usercase:* At this moment we have the current situation : We have customers that want to rent houses to visitors of our website. Customers can vary the prices according to specific dates, so renting a house at holidays will cost more. *The problems:* - prices may vary according to

Re: Too much Lucene code to refactor but I like SolrCloud

2014-12-04 Thread Shawn Heisey
On 12/3/2014 6:10 AM, Bill Drake wrote: I have an existing application that includes Lucene code. I want to add high availability. From what I have read SolrCloud looks like an effective approach. My problem is that there is a lot of Lucene code; out of 100+ java files in the application more

Keeping capitalization in suggestions?

2014-12-04 Thread Clemens Wyss DEV
When I index a text such as Chamäleon and look for suggestions for chamä and/or Chamä, I'd expect to get Chamäleon (uppercased). But what happens is If lowecasefilter (see below (1)) set chamä returns chamäleon Chamä does not match If lowecasefilter (1) not set Chamä returns Chamäleon chamä

Re: SegmentInfos exposed to /admin/luke

2014-12-04 Thread Erick Erickson
Not sure how it plays with segment merging and optimizing, but have you considered DocValues for your price fields? On the horizon there's work being done to allow them to be independently updated (although that won't help you now of course). It's not clear at this point how that will play when

Solr MLT with result grouping

2014-12-04 Thread Min L
Hi all: Has anyone made solr Morelikethis working with results grouping? Thanks in advance. M

Re: Alternative searching

2014-12-04 Thread Erick Erickson
I'm guessing because your examples are not clear to me, but assuming what you're really saying is that these are all in the same doc and, for some reason you are unable to just concatenate them all together before you send them to Solr, you can use a multiValued field with positionIncrementgap set

SolrException: Could't parse shape error for location_rpt field

2014-12-04 Thread Eskil Høyen Solvang
Hi, Inspired by the blog post http://lucidworks.com/blog/poor-mans-entity-extraction-with-solr/ , I'm using an update request processor chain that calls anupdate-script.js javascript. This script extracts latitude longitude coordinate pairs from the input document content using regular

Re: Group by on multiple fields and Stats Solr

2014-12-04 Thread Erick Erickson
And what does this have to do with Search ;) Seriously, databases were built through a lot of very hard work by some very smart people to accomplish their jobs. Solr is a search engine. Trying to make one do the job of the other is usually a frustrating experience. If you must do this, take a

HierarchicalFacetField support

2014-12-04 Thread rashmy1
Hello, A few questions regarding hierarchical faceting discussed here: https://issues.apache.org/jira/browse/SOLR-64 https://issues.apache.org/jira/browse/SOLR-2412 1. SOLR-64 - Is this available in 4.9? Unable to find the class 'org.apache.solr.schema.HierarchicalFacetField' in Solr 4.9 version.

Re: Get list of collection

2014-12-04 Thread Erick Erickson
Gaaah, Shawn is exactly correct, sorry for the confusion... Erick On Wed, Dec 3, 2014 at 8:10 AM, Shawn Heisey apa...@elyograg.org wrote: On 12/3/2014 2:51 AM, Ankit Jain wrote: Hi Erick, We are using the 4.7.2 version of solr and no getCollectionList() method is present in CloudSolrServer

Re: Keeping capitalization in suggestions?

2014-12-04 Thread Michael Sokolov
Have a look at AnalyzingInfixSuggester - it does what you want. -Mike On 12/4/14 3:05 AM, Clemens Wyss DEV wrote: When I index a text such as Chamäleon and look for suggestions for chamä and/or Chamä, I'd expect to get Chamäleon (uppercased). But what happens is If lowecasefilter (see below

Re: Large fields storage

2014-12-04 Thread Michael Sokolov
There's no appreciable RAM cost during querying, faceting, sorting of search results and so on. Stored fields are separate from the inverted index. There is some cost in additional disk space required and I/O during merging, but I think you'll find these are not significant. The main cost

Re: Custom SOLR plugin that returns a field + where you can sort on that field

2014-12-04 Thread Alexandre Rafalovitch
Have you thought storing 'availability' as a document instead of 'house'. So, the house, date range and price are a single document. Then, you group them, sum them and in post-filter sort them? Some ideas may come from:

Re: Custom SOLR plugin that returns a field + where you can sort on that field

2014-12-04 Thread Mathijs Corten
well we though of that but there are some problems with a second core for availability: we already have a core containing alot of house information (id, name, latitude, longitude, city , country etc.) which means we would have to do 2 solr queries just to build 1 search result or add alot of

Re: Problem with additional Servlet Filter (SolrRequestParsers Exception)

2014-12-04 Thread Stefan Moises
At least I found a good explanation here: https://issues.apache.org/jira/browse/STANBOL-437 This is because of the Filter introduced for STANBOL-401. I have seen this as well have looked into it on more detail and come to the conclusion that is save. Quote from the first resource linked

Re: Get list of collection

2014-12-04 Thread Ankit Jain
Thanks Shawn ... Thanks, Ankit On Wed, Dec 3, 2014 at 9:40 PM, Shawn Heisey apa...@elyograg.org wrote: On 12/3/2014 2:51 AM, Ankit Jain wrote: Hi Erick, We are using the 4.7.2 version of solr and no getCollectionList() method is present in CloudSolrServer class. That method is

Re: Problem with additional Servlet Filter (SolrRequestParsers Exception)

2014-12-04 Thread Stefan Moises
Thanks for your reply! I've tried to extend Solr's SolrDispatchFilter class, but that doesn't work either... as soon as I do anything with the POST data in doFilter(), I get that error again ... works fine with GET, though (that's what you are using in your class, too...) So I'm kinda stuck

AW: Keeping capitalization in suggestions?

2014-12-04 Thread Clemens Wyss DEV
Thx. Where (in which jar) do I find org.apache.solr.spelling.suggest.AnalyzingInfixSuggester ? Or: How do I declare the suggest-searchComponent in solrconfig.xml to make use of (Lucene's?) AnalyzingInfixSuggester -Ursprüngliche Nachricht- Von: Michael Sokolov

Re: Problem with additional Servlet Filter (SolrRequestParsers Exception)

2014-12-04 Thread Chris Hostetter
: Solr is not really designed to be extended in this way. In fact I believe : they are moving towards an architecture where this is even less possible - Correct. Starting with 5.0, the fact that servlets a servlet container are used by solr becomes a pure implementation detail - subject to

RE: HierarchicalFacetField support

2014-12-04 Thread Toke Eskildsen
rashmy1 [rashm...@gmail.com] wrote: 1. SOLR-64 - Is this available in 4.9? Unable to find the class 'org.apache.solr.schema.HierarchicalFacetField' in Solr 4.9 version. Solr-64 seems abandoned. Lastest patch is from early 2011 for Solr 3.10. 2. In which Solr version is SOLR-2412 available?

Re: Custom SOLR plugin that returns a field + where you can sort on that field

2014-12-04 Thread Alexandre Rafalovitch
What about parent/child record? With availability in the child record? You'd have to reindex the whole block together, but you should be able to get away with one request. Regards, Alex. No idea on the primary question, as you can probably guess. Personal: http://www.outerthoughts.com/ and

Re: Large fields storage

2014-12-04 Thread Shawn Heisey
On 12/1/2014 3:10 PM, Avishai Ish-Shalom wrote: I have very large documents (as big as 1GB) which i'm indexing and planning to store in Solr in order to use highlighting snippets. I am concerned about possible performance issues with such large fields - does storing the fields require

AW: Keeping capitalization in suggestions?

2014-12-04 Thread Clemens Wyss DEV
Enter the factory! ;) str name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Donnerstag, 4. Dezember 2014 14:46 An: solr-user@lucene.apache.org Betreff: AW: Keeping

RE: Tika HTTP 400 Errors with DIH

2014-12-04 Thread Teague James
The database stores the URL as a CLOB. Querying Solr shows that the field value is http://www.someaddress.com/documents/document1.docx; The URL works if I copy and paste it to the browser, but Tika gets a 400 error. Any ideas? Thanks! -Teague -Original Message- From: Alexandre

REST API Alternative to admin/luke

2014-12-04 Thread Constantin Wolber
Hi, we use dynamic Fields in our schema. If I use the admin/luke URL all those dynamic fields are listed with their actual name. If I use the rest endpoint /schema/fields only the hard coded fields are returned. And dynamicFields only returns the definition of the dynamicFields. I was

Anti-Pattern in lucent-join jar?

2014-12-04 Thread Darin Amos
Hello All, I have been doing a lot of research in building some custom queries and I have been looking at the Lucene Join library as a reference. I noticed something that I believe could actually have a negative side effect. Specifically I was looking at the JoinUtil.createJoinQuery(…) method

RE: Data Import Handler Status

2014-12-04 Thread dhwani2388
Hi, In SOLR I am fetching DIH status of the core using /dataimport?command=status. Now the data import is running though the status URL giving me idle status. Some times its giving me idle status on right time once data import is completed but some times its giving idle status 1 or 2 seconds

Re: REST API Alternative to admin/luke

2014-12-04 Thread Shawn Heisey
On 12/4/2014 8:20 AM, Constantin Wolber wrote: we use dynamic Fields in our schema. If I use the admin/luke URL all those dynamic fields are listed with their actual name. If I use the rest endpoint /schema/fields only the hard coded fields are returned. And dynamicFields only

Re: Data Import Handler Status

2014-12-04 Thread Shawn Heisey
On 12/4/2014 9:18 AM, dhwani2388 wrote: In SOLR I am fetching DIH status of the core using /dataimport?command=status. Now the data import is running though the status URL giving me idle status. Some times its giving me idle status on right time once data import is completed but some times its

Re: Tika HTTP 400 Errors with DIH

2014-12-04 Thread Alexandre Rafalovitch
400 error means something wrong on the server (resource not found). So, it would be useful to see what URL is actually being requested. Can you run some sort of network tracer to see the actual network request (dtrace, Wireshark, etc)? That will dissect the problem into half for you. Regards,

Question on Solr Caching

2014-12-04 Thread Manohar Sripada
Hi, I am working on implementing Solr in my product. I have a few questions on caching. 1. Does posting-list and term-list of the index reside in the memory? If not, how to load this to memory. I don't want to load entire data, like using DocumentCache. Either I want to use RAMDirectoryFactory

Re: Tika HTTP 400 Errors with DIH

2014-12-04 Thread Walter Underwood
No, 400 should mean that the request was bad. When the server fails, that is a 500. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 4, 2014, at 8:43 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: 400 error means something wrong on the server

Re: Keeping capitalization in suggestions?

2014-12-04 Thread Gopal Patwa
More detail can be found in Solr Docs https://cwiki.apache.org/confluence/display/solr/Suggester On Thu, Dec 4, 2014 at 6:33 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Enter the factory! ;) str name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str

Re: REST API Alternative to admin/luke

2014-12-04 Thread Chris Hostetter
: Subject: REST API Alternative to admin/luke this smells like an XY problem ... if /admin/luke gives you the data you want, why not use /admin/luke ? ... what is it about /admin/luke that prevents you from solving your problem? what is your ultimate goal? : If I use the admin/luke URL all

Re: Tika HTTP 400 Errors with DIH

2014-12-04 Thread Alexandre Rafalovitch
Right. Resource not found (on server). The end result is the same. If it works in the browser but not from the application than either not the same URL is being requested or - somehow - not even the same server. The solution (watching network traffic) is still the same, right? Regards, Alex.

Re: Question on Solr Caching

2014-12-04 Thread Michael Della Bitta
Hi, Manohar, 1. Does posting-list and term-list of the index reside in the memory? If not, how to load this to memory. I don't want to load entire data, like using DocumentCache. Either I want to use RAMDirectoryFactory as the data will be lost if you restart If you use MMapDirectory, Lucene

Re: Anti-Pattern in lucent-join jar?

2014-12-04 Thread Mikhail Khludnev
Hello, I wonder if you see https://issues.apache.org/jira/browse/SOLR-6234 which solves such problem. QueryResult Cache are useless for join, because they carry cropped results. Potentially you can hit filter cache wrapping fromQuery into this monster bridge new FilteredQuery(new

Re: Schemaless configuration using 4.10.2/API returning 404

2014-12-04 Thread Stefan Moises
Oh no, now *I* have that same problem again... :( I have copied my (running) schemaless core to another server, the core runs schemaless (managed-schema is created etc.), solrconfig.xml and web.xml are identical besides the paths on the server ... And yet on one Tomcat (7.0.28) the URL

Re: Schemaless configuration using 4.10.2/API returning 404

2014-12-04 Thread Alexandre Rafalovitch
Does Admin UI works? Because that API end-points is called by the Admin UI (forgot which screen though). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community:

Re: Schemaless configuration using 4.10.2/API returning 404

2014-12-04 Thread Alexandre Rafalovitch
The other options is that you not running your - expected - Solr on that port but are running a different instance. I found that when I use the new background scripts, I keep forgetting I have another Solr running. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr

Re: Schemaless configuration using 4.10.2/API returning 404

2014-12-04 Thread Stefan Moises
Hi, yeah, that's the strange thing admin UI, /select-URLs etc. are working fine... just the REST related URLs give me 404 errors... :( I'll double check if it's the correct Solr instance, but I'm pretty sure it is since the requested core is only running on this instance. Regards, Stefan

Re: REST API Alternative to admin/luke

2014-12-04 Thread Constantin Wolber
Hi, And thanks for the answers. So my understanding is at least correct that I did not oversee a feature of the rest endpoints. So probably we will stick with the admin/luke endpoint to achieve our goal. Since you have been telling me a lot about the xy problem, I will of course give you

Re: Custom SOLR plugin that returns a field + where you can sort on that field

2014-12-04 Thread Erick Erickson
denormalizing data is a very common practice in Solr, don't be afraid to try it if it makes your problem easier. Also, take a look at the distributed analytics plugin, it allows you to fairly painlessly add custom code to do whatever you want, see: https://issues.apache.org/jira/browse/SOLR-6150

Re: Schemaless configuration using 4.10.2/API returning 404

2014-12-04 Thread Stefan Moises
don't ask, but I've deleted the webapp and re-deployed it in Tomcat and everything is working now... Thanks for the input! Regards, Stefan Am 04.12.2014 um 19:53 schrieb Stefan Moises: Hi, yeah, that's the strange thing admin UI, /select-URLs etc. are working fine... just the REST

Re: REST API Alternative to admin/luke

2014-12-04 Thread Chris Hostetter
: I did not oversee a feature of the rest endpoints. So probably we will : stick with the admin/luke endpoint to achieve our goal. Ok ... i mean ... yeah -- the /admin/luke endpoint exists to tell you what fields are *actually* in your index, regardless of who/how they are in your index. the

Re: Schemaless configuration using 4.10.2/API returning 404

2014-12-04 Thread Alexandre Rafalovitch
Perhaps the expanded version of the war was not correctly updated and it was picking up the old definitions. Weird things happen. Glad to hear it is not Solr itself. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/

Re: REST API Alternative to admin/luke

2014-12-04 Thread Constantin Wolber
Hi, Basically using an endpoint in the admin section is something that makes me think if there is an alternative. And it would have been nice to have a straight forward resource oriented approach. Which the Luke certainly is not. Regards Constantin Am 04.12.2014 um 20:46 schrieb Chris

Preferred Scema/Config for Chinese Language Cores?

2014-12-04 Thread Tom Zimmermann
Hi , We are setting up our first Chinese language index and our team has found multiple conflicting bits of information regarding the proper configuration for tokenizing, filtering etc. Does anyone out there have a good functioning example we could work from our some links with guidance. Thanks,

Re: Preferred Scema/Config for Chinese Language Cores?

2014-12-04 Thread Alexandre Rafalovitch
I have a couple of links that may be useful, though I have not tried Chinese indexing myself: http://discovery-grindstone.blogspot.ca/ (12 articles on CJK!) http://java.dzone.com/articles/indexing-chinese-solr Also, may be worth checking out the commercial offering from http://www.basistech.com/

Re: Anti-Pattern in lucent-join jar?

2014-12-04 Thread Roman Chyla
+1, additionally (as it follows from your observation) the query can get out of sync with the index, if eg it was saved for later use and ran against newly opened searcher Roman On 4 Dec 2014 10:51, Darin Amos dari...@gmail.com wrote: Hello All, I have been doing a lot of research in building

Re: Question on Solr Caching

2014-12-04 Thread Manohar Sripada
Thanks Micheal for the response. If you use MMapDirectory, Lucene will map the files into memory off heap and the OS's disk cache will cache the files in memory for you. Don't use RAMDirectory, it's not better than MMapDirectory for any use I'm aware of. Will that mean it will cache the

Re: Question on Solr Caching

2014-12-04 Thread Shawn Heisey
On 12/4/2014 10:06 PM, Manohar Sripada wrote: If you use MMapDirectory, Lucene will map the files into memory off heap and the OS's disk cache will cache the files in memory for you. Don't use RAMDirectory, it's not better than MMapDirectory for any use I'm aware of. Will that mean it will

Using Solr for finding Flight Routes

2014-12-04 Thread Robin Woods
Hello, Anyone implemented Solr for searching the flights between two destinations, sort by shortest trip and best price? is geo-spatial search a right module to use? Thanks!

Proximity Search with Grouping

2014-12-04 Thread Emre ERKEK
Hi All, Can I use proximity search with grouping like this A B (C D)~19 ? Thanks, Emre