Re: Send many files to update/extract

2014-03-18 Thread Alexandre Rafalovitch
HttpSolrServer allows to send multiple documents at once. But they need to be extracted/converted on the client. However, if you know you will be sending a lot of documents to Solr, you are better off to run Tika locally on the client (or as a standalone network server). A lot more performant. I

Re: Doing spatial search on multiple location points

2014-03-18 Thread Varun Gupta
Hi David, Thanks for the quick reply. As I haven't migrated to 4.7 (I am still using 4.6), I tested using OR clause with multiple geofilt query based phrases and it seems to be working great. But I have one more question: How do I boost the score of the matching documents based on geodist? How

About enableLazyFieldLoading and memory

2014-03-18 Thread david . davila
Hello, we have a Solr Cloud 4.7, but this question is also related with other versions, because we have tested this in several installations. We have a very big index ( more than 400K docs) with big documents, but in our queries we don't fetch the large fields in fl parameter. But, we have

Re: Nested documents, block join - re-indexing a single document upon update

2014-03-18 Thread danny teichthal
Thanks Jack, I understand that updating a single document on a block is currently not supported. But, atomic update to a single document does not have to be in conflict with block joins. If I got it right from the documentation: Currently, If a document is atomically updated, SOLR finds the

Need help importing OOXML custom properties into Solr

2014-03-18 Thread Anders Gustafsson
solr-spec 4.6.1 lucene-spec 4.6.0 lux-appserver 1.1.0 tika 1.4 poi 3.9 Hi! I set it up, pretty much following the instructions at http://www.codewrecks.com/blog/index.php/2013/05/25/import-folder-of-documents-with-apache-solr-4-0-and-tika/ Problem is that I cannot seem to import custom

Re: Need help importing OOXML custom properties into Solr

2014-03-18 Thread Alexandre Rafalovitch
Have you tried just using Tika directly and seeing what gets output? Maybe it is all prefixed somehow. Or sending one file as a sample directly to the extract handler and temporarily storing the ignored_* dynamicField to see what actually happens? Basically, check what is there before trying to

Sv: Re: Need help importing OOXML custom properties into Solr

2014-03-18 Thread Anders Gustafsson
Thanks for the quick reply. I am a bit of a newb when it comes to Solr, Lux and Tika so I would appreciate if you could give me some quick pointers how to use/call Tika directly and/or how to send one file directly and storing the dynamic field? -- Anders Gustafsson Engineer, CNI, CNE6,

Re: Re: Need help importing OOXML custom properties into Solr

2014-03-18 Thread Alexandre Rafalovitch
You can just download Tika from Apache site, it's a separate product and has command line interface. Or to use Solr extract handler: go through Solr tutorial, it explains it. https://lucene.apache.org/solr/4_7_0/tutorial.html Specifically, http://wiki.apache.org/solr/ExtractingRequestHandler and

Sv: Re: Re: Need help importing OOXML custom properties into Solr

2014-03-18 Thread Anders Gustafsson
Thanks again. I already had the Tika jars, but not the commandline one, so I downloaded 1.5 and ran it against the docx and found: meta name=custom:Testmeta content=Innehåll/ So the name is prefixed, does that mean that I should add it prefixed in the conf files as well? Ie: field

Re: Re: Re: Need help importing OOXML custom properties into Solr

2014-03-18 Thread Alexandre Rafalovitch
The metadata fields can be all sorts of strange, including spaces and other strange characters. So, often, there is some issue on mapping. But yes, please, add the howto to Wiki. You will need to get your account whitelisted first (due to spammers), so send a separate email with your Apache wiki

Wiki edit rights

2014-03-18 Thread Anders Gustafsson
Yes, please. My Wiki ID is Anders Gustafsson But yes, please, add the howto to Wiki. You will need to get your account whitelisted first (due to spammers), so send a separate email with your Apache wiki id and somebody will unlock you for editing. -- Anders Gustafsson Engineer, CNI, CNE6,

Solr memory usage off-heap

2014-03-18 Thread Avishai Ish-Shalom
Hi, My solr instances are configured with 10GB heap (Xmx) but linux shows resident size of 16-20GB. even with thread stack and permgen taken into account i'm still far off from these numbers. Could it be that jvm IO buffers take so much space? does lucene use JNI/JNA memory allocations?

RE: Solr memory usage off-heap

2014-03-18 Thread Doug Turnbull
How large is your index on disk? Solr memory maps the index into memory. Thus the virtual memory used will often be quite large. Your numbers don't sound inconceivable. A good reference point is Grant Ingersoll's blog post on searchhub:

Re: Nested documents, block join - re-indexing a single document upon update

2014-03-18 Thread Jack Krupansky
That's a reasonable request and worth a Jira, but different from what you have specified in your subject line: re-indexing a single document - the entire block needs to be re-indexed. I suppose people might want a block atomic update - where multiple child documents as well as the parent

Re: Solr memory usage off-heap

2014-03-18 Thread Shawn Heisey
On 3/18/2014 5:30 AM, Avishai Ish-Shalom wrote: My solr instances are configured with 10GB heap (Xmx) but linux shows resident size of 16-20GB. even with thread stack and permgen taken into account i'm still far off from these numbers. Could it be that jvm IO buffers take so much space? does

Re: About enableLazyFieldLoading and memory

2014-03-18 Thread Miguel
Hi David If you use lazy field loading (/enableLazyFieldLoading=true/) /documentCache/ functionality is somehow limited. This means that the document stored in the /documentCache/ will contain only those fields that were passed to the /fl /parameter. /documentCache/ requires memory, the

Best SSD block size for large SOLR indexes

2014-03-18 Thread Salman Akram
All, Is there a rule of thumb for ideal block size for SSDs for large indexes (in hundreds of GBs)? Read performance is of top importance for us and we can sacrifice the space a little... This is the one we just got and wanted to see if there are any test results out there

Re: About enableLazyFieldLoading and memory

2014-03-18 Thread david . davila
Hi Miguel, yes, but if I use enableLazyFieldLoading=trueand my queries only request for very small fields like ID, DocumentCache shouldn't grow, although my stored fields are very big. Am I wrong? Best regards, David Dávila Atienza AEAT - Departamento de Informática Tributaria Subdirección

Re: Best SSD block size for large SOLR indexes

2014-03-18 Thread Shawn Heisey
On 3/18/2014 7:12 AM, Salman Akram wrote: Is there a rule of thumb for ideal block size for SSDs for large indexes (in hundreds of GBs)? Read performance is of top importance for us and we can sacrifice the space a little... This is the one we just got and wanted to see if there are any test

Re: Best SSD block size for large SOLR indexes

2014-03-18 Thread Salman Akram
This SSD default size seems to be 4K not 16K (as can be seen below). Bytes Per Sector : 512 Bytes Per Physical Sector : 4096 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment: 1024 I will go through the articles you sent. Thanks On Tue, Mar 18, 2014

Hiarachical facet on one filed

2014-03-18 Thread Alex
Hi all, I have a field that contains dates (it has date type) and I would like to make a hierarchical (pivot) facet based on that field. So I would like to have something like this: date_of_creation: |__2014 ||__January || |_01 || |_02 || |_14 |

Re: About enableLazyFieldLoading and memory

2014-03-18 Thread Shawn Heisey
On 3/18/2014 7:18 AM, david.dav...@correo.aeat.es wrote: yes, but if I use enableLazyFieldLoading=trueand my queries only request for very small fields like ID, DocumentCache shouldn't grow, although my stored fields are very big. Am I wrong? Since Solr 4.1, stored fields are compressed.

Re: Best SSD block size for large SOLR indexes

2014-03-18 Thread Shawn Heisey
On 3/18/2014 7:39 AM, Salman Akram wrote: This SSD default size seems to be 4K not 16K (as can be seen below). Bytes Per Sector : 512 Bytes Per Physical Sector : 4096 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment: 1024 The *sector* size on a

Re: Wiki edit rights

2014-03-18 Thread Erick Erickson
Done, thanks! On Tue, Mar 18, 2014 at 3:54 AM, Anders Gustafsson anders.gustafs...@pedago.fi wrote: Yes, please. My Wiki ID is Anders Gustafsson But yes, please, add the howto to Wiki. You will need to get your account whitelisted first (due to spammers), so send a separate email with your

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-18 Thread Martin de Vries
Martin, I’ve committed the SOLR-5875 fix, including to the lucene_solr_4_7 branch. Any chance you could test the fix? Hi Steve, I'm very happy you found the bug. We are running the version from SVN on one server and it's already running fine for 5 hours. If it's still stable tomorrow than

Re: Solr memory usage off-heap

2014-03-18 Thread Erick Erickson
Avishai: It sounds like you already understand mmap. Even so you might be interested in this excellent writeup of MMapDirectory and Lucene by Uwe: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best, Erick On Tue, Mar 18, 2014 at 7:23 AM, Avishai Ish-Shalom

Re: Solr memory usage off-heap

2014-03-18 Thread Shawn Heisey
On 3/18/2014 8:37 AM, Erick Erickson wrote: It sounds like you already understand mmap. Even so you might be interested in this excellent writeup of MMapDirectory and Lucene by Uwe: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html There is some actual bad memory

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-18 Thread Till Kinstler
Am 18.03.2014 15:26, schrieb Martin de Vries: Martin, I’ve committed the SOLR-5875 fix, including to the lucene_solr_4_7 branch. Any chance you could test the fix? Hi Steve, I'm very happy you found the bug. We are running the version from SVN on one server and it's already running fine for

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-18 Thread adfel70
I disabled softCommit and tried to run another indexing proccess. Now I see no jetty EofException and no latency peaks.. I also noticed that when I had softcommit every 10 minutes, I also saw spikes in the major GC (i use CMS) to around 9-10k. Any idea? Shawn Heisey-4 wrote On 3/17/2014 7:07

Re: Nested documents, block join - re-indexing a single document upon update

2014-03-18 Thread danny teichthal
Thanks, Indeed, the subject line was misleading. Then I will file a new improvement request for block atomic update support. On Tue, Mar 18, 2014 at 2:08 PM, Jack Krupansky j...@basetechnology.comwrote: That's a reasonable request and worth a Jira, but different from what you have specified

Re: CollapsingQParserPlugin facet results: fq={!collapse field=fld} vs. group=truegroup.field=fld

2014-03-18 Thread tchaffee
Thanks Joel - I decided upon another route - I was almost always grouping so I am trying another model where we will store the data with fewer rows and a few multivalue fields. -- View this message in context:

solr cloud distributed optimize() becomes serialized

2014-03-18 Thread Chris Lu
I wonder whether this is a known bug. In previous SOLR cloud versions, 4.4 or maybe 4.5, an explicit optimize(), without any parameters, it usually took 2 minutes for a 32 core cluster. However, in 4.6.1, the same call took about 1 hour. Checking the index modification time for each core shows 2

Re: Doing spatial search on multiple location points

2014-03-18 Thread Smiley, David W.
Varun, You could use a function query involving “min” with a comma-separated list of geodist clauses. See https://cwiki.apache.org/confluence/display/solr/Spatial+Search “Boost Nearest Results”. You’d replace the geodist() in there with min(geodist(45.15,-93.85),geodist(50.2,22.3),…) (etc.)

Re: CollapsingQParserPlugin returning different result set

2014-03-18 Thread Joel Bernstein
Hi Shamik, I see that you are using distributed search. With the CollapsingQParserPlugin you need to have all the documents that are in the same group on the same shard. Is that the way you have the documents indexed? Joel Joel Bernstein Search Engineer at Heliosearch On Mon, Mar 17, 2014 at

Edit config files

2014-03-18 Thread Francois Perron
Hi, I had install lastest version of solr (4.7.0) and I want to try new functionality to edit config files in AdminUI. But when I click on file, no edit box appear! This is info on my version : Versions * solr-spec 4.7.0 * solr-impl 4.7.0 1570806 - simon - 2014-02-22 08:36:23 *

RE: Best SSD block size for large SOLR indexes

2014-03-18 Thread Toke Eskildsen
Salman Akram [salman.ak...@northbaysolutions.net] wrote: [Hundreds of GB index] http://www.storagereview.com/micron_p420m_enterprise_pcie_ssd_review May I ask why you have chosen a drive with such a high speed and matching cost? We have some years of experience with using SSDs for search at

Re: Edit config files

2014-03-18 Thread Steve Rowe
Hi Francois, The config file editing functionality was pulled out of Solr before the 4.7 release; what remains is a read-only config directory browser/file viewer. May I ask why you thought the config file editing functionality was in 4.7? Steve On Mar 18, 2014, at 4:39 PM, Francois Perron

String Cast Error

2014-03-18 Thread AJ Lemke
Hello all! I have a strange issue with my local SOLR install. I have a search that sorts on a boolean field. This search is pulling the following error: java.lang.String cannot be cast to org.apache.lucene.util.BytesRef. The search is over the dummy data that is included in the exampledocs. I

Re: CollapsingQParserPlugin returning different result set

2014-03-18 Thread shamik
Joel, I had a discussion with you earlier related ngroup inconsistent number when you suggested to use the composite id to make sure that identical (ADSKDedup) fields are available in the same shard. Here's the thread --

Re: String Cast Error

2014-03-18 Thread Shawn Heisey
On 3/18/2014 3:51 PM, AJ Lemke wrote: I have a strange issue with my local SOLR install. I have a search that sorts on a boolean field. This search is pulling the following error: java.lang.String cannot be cast to org.apache.lucene.util.BytesRef. The search is over the dummy data that is

RE : Edit config files

2014-03-18 Thread Francois Perron
Hi Steve, This feature make sens for us because we don't have write access in production. Anyway, I'll do a script to push config files updates directly to zookeeper and reload the collection. But, it's always simpler when it's already integrated in a admin tool. Thank you for your time.

RE: String Cast Error

2014-03-18 Thread AJ Lemke
Did you change the schema at all? No Did you upgrade Solr from a previous version with the same index? No This was fresh install from the website. Ran ant run-example Killed that instance Copied Example to Node1 Copied Example to Node2 Switched into Node1 java

does shards.tolerant deal with this scenario?

2014-03-18 Thread solr-user
hi all I have some questions re shards.tolerant=true and timeAllowed=xxx I have seen situations where shards.tolerant=true works; if one of the shards specified in a query is dead, shards.tolerant seems to work and I get results from the non-dead shards However, if one of the shards goes down

Zookeeper exceptions - SEVERE

2014-03-18 Thread Chris W
I am running a 3 node zookeeper 3.4.5 Quorum. I am running into issues with Zookeeper transaction logs [myid:2] - ERROR [main:QuorumPeer@453] - Unable to load database on disk java.io.IOException: Unreasonable length = 1048587 at

Re: Zookeeper exceptions - SEVERE

2014-03-18 Thread Shawn Heisey
On 3/18/2014 5:46 PM, Chris W wrote: I am running a 3 node zookeeper 3.4.5 Quorum. I am running into issues with Zookeeper transaction logs [myid:2] - ERROR [main:QuorumPeer@453] - Unable to load database on disk java.io.IOException: Unreasonable length = 1048587 at

Re: /suggest

2014-03-18 Thread Areek Zillur
Hi Lajos, Can you elaborate on the get the overflow when using a text field part? The new SuggestComponent should work just as well for DocumentDictionary. Thanks Areek On Mon, Mar 17, 2014 at 6:05 PM, Lajos la...@protulae.com wrote: Hi Steve, I've posted previously about a nice

Re: StackOverflow ... the errors, not the site

2014-03-18 Thread Areek Zillur
Hi Lajos, This can be due to the heavy query-time processing chain associated with the TextField? You can also check out AnalyzingInfixLookupFactory, if the suggestion entries are a bit long (this suggester will give matches, even if the query matches a term in the middle of a suggestion entry.

Indexing large documents

2014-03-18 Thread Stephen Kottmann
Hi Solr Users, I'm looking for advice on best practices when indexing large documents (100's of MB or even 1 to 2 GB text files). I've been hunting around on google and the mailing list, and have found some suggestions of splitting the logical document up into multiple solr documents. However, I

Re: Indexing large documents

2014-03-18 Thread Otis Gospodnetic
Hi, I think you probably want to split giant documents because you / your users probably want to be able to find smaller sections of those big docs that are best matches to their queries. Imagine querying War and Peace. Almost any regular word your query for will produce a match. Yes, you may

Re: Zookeeper exceptions - SEVERE

2014-03-18 Thread Shalin Shekhar Mangar
SolrCloud will update Zookeeper on state changes (node goes to recovery, comes back up etc) or for leader election and during collection API commands. It doesn't correlate directly with indexing but is correlated with how frequently you call commit. On Wed, Mar 19, 2014 at 5:46 AM, Shawn Heisey

Re: Zookeeper exceptions - SEVERE

2014-03-18 Thread Chris W
Thanks, Shawn and Shalin How does the frequency of commit affect zookeeper? Thanks On Tue, Mar 18, 2014 at 9:12 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: SolrCloud will update Zookeeper on state changes (node goes to recovery, comes back up etc) or for leader election and

Re: Zookeeper exceptions - SEVERE

2014-03-18 Thread Gopal Patwa
Shalin, correlated with how frequently you call commit is it soft commit or hard commit? , I guess it should be later one. just curious what data it update to zookeeper during commit On Tue, Mar 18, 2014 at 9:12 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: SolrCloud will update

Re: Zookeeper exceptions - SEVERE

2014-03-18 Thread Shalin Shekhar Mangar
Sorry guys I spoke too fast. I looked at the code again. No it doesn't correlate with commits at all. I was mistaken. On Wed, Mar 19, 2014 at 10:06 AM, Chris W chris1980@gmail.com wrote: Thanks, Shawn and Shalin How does the frequency of commit affect zookeeper? Thanks On Tue, Mar