HttpSolrServer allows to send multiple documents at once. But they
need to be extracted/converted on the client. However, if you know you
will be sending a lot of documents to Solr, you are better off to run
Tika locally on the client (or as a standalone network server). A lot
more performant.
I
Hi David,
Thanks for the quick reply.
As I haven't migrated to 4.7 (I am still using 4.6), I tested using OR
clause with multiple geofilt query based phrases and it seems to be working
great. But I have one more question: How do I boost the score of the
matching documents based on geodist? How
Hello,
we have a Solr Cloud 4.7, but this question is also related with other
versions, because we have tested this in several installations.
We have a very big index ( more than 400K docs) with big documents, but
in our queries we don't fetch the large fields in fl parameter. But, we
have
Thanks Jack,
I understand that updating a single document on a block is currently not
supported.
But, atomic update to a single document does not have to be in conflict
with block joins.
If I got it right from the documentation:
Currently, If a document is atomically updated, SOLR finds the
solr-spec 4.6.1
lucene-spec 4.6.0
lux-appserver 1.1.0
tika 1.4
poi 3.9
Hi!
I set it up, pretty much following the instructions at
http://www.codewrecks.com/blog/index.php/2013/05/25/import-folder-of-documents-with-apache-solr-4-0-and-tika/
Problem is that I cannot seem to import custom
Have you tried just using Tika directly and seeing what gets output?
Maybe it is all prefixed somehow. Or sending one file as a sample
directly to the extract handler and temporarily storing the ignored_*
dynamicField to see what actually happens?
Basically, check what is there before trying to
Thanks for the quick reply. I am a bit of a newb when it comes to Solr, Lux and
Tika so I would appreciate if you could give me some quick pointers how to
use/call Tika directly and/or how to send one file directly and storing the
dynamic field?
--
Anders Gustafsson
Engineer, CNI, CNE6,
You can just download Tika from Apache site, it's a separate product
and has command line interface.
Or to use Solr extract handler: go through Solr tutorial, it explains
it. https://lucene.apache.org/solr/4_7_0/tutorial.html
Specifically, http://wiki.apache.org/solr/ExtractingRequestHandler and
Thanks again. I already had the Tika jars, but not the commandline one,
so I downloaded 1.5 and ran it against the docx and found:
meta name=custom:Testmeta content=Innehåll/
So the name is prefixed, does that mean that I should add it prefixed
in the conf files as well? Ie:
field
The metadata fields can be all sorts of strange, including spaces and
other strange characters. So, often, there is some issue on mapping.
But yes, please, add the howto to Wiki. You will need to get your
account whitelisted first (due to spammers), so send a separate email
with your Apache wiki
Yes, please. My Wiki ID is Anders Gustafsson
But yes, please, add the howto to Wiki. You will need to get your
account whitelisted first (due to spammers), so send a separate email
with your Apache wiki id and somebody will unlock you for editing.
--
Anders Gustafsson
Engineer, CNI, CNE6,
Hi,
My solr instances are configured with 10GB heap (Xmx) but linux shows
resident size of 16-20GB. even with thread stack and permgen taken into
account i'm still far off from these numbers. Could it be that jvm IO
buffers take so much space? does lucene use JNI/JNA memory allocations?
How large is your index on disk? Solr memory maps the index into
memory. Thus the virtual memory used will often be quite large. Your
numbers don't sound inconceivable.
A good reference point is Grant Ingersoll's blog post on searchhub:
That's a reasonable request and worth a Jira, but different from what you
have specified in your subject line: re-indexing a single document - the
entire block needs to be re-indexed.
I suppose people might want a block atomic update - where multiple child
documents as well as the parent
On 3/18/2014 5:30 AM, Avishai Ish-Shalom wrote:
My solr instances are configured with 10GB heap (Xmx) but linux shows
resident size of 16-20GB. even with thread stack and permgen taken into
account i'm still far off from these numbers. Could it be that jvm IO
buffers take so much space? does
Hi David
If you use lazy field loading (/enableLazyFieldLoading=true/)
/documentCache/ functionality is somehow limited. This means that the
document stored in the /documentCache/ will contain only those fields
that were passed to the /fl /parameter.
/documentCache/ requires memory, the
All,
Is there a rule of thumb for ideal block size for SSDs for large indexes
(in hundreds of GBs)? Read performance is of top importance for us and we
can sacrifice the space a little...
This is the one we just got and wanted to see if there are any test results
out there
Hi Miguel,
yes, but if I use enableLazyFieldLoading=trueand my queries only request
for very small fields like ID, DocumentCache shouldn't grow, although my
stored fields are very big. Am I wrong?
Best regards,
David Dávila Atienza
AEAT - Departamento de Informática Tributaria
Subdirección
On 3/18/2014 7:12 AM, Salman Akram wrote:
Is there a rule of thumb for ideal block size for SSDs for large indexes
(in hundreds of GBs)? Read performance is of top importance for us and we
can sacrifice the space a little...
This is the one we just got and wanted to see if there are any test
This SSD default size seems to be 4K not 16K (as can be seen below).
Bytes Per Sector : 512
Bytes Per Physical Sector : 4096
Bytes Per Cluster : 4096
Bytes Per FileRecord Segment: 1024
I will go through the articles you sent. Thanks
On Tue, Mar 18, 2014
Hi all,
I have a field that contains dates (it has date type) and I would like
to make a hierarchical (pivot) facet based on that field.
So I would like to have something like this:
date_of_creation:
|__2014
||__January
|| |_01
|| |_02
|| |_14
|
On 3/18/2014 7:18 AM, david.dav...@correo.aeat.es wrote:
yes, but if I use enableLazyFieldLoading=trueand my queries only request
for very small fields like ID, DocumentCache shouldn't grow, although my
stored fields are very big. Am I wrong?
Since Solr 4.1, stored fields are compressed.
On 3/18/2014 7:39 AM, Salman Akram wrote:
This SSD default size seems to be 4K not 16K (as can be seen below).
Bytes Per Sector : 512
Bytes Per Physical Sector : 4096
Bytes Per Cluster : 4096
Bytes Per FileRecord Segment: 1024
The *sector* size on a
Done, thanks!
On Tue, Mar 18, 2014 at 3:54 AM, Anders Gustafsson
anders.gustafs...@pedago.fi wrote:
Yes, please. My Wiki ID is Anders Gustafsson
But yes, please, add the howto to Wiki. You will need to get your
account whitelisted first (due to spammers), so send a separate email
with your
Martin, I’ve committed the SOLR-5875 fix, including to the
lucene_solr_4_7 branch.
Any chance you could test the fix?
Hi Steve,
I'm very happy you found the bug. We are running the version from SVN
on one server and it's already running fine for 5 hours. If it's still
stable tomorrow than
Avishai:
It sounds like you already understand mmap. Even so you might be
interested in this excellent writeup of MMapDirectory and Lucene by
Uwe: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
Best,
Erick
On Tue, Mar 18, 2014 at 7:23 AM, Avishai Ish-Shalom
On 3/18/2014 8:37 AM, Erick Erickson wrote:
It sounds like you already understand mmap. Even so you might be
interested in this excellent writeup of MMapDirectory and Lucene by
Uwe: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
There is some actual bad memory
Am 18.03.2014 15:26, schrieb Martin de Vries:
Martin, I’ve committed the SOLR-5875 fix, including to the
lucene_solr_4_7 branch.
Any chance you could test the fix?
Hi Steve,
I'm very happy you found the bug. We are running the version from SVN on
one server and it's already running fine for
I disabled softCommit and tried to run another indexing proccess.
Now I see no jetty EofException and no latency peaks..
I also noticed that when I had softcommit every 10 minutes, I also saw
spikes in the major GC (i use CMS) to around 9-10k.
Any idea?
Shawn Heisey-4 wrote
On 3/17/2014 7:07
Thanks,
Indeed, the subject line was misleading.
Then I will file a new improvement request for block atomic update
support.
On Tue, Mar 18, 2014 at 2:08 PM, Jack Krupansky j...@basetechnology.comwrote:
That's a reasonable request and worth a Jira, but different from what you
have specified
Thanks Joel - I decided upon another route - I was almost always grouping so
I am trying another model where we will store the data with fewer rows and a
few multivalue fields.
--
View this message in context:
I wonder whether this is a known bug. In previous SOLR cloud versions, 4.4
or maybe 4.5, an explicit optimize(), without any parameters, it usually
took 2 minutes for a 32 core cluster.
However, in 4.6.1, the same call took about 1 hour. Checking the index
modification time for each core shows 2
Varun,
You could use a function query involving “min” with a comma-separated list
of geodist clauses.
See https://cwiki.apache.org/confluence/display/solr/Spatial+Search
“Boost Nearest Results”. You’d replace the geodist() in there with
min(geodist(45.15,-93.85),geodist(50.2,22.3),…) (etc.)
Hi Shamik,
I see that you are using distributed search. With the
CollapsingQParserPlugin you need to have all the documents that are in the
same group on the same shard.
Is that the way you have the documents indexed?
Joel
Joel Bernstein
Search Engineer at Heliosearch
On Mon, Mar 17, 2014 at
Hi,
I had install lastest version of solr (4.7.0) and I want to try new
functionality to edit config files in AdminUI. But when I click on file, no
edit box appear!
This is info on my version :
Versions
*
solr-spec
4.7.0
*
solr-impl
4.7.0 1570806 - simon - 2014-02-22 08:36:23
*
Salman Akram [salman.ak...@northbaysolutions.net] wrote:
[Hundreds of GB index]
http://www.storagereview.com/micron_p420m_enterprise_pcie_ssd_review
May I ask why you have chosen a drive with such a high speed and matching cost?
We have some years of experience with using SSDs for search at
Hi Francois,
The config file editing functionality was pulled out of Solr before the 4.7
release; what remains is a read-only config directory browser/file viewer.
May I ask why you thought the config file editing functionality was in 4.7?
Steve
On Mar 18, 2014, at 4:39 PM, Francois Perron
Hello all!
I have a strange issue with my local SOLR install.
I have a search that sorts on a boolean field. This search is pulling the
following error: java.lang.String cannot be cast to
org.apache.lucene.util.BytesRef.
The search is over the dummy data that is included in the exampledocs. I
Joel,
I had a discussion with you earlier related ngroup inconsistent number
when you suggested to use the composite id to make sure that identical
(ADSKDedup) fields are available in the same shard.
Here's the thread --
On 3/18/2014 3:51 PM, AJ Lemke wrote:
I have a strange issue with my local SOLR install.
I have a search that sorts on a boolean field. This search is pulling the following
error: java.lang.String cannot be cast to org.apache.lucene.util.BytesRef.
The search is over the dummy data that is
Hi Steve,
This feature make sens for us because we don't have write access in production.
Anyway, I'll do a script to push config files updates directly to zookeeper and
reload the collection. But, it's always simpler when it's already integrated
in a admin tool.
Thank you for your time.
Did you change the schema at all?
No
Did you upgrade Solr from a previous version with the same index?
No
This was fresh install from the website.
Ran ant run-example
Killed that instance
Copied Example to Node1
Copied Example to Node2
Switched into Node1
java
hi all
I have some questions re shards.tolerant=true and timeAllowed=xxx
I have seen situations where shards.tolerant=true works; if one of the
shards specified in a query is dead, shards.tolerant seems to work and I get
results from the non-dead shards
However, if one of the shards goes down
I am running a 3 node zookeeper 3.4.5 Quorum. I am running into issues
with Zookeeper transaction logs
[myid:2] - ERROR [main:QuorumPeer@453] - Unable to load database on disk
java.io.IOException: Unreasonable length = 1048587
at
On 3/18/2014 5:46 PM, Chris W wrote:
I am running a 3 node zookeeper 3.4.5 Quorum. I am running into issues
with Zookeeper transaction logs
[myid:2] - ERROR [main:QuorumPeer@453] - Unable to load database on disk
java.io.IOException: Unreasonable length = 1048587
at
Hi Lajos,
Can you elaborate on the get the overflow when using a text field part?
The new SuggestComponent should work just as well for DocumentDictionary.
Thanks
Areek
On Mon, Mar 17, 2014 at 6:05 PM, Lajos la...@protulae.com wrote:
Hi Steve,
I've posted previously about a nice
Hi Lajos,
This can be due to the heavy query-time processing chain associated with
the TextField? You can also check out AnalyzingInfixLookupFactory, if the
suggestion entries are a bit long (this suggester will give matches, even
if the query matches a term in the middle of a suggestion entry.
Hi Solr Users,
I'm looking for advice on best practices when indexing large documents
(100's of MB or even 1 to 2 GB text files). I've been hunting around on
google and the mailing list, and have found some suggestions of splitting
the logical document up into multiple solr documents. However, I
Hi,
I think you probably want to split giant documents because you / your users
probably want to be able to find smaller sections of those big docs that
are best matches to their queries. Imagine querying War and Peace. Almost
any regular word your query for will produce a match. Yes, you may
SolrCloud will update Zookeeper on state changes (node goes to
recovery, comes back up etc) or for leader election and during
collection API commands. It doesn't correlate directly with indexing
but is correlated with how frequently you call commit.
On Wed, Mar 19, 2014 at 5:46 AM, Shawn Heisey
Thanks, Shawn and Shalin
How does the frequency of commit affect zookeeper?
Thanks
On Tue, Mar 18, 2014 at 9:12 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
SolrCloud will update Zookeeper on state changes (node goes to
recovery, comes back up etc) or for leader election and
Shalin, correlated with how frequently you call commit is it soft commit
or hard commit? , I guess it should be later one.
just curious what data it update to zookeeper during commit
On Tue, Mar 18, 2014 at 9:12 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
SolrCloud will update
Sorry guys I spoke too fast. I looked at the code again. No it doesn't
correlate with commits at all. I was mistaken.
On Wed, Mar 19, 2014 at 10:06 AM, Chris W chris1980@gmail.com wrote:
Thanks, Shawn and Shalin
How does the frequency of commit affect zookeeper?
Thanks
On Tue, Mar
53 matches
Mail list logo