Okay. I would do that too !
Thanks for sharing the link, I am already doing that. Basically, I am parsing
the documents to handle the content that I need to store in the fields and then
index them to my core in the Solr Server.
With the code below, I am able to deal with Word & Excel file format
Your question is more suitable for the Tika mailing list - it is better if you
ask there. You should share more code on what you are currently doing.
Here is the documentation on how to get a different output format:
https://tika.apache.org/1.8/examples.html#Parsing_using_the_Auto-Detect_Parser
I already spent a lot of time reading on the internet about the same, when I
was finished with all the trials and solutions, then only I posted my query
here.
I know time zones are different and you people are busy, I totally understand
it & highly appreciate your efforts!
Regarding my file for
People here are in different timezones, have their normal jobs for which they
are actually paid to provide answers to questions as those one below etc. There
are also a wide number of resources out on the Internet.
It can also not harm to read more about the formats that you are processing and
there are known perf issues in computing very large clusters
give it a try with the following rules
"FOO_CUSTOMER":[
{
"replica":"0",
"sysprop.HELM_CHART":"!FOO_CUSTOMER",
"strict":"true"},
{
"replica":"<2",
"node":"#ANY",
"strict":"fals
Hook up a profiler to the overseer and see what it's doing, file a JIRA and
note the hotspots or what methods appear to be hanging out.
On Tue, Sep 3, 2019 at 1:15 PM Andrew Kettmann
wrote:
>
> > You’re going to want to start by having more than 3gb for memory in my
> opinion but the rest of you
Guys, could I get any help ? Or it's useless posting queries over here ?
On Sep 3, 2019 4:00 PM, "Khare, Kushal (MIND)"
wrote:
Hello, mates !
I am extracting content from my documents using Apache Tika.
I need to exclude the headers & footers of the documents. I have already done
this for Word
On 9/3/2019 4:46 PM, Russell Bahr wrote:
Hi Shawn,
Here is a screenshot of one of the master nodes
solr4
Screen Shot 2019-09-03 at 3.37.08 PM.png
solr8
Screen Shot 2019-09-03 at 3.45.46 PM.png
Email attachments do not make it to the list. I cannot see those
pictures. You will need to use a
This really sounds like an XY problem. What do you need the SolrClient _for_? I
suspect there’s an easier way to do this…..
Best,
Erick
> On Sep 3, 2019, at 6:17 PM, Arnold Bronley wrote:
>
> Hi,
>
> Is there a way to create SolrClient from inside processAdd function for
> custom update proce
Hi Shawn,
Here is a screenshot of one of the master nodes
solr4
[image: Screen Shot 2019-09-03 at 3.37.08 PM.png]
solr8
[image: Screen Shot 2019-09-03 at 3.45.46 PM.png]
*Manzama*a MODERN GOVERNANCE company
Russell Bahr
Lead Infrastructure Engineer
USA & CAN Office: +1 (541) 306 3271
USA & CAN
Hi,
Is there a way to create SolrClient from inside processAdd function for
custom update processor for the same Solr on which it is executing?
On 9/3/2019 1:22 PM, Russell Bahr wrote:
Yes, some of our queries are quite complex due to a lot of very specific
positive as well as negative boosts, however, the query that I ran as the
base test after we found our queries were taking so long is just "
http://solr.obscured.com:8990/solr/content
What about combining:
1) KeywordRepeatFilterFactory
2) An existing folding filter (need to check it ignores Keyword marked word)
3) RemoveDuplicatesTokenFilterFactory
That may give what you are after without custom coding.
Regards,
Alex.
On Tue, 3 Sep 2019 at 16:14, Audrey Lorberfeld -
audrey
Toke,
Thank you! That makes a lot of sense.
In other news -- we just had a meeting where we decided to try out a hybrid
strategy. I'd love to know what you & everyone else thinks...
- Since we are concerned with the overhead created by "double-fielding" all
tokens per language (because I'm not
Hi Toke,
Also, if it helps, the content on each server is between around 6.2Gb and
7.8Gb.
Thanks,
Russ
*Manzama*a MODERN GOVERNANCE company
Russell Bahr
Lead Infrastructure Engineer
USA & CAN Office: +1 (541) 306 3271
USA & CAN Support: +1 (541) 706 9393
UK Office & Support: +44 (0)203 282 16
Hi Toke,
Yes, some of our queries are quite complex due to a lot of very specific
positive as well as negative boosts, however, the query that I ran as the
base test after we found our queries were taking so long is just "
http://solr.obscured.com:8990/solr/content/select?q=*%3A*&wt=json&indent=tr
On 9/3/2019 11:47 AM, dev beautiful wrote:
I want to subscribe solr mailing list.
When I sent a request, I got the following message.
Can you add this email address to the mailing list please?
Thank you.
Louis Choi
---
This is the mail system at host n3.nabble.com.
Nabble is a website th
Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote:
> Do you find that searching over both the original title field and the
> normalized title
> field increases the time it takes for your search engine to retrieve results?
It is not something we have measured as that index is fast enough (which
Hello,
I want to subscribe solr mailing list.
When I sent a request, I got the following message.
Can you add this email address to the mailing list please?
Thank you.
Louis Choi
---
This is the mail system at host n3.nabble.com.
I'm sorry to have to inform you that your message could not
ResourceLoader worked brilliantly - my brain, on the other hand, not so much
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> You’re going to want to start by having more than 3gb for memory in my
> opinion but the rest of your set up is more complex than I’ve dealt with.
right now the overseer is set to a max heap of 3GB, but is only using ~260MB of
heap, so memory doesn't seem to be the issue unless there is a par
Russell Bahr wrote:
> approximately 18 million documents
> *:* query across 10 times returning
> [13234, 18714, 13384, 12966, 12192, 18420, 16592, 15691, 13373, 12458]
>vs
> [93359, 94263, 86949, 90747, 91171, 91588, 87921, 88632, 88035, 89137]
Even the 12-18 seconds for Solr 4 is a long time, so
Hi,
I am trying to replace our solr4 cluster with a solr 8.1.1 cluster and am
running into a problem where searches are taking way to long to respond.
The clusters are set up with the same number of servers, same number of
shards, and same number of replicas. They are indexing the same documents,
Hello Solr Community!
*Problem*: I wish to know if the result document matched all the terms in
the query. The ranking used in solr works most of the time. For some cases
where one of the term is rare and occurs in couple of fields; such
documents trump a document which matches all the terms. Idea
Thanks Erick,
ulimit in all three lodes are more than 65K including max process list. If
you look at the timestamp the core down error happened ahead of unable to
create thread error, and also core down error took place in node1 and
unable to create thread error took place in node3.
BTW we are r
You’re going to want to start by having more than 3gb for memory in my opinion
but the rest of your set up is more complex than I’ve dealt with.
On Sep 3, 2019, at 1:10 PM, Andrew Kettmann
wrote:
>> How many zookeepers do you have? How many collections? What is there size?
>> How much CPU / m
> How many zookeepers do you have? How many collections? What is there size?
> How much CPU / memory do you give per container? How much heap in comparison
> to total memory of the container ?
3 Zookeepers.
733 containers/nodes
735 total cores. Each core ranges from ~4-10GB of index. (Autoscaling
I'm working on a custom tokenizer (Solr 7.3.0) whose Factory needs to read a
configuration file.
I have been able to run it successfully in my local reading from a local
directory.
I would like to be able to have the configuration read from zookeeper
(similarly to how SynonymGraphFilterFactory re
How many zookeepers do you have? How many collections? What is there size?
How much CPU / memory do you give per container? How much heap in comparison to
total memory of the container ?
> Am 03.09.2019 um 17:49 schrieb Andrew Kettmann :
>
> Currently our 7.7.2 cluster has ~600 hosts and each co
Currently our 7.7.2 cluster has ~600 hosts and each collection is using an
autoscaling policy based on system property. Our goal is a single core per host
(container, running on K8S). However as we have rolled more
containers/collections into the cluster any creation/move actions are taking a
h
If you have a properly secured cluster eg with Kerberos then you should not
update files in ZK directly. Use the corresponding Solr REST interfaces then
you also less likely to mess something up.
If you want to have HA you should have at least 3 Solr nodes and replicate the
collection to all t
Shankar:
Two things:
1> please do not hijack threads
2> Follow the instructions here:
http://lucene.apache.org/solr/community.html#mailing-lists-irc. You must use
the _exact_ same e-mail as you used to subscribe.
If the initial try doesn't work and following the suggestions at the "problems"
The “unable to create new thread” is where I’d focus first. It means you’re
running out of some system resources and it’s quite possible that your other
problems are arising from that root cause.
What are you “ulimit” settings? the number of file handles and processes should
be set to 65k at le
Having custom core.properties files is “fraught”. First of all, that file can
be re-written. Second, the collections ADDREPLICA command will create a new
core.properties file. Third, any mistakes you make when hand-editing the file
can have grave consequences.
What change exactly do you want to
On 9/3/2019 7:22 AM, Porritt, Ian wrote:
We have a schema which I have managed to upload to Zookeeper along with
the Solrconfig, how do I get the system to recognise both a lib/.jar
extension and a custom core.properties file? I bypassed the issue of the
core.properties by amending the update.a
Hi,
We are using 3 node SOLR (7.0.1) cloud setup 1 node zookeeper ensemble.
Each system has 16CPUs, 90GB RAM (14GB HEAP), 130 cores (3 replicas NRT)
with index size ranging from 700MB to 20GB.
autoCommit - 10 minutes once
softCommit - 30 Sec Once
We are facing the following problems in recent ti
Yeah, it beats me. If you've made sure that the security.json in
ZooKeeper is exactly the same as the one I posted but you're still
getting different results, then I'm stumped. Maybe someone else here
has an idea.
Out of curiosity, are you setting your security.json via the
authentication/author
Toke,
Do you find that searching over both the original title field and the
normalized title field increases the time it takes for your search engine to
retrieve results?
--
Audrey Lorberfeld
Data Scientist, w3 Search
Digital Workplace Engineering
CIO, Finance and Operations
IBM
audrey.lorberf
Languages are the best. Thank you all so much!
--
Audrey Lorberfeld
Data Scientist, w3 Search
Digital Workplace Engineering
CIO, Finance and Operations
IBM
audrey.lorberf...@ibm.com
On 8/30/19, 4:09 PM, "Walter Underwood" wrote:
The right transliteration for accents is language-dependen
Thank you, Erick!
--
Audrey Lorberfeld
Data Scientist, w3 Search
Digital Workplace Engineering
CIO, Finance and Operations
IBM
audrey.lorberf...@ibm.com
On 8/30/19, 3:49 PM, "Erick Erickson" wrote:
It Depends (tm). In this case on how sophisticated/precise your users are.
If your users
Hi,
I am relatively new to Solr especially Solr Cloud and have been using it for
a few days now. I think I have setup Solr Cloud correctly however would like
some guidance to ensure I am doing it correctly. I ideally want to be able
to process 40 million documents on production via Solr Cloud.
Tracked https://issues.apache.org/jira/browse/SOLR-13735 patches are
welcome.
On Mon, Sep 2, 2019 at 12:39 PM Vadim Ivanov <
vadim.iva...@spb.ntk-intourist.ru> wrote:
> Timeout causes DIH to finish with error message. So, If I check DIH
> response to be sure
> that DIH session have finished wit
Hi all,
If you're in town for Activate next week, we're running another free
Lucene Hackday on Tuesday:
https://www.meetup.com/Apache-Lucene-Solr-London-User-Group/events/263993681/
- do come along if you can! It's only a block and a half from the
Activate venue.
Cheers
Charlie
--
Charlie
Hi Jason,
Apologies for the late reply. My laptop was broken and I got it today from
service centre.
I am still having issues with solr-user able to view the Collections list
as follow.
Testing permissions for user [solr]
Request [/admin/collections?action=LIST] returned status [200]
Req
Hi Jörn,
I am not supplying the name in the update chain. I am not sure pysolr
supports it yet:
def __init__(
self,
url,
decoder=None,
timeout=60,
results_cls=Results,
search_handler="select",
use_qt_param=False,
always_commit=False,
auth=None,
verify=True,
):
How can I define it as default?
Ch
How do you send the request? You need to specify the update.chain parameter
with the name of the Update chain or define it as default
> Am 03.09.2019 um 12:14 schrieb Arturas Mazeika :
>
> Hi Solr Fans,
>
> I am trying to figure out how to use the parse-date processor for pdates.
>
> I am abl
Hello, mates !
I am extracting content from my documents using Apache Tika.
I need to exclude the headers & footers of the documents. I have already done
this for Word & Excel format using OfficeParseConfig, but need to implement the
same for PPT & PDF.
How to achieve that ?
___
Hi Solr Fans,
I am trying to figure out how to use the parse-date processor for pdates.
I am able to insert data with this python code to a solr collection/core:
solr = pysolr.Solr('http://localhost:/solr/core1', timeout=10)
solr.add([
{
"t": '2017-08-19T21:00:42.043Z',
}
])
Please remove my email id from this list.
On Tue, 3 Sep, 2019, 11:06 AM Akreeti Agarwal, wrote:
> Hello,
>
> Please help me with the solution for below error.
>
> Memory details of slave server:
> total used free sharedbuffers cached
> Mem: 15947 15460
49 matches
Mail list logo