Re: Import from S3

2016-11-24 Thread vrindavda
Thanks for the quick response Aniket, Do i need to make any specific configurations to get data from Amazon S3 storage ? -- View this message in context: http://lucene.472066.n3.nabble.com/Import-from-S3-tp4307382p4307384.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Import from S3

2016-11-24 Thread Aniket Khare
You can use Solr DIH for indexing csv data into solr. https://wiki.apache.org/solr/DataImportHandler On Fri, Nov 25, 2016 at 12:47 PM, vrindavda wrote: > Hello, > > I have some data in S3, say in text/CSV format, Please provide pointers how > can i ingest this data into

Import from S3

2016-11-24 Thread vrindavda
Hello, I have some data in S3, say in text/CSV format, Please provide pointers how can i ingest this data into Solr. Thank you, Vrinda Davda -- View this message in context: http://lucene.472066.n3.nabble.com/Import-from-S3-tp4307382.html Sent from the Solr - User mailing list archive at

Re: SOl6.3 Alchemy Annotator Not Working

2016-11-24 Thread soumitra80
I have managed to get it working. Now getting a exception for a different annotator. org.apache.solr.common.SolrException: processing error java.lang.NullPointerException. id=C:\IBMASSIGNMENTS\WatsonCognitive\data\WatsonConferenceRooms.pdf, text="null

Re: Search opening hours

2016-11-24 Thread David Smiley
I just saw this conversation now. I didn't read every word but I have to ask immediately: does DateRangeField address your needs? https://cwiki.apache.org/confluence/display/solr/Working+with+Dates It was introduced in 5.0. On Wed, Nov 16, 2016 at 4:59 AM O. Klein wrote: >

Re: Reload schema or configs failed then drop index, can not recreate that index.

2016-11-24 Thread Jerome Yang
Thanks Erick! On Fri, Nov 25, 2016 at 1:38 AM, Erick Erickson wrote: > This is arguably a bug. I raised a JIRA, see: > > https://issues.apache.org/jira/browse/SOLR-9799 > > Managed schema is not necessary to show this problem, generically if > you upload a bad config

Re: Best python 3 client for solrcloud

2016-11-24 Thread Dorian Hoxha
Hi Nick, What I care most is the low-level stuff to work good (like cloud, retries, zookeeper(i don't think that's needed for normal requests), maybe even routing to the right core/replica?). And your client looked best on an overview. On Thu, Nov 24, 2016 at 10:07 PM, Nick Vasilyev

Re: Best python 3 client for solrcloud

2016-11-24 Thread Nick Vasilyev
I am a comitter for https://github.com/moonlitesolutions/SolrClient. I think its pretty good, my aim with it is to provide several reusable modules for working with Solr in python. Not just querying, but working with collections indexing, reindexing, etc.. Check it out and let me know what you

Best python 3 client for solrcloud

2016-11-24 Thread Dorian Hoxha
Hi searchers, I see multiple clients for solr in python but each one looks like misses many features. What I need is for at least the low-level api to work with cloud (like retries on different nodes and nice exceptions). What is the best that you use currently ? Thank You!

Re: Metadata and Newline Characters at Content

2016-11-24 Thread Erick Erickson
Not sure. What have you tried? For production situations or when you want to take total control of the indexing process,I strongly recommend that you put the Tika parsing on the _client_. Here's a writeup on this topic: https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ Best, Erick

Re: Metadata and Newline Characters at Content

2016-11-24 Thread Furkan KAMACI
Hi Erick, When I check the *Solr* documentation I see that [1]: *In addition to Tika's metadata, Solr adds the following metadata (defined in ExtractingMetadataConstants):* *"stream_name" - The name of the ContentStream as uploaded to Solr. Depending on how the file is uploaded, this may or may

Re: Comparing a Date value in solr

2016-11-24 Thread Erick Erickson
bq: The requirement doesn't really let me use the query like that. Why not? Why can't you index a start date and end date? At ingestion time if your data is a start date and number of days the event (let's call it an event) will run, why not index a second field that contains the end date along

Re: Wildcard searches with space in TextField/StrField

2016-11-24 Thread Sandeep Khanzode
Hi All, Erick, Please suggest. Would like to use the ComplexPhraseQueryParser for searching text (with wildcard) that may contain special characters. For example ...John* should match John V. DoeJohn* should match Johnson SmithBruce-Willis* should match Bruce-WillisV.* should match John V. F.

Re: Reload schema or configs failed then drop index, can not recreate that index.

2016-11-24 Thread Erick Erickson
This is arguably a bug. I raised a JIRA, see: https://issues.apache.org/jira/browse/SOLR-9799 Managed schema is not necessary to show this problem, generically if you upload a bad config by whatever means, then RELOAD/DELETE/correct/CREATE it fails. The steps I outlined in the JIRA force the

Re: Zookeeper version

2016-11-24 Thread Erick Erickson
Well, 3.4.6 gets the most testing, so if you want to upgrade it's at your own risk. See: https://issues.apache.org/jira/browse/SOLR-8724, there are problems with 3.4.8 in the Solr context for instance. There's currently an open Zookeeper JIRA for 3.4.9 that, when fixed, Solr will try to upgrade

Re: Zookeeper version

2016-11-24 Thread Shawn Heisey
On 11/24/2016 3:12 AM, Novin Novin wrote: > I found in solr docs that "Solr currently uses Apache ZooKeeper > v3.4.6". Can I use higher version or I have to use 3.4.6 zookeeper. Solr should be fine working with zookeeper servers running any 3.4.x version. I believe 3.4.9 is the highest stable

Re: Need help to update multiple documents

2016-11-24 Thread Erick Erickson
_What_ issue? You haven't told us what the results are, what if anything the Solr logs show when you try this, in short anything that could help us diagnose the problem. Solr has "atomic updates" that work to update partial documents, but that requires that all fields be stored. Are you trying

Re: Metadata and Newline Characters at Content

2016-11-24 Thread Erick Erickson
about PatternCaptureGroupFilterFactory. This isn't going to help. The data you see when you return stored data is _before_ any analysis so the PatternFactory won't be applied. You could do this in a ScriptUpdateProcessorFactory. Or, just don't worry about it and have the real app deal with it.

Re: AW: AW: Resync after restart

2016-11-24 Thread Erick Erickson
Hold on. Are you using SolrCloud or not? There is a lot of talk here about masters and slaves, then you say "I always add slaves with the collection API", collections are a SolrCloud construct. It sounds like you're mixing the two. You should _not_ configure master/slave replication parameters

Re: Metadata and Newline Characters at Content

2016-11-24 Thread Furkan KAMACI
Hi Erick, 1) I am looking stored data via Solr Admin UI. I send the query and check what is in content field. 2) I can debug the Tika settings if you think that this is not the desired behaviour to have such metadata fields combined into content field. *PS: *Is there any solution to get rid of

Re: SolrCloud -Distribued Indexing

2016-11-24 Thread Shawn Heisey
On 11/23/2016 3:43 AM, Udit Tyagi wrote: > I am a solr user, I am using solr-6.3.0 version, I have some doubts > for > Distributed indexing and sharding in SolrCloud pease clarify, > > 1. How can I index documents to a specific shard(I heard about > document routing not documentation is not

Re: SOLR vs mongdb

2016-11-24 Thread Shawn Heisey
On 11/23/2016 11:27 AM, Prateek Jain J wrote: > 1. Solr is indexing engine but it stores both data and indexes in same > directory. Although we can select fields to store/persist in solr via > schema.xml. But in nutshell, it's not possible to distinguish between data > and indexes like, I

Re: Metadata and Newline Characters at Content

2016-11-24 Thread Erick Erickson
1> I'm assuming when you "see" this data you're looking at the stored data, right? It's a verbatim copy of whatever you sent to the field. I'm guessing it's a character-encoding mismatch between the source and what you use to display. 2> How are you extracting this data? There are Tika options I

Re: SOLR vs mongdb

2016-11-24 Thread Walter Underwood
Solr is not designed to be a repository, so don’t use it as a repository. If you want to keep the original copy of your data, put it in something designed to do that. It could be a database, it could be files in Amazon S3. wunder Walter Underwood wun...@wunderwood.org

Metadata and Newline Characters at Content

2016-11-24 Thread Furkan KAMACI
Hi, I'm testing Solr 4.9.1 I've indexed documents via it. Content field at schema has text_general field type which is not modified from original. I do not copy any fields to content. When I check the data I see content values as like: " \n \nstream_source_info MARLON BRANDO.rtf

Re: Query parser behavior with AND and negative clause

2016-11-24 Thread Alessandro Benedetti
Hey Sandeep, can you debug the query ( debugQuery=on) and show how the query is parsed ? Cheers On Thu, Nov 24, 2016 at 12:38 PM, Sandeep Khanzode < sandeep_khanz...@yahoo.com.invalid> wrote: > Hi Erick, > The example record contains ...dateRange1 = [2016-11-22T18:00:00Z TO >

RE: Again : Query formulation help

2016-11-24 Thread Prasanna S. Dhakephalkar
:( Thanks Michael. Regards, Prasanna. -Original Message- From: Michael Kuhlmann [mailto:k...@solr.info] Sent: Thursday, November 24, 2016 4:29 PM To: solr-user@lucene.apache.org Subject: Re: Again : Query formulation help Hi Prasanna, there's no such filter out-of-the-box. It's

Re: Query parser behavior with AND and negative clause

2016-11-24 Thread Sandeep Khanzode
Hi Erick, The example record contains ...dateRange1 = [2016-11-22T18:00:00Z TO 2016-11-22T20:00:00Z], [2016-11-22T06:00:00Z TO 2016-11-22T14:00:00Z]dateRange2 = [2016-11-22T12:00:00Z TO 2016-11-22T14:00:00Z]" The first query works ... which means that it is able to EXCLUDE this record from the

Re: Wildcard searches with space in TextField/StrField

2016-11-24 Thread Sandeep Khanzode
Hi, This is the typical TextField with ...                         SRK On Thursday, November 24, 2016 1:38 AM, Reth RM wrote: what is the fieldType of those records?   On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode

RE: SOLR vs mongdb

2016-11-24 Thread Prateek Jain J
Hi Walter, With the solr support to sharding, is the storage capability still in question? Or we are only talking about features like transaction logs, which can be used to re-build database. Regards, Prateek Jain -Original Message- From: Walter Underwood

Re: AW: AW: Resync after restart

2016-11-24 Thread Arkadi Colson
This is the code from the master node. Al configs are the same on all nodes. I always add slaves with the collection API. Is there an other place to look for this part of the config? On 24-11-16 12:02, Michael Aleythe, Sternwald wrote: You need to change this on the master node. The part of

Re: Need help to update multiple documents

2016-11-24 Thread GW
I've not looked at your file. If you are really thinking update, there is no such thing. You can only replace the entire document or delete it. On 23 November 2016 at 23:47, Reddy Sankar wrote: > Hi Team , > > > > Facing issue to update multiple document in SOLAR at

Re: Again : Query formulation help

2016-11-24 Thread Michael Kuhlmann
Hi Prasanna, there's no such filter out-of-the-box. It's similar to the mm parameter in (e)dismax parser, but this only works for full text searches on the same fields. So you have to build the query on your own using all possible permutations: fq=(code1: AND code2:) OR (code1: AND

AW: AW: Resync after restart

2016-11-24 Thread Michael Aleythe, Sternwald
You need to change this on the master node. The part of the config you pasted here, looks like it is from the slave node. -Ursprüngliche Nachricht- Von: Arkadi Colson [mailto:ark...@smartbit.be] Gesendet: Donnerstag, 24. November 2016 11:56 An: solr-user@lucene.apache.org Betreff: Re:

Re: AW: Resync after restart

2016-11-24 Thread Arkadi Colson
Hi Michael Thanks for the quick response! The line does not exist in my config. So can I assume that the default configuration is to not replicate at startup? 18.75 05:00:00 15 30 Any other idea's? On 24-11-16 11:49, Michael Aleythe,

AW: Resync after restart

2016-11-24 Thread Michael Aleythe, Sternwald
Hi Arkadi, you need to remove the line "startup" from your ReplicationHandler-config in solrconfig.xml -> https://wiki.apache.org/solr/SolrReplication. Greetings Michael -Ursprüngliche Nachricht- Von: Arkadi Colson [mailto:ark...@smartbit.be] Gesendet: Donnerstag, 24. November 2016

Again : Query formulation help

2016-11-24 Thread Prasanna S. Dhakephalkar
Hi, Need to formulate a distinctive field values query on 4 fields with minimum match on 2 fields I have 4 fields in my core Code 1 : Values between 1001 to Code 2 : Values between 1001 to Code 3 : Values between 1001 to Code 4 : Values between 1001 to I want to

RE: SOLR vs mongdb

2016-11-24 Thread Prateek Jain J
I have used Marklogic for around 6 months but its majorly used for custom ontologies and had serious issues once you start asking for more search results (other than default) in one go. Regards, Prateek Jain -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org]

Zookeeper version

2016-11-24 Thread Novin Novin
Hi Guys, I found in solr docs that "Solr currently uses Apache ZooKeeper v3.4.6". Can I use higher version or I have to use 3.4.6 zookeeper. Thanks in advance, Novin

Resync after restart

2016-11-24 Thread Arkadi Colson
Hi Almost every time when restarting a solr instance the index is replicated completely. Is there a way to avoid this somehow? The index currently has a size of about 17GB. Some advice here would be great. 99% of the config is defaul: ${solr.ulog.dir:}