Re: Issue serving concurrent requests to SOLR on PROD

2015-05-19 Thread Michael Della Bitta
Are you sure the requests are getting queued because the LB is detecting that Solr won't handle them? The reason why I'm asking is I know that ELB doesn't handle bursts well. The load balancer needs to warm up, which essentially means it might be underpowered at the beginning of a burst. It

Re: Applying Tokenizers and Filters to CopyFields

2015-03-26 Thread Michael Della Bitta
Glad you are sorted out! Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b

Re: Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Michael Della Bitta
happens at query time. Not sure if that's significant for you. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com

Re: Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Michael Della Bitta
is the query supposed to retrieve the lower-case version? (sorry, if this sounds like a naive question, but I have a feeling that I am missing something really basic here). Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East

Re: Solr and HDFS configuration

2015-03-24 Thread Michael Della Bitta
the most. Solr on HDFS currently doesn't have any sort of rack locality like there is with say HBase colocated on the HDFS nodes. So you can expect that even with Solr installed on the same nodes as your datanodes for HDFS, that there will be remote IO. Michael Della Bitta Senior Software Engineer

Re: 8 Shards of Cloud with 4.10.3.

2015-02-24 Thread Michael Della Bitta
time, but on the other hand, you don't have to maintain a Zookeeper ensemble or devote brain cells to understanding collections/shards/etc. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017

Re: 8 Shards of Cloud with 4.10.3.

2015-02-24 Thread Michael Della Bitta
Benson: Are you trying to run independent invocations of Solr for every node? Otherwise, you'd just want to create a 8 shard collection with maxShardsPerNode set to 8 (or more I guess). Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence

Re: incorrect Java version reported in solr dashboard

2015-02-23 Thread Michael Della Bitta
You're probably launching Solr using the older version of Java somehow. You should make sure your PATH and JAVA_HOME variables point at your Java 8 install from the point of view of the script or configuration that launches Solr. Hope that helps. Michael Della Bitta Senior Software Engineer o

Re: ignoring bad documents during index

2015-02-20 Thread Michael Della Bitta
At the layer right before you send that XML out, have it have a fallback option on error where it sends each document one at a time if there's a failure with the batch. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East

Re: Solr Logging files get high

2015-02-03 Thread Michael Della Bitta
If you're trying to do a bulk ingest of data, I recommend committing less frequently. Don't soft commit at all until the end of the batch, and hard commit every 60 seconds. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18

Re: Solr Logging files get high

2015-02-02 Thread Michael Della Bitta
If you'd like to reduce the amount of lines Solr logs, you need to edit the file example/resources/log4j.properties in Solr's home directory. Change lines that say INFO to WARN. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing

Re: Solr Logging files get high

2015-02-02 Thread Michael Della Bitta
Good call, it could easily be the tlog Nitin is talking about. As for which definition of high, I was making assumptions as well. :) Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t

Re: OutOfMemoryError for PDF document upload into Solr

2015-01-14 Thread Michael Della Bitta
Yep, you'll have to increase the heap size for your Tomcat container. http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial-heap-size-correctly Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st

Re: Solr limiting number of rows to indexed to 21500 every time.

2015-01-13 Thread Michael Della Bitta
if there are any errors on the Oracle side? Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com

Re: solrcloud nodes registering as 127.0.1.1

2015-01-12 Thread Michael Della Bitta
Another way of doing it is by setting the -Dhost=$hostname parameter when you start Solr. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions

Re: Running Multiple Solr Instances

2015-01-06 Thread Michael Della Bitta
I would do one of either: 1. Set a different Solr home for each instance. I'd use the -Dsolr.solr.home=/d/2 command line switch when launching Solr to do so. 2. RAID 10 the drives. If you expect the Solr instances to get uneven traffic, pooling the drives will allow a given Solr instance to

Re: solrcloud without faceting, i.e. for failover only

2015-01-06 Thread Michael Della Bitta
The downsides that come to mind: 1. Every write gets amplified by the number of nodes in the cloud. 1000 write requests end up creating 1000*N HTTP calls as the leader forwards those writes individually to all of the followers in the cloud. Contrast that with classical replication where only

Re: .htaccess / password

2015-01-06 Thread Michael Della Bitta
The Jetty servlet container that Solr uses doesn't understand those files. It would not use them to determine access, and would likely make them accessible to web requests in plain text. On 1/6/15 16:01, Craig Hoffman wrote: Thanks Otis. Do think a .htaccess / .passwd file in the Solr admin

Re: Endless 100% CPU usage on searcherExecutor thread

2014-12-18 Thread Michael Della Bitta
I've been experiencing this problem. Running VisualVM on my instances shows that they spend a lot of time creating WeakReferences (org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference that is). I think what's happening here is the heap's not big enough for Lucene's caches and it ends up

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Michael Della Bitta
, Michael Della Bitta wrote: Only thing you have to worry about (in both the CUSS and the home grown case) is a single bad document in a batch fails the whole batch. It's up to you to fall back to writing them individually so the rest of the batch makes it in. With CUSS, your program will never

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Michael Della Bitta
Tom: ConcurrentUpdateSolrServer isn't magic or anything. You could pretty trivially write something that takes batches of your XML documents and combines them into a single document (multiple doc tags in the add section) and sends them up to Solr and achieve some of the same speed benefits.

Re: Question on Solr Caching

2014-12-04 Thread Michael Della Bitta
Hi, Manohar, 1. Does posting-list and term-list of the index reside in the memory? If not, how to load this to memory. I don't want to load entire data, like using DocumentCache. Either I want to use RAMDirectoryFactory as the data will be lost if you restart If you use MMapDirectory, Lucene

Re: Dealing with bad apples in a SolrCloud cluster

2014-11-21 Thread Michael Della Bitta
Good discussion topic. I'm wondering if Solr doesn't need some sort of shoot the other node in the head functionality. We ran into one of failure modes that only AWS can dream up recently, where for an extended amount of time, two nodes in the same placement group couldn't talk to one

Re: Handling growth

2014-11-20 Thread Michael Della Bitta
? On Nov 18, 2014 11:49 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: We're achieving some success by treating aliases as collections and collections as shards. More specifically, there's a read alias that spans all the collections, and a write alias that points at the 'latest

Re: Handling growth

2014-11-18 Thread Michael Della Bitta
We're achieving some success by treating aliases as collections and collections as shards. More specifically, there's a read alias that spans all the collections, and a write alias that points at the 'latest' collection. Every week, I create a new collection, add it to the read alias, and

Re: Can we query on _version_field ?

2014-11-13 Thread Michael Della Bitta
You could also find a natural key that doesn't look like an ID and create a name-based (Type 3) UUID out of it, with something like Java's nameUUIDFromBytes: https://docs.oracle.com/javase/7/docs/api/java/util/UUID.html#nameUUIDFromBytes%28byte%5B%5D%29 Implementations of this exist in other

Re: Lucene to Solrcloud migration

2014-11-11 Thread Michael Della Bitta
On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi Michal, Is there a particular reason to shard your collections like that? If it was mainly for ease of operations, I'd consider just using CompositeId to prevent specific types of queries

Re: Lucene to Solrcloud migration

2014-11-10 Thread Michael Della Bitta
Hi Michal, Is there a particular reason to shard your collections like that? If it was mainly for ease of operations, I'd consider just using CompositeId to prevent specific types of queries hotspotting particular nodes. If your ingest rate is fast, you might also consider making each

Re: how do I stop queries from being logged in two different log files in Tomcat

2014-11-10 Thread Michael Della Bitta
I generally turn off the console logging when I install Tomcat. It flushes after every line, unlike the other handlers, and that's sort of a performance problem (although if you need that, you need that). Basically, find logging.properties in Tomcat's conf directory, and change these two

Re: Migrating shards

2014-11-07 Thread Michael Della Bitta
1. The new replica will not begin serving data until it's all there and caught up. You can watch the replica status on the Cloud screen to see it catch up; when it's green, you're done. If you're trying to automate this, you're going to look for the replica that says recovering in

Re: Is there a way to stop some hyphenated terms from being tokenized

2014-11-05 Thread Michael Della Bitta
Pretty sure what you need is called KeywordMarkerFilterFactory. |filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt /| On 11/5/14 17:24, Tang, Rebecca wrote: Hi there, For some hyphenated terms, I want them to stay as is instead of being tokenized. For example:

Re: Solr Cloud Management Tools

2014-11-04 Thread Michael Della Bitta
http://sematext.com/spm/ Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b

Re: Automating Solr

2014-10-30 Thread Michael Della Bitta
You probably just need to put double quotes around the url. On 10/30/14 15:27, Craig Hoffman wrote: Thanks! One more question. WGET seems to choking on a my URL in particular the # and the character . What’s the best method escaping? http://My Host

Re: solr-map-reduce API

2014-10-29 Thread Michael Della Bitta
Check this out: http://www.slideshare.net/cloudera/solrhadoopbigdatasearch On 10/29/14 16:31, Pritesh Patel wrote: What exactly does this API do? --Pritesh

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-28 Thread Michael Della Bitta
We index directly from mappers using SolrJ. It does work, but you pay the price of having to instantiate all those sockets vs. the way MapReduceIndexerTool works, where you're writing to an EmbeddedSolrServer directly in the Reduce task. You don't *need* to use MapReduceIndexerTool, but it's

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-28 Thread Michael Della Bitta
toward becoming proficient with one, I would recommend against it. On 10/28/14 15:27, S.L wrote: I m using Apache Hadoop and Solr , do I nee dto switch to Cloudera On Tue, Oct 28, 2014 at 1:27 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: We index directly from mappers

Re: Solr + HDFS settings

2014-10-27 Thread Michael Della Bitta
This doesn't answer your question, but unless something is changed, you're going to want to set this to false. It causes index corruption at the moment. On 10/25/14 03:42, Norgorn wrote: bool name=solr.hdfs.blockcache.write.enabledtrue/bool

Re: SolrCloud config question and zookeeper

2014-10-27 Thread Michael Della Bitta
You want external zookeepers. Partially because you don't want your Solr garbage collections holding up zookeeper availability, but also because you don't want your zookeepers going offline if you have to restart Solr for some reason. Also, you want 3 or 5 zookeeepers, not 4 or 8. On

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-27 Thread Michael Della Bitta
I'm curious, could you elaborate on the issue and the partial fix? Thanks! On 10/27/14 11:31, Markus Jelsma wrote: It is an ancient issue. One of the major contributors to the issue was resolved some versions ago but we are still seeing it sometimes too, there is nothing to see in the logs.

Re: Solr replicas - stop replication and start again

2014-10-20 Thread Michael Della Bitta
Andrei, I'm wondering if you've considered using Classic replication for this use case. It seems better suited for it. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions

Re: Solr replicas - stop replication and start again

2014-10-20 Thread Michael Della Bitta
Yes, that's what I'm suggesting. It seems a perfect fit for a single shard collection with an offsite remote that you don't always want to write to. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New

Re: Inconsistent response time

2014-10-03 Thread Michael Della Bitta
Hi Scott, Any chance this could be an IPv6 thing? What if you start both server and client with this flag: -Djava.net.preferIPv4Stack=true Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY

Re: Upgrade from solr 4.4 to 4.10.1

2014-10-02 Thread Michael Della Bitta
take advantage of some of the file format improvements. However, it is somewhat of a design smell that you can't reindex. In my experience, it is extremely valuable to be able to reindex your data at will. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science

Re: Upgrade from solr 4.4 to 4.10.1

2014-10-02 Thread Michael Della Bitta
Yes, you can just do something like curl http://mysolrserver:mysolrport/solr/mycollectionname/update?optimize=true;. You should expect heavy disk activity while this completes. I wouldn't do more than one collection at a time. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062

Re: Does Solr handle an sshfs mounted index

2014-10-02 Thread Michael Della Bitta
Grainne, I would recommend that you do not do this. In fact, I would recommend you not use NFS as well, although that’s more likely to work, just not ideally. Solr’s going to do best when it’s working with fast, local storage that the OS can cache natively. Michael Della Bitta Senior Software

Re: Solr and hadoop

2014-09-25 Thread Michael Della Bitta
Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the Morphline stuff (check out https://github.com/markrmiller/solr-map-reduce-example). Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street

Re: SolrCloud Slow to boot up

2014-09-25 Thread Michael Della Bitta
1. What version of Solr are you running? 2. Have you made substantial changes to solrconfig.xml? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com

Re: Performance of Unsorted Queries

2014-09-16 Thread Michael Della Bitta
/CommonQueryParameters#Deep_paging_with_cursorMark That's the magic knock that will get you what you want. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https

Re: Solr Exceptions -- immense terms

2014-09-15 Thread Michael Della Bitta
, but my hunch is that you'll be happier in general with the behavior of a field type that does tokenizing and stemming for plain text search anyway. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York

Re: Solr Exceptions -- immense terms

2014-09-15 Thread Michael Della Bitta
into a different field that has the keyword tokenizer? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https

Re: Moving to HDFS, How to merge indices from 8 servers ?‏‏

2014-09-15 Thread Michael Della Bitta
not really aimed at preserving uptime as far as I know. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https

Re: Moving to HDFS, How to merge indices from 8 servers ?‏‏

2014-09-15 Thread Michael Della Bitta
If all you need is better availability, I would start by trying out an additional replica of each shard on a different box, so each box would be serving the data for 2 shards and each shard would be available on 2 boxes. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Michael Della Bitta
. I could be wrong. It's probably best if you post your field definition from your schema. Also, is this a free-text field, or something that's more like a short string? Thanks, Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Michael Della Bitta
If that's your problem, I bet all you have to do is twiddle on one of the catenate options, either catenateWords or catenateAll. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t

Re: Solr and HDFS

2014-08-29 Thread Michael Della Bitta
with 4.8.1 pointing at a CDH 5 HDFS, and a production cluster with 4.9 as well. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g

Re: Questions about caching and HDFSDirectory

2014-08-25 Thread Michael Della Bitta
/csug_tuning_solr.html If anyone has anything to add or correct about these two resources, please let me know! Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com

Re: Copying a collection from one version of SOLR to another

2014-08-25 Thread Michael Della Bitta
Hi Philippe, You can indeed copy an index like that. The problem probably arises because 4.9.0 is using core discovery by default. This wiki page will shed some light: https://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29 Michael Della Bitta Applications Developer o: +1 646

Questions about caching and HDFSDirectory

2014-08-22 Thread Michael Della Bitta
just to be sure? Thanks, Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b

Re: Auto Complete

2014-08-06 Thread Michael Della Bitta
You'd still need to modify that schema to use the ASCII folding filter. Alternatively, if you want something off the shelf, you might check out Sematext's autocomplete product: http://www.sematext.com/products/autocomplete/index.html Michael Della Bitta Applications Developer o: +1 646 532

Re: Auto Complete

2014-08-05 Thread Michael Della Bitta
contain the same term? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b

Re: Auto Complete

2014-08-05 Thread Michael Della Bitta
version of you field for display, so your accented characters would not get stripped. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g

Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Michael Della Bitta
propagate over to Solr. http://www.ngdata.com/on-lily-hbase-hadoop-and-solr/ Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com

Re: Auto Complete

2014-08-04 Thread Michael Della Bitta
You need to use this filter in your analysis chain: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY

Re: Auto Complete

2014-08-04 Thread Michael Della Bitta
How are you implementing autosuggest? I'm assuming you're querying an indexed field and getting a stored value back. But there are a wide variety of ways of doing it. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East

Re: Stand alone Solr - no zookeeper?

2014-08-04 Thread Michael Della Bitta
at this: http://www.lucenerevolution.org/sites/default/files/Living%20with%20Garbage.pdf Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g

Re: solr boosting any perticular URL

2014-07-17 Thread Michael Della Bitta
Rahul, Check out the relevancy FAQ. You probably want to boost that field value at index time, or use the query elevation component. http://wiki.apache.org/solr/SolrRelevancyFAQ Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing

Re: java.net.SocketException: Connection reset

2014-07-07 Thread Michael Della Bitta
to commits and optimizes? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b

Re: java.net.SocketException: Connection reset

2014-07-03 Thread Michael Della Bitta
What's the %system load on your nodes? What servlet container are you using? Are you writing a single document per update, or in batches? How many clients are attached to your cloud? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence

Re: OCR - Saving multi-term position

2014-07-02 Thread Michael Della Bitta
for findability reasons and I heard it works out OK. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Michael Della Bitta
Alex, maybe you're thinking of constraints put on shard keys? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions

Am I being dense? Or are real-time gets not exposed in SolrJ?

2014-06-25 Thread Michael Della Bitta
. Is that a bad idea? It seems like there might be some overhead to having several going in the same process that could be avoided, but maybe I'm overcomplicating things. Thanks, Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18

Re: Does one need to perform an optimize soon after doing a batch indexing using SolrJ ?

2014-06-24 Thread Michael Della Bitta
. If you're just worried about the segment count, you can tune that in solrconfig.xml and Solr will merge down your index on the fly as it indexes. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY

Re: SolrCloud copy the index to another cluster.

2014-06-24 Thread Michael Della Bitta
I'm currently playing around with Solr Cloud migration strategies, too. I'm wondering... when you say zero downtime, do you mean zero *read* downtime, or zero downtime altogether? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing

Re: SolrCloud copy the index to another cluster.

2014-06-24 Thread Michael Della Bitta
tool to bring yourself up to the new version. Part of this for me is a migration to HDFSDirectory so there's an added level of complication there. I would assume that since you only need to preserve reads, you could cut over once your collections were created on the new cloud? Michael Della Bitta

Re: Restricting access to reading full text document field

2014-06-23 Thread Michael Della Bitta
Unfortunately, it's not really advisable to allow open access to Solr to the open web. There are many avenues of DOSing a Solr install otherwise, and depending on how it's configured, some more intrusive vulnerabilities. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions

Re: Restricting access to reading full text document field

2014-06-23 Thread Michael Della Bitta
. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336

Looking for migration stories to an HDFS-backed Solr Cloud

2014-06-18 Thread Michael Della Bitta
or experiences you might be able to share would be helpful. In the meantime, I'm going to start experimenting with some of these approaches. Thanks! Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY

Re: solrj error

2014-06-17 Thread Michael Della Bitta
/currency.data file either. Is it possible that you have somehow used a mismatched JAVA_HOME and tools.jar somehow? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com

Re: Tomcat restart removes the Core.

2014-06-05 Thread Michael Della Bitta
Did you put that attribute on the root element, or somewhere else? The beginning of solr.xml should look like this: ?xml version=1.0 encoding=UTF-8 ? solr sharedLib=lib persistent=true Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence

Re: Tomcat restart removes the Core.

2014-06-04 Thread Michael Della Bitta
Any chance you don't have a persistent=true attribute in your solr.xml? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com

Re: search using Ngram.

2014-05-29 Thread Michael Della Bitta
rank higher. Or, you could use the Suggester component or one of the other bolt-on autocomplete components instead. Maybe you should post your current field definition and let us know specifically what you're trying to achieve? Michael Della Bitta Applications Developer o: +1 646 532 3062

Re: Percolator feature

2014-05-29 Thread Michael Della Bitta
We've definitely looked at Luwak before... nice to hear it might be being brought closer into the Solr ecosystem!

Re: index a repository of documents(.doc) without using post.jar

2014-05-23 Thread Michael Della Bitta
There's an example of using curl to make a REST call to update a core on this page: https://wiki.apache.org/solr/UpdateXmlMessages If that doesn't help, please let us know what error you're receiving. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science

Re: How to Disable Commit Option and Just Manage it via SolrConfig?

2014-05-22 Thread Michael Della Bitta
Just a thought: If your users can send updates and you can't trust them, how can you keep them from deleting all your data? I would consider using a servlet filter to inspect the request. That would probably be non-trivial if you plan to accept javabin requests as well. Michael Della Bitta

Re: solr problem after indexing, shutdown and startup

2014-05-21 Thread Michael Della Bitta
with a maxTime somewhat larger than your soft commit setting, somewhere in the low minutes range. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g

Re: Storing tweets For WC2014

2014-05-16 Thread Michael Della Bitta
no mass loading. Additionally, we generally do bulk data collection across only 3 days of data, so if you're looking to do a mess of reporting against your full set, take that into consideration. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence

Cloudera Manager install

2014-05-16 Thread Michael Della Bitta
if anybody has experience with installing a fairly new version of Solr, say 4.7 or 4.8, through Cloudera Manager. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https

Re: Solrj Default Data Format

2014-05-13 Thread Michael Della Bitta
Hi Furkan, If I were to guess, the XML format is more cross-compatible with different versions of SolrJ. But it might not be intentional. In any case, feeding your SolrServer a BinaryResponseParser will switch it over to javabin. Michael Della Bitta Applications Developer o: +1 646 532 3062

Re: Solr interface

2014-04-07 Thread Michael Della Bitta
The speed of ingest via HTTP improves greatly once you do two things: 1. Batch multiple documents into a single request. 2. Index with multiple threads at once. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st

Re: zookeeper reconnect failure

2014-03-28 Thread Michael Della Bitta
you get an immediate notification of the problem, and to install some sort of caching server like nscd if you expect to have DNS resolution failures regularly. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street

Re: Replication (Solr Cloud)

2014-03-25 Thread Michael Della Bitta
collection with the collections API and the required bits are in schema.xml and solrconfig.xml, you should be good to go. See https://wiki.apache.org/solr/SolrCloud#Required_Config Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing

Re: Solr Cloud collection keep going down?

2014-03-25 Thread Michael Della Bitta
-XX:CMSInitiatingOccupancyFraction ? Just a shot in the dark, since I'm not familiar with Jetty's logging statements, but that looks like plain old dropped HTTP sockets to me. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st

Re: Solr4 performance

2014-02-27 Thread Michael Della Bitta
), but I don't know if there's any definitive information about how to set them appropriately for Solr. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com

Re: Solr4 performance

2014-02-24 Thread Michael Della Bitta
I'm not sure how you're measuring free RAM. Maybe this will help: http://www.linuxatemyram.com/play.html Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com

Re: Best way to get results ordered

2014-02-21 Thread Michael Della Bitta
Hi Metin, How many IDs are you supplying in a single query? You could probably accomplish this easily with boosts if it were few. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t

RE: Solr4 performance

2014-02-20 Thread Michael Della Bitta
Hi, As for your first question, setting openSearcher to true means you will see the new docs after every hard commit. Soft and hard commits only become isolated from one another with that set to false. Your second problem might be explained by your large heap and garbage collection. Walking a

Re: Best way to copy data from SolrCloud to standalone Solr?

2014-02-17 Thread Michael Della Bitta
to protect the guilty) The admin handler for replication doesn't seem to be there, but the actual API seems to work normally. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions

Re: Boost Query Example

2014-02-17 Thread Michael Della Bitta
Hi, Filter queries don't affect score, so boosting won't have an effect there. If you want those query terms to get boosted, move them into the q parameter. http://wiki.apache.org/solr/CommonQueryParameters#fq Hope that helps! Michael Della Bitta Applications Developer o: +1 646 532 3062

RE: JVM heap constraints and garbage collection

2014-02-03 Thread Michael Della Bitta
should be using. On Feb 1, 2014 1:51 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Michael Della Bitta [michael.della.bi...@appinions.com] wrote: Here at Appinions, we use mostly m2.2xlarges, but the new i2.xlarges look pretty tasty primarily because of the SSD, and I'll probably push

Re: JVM heap constraints and garbage collection

2014-01-31 Thread Michael Della Bitta
Here at Appinions, we use mostly m2.2xlarges, but the new i2.xlarges look pretty tasty primarily because of the SSD, and I'll probably push for a switch to those when our reservations run out. http://www.ec2instances.info/ Michael Della Bitta Applications Developer o: +1 646 532 3062

  1   2   3   4   5   >