Re: Smart Table creation for 2D range query

2017-05-09 Thread Jim Ancona
er. > > Jon > > On May 9, 2017, at 10:11 AM, Jim Ancona <j...@anconafamily.com> wrote: > > There are clever ways to encode coordinates into a single scalar value > where points that are close on a surface are also close in value, making > queries efficient. Examples are Ge

Re: Smart Table creation for 2D range query

2017-05-09 Thread Jim Ancona
There are clever ways to encode coordinates into a single scalar value where points that are close on a surface are also close in value, making queries efficient. Examples are Geohash and Google's S2

Re: Effective partition key for time series data, which allows range queries?

2017-04-05 Thread Jim Ancona
finitely don't want to > paint yourself into a corner where you need a smaller bucket size but your > data model didn't leave room for it. > > On Tue, Apr 4, 2017 at 2:59 PM Jim Ancona <j...@anconafamily.com> wrote: > >> The typical recommendation for maximum partit

Re: Effective partition key for time series data, which allows range queries?

2017-04-04 Thread Jim Ancona
The typical recommendation for maximum partition size is on the order of 100mb and/or 100,000 rows. That's not a hard limit, but you may be setting yourself up for issues as you approach or exceed those numbers. If you need to reduce partition size, the typical way to do this is by "bucketing,"

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-22 Thread Jim Ancona
To answer DuyHai's question without introducing new syntax, I'd suggest: LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' So the first two %'s are translated to a literal, non-wildcard % and the third % is a wildcard because it's not doubled. Jim On Thu, Sep 22, 2016 at 11:40 AM,

Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-12 Thread Jim Ancona
Mark, Is there some official Apache policy on which sites it's appropriate to link to on an Apache mailing list? If so, could you please post a link to it so we can all understand the rules. Or is this your personal opinion on what you'd like to see here? Thanks! On Mon, Sep 12, 2016 at 7:34

Re: Support/Consulting companies

2016-08-19 Thread Jim Ancona
There's also a list of companies that provide Cassandra-related services on the wiki: https://wiki.apache.org/cassandra/ThirdPartySupport Jim On Fri, Aug 19, 2016 at 3:37 PM, Chris Tozer wrote: > Instaclustr ( Instaclustr.com ) also offers Cassandra consulting > >

Re: Migrating to CQL and Non Compact Storage

2016-04-11 Thread Jim Ancona
as Row Scans are >> concerned: >> https://www.oreilly.com/ideas/apache-cassandra-for-analytics-a-performance-and-storage-analysis >> >> The flexibility of Cql comes at heavy cost until 3.x. >> >> >> >> Thanks >> Anuj >> Sent from Yahoo Mail on

Re: Migrating to CQL and Non Compact Storage

2016-04-11 Thread Jim Ancona
Jack, the Datastax link he posted ( http://www.datastax.com/dev/blog/thrift-to-cql3) says that for column families with mixed dynamic and static columns: "The only solution to be able to access the column family fully is to remove the declared columns from the thrift schema altogether..." I think

Re: Is it possible to achieve "sticky" request routing?

2016-04-05 Thread Jim Ancona
Jon and Steve: I don't understand your point. The TokenAwareLoadBalancer identifies the nodes in the cluster that own the data for a particular token and route requests to one of them. As I understand it, the OP wants to send requests for a particular token to the same node every time (assuming

Re: best ORM for cassandra

2016-02-10 Thread Jim Ancona
Recent versions of the Datastax Java Driver include an object mapping API that might work for you: http://docs.datastax.com/en/latest-java-driver/java-driver/reference/objectMappingApi.html Jim On Wed, Feb 10, 2016 at 4:29 AM, Nirmallya Mukherjee wrote: > I have heard of

Re: Writing a large blob returns WriteTimeoutException

2016-02-08 Thread Jim Ancona
The "if not exists" in your INSERT means that you are incurring a performance hit by using Paxos. Do you need that? Have you tried your test without it? Jim

Re: Cassandra Connection Pooling

2016-01-28 Thread Jim Ancona
It's typically handled by your client (e.g. https://docs.datastax.com/en/latest-java-driver/index.html) along with retries, timeouts and all the other things you would put in your datasource config for a SQL database in JBoss. On Thu, Jan 28, 2016 at 5:31 PM, KAMM, BILL wrote:

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-06 Thread Jim Ancona
ltiple concurrent writers. What happens when you change the number of buckets? Does existing data have to be re-written into new buckets? If so, how do you make sure that's only done once for each bucket size increase? Or perhaps I'm misunderstanding your suggestion? Jim > On Tue, Jan 5, 2

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
ch partition one at a time. > > Unfortunately due to the artificial partition key segment you cannot > iterate or page in any particular order...(at least across partitions) > Unless your hash function can also provide you some ordering guarantees. > > It all just depends on your requirem

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
ional. Hence my reference to a "nasty distributed consensus problem" and Clint's reference to an "anti-pattern". I'd like to avoid it if I can. Jim > > -- Jack Krupansky > > On Tue, Jan 5, 2016 at 11:07 AM, Jim Ancona <j...@anconafamily.com> wrote: > >> Tha

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Hi Nate, Yes, I've been thinking about treating customers as either small or big, where "small" ones have a single partition and big ones have 50 (or whatever number I need to keep sizes reasonable). There's still the problem of how to handle a small customer who becomes too big, but that will

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
; > Clint > On Jan 5, 2016 2:28 PM, "Jim Ancona" <j...@anconafamily.com> wrote: > >> Hi Nate, >> >> Yes, I've been thinking about treating customers as either small or big, >> where "small" ones have a single partition and big ones h

Data Modeling: Partition Size and Query Efficiency

2016-01-04 Thread Jim Ancona
A problem that I have run into repeatedly when doing schema design is how to control partition size while still allowing for efficient multi-row queries. We want to limit partition size to some number between 10 and 100 megabytes to avoid operational issues. The standard way to do that is to

Re: Replicating Data Between Separate Data Centres

2015-12-14 Thread Jim Ancona
Could you define what you mean by Casual Consistency and explain why you think you won't have that when using LOCAL_QUORUM? I ask because LOCAL_QUORUM and multiple data centers are the way many of us handle DR, so I'd like to understand why it doesn't work for you. I'm afraid I don't understand

Re: Cassandra users survey

2015-10-01 Thread Jim Ancona
Hi Jonathan, The survey asks about "your application." We have multiple applications using Cassandra. Are you looking for information about each application separately, or the sum of all of them? Jim On Wed, Sep 30, 2015 at 2:18 PM, Jonathan Ellis wrote: > With 3.0

Re: How to store unique visitors in cassandra

2015-04-01 Thread Jim Ancona
Very interesting. I had saved your email from three years ago in hopes of an elegant answer. Thanks for sharing! Jim On Tue, Mar 31, 2015 at 8:16 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: People keep asking me if we finally found a solution (even if this is 3+ years old) so I will just

Re: What % of cassandra developers are employed by Datastax?

2014-05-23 Thread Jim Ancona
I took a look at the Ohloh stats here: https://www.ohloh.net/p/cassandra/contributors/summary Note that committers are not the same as contributors. Dozens of people contribute patches that are committed to the codebase without being committers. Over the last year, the top four contributors

Re: is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-09-06 Thread Jim Ancona
Unfortunately, Netflix doesn't seem to have released Aegisthus as open source. Jim On Fri, Aug 30, 2013 at 1:44 PM, Jeremiah D Jordan jeremiah.jor...@gmail.com wrote: FYI: http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html -Jeremiah On Aug 30, 2013, at 9:21 AM,

Re: vnodes ready for production ?

2013-06-19 Thread Jim Ancona
On Tue, Jun 18, 2013 at 4:04 AM, aaron morton aa...@thelastpickle.com wrote: Even more if we could automate some up-scale thanks to AWS alarms, It would be awesome. I saw a demo for Priam (https://github.com/Netflix/Priam) doing that at netflix in March, not sure if it's public yet. Are the

nodetool cfstats and compression

2012-09-14 Thread Jim Ancona
Do the row size stats reported by 'nodetool cfstats' include the effect of compression? Thanks, Jim

Re: What determines the memory that used by key cache??

2012-06-18 Thread Jim Ancona
On Mon, Jun 18, 2012 at 8:53 AM, mich.hph mich@gmail.com wrote: Dear all! In my cluster, I found every key needs 192bytes in the key cache.So I want to know what determines the memory that used by key cache. How to calculate the value. According to

Re: Cassandra error while processing message

2012-06-15 Thread Jim Ancona
It's hard to tell exactly what happened--are there other messages in your client log before the All host pools marked down? Also, how many nodes are there in your cluster? I suspect that the Thrift protocol error was (incorrectly) retried by Hector, leading to the All host pools marked down, but

Re: Secondary Indexes, Quorum and Cluster Availability

2012-06-07 Thread Jim Ancona
On 7/06/2012, at 7:54 AM, Jim Ancona wrote: On Tue, Jun 5, 2012 at 4:30 PM, Jim Ancona j...@anconafamily.com wrote: It might be a good idea for the documentation to reflect the tradeoffs more clearly. Here's a proposed addition to the Secondary Index FAQ at http://wiki.apache.org

Re: Secondary Indexes, Quorum and Cluster Availability

2012-06-06 Thread Jim Ancona
On Tue, Jun 5, 2012 at 4:30 PM, Jim Ancona j...@anconafamily.com wrote: It might be a good idea for the documentation to reflect the tradeoffs more clearly. Here's a proposed addition to the Secondary Index FAQ at http://wiki.apache.org/cassandra/SecondaryIndexes Q: How does choice

Re: Secondary Indexes, Quorum and Cluster Availability

2012-06-05 Thread Jim Ancona
by the partitioner. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/06/2012, at 5:15 AM, Jim Ancona wrote: Hi, We have an application with two code paths, one of which uses a secondary index query and the other, which doesn't

Re: Cassandra 1.1.1 release?

2012-06-02 Thread Jim Ancona
The release vote is going on now on the dev list. So probably in the next day or two, assuming no problems pop up. Jim On Wed, May 30, 2012 at 1:29 PM, Roland Mechler rmech...@sencha.com wrote: Anyone have a rough idea of when Cassandra 1.1.1 is likely to be released? -Roland

Secondary Indexes, Quorum and Cluster Availability

2012-06-01 Thread Jim Ancona
Hi, We have an application with two code paths, one of which uses a secondary index query and the other, which doesn't. While testing node down scenarios in our cluster we got a result which surprised (and concerned) me, and I wanted to find out if the behavior we observed is expected.

Re: single row key continues to grow, should I be concerned?

2012-03-23 Thread Jim Ancona
I'm dealing with a similar issue, with an additional complication. We are collecting time series data, and the amount of data per time period varies greatly. We will collect and query event data by account, but the biggest account will accumulate about 10,000 times as much data per time period as

Re: yet a couple more questions on composite columns

2012-02-06 Thread Jim Ancona
must conform to that. Jim On Sat, Feb 4, 2012 at 6:24 PM, Jim Ancona j...@anconafamily.com wrote: I've used special values which still comply with the Composite schema for the metadata columns, e.g. a column of 1970-01-01:{accountId} for a metadata column where the Composite

Re: yet a couple more questions on composite columns

2012-02-04 Thread Jim Ancona
I've used special values which still comply with the Composite schema for the metadata columns, e.g. a column of 1970-01-01:{accountId} for a metadata column where the Composite is DateType:UTF8Type. Jim On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun yiming@gmail.com wrote: Thanks Andrey and

Re: TransportException when storing large values

2011-09-20 Thread Jim Ancona
Pete, See this thread http://groups.google.com/group/hector-users/browse_thread/thread/cb3e72c85dbdd398/82b18ffca0e3940a?#82b18ffca0e3940a for a bit more info. Jim On Tue, Sep 20, 2011 at 9:02 PM, Tyler Hobbs ty...@datastax.com wrote: From cassandra.yaml: # Frame size for thrift (maximum

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-12 Thread Jim Ancona
On Mon, Sep 12, 2011 at 1:44 PM, Peter Schuller peter.schul...@infidyne.com wrote: I am using 0.7.4.  so it is always okay to do the routine repair on Column Family basis? thanks! It's okay but won't do what you want; due to a bug you'll see streaming of data for other column families than

Re: Professional Support

2011-09-06 Thread Jim Ancona
We use Datastax (http://www.datastax.com) and we have been very happy with the support we've received. We haven't tried any of the other providers on that page, so I can't comment on them. Jim (Disclaimer: no connection with Datastax other than as a satisfied customer.) On Tue, Sep 6, 2011 at

Re: Cassandra client loses connectivity to cluster

2011-09-06 Thread Jim Ancona
about averages, but harder to do the same for extremes. Jim On Wed, Jun 29, 2011 at 5:42 PM, Jim Ancona j...@anconafamily.com wrote: In reviewing client logs as part of our Cassandra testing, I noticed several Hector All host pools marked down exceptions in the logs. Further investigation

Re: Updates lost

2011-08-31 Thread Jim Ancona
You could also look at Hector's approach in: https://github.com/rantav/hector/blob/master/core/src/main/java/me/prettyprint/cassandra/service/clock/MicrosecondsSyncClockResolution.java It works well and I believe there was some performance testing done on it as well. Jim On Tue, Aug 30, 2011 at

Re: Damaged commit log disk causes Cassandra client to get stuck

2011-08-02 Thread Jim Ancona
Ideally, I would hope that a bad disk wouldn't hang a node but would instead just cause writes to fail, but if that is not the case, perhaps the bad disk somehow wedged that server node completely so that requests were not being processed at all (maybe not even being timed out). At that point

Re: Damaged commit log disk causes Cassandra client to get stuck

2011-08-02 Thread Jim Ancona
PM, Jim Ancona j...@anconafamily.com wrote: Ideally, I would hope that a bad disk wouldn't hang a node but would instead just cause writes to fail, but if that is not the case, perhaps the bad disk somehow wedged that server node completely so that requests were not being processed at all (maybe

Re: cassandra server disk full

2011-08-02 Thread Jim Ancona
On Mon, Aug 1, 2011 at 6:12 PM, Ryan King r...@twitter.com wrote: On Fri, Jul 29, 2011 at 12:02 PM, Chris Burroughs chris.burrou...@gmail.com wrote: On 07/25/2011 01:53 PM, Ryan King wrote: Actually I was wrong– our patch will disable gosisp and thrift but leave the process running:

Re: Trying to find the problem with a broken pipe

2011-08-02 Thread Jim Ancona
On Tue, Aug 2, 2011 at 4:36 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: I'm not sure if this is a problem with Hector or with Cassandra. We seem to be seeing broken pipe issues with our connections on the client side (Exception below). A bit of googling finds possibly a problem with

Re: Trying to find the problem with a broken pipe

2011-08-02 Thread Jim Ancona
. We plan to deploy Hector 0.7-31 this week and to turn on useSocketKeepalive. Are you using that? We're also using tcpdump to capture packets when failures occur to see if there are anomalies in the network traffic. Jim On Tue, Aug 2, 2011 at 10:37 AM, Jim Ancona j...@anconafamily.com wrote

Re: do I need to add more nodes? minor compaction eat all IO

2011-07-26 Thread Jim Ancona
On Mon, Jul 25, 2011 at 6:41 PM, aaron morton aa...@thelastpickle.com wrote: There are no hard and fast rules to add new nodes, but here are two guidelines: 1) Single node load is getting too high, rule of thumb is 300GB is probably too high. What is that rule of thumb based on? I would

Cassandra client loses connectivity to cluster

2011-06-29 Thread Jim Ancona
In reviewing client logs as part of our Cassandra testing, I noticed several Hector All host pools marked down exceptions in the logs. Further investigation showed a consistent pattern of java.net.SocketException: Broken pipe and java.net.SocketException: Connection reset messages. These errors

UnsupportedOperationException: Index manager cannot support deleting and inserting into a row in the same mutation

2011-06-23 Thread Jim Ancona
Since upgrading to 0.7.6-2 I'm seeing the following exception in our server logs: ERROR [MutationStage:1184874] 2011-06-22 23:59:43,867 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[MutationStage:1184874,5,main] java.lang.UnsupportedOperationException: Index manager

Re: UnsupportedOperationException: Index manager cannot support deleting and inserting into a row in the same mutation

2011-06-23 Thread Jim Ancona
Is there any reason this fix can't be back-ported to 0.7? Jim On Thu, Jun 23, 2011 at 3:00 PM, Jonathan Ellis jbel...@gmail.com wrote: Sorry, 0.8.2 is correct. On Thu, Jun 23, 2011 at 1:36 PM, Les Hazlewood l...@katasoft.com wrote: The issue has the fix version as 0.8.2, not 0.7.7.  Is that

Re: UnsupportedOperationException: Index manager cannot support deleting and inserting into a row in the same mutation

2011-06-23 Thread Jim Ancona
, not in production, but this is not mostly a non-problem here. Jim On Thu, Jun 23, 2011 at 3:25 PM, Jonathan Ellis jbel...@gmail.com wrote: The patch probably applies as-is but I don't want to take any risks with 0.7 to solve what is mostly a non-problem. On Thu, Jun 23, 2011 at 2:16 PM, Jim Ancona j

Re: Could Not connect to cassandra-cli on windows

2010-11-09 Thread Jim Ancona
On Mon, Nov 8, 2010 at 8:31 PM, Alaa Zubaidi alaa.zuba...@pdf.com wrote: Hi, Failing to connect to cassandra client: on windows [defa...@unknown] connect localhost/9160 Exception connecting to localhost/9160. Reason: Connection refused: connect. [defa...@unknown] connect xxx.xxx.x.xx/9160

Any plans to support key metadata?

2010-10-29 Thread Jim Ancona
In 0.7, Cassandra now supports column metadata CfDef.default_validation_class and ColumnDef.validation_class. Is there any plan to provide similar metadata for keys, at the key space or column family level? Jim

Re: Any plans to support key metadata?

2010-10-29 Thread Jim Ancona
On Fri, Oct 29, 2010 at 10:07 AM, Jim Ancona j...@anconafamily.com wrote: In 0.7, Cassandra now supports column metadata CfDef.default_validation_class and ColumnDef.validation_class. Is there any plan to provide similar metadata for keys, at the key space or column family level? Sorry