cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server

2015-01-27 Thread Shenghua(Daniel) Wan
nd I also tried CqlPagingInputFormat, which has same behavior. Thank you. -- Regards, Shenghua (Daniel) Wan

Re: full-tabe scan - extracting all data from C*

2015-01-27 Thread Shenghua(Daniel) Wan
datasets > (terabytes) from C*. Also, if you have used some other techniques, it would > be great if you could share your approach with the group. > > > > Mohammed > > > > -- Regards, Shenghua (Daniel) Wan

Re: full-tabe scan - extracting all data from C*

2015-01-27 Thread Shenghua(Daniel) Wan
or Spark to speed up the > extraction process. > > > > I wanted to do a quick survey and find out how many people on this mailing > list have successfully used approach #1 or #2 for extracting large datasets > (terabytes) from C*. Also, if you have used some other techniques, it would > be great if you could share your approach with the group. > > > > Mohammed > > > > -- Regards, Shenghua (Daniel) Wan

Re: Re: full-tabe scan - extracting all data from C*

2015-01-27 Thread Shenghua(Daniel) Wan
t; } > > For Spark, the code skeleton is: > > val rdd = sc.cassandraTable("ks", "table") > > then call various standard Spark API to process the table parallelly. > > I have not used CqlInputFormat. > > At 2015-01-28 13:38:20, "Shenghua(Daniel

Re: cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server

2015-01-27 Thread Shenghua(Daniel) Wan
7, 2015 at 9:34 PM, Shenghua(Daniel) Wan < > wansheng...@gmail.com> wrote: > >> By default, each C* node is set with 256 tokens. On a local 1-node C* >> server, my hadoop drop creates 256 connections to the server. Is there any >> way to control this behavior? e.g. reduce

Re: cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server

2015-01-27 Thread Shenghua(Daniel) Wan
there is any incorrect reasoning here. Thanks. On Tue, Jan 27, 2015 at 11:21 PM, Huiliang Zhang wrote: > In that case, each node will have 256/3 connections at most. Still 256 > mappers. Someone please correct me if I am wrong. > > On Tue, Jan 27, 2015 at 11:04 PM, Shenghua

Re: cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server

2015-01-27 Thread Shenghua(Daniel) Wan
On Tue, Jan 27, 2015 at 11:04 PM, Shenghua(Daniel) Wan < > wansheng...@gmail.com> wrote: > >> Hi, Huiliang, >> Great to hear from you, again! >> Image you have 3 nodes, replication factor=1, and using default number of >> tokens. You will have 3*256 mappers... In

Re: cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server

2015-01-28 Thread Shenghua(Daniel) Wan
a more reasonable number of connections. > We do this, using code similar to this patch > https://github.com/michaelsembwever/cassandra/pull/2/files > > ~mck > > ยน https://issues.apache.org/jira/browse/CASSANDRA-8358 > -- Regards, Shenghua (Daniel) Wan

Re: cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server

2015-01-28 Thread Shenghua(Daniel) Wan
> virtual nodes. But in your experiment, you saw 3*257 mapper. Is that > because of the setting cassandra.input.split.size=3? It is nothing with > node number=3. Otherwise, I am confused why there are 256 virtual nodes on > every cassandra node. > > On Wed, Jan 28, 2015 at 12:2

when a node is dead in Cassandra cluster

2015-09-21 Thread Shenghua(Daniel) Wan
lost nodes from the clients? Thanks a lot! -- Regards, Shenghua (Daniel) Wan