RE: Accessing Cassandra data from Spark Shell

2016-05-18 Thread Mohammed Guller
en.sla...@instaclustr.com] Sent: Tuesday, May 17, 2016 11:00 PM To: user@cassandra.apache.org; Mohammed Guller Cc: user Subject: Re: Accessing Cassandra data from Spark Shell It definitely should be possible for 1.5.2 (I have used it with spark-shell and cassandra connector with 1.4.x). The main

RE: Accessing Cassandra data from Spark Shell

2016-05-10 Thread Mohammed Guller
Yes, it is very simple to access Cassandra data using Spark shell. Step 1: Launch the spark-shell with the spark-cassandra-connector package $SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0 Step 2: Create a DataFrame pointing to your Cassandra table

RE: reducing disk space consumption

2016-02-10 Thread Mohammed Guller
If I remember it correctly, C* creates a snapshot when you drop a keyspace. Run the following command to get rid of the snapshot: nodetool clearsnapshot Mohammed Author: Big Data Analytics with Spark From: Ted Yu

RE: Cassandra Summit 2015 Roll Call!

2015-09-22 Thread Mohammed Guller
Hey everyone, I will be at the summit too on Wed and Thu. I am giving a talk on Thursday at 2.40pm. Would love to meet everyone on this list in person. Here is an old picture of mine: https://events.mfactormeetings.com/accounts/register123/mfactor/datastax/events/dstaxsummit2015/guller.jpg

RE: Code review - Spark SQL command-line client for Cassandra

2015-06-19 Thread Mohammed Guller
Hi Matthew, It looks fine to me. I have built a similar service that allows a user to submit a query from a browser and returns the result in JSON format. Another alternative is to leave a Spark shell or one of the notebooks (Spark Notebook, Zeppelin, etc.) session open and run queries from

RE: Lucene index plugin for Apache Cassandra

2015-06-12 Thread Mohammed Guller
The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed From: Andres de la Peña [mailto:adelap...@stratio.com] Sent: Friday, June 12, 2015 3:43 AM To: user@cassandra.apache.org Subject: Re: Lucene index plugin for Apache Cassandra

RE: Cassandra 2.2, 3.0, and beyond

2015-06-11 Thread Mohammed Guller
Considering that 2.1.6 was just released and it is the first “stable” release ready for production in the 2.1 series, won’t it be too soon to EOL 2.1.x when 3.0 comes out in September? Mohammed From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, June 11, 2015 10:14 AM To: user

RE: Cassandra 2.2, 3.0, and beyond

2015-06-11 Thread Mohammed Guller
in 2.2).. so 2.2.x and 2.1.x are somewhat synonymous. On Jun 11, 2015, at 8:14 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Considering that 2.1.6 was just released and it is the first “stable” release ready for production in the 2.1 series, won’t it be too soon

RE: Spark SQL JDBC Server + DSE

2015-06-01 Thread Mohammed Guller
the intended recipient is strictly prohibited. From: Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, May 29, 2015 at 2:15 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user

RE: Spark SQL JDBC Server + DSE

2015-05-29 Thread Mohammed Guller
by persons or entities other than the intended recipient is strictly prohibited. From: Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Thursday, May 28, 2015 at 8:26 PM To: user

RE: Spark SQL JDBC Server + DSE

2015-05-28 Thread Mohammed Guller
Anybody out there using DSE + Spark SQL JDBC server? Mohammed From: Mohammed Guller [mailto:moham...@glassbeam.com] Sent: Tuesday, May 26, 2015 6:17 PM To: user@cassandra.apache.org Subject: Spark SQL JDBC Server + DSE Hi - As I understand, the Spark SQL Thrift/JDBC server cannot be used

Spark SQL JDBC Server + DSE

2015-05-26 Thread Mohammed Guller
Hi - As I understand, the Spark SQL Thrift/JDBC server cannot be used with the open source C*. Only DSE supports the Spark SQL JDBC server. We would like to find out whether how many organizations are using this combination. If you do use DSE + Spark SQL JDBC server, it would be great if you

Spark SQL Thrift JDBC/ODBC server + Cassandra

2015-04-07 Thread Mohammed Guller
Hi - Is anybody using Cassandra with the Spark SQL Thrift JDBC/ODBC server? I can programmatically (within our app) use Spark SQL with C* using the Spark-Cassandra-Connector, but can't find any documentation on how to query C* through the Spark SQL Thrift JDBC/ODBC server. Would appreciate if

RE: Data tiered compaction and data model question

2015-02-19 Thread Mohammed Guller
on event size. On Thu, Feb 19, 2015 at 12:00 AM, cass savy casss...@gmail.commailto:casss...@gmail.com wrote: 10-20 per minute is the average. Worstcase can be 10x of avg. On Wed, Feb 18, 2015 at 4:49 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: What

RE: Data tiered compaction and data model question

2015-02-18 Thread Mohammed Guller
What is the maximum number of events that you expect in a day? What is the worst-case scenario? Mohammed From: cass savy [mailto:casss...@gmail.com] Sent: Wednesday, February 18, 2015 4:21 PM To: user@cassandra.apache.org Subject: Data tiered compaction and data model question We want to track

RE: Smart column searching for a particular rowKey

2015-02-03 Thread Mohammed Guller
Astyanax allows you to execute CQL statements. I don’t remember the details, but it is there. One tip – when you create the column family, use CLUSTERING ORDER WITH (timestamp DESC). Then you query becomes straightforward and C* will do all the heavy lifting for you. Mohammed From: Ravi

RE: Tombstone gc after gc grace seconds

2015-01-29 Thread Mohammed Guller
Ravi – It may help. What version are you running? Do you know if minor compaction is getting triggered at all? One way to check would be see how many sstables the data directory has. Mohammed From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com] Sent: Thursday, January 29, 2015 1:29 PM To:

RE: Controlling the MAX SIZE of sstables after compaction

2015-01-27 Thread Mohammed Guller
I believe Aegisthus is open sourced. Mohammed From: Jan [mailto:cne...@yahoo.com] Sent: Monday, January 26, 2015 11:20 AM To: user@cassandra.apache.org Subject: Re: Controlling the MAX SIZE of sstables after compaction Parth et al; the folks at Netflix seem to have built a solution for your

full-tabe scan - extracting all data from C*

2015-01-27 Thread Mohammed Guller
Hi - Over the last few weeks, I have seen several emails on this mailing list from people trying to extract all data from C*, so that they can import that data into other analytical tools that provide much richer analytics functionality than C*. Extracting all data from C* is a full-table

RE: Re:full-tabe scan - extracting all data from C*

2015-01-27 Thread Mohammed Guller
sc.cassandraTable() work well. I use both of them frequently. At 2015-01-28 04:06:20, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Hi - Over the last few weeks, I have seen several emails on this mailing list from people trying to extract all data from C*, so that they can

RE: Retrieving all row keys of a CF

2015-01-23 Thread Mohammed Guller
row keys of a CF In each partition cql rows on average is 200K. Max is 3M. 800K is number of cassandra partitions. From: Mohammed Guller [mailto:moham...@glassbeam.com] Sent: Thursday, January 22, 2015 7:43 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Retrieving

RE: Retrieving all row keys of a CF

2015-01-22 Thread Mohammed Guller
checkpoints, but some other data store (maybe just a flatfile). Again, this is just to give you a sense of what's involved. On Fri, Jan 16, 2015 at 6:31 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Both total system memory and heap size can’t be 8GB? The timeout

RE: sharding vs what cassandra does

2015-01-19 Thread Mohammed Guller
Partitioning is similar to sharding. Mohammed From: Adaryl Bob Wakefield, MBA [mailto:adaryl.wakefi...@hotmail.com] Sent: Monday, January 19, 2015 8:28 PM To: user@cassandra.apache.org Subject: sharding vs what cassandra does It’s my understanding that the way Cassandra replicates data across

RE: Retrieving all row keys of a CF

2015-01-16 Thread Mohammed Guller
Ruchir, I am curious if you had better luck with the AllRowsReader recipe. Mohammed From: Eric Stevens [mailto:migh...@gmail.com] Sent: Friday, January 16, 2015 12:33 PM To: user@cassandra.apache.org Subject: Re: Retrieving all row keys of a CF Note that getAllRows() is deprecated in Astyanax

RE: Retrieving all row keys of a CF

2015-01-16 Thread Mohammed Guller
A few questions: 1) What is the heap size and total memory on each node? 2) How big is the cluster? 3) What are the read and range timeouts (in cassandra.yaml) on the C* nodes? 4) What are the timeouts for the Astyanax client? 5) Do you see GC pressure on the C*

RE: Retrieving all row keys of a CF

2015-01-16 Thread Mohammed Guller
gen and old gen take? occurs every 5 secs dont see huge gc pressure, 50ms 6)Does any node crash with OOM error when you try AllRowsReader? No From: Mohammed Guller [mailto:moham...@glassbeam.com] Sent: Friday, January 16, 2015 7:30 PM To: user@cassandra.apache.orgmailto:user

RE: C* throws OOM error despite use of automatic paging

2015-01-12 Thread Mohammed Guller
it as 'not happening'. What is heap usage when you start? Are you storing your data on EBS? What kind of write throughput do you have going on at the same time? What errors do you have in the cassandra logs before this crashes? On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller moham

Re: C* throws OOM error despite use of automatic paging

2015-01-12 Thread Mohammed Guller
generation on crash in cassandra-env.sh just uncomment JVM_OPTS=$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError and then run your query again. The heapdump will have the answer. On Tue, Jan 13, 2015 at 10:54 AM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: The heap

RE: C* throws OOM error despite use of automatic paging

2015-01-10 Thread Mohammed Guller
: Friday, January 9, 2015 4:02 AM To: user@cassandra.apache.org Subject: Re: C* throws OOM error despite use of automatic paging Hi Mohammed, Zitat von Mohammed Guller moham...@glassbeam.com: Hi - We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores

RE: C* throws OOM error despite use of automatic paging

2015-01-10 Thread Mohammed Guller
is the data size of the column family you're trying to fetch with paging ? Are you storing big blob or just primitive values ? On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Hi – We have an ETL application that reads all rows from Cassandra

C* throws OOM error despite use of automatic paging

2015-01-08 Thread Mohammed Guller
Hi - We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our application is using Datastax's Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver supports automatic paging, I was under the impression

batch_size_warn_threshold_in_kb

2014-12-11 Thread Mohammed Guller
Hi - The cassandra.yaml file has property called batch_size_warn_threshold_in_kb. The default size is 5kb and according to the comments in the yaml file, it is used to log WARN on any batch size exceeding this value in kilobytes. It says caution should be taken on increasing the size of this

RE: batch_size_warn_threshold_in_kb

2014-12-11 Thread Mohammed Guller
as a performance optimization, this helps flag those cases of misuse. On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Hi – The cassandra.yaml file has property called batch_size_warn_threshold_in_kb. The default size is 5kb

querying data from Cassandra through the Spark SQL Thrift JDBC server

2014-11-19 Thread Mohammed Guller
Hi - I was curious if anyone is using the Spark SQL Thrift JDBC server with Cassandra. It would be great be if you could share how you got it working? For example, what config changes have to be done in hive-site.xml, what additional jars are required, etc.? I have a Spark app that can

RE: What will be system configuration for retrieving few GB of data

2014-10-17 Thread Mohammed Guller
With 8GB RAM, the default heap size is 2GB, so you will quickly start running out of heap space if you do large reads. What is a large read? It depends on the number of columns in each row and data in each column. It could 100,000 rows for some and 300,000 for others. In addition, remember that

RE: no change observed in read latency after switching from EBS to SSD storage

2014-09-18 Thread Mohammed Guller
partition max size from output of nodetool cfstats), may be worth including g to break it up more - but I dont know enough about your data model. --- Chris Lohfink On Sep 17, 2014, at 4:53 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Thank you all for your

RE: no change observed in read latency after switching from EBS to SSD storage

2014-09-18 Thread Mohammed Guller
more queries at once than there are cores (in general Cassandra is not designed to serve workloads consisting of single large queries, at least not yet) On Thu, Sep 18, 2014 at 7:29 AM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Chris, I agree that reading 250k

RE: no change observed in read latency after switching from EBS to SSD storage

2014-09-17 Thread Mohammed Guller
of instance based SSD storage. If you're using EBS SSD drives then network will still be the slowest thing so switching won't likely make much of a difference. On Wed, Sep 17, 2014 at 6:00 AM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Rob, The 10 seconds latency that I

no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Mohammed Guller
Hi - We are running Cassandra 2.0.5 on AWS on m3.large instances. These instances were using EBS for storage (I know it is not recommended). We replaced the EBS storage with SSDs. However, we didn't see any change in read latency. A query that took 10 seconds when data was stored on EBS still

RE: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Mohammed Guller
...@eventbrite.com] Sent: Tuesday, September 16, 2014 5:42 PM To: user@cassandra.apache.org Subject: Re: no change observed in read latency after switching from EBS to SSD storage On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Does anyone have

RE: Number of columns per row for composite columns?

2014-08-13 Thread Mohammed Guller
4 Mohammed From: hlqv [mailto:hlqvu...@gmail.com] Sent: Tuesday, August 12, 2014 11:44 PM To: user@cassandra.apache.org Subject: Re: Number of columns per row for composite columns? For more specifically, I declared a column family create column family Column_Family with

RE: select many rows one time or select many times?

2014-08-01 Thread Mohammed Guller
Did you benchmark these two options: 1) Select with IN 2) Select all words and filter in application Mohammed From: Philo Yang [mailto:ud1...@gmail.com] Sent: Thursday, July 31, 2014 10:45 AM To: user@cassandra.apache.org Subject: select many rows one time or select many times? Hi

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-30 Thread Mohammed Guller
- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 26/06/2013, at 3:57 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Replication is 3 and read consistency level is one. One of the non-cordinator mode

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-25 Thread Mohammed Guller
is the replication in your keyspace and what consistency you are reading with. Also 55MB on disk will not mean 55MB in memory. The data is compressed on disk and also there are other overheads. On Mon, Jun 24, 2013 at 8:38 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-24 Thread Mohammed Guller
advice from other members of the mailing list. Thanks Jabbar Azam On 21 June 2013 18:49, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any

Cassandra terminates with OutOfMemory (OOM) error

2013-06-21 Thread Mohammed Guller
We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data