
Nobody has mentioned but you can use spark cassandra connector also.
Preferably if your data set is so big that a simple copy to csv cannot
handle it


Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Fri, Jan 17, 2020 at 8:11 PM Durity, Sean R <sean_r_dur...@homedepot.com>

> sstablekeys (in the tools directory?) can extract the actual keys from
> your sstables. You have to run it on each node and then combine and de-dupe
> the final results, but I have used this technique with a query generator to
> extract data more efficiently.
> Sean Durity
> *From:* Chris Splinter <chris.splinter...@gmail.com>
> *Sent:* Friday, January 17, 2020 1:47 PM
> *To:* adrien ruffie <adriennolar...@hotmail.fr>
> *Cc:* user@cassandra.apache.org; Erick Ramirez <flightc...@gmail.com>
> *Subject:* [EXTERNAL] Re: COPY command with where condition
> Do you know your partition keys?
> One option could be to enumerate that list of partition keys in separate
> cmds to make the individual operations less expensive for the cluster.
> For example:
> Say your partition key column is called id and the ids in your database
> are [1,2,3]
> You could do
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 1 AND localisation_id = 208812" -url
> /home/dump
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 2 AND localisation_id = 208812" -url
> /home/dump
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 3 AND localisation_id = 208812" -url
> /home/dump
> Does that option work for you?
> On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie <adriennolar...@hotmail.fr>
> wrote:
> I don't really know for the moment in production environment, but for
> developpment environment the table contains more than 10.000.000 rows.
> But we need just a sub dataset of this table not the entirety ...
> ------------------------------
> *De :* Chris Splinter <chris.splinter...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 17:40
> *À :* adrien ruffie <adriennolar...@hotmail.fr>
> *Cc :* user@cassandra.apache.org <user@cassandra.apache.org>; Erick
> Ramirez <flightc...@gmail.com>
> *Objet :* Re: COPY command with where condition
> What you are seeing there is a standard read timeout, how many rows do you
> expect back from that query?
> On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie <adriennolar...@hotmail.fr>
> wrote:
> Thank you very much,
>  so I do this request with for example -->
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url
> /home/dump
> But I get the following error
> com.datastax.dsbulk.executor.api.exception.BulkExecutionException:
> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id =
> 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency
> LOCAL_ONE (1 responses were required but only 0 replica responded))
> but I configured my driver with following driver.conf, but nothing work
> correctly. Do you know what is the problem ?
> datastax-java-driver {
>     basic {
>         contact-points = ["data1com:9042","data2.com:9042 [data2.com]
> <https://urldefense.com/v3/__http:/data2.com:9042__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKH7jCV5U$>
> "]
>         request {
>             timeout = "2000000"
>             consistency = "LOCAL_ONE"
>         }
>     }
>     advanced {
>         auth-provider {
>             class = PlainTextAuthProvider
>             username = "superuser"
>             password = "mypass"
>         }
>     }
> }
> ------------------------------
> *De :* Chris Splinter <chris.splinter...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 16:17
> *À :* user@cassandra.apache.org <user@cassandra.apache.org>
> *Cc :* Erick Ramirez <flightc...@gmail.com>
> *Objet :* Re: COPY command with where condition
> DSBulk has an option that lets you specify the query ( including a WHERE
> clause )
> See Example 19 in this blog post for details: 
> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
> [datastax.com]
> <https://urldefense.com/v3/__https:/www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKBUuw2Cc$>
> On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
> On 17 Jan 2020, at 14:30 , adrien ruffie <adriennolar...@hotmail.fr>
> wrote:
> Sorry I come back to a quick question about the bulk loader ...
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
> [datastax.com]
> <https://urldefense.com/v3/__https:/www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKLr1rFjk$>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
> ------------------------------
> *De :* adrien ruffie <adriennolar...@hotmail.fr>
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez <flightc...@gmail.com>; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
> best regards,
> Adrian
> ------------------------------
> *De :* Erick Ramirez <flightc...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org <user@cassandra.apache.org>
> *Objet :* Re: COPY command with where condition
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with DataStax Enterprise but a few weeks ago, it was made free and
> works with open-source Apache Cassandra. For details, see this blogpost
> [datastax.com]
> <https://urldefense.com/v3/__https:/www.datastax.com/blog/2019/12/tools-for-apache-cassandra__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKg1mXfCU$>.
> Cheers!
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <adriennolar...@hotmail.fr>
> wrote:
> Hello all,
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
> Because we need to export only several data which must be return by a
> WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
> Do you know a means to do that please ?
> Thank all and best regards
> Adrian
> ------------------------------
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.

Reply via email to