Hi, here are some snippets of code in scala which should get you started.
Jirka H. loop {lastRow =>val query = lastRow match {case Some(row) => nextPageQuery(row, upperLimit)case None => initialQuery(lowerLimit)}session.execute(query).all} private def nextPageQuery(row: Row, upperLimit: String): String = {val tokenPart = "token(%s) > token(0x%s) and token(%s) < %s".format(rowKeyName, hex(row.getBytes(rowKeyName)), rowKeyName, upperLimit)basicQuery.format(tokenPart)} private def initialQuery(lowerLimit: String): String = {val tokenPart = "token(%s) >= %s".format(rowKeyName, lowerLimit)basicQuery.format(tokenPart)}private def calculateRanges: (BigDecimal, BigDecimal, IndexedSeq[(BigDecimal, BigDecimal)]) = {tokenRange match {case Some((start, end)) =>Logger.info("Token range given: {}", "<" + start.underlying.toPlainString + ", " + end.underlying.toPlainString + ">")val tokenSpaceSize = end - startval rangeSize = tokenSpaceSize / concurrencyval ranges = for (i <- 0 until concurrency) yield (start + (i * rangeSize), start + ((i + 1) * rangeSize))(tokenSpaceSize, rangeSize, ranges)case None =>val tokenSpaceSize = partitioner.max - partitioner.minval rangeSize = tokenSpaceSize / concurrencyval ranges = for (i <- 0 until concurrency) yield (partitioner.min + (i * rangeSize), partitioner.min + ((i + 1) * rangeSize))(tokenSpaceSize, rangeSize, ranges)}} private val basicQuery = {"select %s, %s, %s, writetime(%s) from %s where %s%s limit %d%s".format(rowKeyName,columnKeyName,columnValueName,columnValueName,columnFamily,"%s", // templatewhereCondition,pageSize,if (cqlAllowFiltering) " allow filtering" else "")} case object Murmur3 extends Partitioner {override val min = BigDecimal(-2).pow(63)override val max = BigDecimal(2).pow(63) - 1}case object Random extends Partitioner {override val min = BigDecimal(0)override val max = BigDecimal(2).pow(127) - 1} On 02/11/2015 02:21 PM, Ja Sam wrote: > Your answer looks very promising > > How do you calculate start and stop? > > On Wed, Feb 11, 2015 at 12:09 PM, Jiri Horky <ho...@avast.com > <mailto:ho...@avast.com>> wrote: > > The fastest way I am aware of is to do the queries in parallel to > multiple cassandra nodes and make sure that you only ask them for keys > they are responsible for. Otherwise, the node needs to resend your > query > which is much slower and creates unnecessary objects (and thus GC > pressure). > > You can manually take advantage of the token range information, if the > driver does not get this into account for you. Then, you can play with > concurrency and batch size of a single query against one node. > Basically, what you/driver should do is to transform the query to > series > of "SELECT * FROM TABLE WHERE TOKEN IN (start, stop)". > > I will need to look up the actual code, but the idea should be > clear :) > > Jirka H. > > > On 02/11/2015 11:26 AM, Ja Sam wrote: > > Is there a simple way (or even a complicated one) how can I speed up > > SELECT * FROM [table] query? > > I need to get all rows form one table every day. I split tables, and > > create one for each day, but still query is quite slow (200 millions > > of records) > > > > I was thinking about run this query in parallel, but I don't know if > > it is possible > >