Re: 2d or multi dimension range query in cassandra CQL

Mehak Mehta Mon, 23 Mar 2015 10:08:23 -0700

Hi,

I checked Startio Cassandra but couldn't get any good documentation for the
same.
Can you give me some pointers on how to use it.
Do I have to build it from the source or I can use it directly with jar
files as in case of Stargate.
Since I was looking for solution which I don't need a full build and can be
used with existing tar of cassandra because I have some restrictions on
installing stuff on my server.


Thanks,
Mehak

On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK <asitkaushikno...@gmail.com>
wrote:

> i am using Startio Cassandra it way better than stargate as it works on
> the latest release of Cassandra and is better on my performance.
>
> we are using it for fulltext search use case
>
> Regards
> Asit
>
> On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta <meme...@cs.stonybrook.edu>
> wrote:
>
>> Hi,
>>
>> On the basis of some suggestions, I tried using tuplejump for 
>> multidimensional
>> queries. Since other mostly needed root permissions (for building ) which I
>> don't have on my cluster account.
>>
>> I found a major problem in tuplejump (stargate-core). When I am using it
>> with a list type field in my table. It stops working.
>> For e.g.
>>
>> create table person (
>>     id int primary key,
>>     isActive boolean,
>>     age int,
>>     eyeColor varchar,
>>     name text,
>>     gender varchar,
>>     company varchar,
>>     email varchar,
>>     phone varchar,
>>     address text,
>>     points list<double>,
>>     stargate text
>> );
>>
>> with indexing as:
>> CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
>> 'com.tuplejump.stargate.RowIndex' WITH options =
>> {
>>         'sg_options':'{
>>                 "fields":{
>>                 "eyeColor":{},
>>                         "age":{},
>>                         "phone":{}
>>                         }
>>         }'
>> };
>>
>> If I insert data in the table along with points list. The following query
>> won't give any results (0 rows):
>>
>> SELECT * FROM RESULTS1 WHERE stargate ='{
>>     filter: {
>>         type: "range",
>>         field: "x",
>>         lower: 0
>>     }
>> }';
>>
>> I tried removing points list<double> from the table and it works i.e.
>> same query will return results.
>> Can somebody help me with this problem as I couldn't find much support
>> from Stargate.
>>
>> Please note that I am using Cassandra 2.0.9 compatible with Stargate-core
>> as given in link (
>> http://stargate-core.readthedocs.org/en/latest/quickstart.html).
>>
>> Thanks,
>> Mehak
>>
>>
>> On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña <adelap...@stratio.com
>> > wrote:
>>
>>> Hi,
>>>
>>> With Stratio Cassandra you can create Lucene based indexes for
>>> multidimensional queries this way:
>>>
>>> ALTER TABLE images.results1 ADD lucene text ;
>>>
>>> CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
>>> USING 'com.stratio.cassandra.index.RowIndex'
>>> WITH OPTIONS = {
>>>  'refresh_seconds':'1',
>>>  'schema':'{
>>>   fields:{
>>>   image_caseid:{type:"string"},
>>> x:{type:"double"},
>>> y:{type:"double"} } } '};
>>>
>>> Then you can perform the query using the dummy column:
>>>
>>> SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean",
>>> must:[
>>> {field:"image_caseid", type:"match", value:"mehak"},
>>> {field:"x", type:"range", lower:100},
>>> {field:"y", type:"range", lower:100}
>>> ]}}';
>>>
>>> However, you can take advantage of partition key to route the query only
>>> to the nodes owning the data:
>>>
>>> SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
>>> filter:{type:"boolean", must:[
>>> {field:"x", type:"range", lower:100},
>>> {field:"y", type:"range", lower:100}
>>> ]}}';
>>>
>>> Or, even better:
>>>
>>> SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND
>>> lucene='{ filter:{field:"y", type:"range", lower:100}}';
>>>
>>> Additionally, if your data are geospatial (latitude and longitude), soon
>>> you will can use the incoming spatial features.
>>>
>>>
>>>
>>> 2015-03-17 23:01 GMT+01:00 Mehak Mehta <meme...@cs.stonybrook.edu>:
>>>
>>>> Sorry I gave you wrong table definition for query. Here a composite key
>>>> of image_caseid, x and uuid which is unique. I have used x in clustering
>>>> columns to query it. And used secondary index on y column.
>>>>
>>>> 1. Example
>>>> *cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
>>>> analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
>>>> double, submit_date timestamp, points list<double>,  PRIMARY KEY
>>>> ((image_caseid),x,uuid));*
>>>> *cqlsh:images> create index results1_y on results1(y);*
>>>>
>>>> In the below query you can see I have image_caseid as primary key which
>>>> is filtered. Even then it is giving error that "*No indexed columns
>>>> present*"
>>>>
>>>> *cqlsh:images> select * from results1 where image_caseid='mehak' and x
>>>> > 100 and y<100 order by image_caseid asc;*
>>>> *code=2200 [Invalid query] message="No indexed columns present in
>>>> by-columns clause with Equal operator"*
>>>>
>>>> 2. Example
>>>> I also tried including both x and y columns as composite key even then
>>>> query gives following error:
>>>>
>>>> *cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
>>>> analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
>>>> double, submit_date timestamp, points list<double>,  PRIMARY KEY
>>>> ((image_caseid),x,y,uuid));*
>>>>
>>>> *cqlsh:images> select * from results1 where image_caseid='mehak' and x
>>>> > 100 and y<100 order by image_caseid asc;*
>>>> *code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be
>>>> restricted (preceding column "ColumnDefinition{name=x,
>>>> type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
>>>> componentIndex=0, indexName=null, indexType=null}" is either not restricted
>>>> or by a non-EQ relation)"*
>>>>
>>>> Thanks,
>>>> Mehak
>>>>
>>>>
>>>> On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <
>>>> jack.krupan...@gmail.com> wrote:
>>>>
>>>>> Yeah, you may have to add a dummy column populated with a constant, or
>>>>> just pick some other additional column to index that you can apply = to.
>>>>> It's just a requirement to assure that a filtered select isn't too-too 
>>>>> slow.
>>>>>
>>>>> Uh.... if you are applying = to your primary key then there is no need
>>>>> for any filtering. What did you really mean to do? It makes no sense the
>>>>> way you have it!
>>>>>
>>>>> Either go with DSE Search/Solr, or google "Tuplejump Stargate" or
>>>>> "Stratio".
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <
>>>>> meme...@cs.stonybrook.edu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> In my case I am just trying to do range queries on 2 dimensions i.e.
>>>>>> x and y.
>>>>>>
>>>>>> *cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
>>>>>> analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
>>>>>> double, submit_date timestamp, points list<double>,  PRIMARY KEY
>>>>>> (image_caseid));*
>>>>>>
>>>>>> With secondary indexes on x and y:
>>>>>> *cqlsh:images> create index results1_y on results1(y);*
>>>>>> *cqlsh:images> create index results1_x on results1(x);*
>>>>>>
>>>>>> But when I try the following the query it is not allowed with an
>>>>>> error.
>>>>>>
>>>>>> *cqlsh:images> select * from results1 where image_caseid='mehak' and
>>>>>> x > 100 and y <100 allow filtering;*
>>>>>> *code=2200 [Invalid query] message="No indexed columns present in
>>>>>> by-columns clause with Equal operator"*
>>>>>>
>>>>>> Can you tell me why this is not working. I can't understand why
>>>>>> Cassandra is not allowing me to use Non EQ range query on secondary 
>>>>>> indexes
>>>>>> even when I have restricted my primary key by EQ as specified in error
>>>>>> message.
>>>>>>
>>>>>> Also please elaborate on how to use Lucene/Solr in this case with
>>>>>> Cassandra.
>>>>>>
>>>>>> Thanks,
>>>>>> Mehak
>>>>>>
>>>>>> On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <g...@beecloud.cn> wrote:
>>>>>>
>>>>>>> Lucene or Solr is better. I’m using lucene
>>>>>>>
>>>>>>> 在 2015年3月18日，上午2:11，Jack Krupansky <jack.krupan...@gmail.com> 写道：
>>>>>>>
>>>>>>> 1. Create multiple secondary indexes, one for each non-key column
>>>>>>> you need to index on. Not recommended. Considered an anti-pattern for
>>>>>>> Cassandra.
>>>>>>> 2. Use DSE Search/Solr.
>>>>>>> 3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
>>>>>>>
>>>>>>> -- Jack Krupansky
>>>>>>>
>>>>>>> On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <
>>>>>>> meme...@cs.stonybrook.edu> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I want to perform range queries (as in x and y ranges) on a large
>>>>>>>> data billions of rows.
>>>>>>>> CQL allows me to put Non EQ restrictions on only one of the
>>>>>>>> clustering columns.
>>>>>>>> Its not allowing me to filter the data using any other column even
>>>>>>>> with use of Allow Filtering option.
>>>>>>>>
>>>>>>>> cqlsh:images> *select * from results1 where image_caseid='mehak'
>>>>>>>> and x > 100 and y <100 order by x allow filtering;*
>>>>>>>> *code=2200 [Invalid query] message="No indexed columns present in
>>>>>>>> by-columns clause with Equal operator"*
>>>>>>>>
>>>>>>>> My table definition is :
>>>>>>>>
>>>>>>>> *CREATE TABLE images.results1 (uuid uuid, analysis_execution_id
>>>>>>>> varchar, analysis_execution_uuid uuid, x  double, y double, loc 
>>>>>>>> varchar, w
>>>>>>>> double, h double, normalized varchar, type varchar, filehost varchar,
>>>>>>>> filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
>>>>>>>> varchar,
>>>>>>>> image_mpp_x double, image_mpp_y double, image_width double, 
>>>>>>>> image_height
>>>>>>>> double, objective double, cancer_type varchar,  Area float, submit_date
>>>>>>>> timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));*
>>>>>>>>
>>>>>>>> Kindly suggest me how to model is data and perform 2d or in general
>>>>>>>> multi dimension range queries in Cassandra efficiently.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mehak
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Andrés de la Peña
>>>
>>>
>>> <http://www.stratio.com/>
>>> Avenida de Europa, 26. Ática 5. 3ª Planta
>>> 28224 Pozuelo de Alarcón, Madrid
>>> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>>>
>>
>>
>

Re: 2d or multi dimension range query in cassandra CQL

Reply via email to