Re: Finding the intersection results of column sets of two rows

Aklin_81 Tue, 08 Feb 2011 12:48:10 -0800

Thank you so much Aaron !!


On Wed, Feb 9, 2011 at 2:11 AM, Aaron Morton <aa...@thelastpickle.com> wrote:
> Makes sense, use a get_slice() against the second row and pass in the column 
> names. Should e fine.
>
> If you run into performance issues look at slice_buffer_size and 
> column_index_size in the config.
>
> Aaron
>
>
> On 9/02/2011, at 5:16 AM, Aklin_81 <asdk...@gmail.com> wrote:
>
>> Amongst two rows, where I need to find the common columns. I will not
>> have more than 200 columns(in 99% cases) for the 1st row. But the 2nd
>> row where I need to find these columns may have even around a million
>> valueless columns.
>>
>> A point to note is:- These calculations are all done for **writing the
>> data to the database that has been collected from presentation layer**
>> & not while presentation of data.
>>
>> I am using the results of such intersection to find the rows(that are
>> pointed by names of common columns) that I should write to. The
>> calculations are done after a Post is submitted by a user, in a
>> discussions forum. Actually this is used to find out the mutual
>> connections in a group & write to the rows pointed by common columns.
>> 1st row represents the connection list of a user, which is not going
>> to be more than 100-250 columns for my case & 2nd row represents the
>> members of a group which may contain a million columns as I told.
>> I find the mutual connections in a group(by finding the common columns
>> in the above two rows) and then write to the rows of those users.
>>
>> Cant I run a batch query to ask for all columns that I picked up from
>> 1st row and want to ask in the 2nd row ??
>>
>> Is there any better way ?
>>
>> Asil
>>
>>
>>>
>>> On Feb 7, 2011, at 12:30 AM, Aklin_81 wrote:
>>>
>>>> Thanks Aaron & Shaun,
>>>>
>>>> ******************************
>>>> I think my question might have been unclear to some of you. So I would
>>>> again explain my problem(& solution which I thought of) for the sake
>>>> of clarity:-
>>>>
>>>> Consider I have 2 rows.  1st row contains 60-70 columns and 2nd row
>>>> contains like in hundreds of thousands columns. Both the columns sets
>>>> are all valueless. I need to just findout the **common column names**
>>>> in the two rows. **These two rows are known to me**. So what I plan to
>>>> do is, I just pick up all **columns (names)** of 1st row (60 -70
>>>> columns) and just ask for them in 2nd row, whatever column names I get
>>>> back is my result.
>>>> Would there be any problem with this solution ? This is how I am
>>>> expecting to get common column names.
>>>>
>>>> Please do not consider it as a JOIN case as it leads to unnecessary
>>>> confusions, I just need common column names from valueless columns in
>>>> the two rows.
>>>>
>>>> ********************************
>>>>
>>>> Aaron, actually the intersection data is very much context based. So
>>>> say if there are 10 million rows in CF A & 1 million in CF B, then
>>>> intersection data would be containing 10 million *1 million rows. This
>>>> would involve very huge & unaffordable amounts of denormalization.
>>>> And finding columns in client would require pulling unnecessary
>>>> columns like pulling 100,000 columns from a row of which only 60-70
>>>> are required .
>>>>
>>>> Shaun, I hope my above clarification has clarified things a bit. Yes,
>>>> the rows, of which I need to find common columns are known to me.
>>>>
>>>>
>>>> Thank you all,
>>>> Asil
>>>>
>>>>
>>>> On Mon, Feb 7, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>>>> In theory, you should be able to do joins by creating an extra column in 
>>>>> one column family, holding the "foreign key" of the matching row in the 
>>>>> other family.
>>>>>
>>>>> This assumes that the info you are joining on is available in both CFs 
>>>>> (is not some sort of functional transformation).
>>>>>
>>>>> I have just found that the implementation for secondary indexes is not 
>>>>> yet very close to optimal for more complex "joins" involving multiple 
>>>>> indexes, I'm not sure if that affects you as you didn't say what you are 
>>>>> joining on.
>>>>>
>>>>> -- Shaun
>>>>>
>>>>>
>>>>> On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote:
>>>>>
>>>>>> Is it possible for you to dernormalise and write all the intersection 
>>>>>> values? Will depend on how many I guess.
>>>>>>
>>>>>> The other alternative is to pull back more data that you need and the 
>>>>>> intersection in code in the client.
>>>>>>
>>>>>>
>>>>>> Hope that helps.
>>>>>> Aaron
>>>>>> On 7/02/2011, at 7:11 AM, Aklin_81 <asdk...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> @buddhasystem : yes that's well known solution. But obviously when
>>>>>>> mysql couldnt satisfy my needs, I am here. My question is in context
>>>>>>> of Cassandra, if it possible to achieve intersection result set of
>>>>>>> columns in two rows, by the way I spoke about.
>>>>>>>
>>>>>>> @Edward: yes that I know but how does that fit here for obtaining the
>>>>>>> common columns among two rows.
>>>>>>>
>>>>>>> Thanks for your comments..
>>>>>>>
>>>>>>> -Asil
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <edlinuxg...@gmail.com> 
>>>>>>> wrote:
>>>>>>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <potek...@bnl.gov> wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> If the amount of data is _that_ small, you'll have a much easier life 
>>>>>>>>> with
>>>>>>>>> MySQL, which supports the "join" procedure -- because that's exactly 
>>>>>>>>> what
>>>>>>>>> you want to achieve.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> asil klin wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I want to procure the intersection of columns set of two rows (from 2
>>>>>>>>>> different column families).
>>>>>>>>>>
>>>>>>>>>> To achieve the intersection results, Can I, first retrieve all
>>>>>>>>>> columns(around 300) from first row and just query by those column
>>>>>>>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>>>>>>>>
>>>>>>>>>> I am using the results during the write time & not before 
>>>>>>>>>> presentation
>>>>>>>>>> to the user, so latency wont be much concern while writing.
>>>>>>>>>>
>>>>>>>>>> Is it the proper way to procure intersection results of two rows ?
>>>>>>>>>>
>>>>>>>>>> Would love to hear your comments..
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Asil
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context: 
>>>>>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>>>>>>>>> Sent from the cassandra-u...@incubator.apache.org mailing list 
>>>>>>>>> archive at Nabble.com.
>>>>>>>>>
>>>>>>>>
>>>>>>>> You can use multi-get when fetching lists of already know keys
>>>>>>>> optimize your round rip time.
>>>>>>>>
>>>>>
>>>>>
>>>
>>>
>

Re: Finding the intersection results of column sets of two rows

Reply via email to