Thank you so much Aaron !!
On Wed, Feb 9, 2011 at 2:11 AM, Aaron Morton <aa...@thelastpickle.com> wrote: > Makes sense, use a get_slice() against the second row and pass in the column > names. Should e fine. > > If you run into performance issues look at slice_buffer_size and > column_index_size in the config. > > Aaron > > > On 9/02/2011, at 5:16 AM, Aklin_81 <asdk...@gmail.com> wrote: > >> Amongst two rows, where I need to find the common columns. I will not >> have more than 200 columns(in 99% cases) for the 1st row. But the 2nd >> row where I need to find these columns may have even around a million >> valueless columns. >> >> A point to note is:- These calculations are all done for **writing the >> data to the database that has been collected from presentation layer** >> & not while presentation of data. >> >> I am using the results of such intersection to find the rows(that are >> pointed by names of common columns) that I should write to. The >> calculations are done after a Post is submitted by a user, in a >> discussions forum. Actually this is used to find out the mutual >> connections in a group & write to the rows pointed by common columns. >> 1st row represents the connection list of a user, which is not going >> to be more than 100-250 columns for my case & 2nd row represents the >> members of a group which may contain a million columns as I told. >> I find the mutual connections in a group(by finding the common columns >> in the above two rows) and then write to the rows of those users. >> >> Cant I run a batch query to ask for all columns that I picked up from >> 1st row and want to ask in the 2nd row ?? >> >> Is there any better way ? >> >> Asil >> >> >>> >>> On Feb 7, 2011, at 12:30 AM, Aklin_81 wrote: >>> >>>> Thanks Aaron & Shaun, >>>> >>>> ****************************** >>>> I think my question might have been unclear to some of you. So I would >>>> again explain my problem(& solution which I thought of) for the sake >>>> of clarity:- >>>> >>>> Consider I have 2 rows. 1st row contains 60-70 columns and 2nd row >>>> contains like in hundreds of thousands columns. Both the columns sets >>>> are all valueless. I need to just findout the **common column names** >>>> in the two rows. **These two rows are known to me**. So what I plan to >>>> do is, I just pick up all **columns (names)** of 1st row (60 -70 >>>> columns) and just ask for them in 2nd row, whatever column names I get >>>> back is my result. >>>> Would there be any problem with this solution ? This is how I am >>>> expecting to get common column names. >>>> >>>> Please do not consider it as a JOIN case as it leads to unnecessary >>>> confusions, I just need common column names from valueless columns in >>>> the two rows. >>>> >>>> ******************************** >>>> >>>> Aaron, actually the intersection data is very much context based. So >>>> say if there are 10 million rows in CF A & 1 million in CF B, then >>>> intersection data would be containing 10 million *1 million rows. This >>>> would involve very huge & unaffordable amounts of denormalization. >>>> And finding columns in client would require pulling unnecessary >>>> columns like pulling 100,000 columns from a row of which only 60-70 >>>> are required . >>>> >>>> Shaun, I hope my above clarification has clarified things a bit. Yes, >>>> the rows, of which I need to find common columns are known to me. >>>> >>>> >>>> Thank you all, >>>> Asil >>>> >>>> >>>> On Mon, Feb 7, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote: >>>>> In theory, you should be able to do joins by creating an extra column in >>>>> one column family, holding the "foreign key" of the matching row in the >>>>> other family. >>>>> >>>>> This assumes that the info you are joining on is available in both CFs >>>>> (is not some sort of functional transformation). >>>>> >>>>> I have just found that the implementation for secondary indexes is not >>>>> yet very close to optimal for more complex "joins" involving multiple >>>>> indexes, I'm not sure if that affects you as you didn't say what you are >>>>> joining on. >>>>> >>>>> -- Shaun >>>>> >>>>> >>>>> On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote: >>>>> >>>>>> Is it possible for you to dernormalise and write all the intersection >>>>>> values? Will depend on how many I guess. >>>>>> >>>>>> The other alternative is to pull back more data that you need and the >>>>>> intersection in code in the client. >>>>>> >>>>>> >>>>>> Hope that helps. >>>>>> Aaron >>>>>> On 7/02/2011, at 7:11 AM, Aklin_81 <asdk...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> @buddhasystem : yes that's well known solution. But obviously when >>>>>>> mysql couldnt satisfy my needs, I am here. My question is in context >>>>>>> of Cassandra, if it possible to achieve intersection result set of >>>>>>> columns in two rows, by the way I spoke about. >>>>>>> >>>>>>> @Edward: yes that I know but how does that fit here for obtaining the >>>>>>> common columns among two rows. >>>>>>> >>>>>>> Thanks for your comments.. >>>>>>> >>>>>>> -Asil >>>>>>> >>>>>>> >>>>>>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <edlinuxg...@gmail.com> >>>>>>> wrote: >>>>>>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <potek...@bnl.gov> wrote: >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> If the amount of data is _that_ small, you'll have a much easier life >>>>>>>>> with >>>>>>>>> MySQL, which supports the "join" procedure -- because that's exactly >>>>>>>>> what >>>>>>>>> you want to achieve. >>>>>>>>> >>>>>>>>> >>>>>>>>> asil klin wrote: >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I want to procure the intersection of columns set of two rows (from 2 >>>>>>>>>> different column families). >>>>>>>>>> >>>>>>>>>> To achieve the intersection results, Can I, first retrieve all >>>>>>>>>> columns(around 300) from first row and just query by those column >>>>>>>>>> names in the second row(which contains maximum 100 000 columns) ? >>>>>>>>>> >>>>>>>>>> I am using the results during the write time & not before >>>>>>>>>> presentation >>>>>>>>>> to the user, so latency wont be much concern while writing. >>>>>>>>>> >>>>>>>>>> Is it the proper way to procure intersection results of two rows ? >>>>>>>>>> >>>>>>>>>> Would love to hear your comments.. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------- >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Asil >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> View this message in context: >>>>>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html >>>>>>>>> Sent from the cassandra-u...@incubator.apache.org mailing list >>>>>>>>> archive at Nabble.com. >>>>>>>>> >>>>>>>> >>>>>>>> You can use multi-get when fetching lists of already know keys >>>>>>>> optimize your round rip time. >>>>>>>> >>>>> >>>>> >>> >>> >