Thank you for the reply. At Ted: if we are talking in the sense of a
millions of users there will be a millions of cluster , do this clustering
be feasible.


On Fri, Jun 14, 2013 at 1:27 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

>  [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (ted.dunn...@gmail.com) Add cleanup 
> rule<https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DUidJJCGBsgyNZdyg04nO%252BNvneXvGTMg50O7JlejSkQAOd91gWBRYa6rImYY52P8PP2QNQf2o6SMVPDmLny0W8ELvhxQpm7qeEdJw16b0QIVsH6MiPq6MiWqm4aWRqUNMYY3hHYtjfotF2DiEYRkFXQ%253D%253D%26key%3D5eB29OSPchFmdH044S6TA0UftcTY%252FTd7ebJrWBroYQA%253D&tc_serial=14371471469&tc_rand=1859759626&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>|
>  More
> info<http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=14371471469&tc_rand=1859759626&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
> Thanks Grant.  Exactly correct.
>
> Some pig or hive action is indicated here.  Or write a map-reduce where the
> reducer does the vector generation.
>
>
>
> On Thu, Jun 13, 2013 at 7:13 PM, Grant Ingersoll <gsing...@apache.org
> >wrote:
>
> > I think Ted was implying just write a script to aggregate the Movielens
> > data by user id.  Should be pretty straightforward.
> >
> > On Jun 13, 2013, at 10:05 AM, Neetha <netasu...@gmail.com> wrote:
> >
> > > Thank you, for the reply. How can we group the user.
> > >
> > >
> > > On Thu, Jun 13, 2013 at 3:41 PM, Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> > >
> > >> [image: Boxbe] <https://www.boxbe.com/overview> This message is
> > eligible
> > >> for Automatic Cleanup! (ted.dunn...@gmail.com) Add cleanup rule<
> >
> https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DGYex%252FPN%252FsEWDwuSs%252F9AS43g45aYbNc1OMuaZA7xu3TRldhNItvxAspHuwKeaedBKYvZ5Ah5DVIK7%252F%252B0qQSbX3CvYa7lvPle4%252BTdcv5k4cI%252BL4yoMK8by1Rm7UhZnW7TcvFw%252FeqoeYWXhz%252BgDPSUIWA%253D%253D%26key%3D0Lbb2Ob2N7oax0oxeBQTRLmrOCps42qosLO9Gh82kvs%253D&tc_serial=14367563490&tc_rand=1983549237&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001
> >|
> > More
> > >> info<
> >
> http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=14367563490&tc_rand=1983549237&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001
> > >
> > >>
> > >> You need to group by user before converting to vector to get sensible
> > >> clustering.
> > >>
> > >>
> > >> On Wed, Jun 12, 2013 at 1:06 PM, Grant Ingersoll <gsing...@apache.org
> > >>> wrote:
> > >>
> > >>> The CSVVectorIterator in the Integration package will take in a CSV
> > file
> > >>> and produce vectors.  It assumes that each row is the equivalent of a
> > >>> DenseVector (does MovieLens fit that?)  If you need otherwise, I'd
> > >> suggest
> > >>> starting with the code and modifying to fit your needs.
> > >>>
> > >>>
> > >>> -Grant
> > >>>
> > >>> On Jun 12, 2013, at 6:11 AM, Neetha <netasu...@gmail.com> wrote:
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>>
> > >>>> I am using 1m movielens.
> > >>>>
> > >>>> I need to run the K-means clustering using mahout and hadoop.
> > Actually,
> > >>>> 1st step in the clustering is to convert into a sequence file, then
> > >> into
> > >>>> vector format and then apply the clustering algorithm. My doubt is,
> Is
> > >>>> there any need to convert the movielens rating.csv file into a
> > sequence
> > >>>> file. If needed what are the commands for applying clustering
> > technique
> > >>>> using mahout and the hadoop.
> > >>>>
> > >>>> Thanking you,
> > >>>> Neetha Suan Thampi
> > >>>
> > >>> --------------------------------------------
> > >>> Grant Ingersoll | @gsingers
> > >>> http://www.lucidworks.com
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> >
> > --------------------------------------------
> > Grant Ingersoll | @gsingers
> > http://www.lucidworks.com
> >
> >
> >
> >
> >
> >
>
>

Reply via email to