Re: [Senseclusters-users] SenseCluster question (Ubuntu install)

Ted Pedersen Fri, 18 Jan 2008 09:06:30 -0800

Hi Sai,

As you know Ubuntu is a little different in that it encourages the use
of sudo rather than having a true root account that you access with
su. However, I think that can cause problems like you are describing
below, because you are running sudo as your regular id, so I think
CPAN is looking for a directory under your own user id rather than the
root account, and that is causing some trouble.


So, I'd suggest that you create a true root account and try the perl
-MCPAN command again...

You can find some info on how to do that here I think (as well as some
explanation of why they prefer sudo, but why that can also be a
problem).

https://help.ubuntu.com/community/RootSudo

I hope this helps, let us know how things go!

Cordially,
Ted


On Jan 18, 2008 6:19 AM, Sai Tang Huang <[EMAIL PROTECTED]> wrote:
> Hi there,
>
> I'm trying to install PDL using the command "perl -MCPAN -e -shell" but with
> a "sudo" before the command. With "sudo" or without it, the terminal tells
> me:
>
> Your configuration suggests that CPAN.pm should use a working directory of
> /home/sai/.cpan
> Unfortunately we could not create the lock file /home/sai/.cpan/.lock due to
> permission problems.
>
> So what do I do? I am the only person using my new ubuntu!
>
> It also suggests that I changes the Config.pm file of CPAN to set the
> "cpan_home" to point to a directory where I can write a .lock file.
> Which directory would satisfy this?
>
> Im trying to install PDL in the easiest way.  Your dependencies file (for
> SenseClusters) tells me that I would not want to fiddle with a local
> installation of PDL.
>
> regards,
>
> Sai
>
> --------------------------------------------------
> From: "Ted Pedersen" <[EMAIL PROTECTED]>
> Sent: Thursday, January 17, 2008 1:23 PM
>
> To: "Sai Tang Huang" <[EMAIL PROTECTED]>
> Subject: Re: SenseCluster question
>
> > Hi Sai,
> >
> > Generally speaking we do this quite a bit with Perl - Perl is very
> > good at manipulating text and doing this sort of reformatting, and I
> > think you'll find it much easier than Java (for this task at least).
> > How you actually do this depends on the data you have, you might want
> > to look at our preprocess.pl code in SenseClusters to get some idea of
> > what we've done. Books like Learning Perl and Programming Perl are
> > quite helpful in figuring out things like this too.
> >
> > I wish I could be more specific, but this tends to vary case by case.
> >
> > Good luck,
> > Ted
> >
> > On Jan 16, 2008 7:10 PM, Sai Tang Huang <[EMAIL PROTECTED]>
> > wrote:
> >> Hiya again Ted,
> >>
> >> I'm just about to start using SenseCluster however I'm having a big
> >> problem
> >> formatting the source of my corpus. I was given many small text files
> >> containing article introductions, summaries. I have to put them all into
> >> one
> >> file and give each sentence a line right? So there's a lot of issues with
> >> this. I tried making a parser in Java to clean the text. For instance,
> >> every
> >> time there's a full stop and a space followed by a capital letter, I
> >> consider it to be a sentence boundary so I put a "\n" in between. It
> >> works
> >> well for the sentences but each time there is a "Dr. Something" or "Mr.
> >> Something" it also puts the "\n"! Apart from this, what is the general
> >> convention for cleaning up text to make corpora? Do I get rid of all the
> >> brackets and quotes as well? What about numeric data like "19,000,000"
> >> and
> >> dates?
> >>
> >> Have you encountered such problem before? Do you know if there's any
> >> software out there I could use to simply clean up all this mess?
> >>
> >> Thanks again for the help,
> >>
> >> Sai
> >>
> >> --------------------------------------------------
> >> From: "Ted Pedersen" <[EMAIL PROTECTED]>
> >> Sent: Monday, January 14, 2008 10:14 PM
> >> To: "Sai Tang Huang" <[EMAIL PROTECTED]>
> >>
> >> Subject: Re: SenseCluster question
> >>
> >> > Hi Sai,
> >> >
> >> > Yes, you are correct, Cluto requires that the input be in vector or
> >> > matrix form, and that is exactly what SenseClusters lets you create.
> >> > And in fact, SenseClusters will allow you to cluster sentences, as
> >> > long as each sentence is represented as a "context". You would need to
> >> > get your data into the Senseval2 format, but that is pretty simple. If
> >> > you have each sentence on a single line in a file, then you can use
> >> > the
> >> >
> >> > plain2sval2.pl
> >> >
> >> > utility to convert your "plain" text to Senseval-2 format. Then it can
> >> > be input to SenseClusters, where each sentence will be treated as a
> >> > "headless" context (meaning there is no target word embedded in the
> >> > sentence that you are trying to discriminate, which you won't have).
> >> > You should be able to do a few preliminary experiments very easily on
> >> > the Web interface to SenseClusters (once you have the data converted
> >> > to Senseval 2 format). It's not that hard to use the command line
> >> > either, but sometimes the Web interface is helpful when you first get
> >> > started and aren't familiar with all the options.
> >> >
> >> > You can find the web interface here:
> >> >
> >> > http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi
> >> >
> >> > Do let me know if you have additional questions. This does sound like
> >> > something you should be able to do with SenseClusters without too much
> >> > difficulty.
> >> >
> >> > Good luck!
> >> > Ted
> >> >
> >> > On Jan 14, 2008 12:38 PM, Sai Tang Huang <[EMAIL PROTECTED]>
> >> > wrote:
> >> >> Hi again Ted,
> >> >>
> >> >> Thanks a lot for your quick and detailed reply.
> >> >>
> >> >> I should have outlined my project more clearly in my last email.
> >> >>
> >> >> Basically I have a corpus from which a Language Model will be created.
> >> >> I'd
> >> >> like to reduce the size of the corpus before computing the LM. My
> >> >> intention
> >> >> is to cluster all sentences of the corpus so that the most similar
> >> >> sentences
> >> >> will be in the same cluster. Then I will select the most
> >> >> representative
> >> >> sentences and use those for the final cluster. Sounds like crazy I
> >> >> know!
> >> >> This project is all about how I cluster the sentences, and how I
> >> >> select
> >> >> the
> >> >> clustered sentences for the final corpus.
> >> >>
> >> >> The problem with CLUTO is that it needs either a similarity matrix or
> >> >> a
> >> >> vector value table. And so far I have only got a corpus! So I guess  I
> >> >> could
> >> >> make use of SenseCluster to find semantic features and use those as
> >> >> high-dimensional vectors to then use the vcluster program in CLUTO.
> >> >> When
> >> >> the
> >> >> clustering is done, I will use my own methods for selecting the
> >> >> sentences.
> >> >> Does this sound like it's gonna work?
> >> >>
> >> >> Regards,
> >> >>
> >> >> Sai
> >> >>
> >> >>
> >> >> --------------------------------------------------
> >> >> From: "Ted Pedersen" <[EMAIL PROTECTED]>
> >> >> Sent: Sunday, January 13, 2008 6:41 PM
> >> >> To: "Sai Tang Huang" <[EMAIL PROTECTED]>
> >> >> Cc: <[email protected]>
> >> >> Subject: Re: SenseCluster question
> >> >>
> >> >>
> >> >> > Hi Sai,
> >> >> >
> >> >> > The clustering in SenseClusters is performed by Cluto, which
> >> >> > includes
> >> >> > quite a few clustering algorithms, and has even more similarity
> >> >> > measures. I'd actually sort of discourage you from trying to invent
> >> >> > your own similarity measure since there are so many and it's not
> >> >> > clear
> >> >> > that you'd be able to do anything substantially different or better
> >> >> > than what already exists. In terms of manipulating cluster size, I'm
> >> >> > not sure about that. I think that's something you probably want to
> >> >> > do
> >> >> > indirectly through your choice of clustering algorithm, criterion
> >> >> > function, and similarity measure (rather than saying I want clusters
> >> >> > of 100 items each, which seems like it might be cheating a little :)
> >> >> >
> >> >> > Speaking of cheating, one of the things Cluto requires is that you
> >> >> > specify the number of clusters ahead of time. For many problems that
> >> >> > is cheating, since you don't always know that a priori.
> >> >> > SenseClusters
> >> >> > actually adds cluster stopping measures to Cluto, and predicts the
> >> >> > number of clusters automatically for you, which we think is a
> >> >> > substantial improvement. In general that's the idea of
> >> >> > SenseClusters,
> >> >> > to provide support for all the things that must occur before and
> >> >> > after
> >> >> > the actual clustering operation.
> >> >> >
> >> >> > So, I'd suggest you check out Cluto a bit more. If that looks like
> >> >> > it
> >> >> > provides the functionality you need, I think SenseClusters can add
> >> >> > quite a bit to that which will help in many sorts of text clustering
> >> >> > applications.
> >> >> >
> >> >> > More about Cluto here:
> >> >> > http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview
> >> >> >
> >> >> > And of course SenseClusters is here:
> >> >> > http://senseclusters.sourceforge.net
> >> >> >
> >> >> > Cordially,
> >> >> > Ted
> >> >> >
> >> >> > On Jan 13, 2008 11:17 AM, Sai Tang Huang
> >> >> > <[EMAIL PROTECTED]>
> >> >> > wrote:
> >> >> >>
> >> >> >>
> >> >> >> Hi there,
> >> >> >>
> >> >> >> My name is Sai Tang Huang and I'm a student at the University of
> >> >> >> Brighton
> >> >> >> (UK). I am doing a project that has got to do with text corpus
> >> >> >> clustering
> >> >> >> (to eventually achieve corpus reduction in the context of SMT
> >> >> >> Language
> >> >> >> Models). I need to find a clustering package that:
> >> >> >>
> >> >> >> - includes several algorithms
> >> >> >> - allows parameters such as cluster size to be manipulated
> >> >> >> - allows me to plug in my own similarity measures
> >> >> >>
> >> >> >> Having read your slides and seen your video on "Language
> >> >> >> independent
> >> >> >> methods
> >> >> >> of clustering similar contexts" (long and interesting), I was
> >> >> >> wondering
> >> >> >> if
> >> >> >> SenseCluster would be able to do this. If not, are you aware of any
> >> >> >> other
> >> >> >> open source software that would acomplish this?
> >> >> >>
> >> >> >> Thanks a million for you time.
> >> >> >>
> >> >> >> Sai
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Ted Pedersen
> >> >> > http://www.d.umn.edu/~tpederse
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Ted Pedersen
> >> > http://www.d.umn.edu/~tpederse
> >> >
> >>
> >
> >
> >
> > --
> > Ted Pedersen
> > http://www.d.umn.edu/~tpederse
> >
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Re: [Senseclusters-users] SenseCluster question (Ubuntu install)

Reply via email to