Hi Sai, As you know Ubuntu is a little different in that it encourages the use of sudo rather than having a true root account that you access with su. However, I think that can cause problems like you are describing below, because you are running sudo as your regular id, so I think CPAN is looking for a directory under your own user id rather than the root account, and that is causing some trouble.
So, I'd suggest that you create a true root account and try the perl -MCPAN command again... You can find some info on how to do that here I think (as well as some explanation of why they prefer sudo, but why that can also be a problem). https://help.ubuntu.com/community/RootSudo I hope this helps, let us know how things go! Cordially, Ted On Jan 18, 2008 6:19 AM, Sai Tang Huang <[EMAIL PROTECTED]> wrote: > Hi there, > > I'm trying to install PDL using the command "perl -MCPAN -e -shell" but with > a "sudo" before the command. With "sudo" or without it, the terminal tells > me: > > Your configuration suggests that CPAN.pm should use a working directory of > /home/sai/.cpan > Unfortunately we could not create the lock file /home/sai/.cpan/.lock due to > permission problems. > > So what do I do? I am the only person using my new ubuntu! > > It also suggests that I changes the Config.pm file of CPAN to set the > "cpan_home" to point to a directory where I can write a .lock file. > Which directory would satisfy this? > > Im trying to install PDL in the easiest way. Your dependencies file (for > SenseClusters) tells me that I would not want to fiddle with a local > installation of PDL. > > regards, > > Sai > > -------------------------------------------------- > From: "Ted Pedersen" <[EMAIL PROTECTED]> > Sent: Thursday, January 17, 2008 1:23 PM > > To: "Sai Tang Huang" <[EMAIL PROTECTED]> > Subject: Re: SenseCluster question > > > Hi Sai, > > > > Generally speaking we do this quite a bit with Perl - Perl is very > > good at manipulating text and doing this sort of reformatting, and I > > think you'll find it much easier than Java (for this task at least). > > How you actually do this depends on the data you have, you might want > > to look at our preprocess.pl code in SenseClusters to get some idea of > > what we've done. Books like Learning Perl and Programming Perl are > > quite helpful in figuring out things like this too. > > > > I wish I could be more specific, but this tends to vary case by case. > > > > Good luck, > > Ted > > > > On Jan 16, 2008 7:10 PM, Sai Tang Huang <[EMAIL PROTECTED]> > > wrote: > >> Hiya again Ted, > >> > >> I'm just about to start using SenseCluster however I'm having a big > >> problem > >> formatting the source of my corpus. I was given many small text files > >> containing article introductions, summaries. I have to put them all into > >> one > >> file and give each sentence a line right? So there's a lot of issues with > >> this. I tried making a parser in Java to clean the text. For instance, > >> every > >> time there's a full stop and a space followed by a capital letter, I > >> consider it to be a sentence boundary so I put a "\n" in between. It > >> works > >> well for the sentences but each time there is a "Dr. Something" or "Mr. > >> Something" it also puts the "\n"! Apart from this, what is the general > >> convention for cleaning up text to make corpora? Do I get rid of all the > >> brackets and quotes as well? What about numeric data like "19,000,000" > >> and > >> dates? > >> > >> Have you encountered such problem before? Do you know if there's any > >> software out there I could use to simply clean up all this mess? > >> > >> Thanks again for the help, > >> > >> Sai > >> > >> -------------------------------------------------- > >> From: "Ted Pedersen" <[EMAIL PROTECTED]> > >> Sent: Monday, January 14, 2008 10:14 PM > >> To: "Sai Tang Huang" <[EMAIL PROTECTED]> > >> > >> Subject: Re: SenseCluster question > >> > >> > Hi Sai, > >> > > >> > Yes, you are correct, Cluto requires that the input be in vector or > >> > matrix form, and that is exactly what SenseClusters lets you create. > >> > And in fact, SenseClusters will allow you to cluster sentences, as > >> > long as each sentence is represented as a "context". You would need to > >> > get your data into the Senseval2 format, but that is pretty simple. If > >> > you have each sentence on a single line in a file, then you can use > >> > the > >> > > >> > plain2sval2.pl > >> > > >> > utility to convert your "plain" text to Senseval-2 format. Then it can > >> > be input to SenseClusters, where each sentence will be treated as a > >> > "headless" context (meaning there is no target word embedded in the > >> > sentence that you are trying to discriminate, which you won't have). > >> > You should be able to do a few preliminary experiments very easily on > >> > the Web interface to SenseClusters (once you have the data converted > >> > to Senseval 2 format). It's not that hard to use the command line > >> > either, but sometimes the Web interface is helpful when you first get > >> > started and aren't familiar with all the options. > >> > > >> > You can find the web interface here: > >> > > >> > http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi > >> > > >> > Do let me know if you have additional questions. This does sound like > >> > something you should be able to do with SenseClusters without too much > >> > difficulty. > >> > > >> > Good luck! > >> > Ted > >> > > >> > On Jan 14, 2008 12:38 PM, Sai Tang Huang <[EMAIL PROTECTED]> > >> > wrote: > >> >> Hi again Ted, > >> >> > >> >> Thanks a lot for your quick and detailed reply. > >> >> > >> >> I should have outlined my project more clearly in my last email. > >> >> > >> >> Basically I have a corpus from which a Language Model will be created. > >> >> I'd > >> >> like to reduce the size of the corpus before computing the LM. My > >> >> intention > >> >> is to cluster all sentences of the corpus so that the most similar > >> >> sentences > >> >> will be in the same cluster. Then I will select the most > >> >> representative > >> >> sentences and use those for the final cluster. Sounds like crazy I > >> >> know! > >> >> This project is all about how I cluster the sentences, and how I > >> >> select > >> >> the > >> >> clustered sentences for the final corpus. > >> >> > >> >> The problem with CLUTO is that it needs either a similarity matrix or > >> >> a > >> >> vector value table. And so far I have only got a corpus! So I guess I > >> >> could > >> >> make use of SenseCluster to find semantic features and use those as > >> >> high-dimensional vectors to then use the vcluster program in CLUTO. > >> >> When > >> >> the > >> >> clustering is done, I will use my own methods for selecting the > >> >> sentences. > >> >> Does this sound like it's gonna work? > >> >> > >> >> Regards, > >> >> > >> >> Sai > >> >> > >> >> > >> >> -------------------------------------------------- > >> >> From: "Ted Pedersen" <[EMAIL PROTECTED]> > >> >> Sent: Sunday, January 13, 2008 6:41 PM > >> >> To: "Sai Tang Huang" <[EMAIL PROTECTED]> > >> >> Cc: <[email protected]> > >> >> Subject: Re: SenseCluster question > >> >> > >> >> > >> >> > Hi Sai, > >> >> > > >> >> > The clustering in SenseClusters is performed by Cluto, which > >> >> > includes > >> >> > quite a few clustering algorithms, and has even more similarity > >> >> > measures. I'd actually sort of discourage you from trying to invent > >> >> > your own similarity measure since there are so many and it's not > >> >> > clear > >> >> > that you'd be able to do anything substantially different or better > >> >> > than what already exists. In terms of manipulating cluster size, I'm > >> >> > not sure about that. I think that's something you probably want to > >> >> > do > >> >> > indirectly through your choice of clustering algorithm, criterion > >> >> > function, and similarity measure (rather than saying I want clusters > >> >> > of 100 items each, which seems like it might be cheating a little :) > >> >> > > >> >> > Speaking of cheating, one of the things Cluto requires is that you > >> >> > specify the number of clusters ahead of time. For many problems that > >> >> > is cheating, since you don't always know that a priori. > >> >> > SenseClusters > >> >> > actually adds cluster stopping measures to Cluto, and predicts the > >> >> > number of clusters automatically for you, which we think is a > >> >> > substantial improvement. In general that's the idea of > >> >> > SenseClusters, > >> >> > to provide support for all the things that must occur before and > >> >> > after > >> >> > the actual clustering operation. > >> >> > > >> >> > So, I'd suggest you check out Cluto a bit more. If that looks like > >> >> > it > >> >> > provides the functionality you need, I think SenseClusters can add > >> >> > quite a bit to that which will help in many sorts of text clustering > >> >> > applications. > >> >> > > >> >> > More about Cluto here: > >> >> > http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview > >> >> > > >> >> > And of course SenseClusters is here: > >> >> > http://senseclusters.sourceforge.net > >> >> > > >> >> > Cordially, > >> >> > Ted > >> >> > > >> >> > On Jan 13, 2008 11:17 AM, Sai Tang Huang > >> >> > <[EMAIL PROTECTED]> > >> >> > wrote: > >> >> >> > >> >> >> > >> >> >> Hi there, > >> >> >> > >> >> >> My name is Sai Tang Huang and I'm a student at the University of > >> >> >> Brighton > >> >> >> (UK). I am doing a project that has got to do with text corpus > >> >> >> clustering > >> >> >> (to eventually achieve corpus reduction in the context of SMT > >> >> >> Language > >> >> >> Models). I need to find a clustering package that: > >> >> >> > >> >> >> - includes several algorithms > >> >> >> - allows parameters such as cluster size to be manipulated > >> >> >> - allows me to plug in my own similarity measures > >> >> >> > >> >> >> Having read your slides and seen your video on "Language > >> >> >> independent > >> >> >> methods > >> >> >> of clustering similar contexts" (long and interesting), I was > >> >> >> wondering > >> >> >> if > >> >> >> SenseCluster would be able to do this. If not, are you aware of any > >> >> >> other > >> >> >> open source software that would acomplish this? > >> >> >> > >> >> >> Thanks a million for you time. > >> >> >> > >> >> >> Sai > >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Ted Pedersen > >> >> > http://www.d.umn.edu/~tpederse > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > Ted Pedersen > >> > http://www.d.umn.edu/~tpederse > >> > > >> > > > > > > > > -- > > Ted Pedersen > > http://www.d.umn.edu/~tpederse > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
