My apologies for the very delayed response - you caught us right before the start of the Thanksgiving Holidays here in the USA - we are back to work now, and so I'll take a look at this today.
On Wed, Nov 26, 2014 at 3:19 PM, Stefano Silvestri <[email protected]> wrote: > Hi Ted, > as described in the previous email, I've launched my experiment. As said, > the final step of my pipeline is the cluster labeling, using Sensclusters. > I want to remember to you that the system performs an unsupervised relation > extraction from the entities found in 988 clinical records (the entities > have been extracted through UMLS databases and we cluster the couples of > entities). > > To integrate Sunslusters cluster_label in our system, I've produced a > cluto-style output for the clustering results (around 160000 elements) and > an rlabel file (same number), with the list of all the clustered elements. > At this point, I have problems in running format_cluster. > > To perform the labeling, I need the the format_cluster's output, generated > with the --context option. So, I've created a senseval-2 file with > text2sval.pl. The input file of text2sval is a plain text with each whole > clinical record on each line. > Naturally, each context contains more than one cluster members. > I haven't used any optional argument in text2sval. > > This output has 988 instance ids. Now, when I try to launch format_cluster, > I have the following error, occurring during the parse of the senseval file: > Use of uninitialized value $sentence in pattern match (m//) at > ../.cpan/build/Text-SenseClusters-1.03-FMoSjn/Toolkit/evaluate/format_clusters.pl > line 309, <SCON> line 5938. (when it reaches the last line of senseval2 > file). > > I'm thinking that the context used are wrong... so my question are: > 1) do I have to put in the context only the extracted entities or the > relations? > 2) Do the contexts must be in the same number of clustered elements? > 3) If nothing is (theoretically) wrong, what should be the error in the > sense-eval file? > > I'm waiting for your response... > Thank you for the attention and I hope that you can help us to complete our > research. > > > 2014-10-23 16:02 GMT+02:00 Stefano Silvestri <[email protected]>: >> >> Hi Ted and thanks. >> >> The PoS tagging, entity recognition, feature extraction and the clustering >> tasks have been created with our system (not Senseclusters) - still in >> developement. >> Now I'm trying to use the cluster_labeling module of SenseClusters to show >> that we have found, in a unsupervised approach, the relation between medical >> entities in the clinical records (i.e. diabetes mellitus <> glycemia) and >> have, in this way, some labels for the clusters. >> >> I'm now writing the code to create the context files and then I'll run the >> experiments on cluster labeling. I'll let you know in a few days if >> everything worked well and, in case of a new publication, I'll cite your >> great work. >> >> I'm sure that I will ask some more things in the next days, so I thank you >> in advance. >> Stefano Silvestri >> >> >> 2014-10-23 15:07 GMT+02:00 Ted Pedersen <[email protected]>: >>> >>> Hi Stefano, >>> >>> This sounds like an interesting project, and it's good to know >>> SenseClusters is proving to be useful. See my responses inline... >>> >>> On Wed, Oct 22, 2014 at 5:58 AM, Stefano Silvestri >>> <[email protected]> wrote: >>> > I've used a clustering techniques to discover, in an unsupervised way, >>> > relations between medical entities contained in a large collection of >>> > anonymized medical records, in a reserch project of University of >>> > Neaples. >>> > The data set is composed by a large set of features - all the results >>> > will >>> > be shortly published on a journal. >>> > >>> > The next step in the development of our system is performing an >>> > unsupervised >>> > cluster (relation) labeling. To do that, I think to try the >>> > clusterlabeling >>> > module from Senseclusters. For creating the input to clusterlabeling I >>> > have >>> > to use format_clusters module with --context option and now I have some >>> > problems. >>> > >>> > I have already produced a cluto-style cluster solution file (no problem >>> > for >>> > that) from my system. >>> > >>> > The rlabel file, if I'm right, is a file containing the explicit >>> > corresponding name of each entity in the cluster (in my case the >>> > relation). >>> > Is that right? >>> >>> Yes, rlabel shows the cluster to which each instance has been assigned. >>> >>> > >>> > And now the problems about the context file... >>> > It should be in senseval2 format. My experimental assesment is made of >>> > a >>> > plain text files - so I should use plain text to headless senseval2 >>> > utility. >>> > >>> > I have some questions. >>> > >>> > 1) Does the context file have to put together all my input files (the >>> > medical records) in one large file (and each context must correspond to >>> > a >>> > medical record)? >>> >>> Yes, the input for each run of SenseClusters should be a single file >>> with all your contexts included. >>> >>> > >>> > 2) Does the contexts be headless, or I have to tag (<head></head>) all >>> > the >>> > entities (medical names) in input? >>> >>> Your contexts can be headless, and so there is no need to include >>> <head> tags in your contexts. >>> >>> > >>> > 3) Are other costrains in the context files (formatting, tags, or >>> > other)? >>> > >>> >>> There shouldn't be. The output from text2sval.pl should be acceptable >>> for input "as is". >>> >>> > In case of success of the experiments, of course, I'll credit and cite >>> > the >>> > Senseclusters project. >>> > >>> > PS - my system works on italian language. >>> >>> That's great! We'd be happy to answer further questions as they arise, >>> and will be curious to know how things work out! >>> >>> Good luck, >>> Ted >>> >>> > >>> > Thanks for response, >>> > Stefano Silvestri, >>> > NLP researcher at University of Neaples "Federico II" >>> > >>> > >>> > ------------------------------------------------------------------------------ >>> > Comprehensive Server Monitoring with Site24x7. >>> > Monitor 10 servers for $9/Month. >>> > Get alerted through email, SMS, voice calls or mobile push >>> > notifications. >>> > Take corrective actions from your mobile device. >>> > http://p.sf.net/sfu/Zoho >>> > _______________________________________________ >>> > senseclusters-users mailing list >>> > [email protected] >>> > https://lists.sourceforge.net/lists/listinfo/senseclusters-users >>> > >>> >>> >>> >>> -- >>> Ted Pedersen >>> http://www.d.umn.edu/~tpederse >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> senseclusters-users mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/senseclusters-users >> >> > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk > _______________________________________________ > senseclusters-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/senseclusters-users > ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
