Hi Stefano,
Thanks for your patience. I decided to construct my own little example
here, and maybe through that we could see what might be different in
your case.
Here is my input file :
hi ted i am here
what is your name
And here is this file after running text2sval.pl
<corpus lang="english">
<lexelt item="LEXELT">
<instance id="0">
<answer instance="0" senseid="NOTAG"/>
<context>
hi ted i am here
</context>
</instance>
<instance id="1">
<answer instance="1" senseid="NOTAG"/>
<context>
what is your name
</context>
</instance>
</lexelt>
</corpus>
Now, I created an rlabel file :
0
1
And a cluster_solutions file :
1
1
When I run all of the above with format_clusters.pl, I get something like this :
format_clusters.pl testfile.cluster_solution testfile.rlabel
--context testfile.sval2
<cluster id="1">
<instance id="0">
<context>
hi ted i am here
</context>
</instance>
<instance id="1">
<context>
what is your name
</context>
</instance>
</cluster>
This shows us that instance 0 and 1 are both found in cluster 1, which
is what I intended.
Now, I am wondering if that points out any differences with what you
did? I will continue to see if I can re-create your error - if so then
I'm confident we can figure this out.
More soon, and let me know if you see anything here that seems
relevant, or if I'm totally off track!
Thanks,
Ted
On Wed, Nov 26, 2014 at 3:19 PM, Stefano Silvestri
<[email protected]> wrote:
> Hi Ted,
> as described in the previous email, I've launched my experiment. As said,
> the final step of my pipeline is the cluster labeling, using Sensclusters.
> I want to remember to you that the system performs an unsupervised relation
> extraction from the entities found in 988 clinical records (the entities
> have been extracted through UMLS databases and we cluster the couples of
> entities).
>
> To integrate Sunslusters cluster_label in our system, I've produced a
> cluto-style output for the clustering results (around 160000 elements) and
> an rlabel file (same number), with the list of all the clustered elements.
> At this point, I have problems in running format_cluster.
>
> To perform the labeling, I need the the format_cluster's output, generated
> with the --context option. So, I've created a senseval-2 file with
> text2sval.pl. The input file of text2sval is a plain text with each whole
> clinical record on each line.
> Naturally, each context contains more than one cluster members.
> I haven't used any optional argument in text2sval.
>
> This output has 988 instance ids. Now, when I try to launch format_cluster,
> I have the following error, occurring during the parse of the senseval file:
> Use of uninitialized value $sentence in pattern match (m//) at
> ../.cpan/build/Text-SenseClusters-1.03-FMoSjn/Toolkit/evaluate/format_clusters.pl
> line 309, <SCON> line 5938. (when it reaches the last line of senseval2
> file).
>
> I'm thinking that the context used are wrong... so my question are:
> 1) do I have to put in the context only the extracted entities or the
> relations?
> 2) Do the contexts must be in the same number of clustered elements?
> 3) If nothing is (theoretically) wrong, what should be the error in the
> sense-eval file?
>
> I'm waiting for your response...
> Thank you for the attention and I hope that you can help us to complete our
> research.
>
>
> 2014-10-23 16:02 GMT+02:00 Stefano Silvestri <[email protected]>:
>>
>> Hi Ted and thanks.
>>
>> The PoS tagging, entity recognition, feature extraction and the clustering
>> tasks have been created with our system (not Senseclusters) - still in
>> developement.
>> Now I'm trying to use the cluster_labeling module of SenseClusters to show
>> that we have found, in a unsupervised approach, the relation between medical
>> entities in the clinical records (i.e. diabetes mellitus <> glycemia) and
>> have, in this way, some labels for the clusters.
>>
>> I'm now writing the code to create the context files and then I'll run the
>> experiments on cluster labeling. I'll let you know in a few days if
>> everything worked well and, in case of a new publication, I'll cite your
>> great work.
>>
>> I'm sure that I will ask some more things in the next days, so I thank you
>> in advance.
>> Stefano Silvestri
>>
>>
>> 2014-10-23 15:07 GMT+02:00 Ted Pedersen <[email protected]>:
>>>
>>> Hi Stefano,
>>>
>>> This sounds like an interesting project, and it's good to know
>>> SenseClusters is proving to be useful. See my responses inline...
>>>
>>> On Wed, Oct 22, 2014 at 5:58 AM, Stefano Silvestri
>>> <[email protected]> wrote:
>>> > I've used a clustering techniques to discover, in an unsupervised way,
>>> > relations between medical entities contained in a large collection of
>>> > anonymized medical records, in a reserch project of University of
>>> > Neaples.
>>> > The data set is composed by a large set of features - all the results
>>> > will
>>> > be shortly published on a journal.
>>> >
>>> > The next step in the development of our system is performing an
>>> > unsupervised
>>> > cluster (relation) labeling. To do that, I think to try the
>>> > clusterlabeling
>>> > module from Senseclusters. For creating the input to clusterlabeling I
>>> > have
>>> > to use format_clusters module with --context option and now I have some
>>> > problems.
>>> >
>>> > I have already produced a cluto-style cluster solution file (no problem
>>> > for
>>> > that) from my system.
>>> >
>>> > The rlabel file, if I'm right, is a file containing the explicit
>>> > corresponding name of each entity in the cluster (in my case the
>>> > relation).
>>> > Is that right?
>>>
>>> Yes, rlabel shows the cluster to which each instance has been assigned.
>>>
>>> >
>>> > And now the problems about the context file...
>>> > It should be in senseval2 format. My experimental assesment is made of
>>> > a
>>> > plain text files - so I should use plain text to headless senseval2
>>> > utility.
>>> >
>>> > I have some questions.
>>> >
>>> > 1) Does the context file have to put together all my input files (the
>>> > medical records) in one large file (and each context must correspond to
>>> > a
>>> > medical record)?
>>>
>>> Yes, the input for each run of SenseClusters should be a single file
>>> with all your contexts included.
>>>
>>> >
>>> > 2) Does the contexts be headless, or I have to tag (<head></head>) all
>>> > the
>>> > entities (medical names) in input?
>>>
>>> Your contexts can be headless, and so there is no need to include
>>> <head> tags in your contexts.
>>>
>>> >
>>> > 3) Are other costrains in the context files (formatting, tags, or
>>> > other)?
>>> >
>>>
>>> There shouldn't be. The output from text2sval.pl should be acceptable
>>> for input "as is".
>>>
>>> > In case of success of the experiments, of course, I'll credit and cite
>>> > the
>>> > Senseclusters project.
>>> >
>>> > PS - my system works on italian language.
>>>
>>> That's great! We'd be happy to answer further questions as they arise,
>>> and will be curious to know how things work out!
>>>
>>> Good luck,
>>> Ted
>>>
>>> >
>>> > Thanks for response,
>>> > Stefano Silvestri,
>>> > NLP researcher at University of Neaples "Federico II"
>>> >
>>> >
>>> > ------------------------------------------------------------------------------
>>> > Comprehensive Server Monitoring with Site24x7.
>>> > Monitor 10 servers for $9/Month.
>>> > Get alerted through email, SMS, voice calls or mobile push
>>> > notifications.
>>> > Take corrective actions from your mobile device.
>>> > http://p.sf.net/sfu/Zoho
>>> > _______________________________________________
>>> > senseclusters-users mailing list
>>> > [email protected]
>>> > https://lists.sourceforge.net/lists/listinfo/senseclusters-users
>>> >
>>>
>>>
>>>
>>> --
>>> Ted Pedersen
>>> http://www.d.umn.edu/~tpederse
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> senseclusters-users mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/senseclusters-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> _______________________________________________
> senseclusters-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/senseclusters-users
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users