[Corpora-List]Call for Participation: Shared Task on "Survey Variable Identification in Social Science Publications"

2022-06-20 Thread Simone Paolo Ponzetto
Dear colleagues,

we hope that this is interesting for some of you working at the intersection of 
NLP, CSS and Scholarly Data Mining — we just released the training data and 
look forward to your participation, thanks!

Best - Simone

_

Call for participation:  Shared task on "Survey Variable Identification in 
Social Science Publications” to be held as part of the evaluation campaign of 
the third workshop on Scholarly Document Processing (SDP) at Coling 2022.

Shared Task URL: https://vadis-project.github.io/sv-ident-sdp2022
Workshop URL: https://sdproc.org/2022

Important dates:
- Trial data release: March 16, 2022
- Training data release: June 6, 2022
- Deadline for registration: July 4, 2022
- Test data release: July 18, 2022
- System runs due: July 25, 2022
- Workshop papers due: August 15, 2022
- Camera-ready papers due: September 5, 2022
- SDP workshop at COLING 2022: October 16/17, 2022

Contact:
For any questions on the shared task please contact us on: 
https://groups.google.com/g/svident2022

_

--
Simone Paolo Ponzetto
Data and Web Science Group
University of Mannheim, Germany

http://dws.informatik.uni-mannheim.de/ponzetto
Tel: +49 621 181 2647

___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List]Re: [EXTERNAL] Re: Complex Word Identification in French

2022-06-20 Thread Ada Wan
On Tue, Jun 21, 2022 at 12:14 AM Flor, Michael  wrote:

>  The notion of 'word' has difficulties in linguistics.
> But not enough for abandoning it.
>
> Except we don't need it at all --- for both human or machine processing.


> The argument from the paper "Fairness in Representation for Multilingual
> NLP"
> is not convincing at all.
>
Even if the early findings are correct for transformers ,
> applicability to human language faculty is not yet supported.
>
> Right, this paper version has not yet addressed the whole story, which I
have yet to continue with. But one can get the gist from conditional
probability, context, and finer granularity.


> On the other hand, it is not even needed.
> Developmental linguists have noted long ago that babies acquire all
> natural languages at approximately the same rate (under some 'standard
> conditions'), despite vast morphological and other differences between
> languages.
> Thus, in some sense, all natural human languages are already deemed
> 'equal' vis-a-vis acquisition complexity.
>
> Well, talk to the NLP crowd or the ones who expect LM/MT results from
different languages should have different performances, even if/when all
else were equal. (I remember how hard and how many rounds I had to work for
my rebuttals)


> For language learning later in life,
> if one's native language is morphologically rich, learning (some types of)
> morphologically rich languages (as an adult) is a bit easier than learning
> a language that is very different, etc.
>
> That's the thing about this paper --- my personal take with L_n learning
is that, no, it's actually also just a length and vocabulary thing wrt
whatever one is used to (e.g. with L1), the environment/support available,
and +/- personal propensity towards new lang.


> Complexity of words in a language for non-native speakers/learners is
> actually a big issue and a field of research in EFL (and now in NLP as
> well).
>
> See above.


> Finally,
> word complexity is often defined within the same language  (e.g.
> able-ability, function-dysfunctional),
> and so a notion of cross-linguistic hegemony or malice is not even
> applicable here.
>
> What would it take for me to convince you that such "complexity" really
boils down to just length and vocab (think the examples you gave, viewed
from, say, a character perspective)?
E.g. is 'Xjfewijpiweoheymqaweopaf'h' more or less complex than
'multiple-dysfunction-prone' to you?
___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Ada Wan
Hi Linas,

As also a native EN speaker myself, I know "w*rds" is a very colloquial
term that gets used often. It got "cemented" via computational
implementation* and hence reinforced people's idea of what grammar is or
ought to be. I am not "undoing" EN writing (that'd be nonsense --- writing
is a thing and that, which is written already, is history, what is more
negotiable is our action/reaction/attitude towards it), but rather, trying
to get people to be more open-minded and more flexibly-minded about the
usage --- in terms of "w*rds" and "grammar", in the context of computing
and beyond (as all these are connected with each other), and of input
representation when modelling/processing.
*And back few decades ago, computing was more EN-dominant. And most working
on text representation then were primarily EN monolinguals, hence there
could have been a bit of X-centrism (where X can be any language, but in
this case, for the relevant historical context, it was EN), with less
sensitivity towards other languages and how w segmentation (with a totally
unintentional "I am just doing language processing, where language involves
words"-mindset) can effect linguistic hegemony in ways that have not been
considered by some. But now our systems and processing power is there to
give us that fine qualitative difference. So hopefully, instead of "so
what?", we can get the community to be more sensitive towards the values of
other communities --- in which the notions of w*rds may be different from
that in EN, or in which there is no native concept of a "w*rd" (and we
don't have to impose one on anyone).

I understand speech processing (both recognition and generation in terms of
TTS) is more fine-grained than text processing. Any step towards finer
granularity is better than none. I don't know if you are aware of the
vocabulary hacking practice in text processing... that's primarily what I
was getting at (also how this vocab hacking relates to structural
linguistics in some ways).

Statistical methods have always been around. They are not "new" methods. In
the tradition of lang sci/tech/eng, they've been somewhat "suppressed" bc
ppl kept arguing about grammar and how surface text representations ought
to look like.
The concept of w*rd defined with whitespace tokenisation is also not
satisfactory for EN, think contractions, abbreviations, tons of stuff from
intro NLP textbooks. :)

Re "What, exactly, is being proposed here?": in case you have read the paper
 already, then more empathy,
more awareness and sensitivity with inter-cultural/personal values. Our
downstream results are good enough. We can switch our concern to more
qualitative matters.

Best,
Ada


On Tue, Jun 21, 2022 at 12:13 AM Linas Vepstas 
wrote:

> Hi Ada,
>
> In the English language, "words" are a thing. Children are taught to place
> spaces between "words". You're not going to undo a millennium-worth of
> English writing by discouraging the use of words.
>
> Much of Latin was written without blank spaces to denote word boundaries.
> In Chinese writing, there are no blank spaces to denote word-boundaries.
> There's assorted NLP software that attempts to guess where those blanks may
> be, so that Chinese could be segmented and passed into other NLP pipeline
> stages.
>
> When we speak, verbally, we don't put in "blanks" between words, although
> there are sometimes pauses. Realistic text-to-speech software NEVER
> vocalizes words individually, and instead ALWAYS vocalizes the transition
> between words, and places the break within a single phoneme (I hope it's
> clear what I am saying here). Thus, from the point of text-to-speech
> software, words don't exist, because that is a fundamental requirement for
> normal-sounding speech. (For English.)
>
> Now that we live in the world of statistics and deep learning and whatnot,
> it's become clear that an audio stream of human speech has some parts that
> are "highly conserved" (require certain sounds to follow) and other regions
> which are flexible (just about any other sound can follow). And plenty of
> stuff in the middle between these two extremes.   Surprisingly (or not
> surprisingly, depending on who you are) the highly variable regions are not
> word boundaries. Except when ... there are ... well, exceptions.
>
> However, right now, I am not communicating verbally, and so I am faced
> with the task of converting thoughts into sequences of (discrete) symbols.
> As I learned in first grade, I do this by placing typed spaces between
> words.
>
> Sure, the concept of "word" may be quite inappropriate for some obscure
> languages.  This is entirely plausible, as any "synthetic" language defies
> the concept of "word" (Finnish, Lithuanian consist of "words" many of which
> are like "antidisestablishmentarianism" and its a children's playground
> game of creating the longest such possible expression. Creating new words
> in these languages is like creating new 

[Corpora-List]Re: [EXTERNAL] Re: Complex Word Identification in French

2022-06-20 Thread Flor, Michael
 The notion of 'word' has difficulties in linguistics.
But not enough for abandoning it.

The argument from the paper "Fairness in Representation for Multilingual NLP"
is not convincing at all.
Even if the early findings are correct for transformers ,
applicability to human language faculty is not yet supported.

On the other hand, it is not even needed.
Developmental linguists have noted long ago that babies acquire all natural 
languages at approximately the same rate (under some 'standard conditions'), 
despite vast morphological and other differences between languages.
Thus, in some sense, all natural human languages are already deemed 'equal' 
vis-a-vis acquisition complexity.

For language learning later in life,
if one's native language is morphologically rich, learning (some types of) 
morphologically rich languages (as an adult) is a bit easier than learning a 
language that is very different, etc.

Complexity of words in a language for non-native speakers/learners is actually 
a big issue and a field of research in EFL (and now in NLP as well).

Finally,
word complexity is often defined within the same language  (e.g. able-ability, 
function-dysfunctional),
and so a notion of cross-linguistic hegemony or malice is not even applicable 
here.

MF







From: Daniel HENKEL 
Sent: Monday, June 20, 2022 5:36 PM
To: Ada Wan 
Cc: Christopher Collins ; 
corpora@list.elra.info 
Subject: [EXTERNAL] [Corpora-List]Re: Complex Word Identification in French


CAUTION: This email originated from outside of our organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.

WAR IS PEACE
FREEDOM IS SLAVERY
IGNORANCE IS STRENGTH

You are a flaw in the pattern, Winston. You are a stain that must be wiped out. 
Did I not tell you just now that we are different from the persecutors of the 
past? We are not content with negative obedience, nor even with the most abject 
submission. When finally you surrender to us, it must be of your own free will. 
We do not destroy the heretic because he resists us: so long as he resists us 
we never destroy him. We convert him, we capture his inner mind, we reshape 
him. We burn all evil and all illusion out of him; we bring him over to our 
side, not in appearance, but genuinely, heart and soul. We make him one of 
ourselves before we kill him. It is intolerable to us that an erroneous thought 
should exist anywhere in the world, however secret and powerless it may be.

(G. Orwell, 1984)


On 20/06/2022 23:27, Ada Wan wrote:
Yeah... I really don't know what to do with "the resistance", "the ignorance"
--
Daniel 
HENKEL
Maître de Conférences (Linguistique et Traduction)
UFR5 LLCE-LEA • EA1569 TransCrit
Université Paris 8 Vincennes-St-Denis

“non si può stendere una tipologia delle traduzioni, ma al massimo una 
tipologia di diversi modi di tradurre, volta per volta negoziando il fine che 
ci si propone
– e volta per volta scoprendo che i modi di tradurre sono più di quelli che 
sospettiamo.” U. Eco


From: Daniel HENKEL 
Sent: Monday, June 20, 2022 5:36 PM
To: Ada Wan 
Cc: Christopher Collins ; 
corpora@list.elra.info 
Subject: [EXTERNAL] [Corpora-List]Re: Complex Word Identification in French


CAUTION: This email originated from outside of our organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.

WAR IS PEACE
FREEDOM IS SLAVERY
IGNORANCE IS STRENGTH

You are a flaw in the pattern, Winston. You are a stain that must be wiped out. 
Did I not tell you just now that we are different from the persecutors of the 
past? We are not content with negative obedience, nor even with the most abject 
submission. When finally you surrender to us, it must be of your own free will. 
We do not destroy the heretic because he resists us: so long as he resists us 
we never destroy him. We convert him, we capture his inner mind, we reshape 
him. We burn all evil and all illusion out of him; we bring him over to our 
side, not in appearance, but genuinely, heart and soul. We make him one of 
ourselves before we kill him. It is intolerable to us that an erroneous thought 
should exist anywhere in the world, however secret and powerless it may be.

(G. Orwell, 1984)


On 20/06/2022 23:27, Ada Wan wrote:
Yeah... I really don't know what to do with "the resistance", "the ignorance"
--
Daniel 

[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Linas Vepstas
Hi Ada,

In the English language, "words" are a thing. Children are taught to place
spaces between "words". You're not going to undo a millennium-worth of
English writing by discouraging the use of words.

Much of Latin was written without blank spaces to denote word boundaries.
In Chinese writing, there are no blank spaces to denote word-boundaries.
There's assorted NLP software that attempts to guess where those blanks may
be, so that Chinese could be segmented and passed into other NLP pipeline
stages.

When we speak, verbally, we don't put in "blanks" between words, although
there are sometimes pauses. Realistic text-to-speech software NEVER
vocalizes words individually, and instead ALWAYS vocalizes the transition
between words, and places the break within a single phoneme (I hope it's
clear what I am saying here). Thus, from the point of text-to-speech
software, words don't exist, because that is a fundamental requirement for
normal-sounding speech. (For English.)

Now that we live in the world of statistics and deep learning and whatnot,
it's become clear that an audio stream of human speech has some parts that
are "highly conserved" (require certain sounds to follow) and other regions
which are flexible (just about any other sound can follow). And plenty of
stuff in the middle between these two extremes.   Surprisingly (or not
surprisingly, depending on who you are) the highly variable regions are not
word boundaries. Except when ... there are ... well, exceptions.

However, right now, I am not communicating verbally, and so I am faced with
the task of converting thoughts into sequences of (discrete) symbols. As I
learned in first grade, I do this by placing typed spaces between words.

Sure, the concept of "word" may be quite inappropriate for some obscure
languages.  This is entirely plausible, as any "synthetic" language defies
the concept of "word" (Finnish, Lithuanian consist of "words" many of which
are like "antidisestablishmentarianism" and its a children's playground
game of creating the longest such possible expression. Creating new words
in these languages is like creating new sentences in English. It's just
something you do, and there are no "word boundaries" involved.)

Great. So now what?  I assume everything I wrote is 100% mainstream, known
to any and every linguist, half of whom could amplify and correct all the
mistakes I've made in the above.  Sure, but so what? You can't get rid of
the concept of "word". It's a thing.  What, exactly, is being proposed here?

-- linas


On Mon, Jun 20, 2022 at 10:33 AM Ada Wan  wrote:

> Hi Christopher,
>
> It is of the best interest of the community to discontinue the usage of
> "word". The term is not only very shaky in its foundation (if any), but it
> can also effect disparity in performance in computational processing and
> robustness when human evaluation is involved.
> Despite the term has been casually adopted by many in the past, like many
> un-PC terms that may have an inappropriate undertone, it needs to be
> discouraged and abandoned.
> Last but not least, I noticed that you are located in Canada, in the event
> that you were to work with any indigenous communities, one MUST be advised
> to be careful with the usage of such term --- you could be imposing your
> own (EN- / FR- / dominant language-centric) view onto another
> individual/community. There is an element of cultural and
> linguistic hegemony with the usage of such term (including and not limited
> to making applications with it).
> Please also consult recent work in this area:
> https://openreview.net/forum?id=-llS6TiOew.
>
> Feel free to get in touch if you should have any questions.
>
> Best,
> Ada
>
>
> On Mon, Jun 20, 2022 at 4:53 PM Christopher Collins <
> christopher.coll...@ontariotechu.ca> wrote:
>
>> Hello,
>>
>>
>>
>> I’m looking for any open source or cloud-hosted solution for complex word
>> identification or word difficulty rating in French for a reading
>> application.
>>
>>
>>
>> As a backup plan we can use measures like corpus frequency, length,
>> number of senses, but we’re hoping someone has already made a tool
>> available.
>>
>>
>>
>> We found this but that’s it: https://github.com/sheffieldnlp/cwi
>>
>>
>>
>> Would appreciate any tips!
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Chris
>>
>>
>>
>> *Christopher Collins *[he/him
>> 
>> ]
>> Associate Professor - Faculty of Science
>> Canada Research Chair in Linguistic Information Visualization
>> Ontario Tech University
>> vialab.ca
>> ___
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list -- corpora@list.elra.info
>> To unsubscribe send an email to corpora-le...@list.elra.info
>>
> ___
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> 

[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Ada Wan
I used "term" bc it makes room for a little bit of (mental) shifting for
some ppl...  Everyone (non-specialists included) uses "w*rd". Nothing is
100% --- when it comes to "language" or abstract concepts (or everything in
the empirical world?), but 99% is better than 98 or 60%. (E.g. we may have
99% of known lgs in character encoding down vs. a very shaky
never-ending-story with w-segmentation, not even for one language.)


On Mon, Jun 20, 2022 at 11:18 PM Daniel HENKEL 
wrote:

> Just to clarify my position, I don't actually think that the En. lexeme
> “w*rd” is easy to define, precise or theoretically well-founded (I prefer
> “lexeme” here, as Ada's previous use of “term” is improper from a wusterian
> point of view, given that “w*rd” lacks distinctive traits due to its
> notorious ambiguity).
>
> The situation is similar in mathematics where “number” is used to denote a
> variety of concepts such as natural numbers, integers, fractions, real
> numbers, irrational numbers, imaginary numbers … which may be inclusive or
> exclusive of each other.  There are thus numerous contexts in which
> colloquial use of the w*rd “number” would be imprecise, inappropriate and
> might even lead to confusion.  Nonetheless, I'm not aware of any
> mathematicians who advocate censorship of the w*rd “number”.
>
> If “w*rd” lacks a clear definition and a clear theoretical foundation
> (which I actually agree with), then it can't really be used as a “term”
> until the concept has been given an adequate definition in relation to
> other terms within the relevant domain or theoretical framework.
>
> On the other hand, though precise terminology is always preferable
> whenever and wherever precision is necessary, there's nothing ever to be
> gained scientifically through censorship (sorry to use an ungood w*rd, but,
> in all earnestness, when I see a spade I call it a “spade”).
>
> DH
>
>
> On 20/06/2022 22:13, Daniel HENKEL wrote:
>
> Not to mention all these shamefully unscientific posts on Corporalist:
>
> *12th International Global W*rdnet Conference Donostia / San Sebastian,
> Basque Country 23-27, 2023 Global W*rdnet Association:
> www.globalw*rdnet.org *
> *Conference website: https://hitz.eus/gwc2023 *
>
> *18th Workshop on Multiw*rd Expressions (MWE 2022) Organized and sponsored
> by SIGLEX, the Special Interest Group on the Lexicon of the ACL*
>
> *The 5th Workshop on Multi-w*rd Units in Machine Translation and
> Translation Technology (MUMTTT 2022) Malaga, 30th September 2022*
>
> ...
>
> Definitely time for some lexical/terminological restrictions/updates, for
> the sake of goodthink/processing, and science!
>
>
> (actually "science" is heretical/redundant, "goodthink/processing" will do
> the job:
>
> *"As we have already seen in the case of the word FREE, w*rds which had
> once borne a heretical meaning were sometimes retained for the sake of
> convenience, but only with the undesirable meanings purged out of them.
> Countless other w*rds such as HONOUR, JUSTICE, MORALITY, INTERNATIONALISM,
> DEMOCRACY, SCIENCE, and RELIGION had simply ceased to exist."*)
>
> DH
>
>
>
> On 20/06/2022 21:47, Daniel HENKEL wrote:
>
> Looks as if Linguistlist is in need of some scientific enlightenment as
> well :
>
> http://linguistlist.org/issues/33/33-2063.html
>
> *In the new, thoroughly revised second edition of W*rds of Wonder:
> Endangered Languages and What They Tell Us, Second Edition (formerly called
> Dying W*rds: Endangered Languages and What They Have to Tell Us), renowned
> scholar Nicholas Evans delivers an accessible and incisive text covering
> the impact of mass language endangerment. The distinguished author explores
> issues surrounding the preservation of indigenous languages, ...*
>
> (ungood w*rds unw*rded to protect the faint of mind against ungood
> thinking/processing).
>
> Best,
>
> DH
>
>
> On 20/06/2022 20:27, Ada Wan wrote:
>
> (I just expounded on a point as a twitter reply today re the granularity
> of one's thinking/processing. Pls feel free to read that also.)
>
> One can think of it in a less binary manner --- not "good" vs "bad", not
> "words" then "sentences", but to think of an utterance/sequence with all
> the finer connections in between... That is the beauty of language --- from
> a "philological" point of view.
>
> I am not sure, though, if you were speaking from a scientific perspective,
> because I have a paper to back my argument in that regard.
>
>
> On Mon, Jun 20, 2022 at 6:06 PM Sylvain Kahane  wrote:
>
>> “We’re destroying words–scores of them, hundreds of them, every day.
>> We’re cutting the language down to the bone.” […]
>>
>> “It’s a beautiful thing, the destruction of words. Of course the great
>> advantage is in the verbs and adjectives, but there are hundreds of nouns
>> that can be got rid of as well. It isn’t only the synonyms; there are also
>> the antonyms. After all, what justification is there for a word which is
>> simply the 

[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Daniel HENKEL

WAR IS PEACE
FREEDOM IS SLAVERY
IGNORANCE IS STRENGTH

You are a flaw in the pattern, Winston. You are a stain that must be 
wiped out. Did I not tell you just now that we are different from the 
persecutors of the past? We are not content with negative obedience, nor 
even with the most abject submission. When finally you surrender to us, 
it must be of your own free will. We do not destroy the heretic because 
he resists us: so long as he resists us we never destroy him. We convert 
him, we capture his inner mind, we reshape him. We burn all evil and all 
illusion out of him; we bring him over to our side, not in appearance, 
but genuinely, heart and soul. We make him one of ourselves before we 
kill him. It is intolerable to us that an erroneous thought should exist 
anywhere in the world, however secret and powerless it may be.


(G. Orwell, 1984)


On 20/06/2022 23:27, Ada Wan wrote:
Yeah... I really don't know what to do with "the resistance", "the 
ignorance"

--
Daniel HENKEL 
/Maître de Conférences (Linguistique et Traduction)
UFR5 LLCE-LEA • EA1569 TransCrit/
Université Paris 8 Vincennes-St-Denis

/“non si può stendere una tipologia delle traduzioni, ma al massimo una 
tipologia di diversi modi di tradurre, volta per volta negoziando il 
fine che ci si propone
– e volta per volta scoprendo che i modi di tradurre sono più di quelli 
che sospettiamo.”/ U. Eco___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Ada Wan
Yeah... I really don't know what to do with "the resistance", "the
ignorance" (as in, both the practice of intentionally ignoring my results,
and otherwise)... etc.. Many of us are so used to both naming and
processing at such granularity... it'd take the whole world for change to
happen and I hope we could work together instead of against each other.

If we sustain the w hackery, we are hurting ourselves / our students in the
end. From the tech point of view, things are getting ever finer and
our hardware can take on more, I don't know what ppl have in mind if/when
they want our next gen to be quibbling about things like "is ... a w*rd?"
or whether the w boundary should be placed exactly here or there

I suggested in "Fairness in Representation
" that we should use clearer
nomenclature:
"one can simply describe languages and their statistical profiles with
respect to their representational granularity in characters or bytes (which
are and/or can be exhaustively standardized in computing), or refer to
sequences as longer/shorter or having a higher/lower vocabulary size when
comparing them with each other, rather than “richer”/“poorer” based on
concepts (e.g. “words”, “sentences”) that can be ambiguous, contested, and
inaccessible to many."

And in the event that our downstream task results are so good already,
there will always be room for content creation. That's also what I tried to
advocate in the paper: quality multiway datasets for science, eval, and
documentation --- I wished the world would give more worth to data. There
is a lot of room to translate our engineering back to science, to
re-educate students in the lang sci/tech to become more stats savvy, and to
re-educate the world about fairness in multilinguality...


On Mon, Jun 20, 2022 at 10:13 PM Daniel HENKEL 
wrote:

> Not to mention all these shamefully unscientific posts on Corporalist:
>
> *12th International Global W*rdnet Conference Donostia / San Sebastian,
> Basque Country 23-27, 2023 Global W*rdnet Association:
> www.globalw*rdnet.org *
> *Conference website: https://hitz.eus/gwc2023 *
>
> *18th Workshop on Multiw*rd Expressions (MWE 2022) Organized and sponsored
> by SIGLEX, the Special Interest Group on the Lexicon of the ACL*
>
> *The 5th Workshop on Multi-w*rd Units in Machine Translation and
> Translation Technology (MUMTTT 2022) Malaga, 30th September 2022*
>
> ...
>
> Definitely time for some lexical/terminological restrictions/updates, for
> the sake of goodthink/processing, and science!
>
>
> (actually "science" is heretical/redundant, "goodthink/processing" will do
> the job:
>
> *"As we have already seen in the case of the word FREE, w*rds which had
> once borne a heretical meaning were sometimes retained for the sake of
> convenience, but only with the undesirable meanings purged out of them.
> Countless other w*rds such as HONOUR, JUSTICE, MORALITY, INTERNATIONALISM,
> DEMOCRACY, SCIENCE, and RELIGION had simply ceased to exist."*)
>
> DH
>
>
>
> On 20/06/2022 21:47, Daniel HENKEL wrote:
>
> Looks as if Linguistlist is in need of some scientific enlightenment as
> well :
>
> http://linguistlist.org/issues/33/33-2063.html
>
> *In the new, thoroughly revised second edition of W*rds of Wonder:
> Endangered Languages and What They Tell Us, Second Edition (formerly called
> Dying W*rds: Endangered Languages and What They Have to Tell Us), renowned
> scholar Nicholas Evans delivers an accessible and incisive text covering
> the impact of mass language endangerment. The distinguished author explores
> issues surrounding the preservation of indigenous languages, ...*
>
> (ungood w*rds unw*rded to protect the faint of mind against ungood
> thinking/processing).
>
> Best,
>
> DH
>
>
> On 20/06/2022 20:27, Ada Wan wrote:
>
> (I just expounded on a point as a twitter reply today re the granularity
> of one's thinking/processing. Pls feel free to read that also.)
>
> One can think of it in a less binary manner --- not "good" vs "bad", not
> "words" then "sentences", but to think of an utterance/sequence with all
> the finer connections in between... That is the beauty of language --- from
> a "philological" point of view.
>
> I am not sure, though, if you were speaking from a scientific perspective,
> because I have a paper to back my argument in that regard.
>
>
> On Mon, Jun 20, 2022 at 6:06 PM Sylvain Kahane  wrote:
>
>> “We’re destroying words–scores of them, hundreds of them, every day.
>> We’re cutting the language down to the bone.” […]
>>
>> “It’s a beautiful thing, the destruction of words. Of course the great
>> advantage is in the verbs and adjectives, but there are hundreds of nouns
>> that can be got rid of as well. It isn’t only the synonyms; there are also
>> the antonyms. After all, what justification is there for a word which is
>> simply the opposite of some other words? A word contains its opposite in
>> itself. Take ‘good,’ 

[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Daniel HENKEL
Just to clarify my position, I don't actually think that the En. lexeme 
“w*rd” is easy to define, precise or theoretically well-founded (I 
prefer “lexeme” here, as Ada's previous use of “term” is improper from a 
wusterian point of view, given that “w*rd” lacks distinctive traits due 
to its notorious ambiguity).


The situation is similar in mathematics where “number” is used to denote 
a variety of concepts such as natural numbers, integers, fractions, real 
numbers, irrational numbers, imaginary numbers … which may be inclusive 
or exclusive of each other.  There are thus numerous contexts in which 
colloquial use of the w*rd “number” would be imprecise, inappropriate 
and might even lead to confusion.  Nonetheless, I'm not aware of any 
mathematicians who advocate censorship of the w*rd “number”.


If “w*rd” lacks a clear definition and a clear theoretical foundation 
(which I actually agree with), then it can't really be used as a “term” 
until the concept has been given an adequate definition in relation to 
other terms within the relevant domain or theoretical framework.


On the other hand, though precise terminology is always preferable 
whenever and wherever precision is necessary, there's nothing ever to be 
gained scientifically through censorship (sorry to use an ungood w*rd, 
but, in all earnestness, when I see a spade I call it a “spade”).


DH


On 20/06/2022 22:13, Daniel HENKEL wrote:


Not to mention all these shamefully unscientific posts on Corporalist:

/12th International Global W*rdnet Conference Donostia / San 
Sebastian, Basque Country 23-27, 2023 Global W*rdnet Association: 
www.globalw*rdnet.org//

//Conference website: https://hitz.eus/gwc2023/

/18th Workshop on Multiw*rd Expressions (MWE 2022) Organized and 
sponsored by SIGLEX, the Special Interest Group on the Lexicon of the ACL/


/The 5th Workshop on Multi-w*rd Units in Machine Translation and 
Translation Technology (MUMTTT 2022) Malaga, 30th September 2022/


...

Definitely time for some lexical/terminological restrictions/updates, 
for the sake of goodthink/processing, and science!



(actually "science" is heretical/redundant, "goodthink/processing" 
will do the job:


/"As we have already seen in the case of the word FREE, w*rds which 
had once borne a heretical meaning were sometimes retained for the 
sake of convenience, but only with the undesirable meanings purged out 
of them. Countless other w*rds such as HONOUR, JUSTICE, MORALITY, 
INTERNATIONALISM, DEMOCRACY, SCIENCE, and RELIGION had simply ceased 
to exist."/)


DH



On 20/06/2022 21:47, Daniel HENKEL wrote:

Looks as if Linguistlist is in need of some scientific enlightenment 
as well :


http://linguistlist.org/issues/33/33-2063.html

/In the new, thoroughly revised second edition of W*rds of Wonder: 
Endangered Languages and What They Tell Us, Second Edition (formerly 
called Dying W*rds: Endangered Languages and What They Have to Tell 
Us), renowned scholar Nicholas Evans delivers an accessible and 
incisive text covering the impact of mass language endangerment. The 
distinguished author explores issues surrounding the preservation of 
indigenous languages, .../


(ungood w*rds unw*rded to protect the faint of mind against ungood 
thinking/processing).


Best,

DH


On 20/06/2022 20:27, Ada Wan wrote:
(I just expounded on a point as a twitter reply today re the 
granularity of one's thinking/processing. Pls feel free to read that 
also.)


One can think of it in a less binary manner --- not "good" vs "bad", 
not "words" then "sentences", but to think of an utterance/sequence 
with all the finer connections in between... That is the beauty of 
language --- from a "philological" point of view.


I am not sure, though, if you were speaking from a scientific 
perspective, because I have a paper to back my argument in that regard.



On Mon, Jun 20, 2022 at 6:06 PM Sylvain Kahane  
wrote:


“We’re destroying words–scores of them, hundreds of them, every
day. We’re cutting the language down to the bone.” […]

“It’s a beautiful thing, the destruction of words. Of course the
great advantage is in the verbs and adjectives, but there are
hundreds of nouns that can be got rid of as well. It isn’t only
the synonyms; there are also the antonyms. After all, what
justification is there for a word which is simply the opposite
of some other words? A word contains its opposite in itself.
Take ‘good,’ for instance. If you have a word like ‘good,’ what
need is there for a word like ‘bad’? ‘Ungood’ will do just as
well–better, because it’s an exact opposite, which the other is
not. Or again, if you want a stronger version of ‘good,’ what
sense is there in having a whole string of vague useless words
like ‘excellent’ and ‘splendid’ and all the rest of them?
‘Plusgood’ covers the meaning, or ‘doubleplusgood’ if you want
something stronger still. Of course we use those forms already,
but in the final 

[Corpora-List]The New Directions in Analyzing Text as Data (TADA) Conference

2022-06-20 Thread heather froehlich
*The New Directions in Analyzing Text as Data (TADA) Conference*
October 6-7, 2022 at Cornell Tech, Roosevelt Island, New York City

Deadline for abstract submission: July 18, 2022

Form for abstract submission: https://forms.gle/XSarfYQGAXArACEn9

The New Directions in Analyzing Text as Data (TADA) meeting is a leading
forum for research on the study of politics, society, and culture through
computational analysis of documents. Recent advances in NLP have the
potential to revolutionize how we study human society. But using these
tools effectively, reliably, and equitably requires continuous dialog
between experts across computational methods, social science, and the
humanities.

TADA 2022 invites applications for research presentations on new work
related to text-as-data methods and applications. TADA is an
interdisciplinary conference, drawing scholars from across the social
sciences, computer and information science, and related fields. Our
programs from past meetings (e.g. TADA 2021 ) show
the wide range of work presented at our conference.

Key Dates:

Monday July 18, abstract submission
Friday Aug 5, notification of selection
Friday Aug 12, registration for participation
Thursday Sep 22, full papers for discussants
Thursday Oct 6 – Friday Oct 7, conference
This year’s conference will be held at Cornell Tech on Roosevelt Island and
is sponsored by the Cornell Center for Social Science and the Cornell
Center for Data Science for Enterprise and Society.

Proposals are due July 18, and consist of a brief, 300-word abstract in
text format rather than a full paper. TADA 2022 is a non-archival
conference; there are no formal proceedings, and papers presented at the
conference will not be distributed publicly by the conference. Presenters
are expected to provide a paper to their discussant two weeks before the
conference. We welcome any work, so long as it hasn’t been previously
presented at a TADA conference. We also welcome individuals to volunteer to
serve as discussants.

We are planning to hold TADA 2022 in a hybrid format. Participants who are
willing to travel to NYC will attend the conference in person, while a
remote option will be provided for the other participants. The conference
will consist of oral presentations and a poster session with Ph.D.
students. The conference will take place on October 6 and 7 during Eastern
Daylight Time business hours.

In addition to oral presentations and posters TADA 2022 will have a
doctoral consortium. PhD students will be matched with experienced mentors
from complementary fields to offer critiques to specific work and to
provide guidance in how to do effective interdisciplinary work.

Because in-person space is limited, we ask all interested participants to
apply to attend the conference. There will be a separate registration for
participants after the abstract review process.

Diversity leads to stronger science. We actively seek, welcome, and
encourage people with diverse backgrounds, experiences, and identities to
apply and attend. While many participants have attended TADA for years, we
also eagerly welcome new researchers!

We anticipate having lodging options available for conference participants
to reserve directly. Limited travel funds will be made available for the
contact author of accepted papers and posters. Priority for this funding
will be given to scholars who lack their own travel funds. We will provide
guidance on booking rooms and accessing travel funds with your conference
acceptance.

For questions write to i...@tada2022.org


-- 
Dr Heather Froehlich

w // http://hfroehli.ch
t  // @heatherfro
___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List] [CFP] 2022 AAAI Fall Symposium on “Artificial Intelligence for Human-Robot Interaction” (AI-HRI)

2022-06-20 Thread Ross Mead
[CFP] 2022 AAAI Fall Symposium on “Artificial Intelligence for Human-Robot
Interaction” (AI-HRI)

The ninth annual AAAI Fall Symposium on “Artificial Intelligence for
Human-Robot Interaction” (AI-HRI) will take place on November 17-19, 2022,
at the Westin Arlington Gateway in Arlington, VA, USA. We are planning a
hybrid in-person and online format to reach more people who would not be
able to attend in person.

Paper submissions will be due on July 31, 2022 and can be made through
https://easychair.org/my/conference?conf=fss22. For more information and
updates, please see below or visit the symposium website at
https://ai-hri.github.io.
Overview

The Artificial Intelligence (AI) for Human-Robot Interaction (HRI)
symposium has been a successful venue of discussion and collaboration on AI
aimed at HRI since 2014. This year, after a review of the achievements of
the AI-HRI community over the last decade in 2021, we are focusing on a
visionary theme: exploring the future of AI-HRI. Accordingly, we added a
Blue Sky Ideas track to foster a forward-thinking discussion on future
research at the intersection of AI and HRI. As always, we appreciate all
contributions related to any topic on AI/HRI and welcome new researchers
who wish to take part in this growing community.

With the success of past symposia, AI-HRI impacts a variety of communities
and problems, and has pioneered the discussions in recent trends and
interests. This year's AI-HRI Fall Symposium aims to bring together
researchers and practitioners from around the globe, representing a number
of university, government, and industry laboratories. In doing so, we hope
to accelerate research in the field, support technology transition and user
adoption, and determine future directions for our group and our research.
Topics

   -

   Future of AI-HRI and “Blue Sky” ideas
   -

   Ubiquitous HRI, including AR and VR
   -

   Ethics in HRI
   -

   Trust and explainability in HRI
   -

   Robot planning and decision-making for HRI
   -

   Architectures and systems supporting autonomous HRI
   -

   Interactive task learning and planning
   -

   Interactive dialog systems and natural language
   -

   Field studies, experimental, and empirical HRI
   -

   Safety and human comfort in HRI
   -

   Software tools for autonomous HRI
   -

   AI for social robots
   -

   AI for physical HRI
   -

   Knowledge representation and reasoning to support HRI
   -

   HRI in teams and groups
   -

   Replication studies and reproducibility
   -

   Test methods and metrics for AI-HRI
   -

   ...and many other topics relevant to the application of Artificial
   Intelligence to Human-Robot Interaction!

Format

This year's symposium will focus on identifying the future of the field of
AI-HRI, ranging from research directions to establishing a year-long
collaborating community. However, symposium participants will still be
invited to present their own work as it contributes to understanding what
matters towards these goals. Symposium participants presenting their work
will be encouraged to include a perspective on the reproducibility and
ethics in HRI, though all research on AI-HRI will be considered.

We will also continue to include community-building efforts in the
schedule: position talks to incite discussion on new and controversial
views; informal, focused discussions during poster sessions; breakout
discussion sessions in smaller groups; and a demo session, which will
emphasize live robot and AR/VR demonstrations. Such discussions and demos
are ideal within the symposium community, which is small enough that we can
all learn each other's names and faces, but large enough to draw an
audience of people who have a real impact in our field. These discussions
give perfect opportunities for new researchers in the field to meet senior
members in a more informal setting, and become more involved in future
collaboration within the community.

This year, we have the opportunity to gather in person after two years of
being virtual. For inclusiveness, we are planning in advance to implement a
hybrid in-person and online format to reach more people who would not be
able to attend in person. The diversity chair will lead the efforts for
creating an inclusive community and will work with AAAI to manage the
logistics for hybrid participation.
Submissions

   -

   Full papers (6-8 pages) highlighting state-of-the-art HRI-oriented AI
   research, HRI research focusing on the Future of AI-HRI, the use of
   autonomous AI systems, or the implementation of AI systems in commercial
   HRI products.
   -

   Short papers (2-4 pages) outlining new or controversial views on AI-HRI
   research or describing ongoing AI-oriented HRI research.
   -

   Tool papers (2-4 pages) describing novel software, hardware, or datasets
   of interest to the AI-HRI community.
   -

   Blue Sky papers (2-4 pages) fostering a forward-thinking discussion on
   the future at the intersection of AI and HRI.


[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Daniel HENKEL

Not to mention all these shamefully unscientific posts on Corporalist:

/12th International Global W*rdnet Conference Donostia / San Sebastian, 
Basque Country 23-27, 2023 Global W*rdnet Association: 
www.globalw*rdnet.org//

//Conference website: https://hitz.eus/gwc2023/

/18th Workshop on Multiw*rd Expressions (MWE 2022) Organized and 
sponsored by SIGLEX, the Special Interest Group on the Lexicon of the ACL/


/The 5th Workshop on Multi-w*rd Units in Machine Translation and 
Translation Technology (MUMTTT 2022) Malaga, 30th September 2022/


...

Definitely time for some lexical/terminological restrictions/updates, 
for the sake of goodthink/processing, and science!



(actually "science" is heretical/redundant, "goodthink/processing" will 
do the job:


/"As we have already seen in the case of the word FREE, w*rds which had 
once borne a heretical meaning were sometimes retained for the sake of 
convenience, but only with the undesirable meanings purged out of them. 
Countless other w*rds such as HONOUR, JUSTICE, MORALITY, 
INTERNATIONALISM, DEMOCRACY, SCIENCE, and RELIGION had simply ceased to 
exist."/)


DH



On 20/06/2022 21:47, Daniel HENKEL wrote:

Looks as if Linguistlist is in need of some scientific enlightenment 
as well :


http://linguistlist.org/issues/33/33-2063.html

/In the new, thoroughly revised second edition of W*rds of Wonder: 
Endangered Languages and What They Tell Us, Second Edition (formerly 
called Dying W*rds: Endangered Languages and What They Have to Tell 
Us), renowned scholar Nicholas Evans delivers an accessible and 
incisive text covering the impact of mass language endangerment. The 
distinguished author explores issues surrounding the preservation of 
indigenous languages, .../


(ungood w*rds unw*rded to protect the faint of mind against ungood 
thinking/processing).


Best,

DH


On 20/06/2022 20:27, Ada Wan wrote:
(I just expounded on a point as a twitter reply today re the 
granularity of one's thinking/processing. Pls feel free to read that 
also.)


One can think of it in a less binary manner --- not "good" vs "bad", 
not "words" then "sentences", but to think of an utterance/sequence 
with all the finer connections in between... That is the beauty of 
language --- from a "philological" point of view.


I am not sure, though, if you were speaking from a scientific 
perspective, because I have a paper to back my argument in that regard.



On Mon, Jun 20, 2022 at 6:06 PM Sylvain Kahane  wrote:

“We’re destroying words–scores of them, hundreds of them, every
day. We’re cutting the language down to the bone.” […]

“It’s a beautiful thing, the destruction of words. Of course the
great advantage is in the verbs and adjectives, but there are
hundreds of nouns that can be got rid of as well. It isn’t only
the synonyms; there are also the antonyms. After all, what
justification is there for a word which is simply the opposite of
some other words? A word contains its opposite in itself. Take
‘good,’ for instance. If you have a word like ‘good,’ what need
is there for a word like ‘bad’? ‘Ungood’ will do just as
well–better, because it’s an exact opposite, which the other is
not. Or again, if you want a stronger version of ‘good,’ what
sense is there in having a whole string of vague useless words
like ‘excellent’ and ‘splendid’ and all the rest of them?
‘Plusgood’ covers the meaning, or ‘doubleplusgood’ if you want
something stronger still. Of course we use those forms already,
but in the final version of Newspeak there’ll be nothing else. In
the end the whole notion of goodness and badness will be covered
by only six words–in reality, only one word. Don’t you see the
beauty of that, Ada?…”

George Orwell, 1984


> Le 20 juin 2022 à 17:33, Ada Wan  a écrit :
>
> Hi Christopher,
>
> It is of the best interest of the community to discontinue the
usage of "word". The term is not only very shaky in its
foundation (if any), but it can also effect disparity in
performance in computational processing and robustness when human
evaluation is involved.
> Despite the term has been casually adopted by many in the past,
like many un-PC terms that may have an inappropriate undertone,
it needs to be discouraged and abandoned.
> Last but not least, I noticed that you are located in Canada,
in the event that you were to work with any indigenous
communities, one MUST be advised to be careful with the usage of
such term --- you could be imposing your own (EN- / FR- /
dominant language-centric) view onto another
individual/community. There is an element of cultural and
linguistic hegemony with the usage of such term (including and
not limited to making applications with it).
> Please also consult recent work in this area:
https://openreview.net/forum?id=-llS6TiOew.
>
> Feel free to get in touch if you should 

[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Ada Wan
@Daniel: Yeah, I think our whole field could benefit from a curriculum
update...

On Mon, Jun 20, 2022 at 9:47 PM Daniel HENKEL 
wrote:

> Looks as if Linguistlist is in need of some scientific enlightenment as
> well :
>
> http://linguistlist.org/issues/33/33-2063.html
>
> *In the new, thoroughly revised second edition of W*rds of Wonder:
> Endangered Languages and What They Tell Us, Second Edition (formerly called
> Dying W*rds: Endangered Languages and What They Have to Tell Us), renowned
> scholar Nicholas Evans delivers an accessible and incisive text covering
> the impact of mass language endangerment. The distinguished author explores
> issues surrounding the preservation of indigenous languages, ...*
>
> (ungood w*rds unw*rded to protect the faint of mind against ungood
> thinking/processing).
>
> Best,
>
> DH
>
>
> On 20/06/2022 20:27, Ada Wan wrote:
>
> (I just expounded on a point as a twitter reply today re the granularity
> of one's thinking/processing. Pls feel free to read that also.)
>
> One can think of it in a less binary manner --- not "good" vs "bad", not
> "words" then "sentences", but to think of an utterance/sequence with all
> the finer connections in between... That is the beauty of language --- from
> a "philological" point of view.
>
> I am not sure, though, if you were speaking from a scientific perspective,
> because I have a paper to back my argument in that regard.
>
>
> On Mon, Jun 20, 2022 at 6:06 PM Sylvain Kahane  wrote:
>
>> “We’re destroying words–scores of them, hundreds of them, every day.
>> We’re cutting the language down to the bone.” […]
>>
>> “It’s a beautiful thing, the destruction of words. Of course the great
>> advantage is in the verbs and adjectives, but there are hundreds of nouns
>> that can be got rid of as well. It isn’t only the synonyms; there are also
>> the antonyms. After all, what justification is there for a word which is
>> simply the opposite of some other words? A word contains its opposite in
>> itself. Take ‘good,’ for instance. If you have a word like ‘good,’ what
>> need is there for a word like ‘bad’? ‘Ungood’ will do just as well–better,
>> because it’s an exact opposite, which the other is not. Or again, if you
>> want a stronger version of ‘good,’ what sense is there in having a whole
>> string of vague useless words like ‘excellent’ and ‘splendid’ and all the
>> rest of them? ‘Plusgood’ covers the meaning, or ‘doubleplusgood’ if you
>> want something stronger still. Of course we use those forms already, but in
>> the final version of Newspeak there’ll be nothing else. In the end the
>> whole notion of goodness and badness will be covered by only six words–in
>> reality, only one word. Don’t you see the beauty of that, Ada?…”
>>
>> George Orwell, 1984
>>
>>
>> > Le 20 juin 2022 à 17:33, Ada Wan  a écrit :
>> >
>> > Hi Christopher,
>> >
>> > It is of the best interest of the community to discontinue the usage of
>> "word". The term is not only very shaky in its foundation (if any), but it
>> can also effect disparity in performance in computational processing and
>> robustness when human evaluation is involved.
>> > Despite the term has been casually adopted by many in the past, like
>> many un-PC terms that may have an inappropriate undertone, it needs to be
>> discouraged and abandoned.
>> > Last but not least, I noticed that you are located in Canada, in the
>> event that you were to work with any indigenous communities, one MUST be
>> advised to be careful with the usage of such term --- you could be imposing
>> your own (EN- / FR- / dominant language-centric) view onto another
>> individual/community. There is an element of cultural and linguistic
>> hegemony with the usage of such term (including and not limited to making
>> applications with it).
>> > Please also consult recent work in this area:
>> https://openreview.net/forum?id=-llS6TiOew.
>> >
>> > Feel free to get in touch if you should have any questions.
>> >
>> > Best,
>> > Ada
>> >
>> >
>> > On Mon, Jun 20, 2022 at 4:53 PM Christopher Collins <
>> christopher.coll...@ontariotechu.ca> wrote:
>> > Hello,
>> >
>> >
>> >
>> > I’m looking for any open source or cloud-hosted solution for complex
>> word identification or word difficulty rating in French for a reading
>> application.
>> >
>> >
>> >
>> > As a backup plan we can use measures like corpus frequency, length,
>> number of senses, but we’re hoping someone has already made a tool
>> available.
>> >
>> >
>> >
>> > We found this but that’s it: https://github.com/sheffieldnlp/cwi
>> >
>> >
>> >
>> > Would appreciate any tips!
>> >
>> >
>> >
>> > Thanks,
>> >
>> >
>> >
>> > Chris
>> >
>> >
>> >
>> > Christopher Collins [he/him]
>> > Associate Professor - Faculty of Science
>> > Canada Research Chair in Linguistic Information Visualization
>> > Ontario Tech University
>> > vialab.ca
>> >
>> > ___
>> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> > Corpora 

[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Daniel HENKEL
Looks as if Linguistlist is in need of some scientific enlightenment as 
well :


http://linguistlist.org/issues/33/33-2063.html

/In the new, thoroughly revised second edition of W*rds of Wonder: 
Endangered Languages and What They Tell Us, Second Edition (formerly 
called Dying W*rds: Endangered Languages and What They Have to Tell Us), 
renowned scholar Nicholas Evans delivers an accessible and incisive text 
covering the impact of mass language endangerment. The distinguished 
author explores issues surrounding the preservation of indigenous 
languages, .../


(ungood w*rds unw*rded to protect the faint of mind against ungood 
thinking/processing).


Best,

DH


On 20/06/2022 20:27, Ada Wan wrote:
(I just expounded on a point as a twitter reply today re the 
granularity of one's thinking/processing. Pls feel free to read that 
also.)


One can think of it in a less binary manner --- not "good" vs "bad", 
not "words" then "sentences", but to think of an utterance/sequence 
with all the finer connections in between... That is the beauty of 
language --- from a "philological" point of view.


I am not sure, though, if you were speaking from a scientific 
perspective, because I have a paper to back my argument in that regard.



On Mon, Jun 20, 2022 at 6:06 PM Sylvain Kahane  wrote:

“We’re destroying words–scores of them, hundreds of them, every
day. We’re cutting the language down to the bone.” […]

“It’s a beautiful thing, the destruction of words. Of course the
great advantage is in the verbs and adjectives, but there are
hundreds of nouns that can be got rid of as well. It isn’t only
the synonyms; there are also the antonyms. After all, what
justification is there for a word which is simply the opposite of
some other words? A word contains its opposite in itself. Take
‘good,’ for instance. If you have a word like ‘good,’ what need is
there for a word like ‘bad’? ‘Ungood’ will do just as well–better,
because it’s an exact opposite, which the other is not. Or again,
if you want a stronger version of ‘good,’ what sense is there in
having a whole string of vague useless words like ‘excellent’ and
‘splendid’ and all the rest of them? ‘Plusgood’ covers the
meaning, or ‘doubleplusgood’ if you want something stronger still.
Of course we use those forms already, but in the final version of
Newspeak there’ll be nothing else. In the end the whole notion of
goodness and badness will be covered by only six words–in reality,
only one word. Don’t you see the beauty of that, Ada?…”

George Orwell, 1984


> Le 20 juin 2022 à 17:33, Ada Wan  a écrit :
>
> Hi Christopher,
>
> It is of the best interest of the community to discontinue the
usage of "word". The term is not only very shaky in its foundation
(if any), but it can also effect disparity in performance in
computational processing and robustness when human evaluation is
involved.
> Despite the term has been casually adopted by many in the past,
like many un-PC terms that may have an inappropriate undertone, it
needs to be discouraged and abandoned.
> Last but not least, I noticed that you are located in Canada, in
the event that you were to work with any indigenous communities,
one MUST be advised to be careful with the usage of such term ---
you could be imposing your own (EN- / FR- / dominant
language-centric) view onto another individual/community. There is
an element of cultural and linguistic hegemony with the usage of
such term (including and not limited to making applications with it).
> Please also consult recent work in this area:
https://openreview.net/forum?id=-llS6TiOew.
>
> Feel free to get in touch if you should have any questions.
>
> Best,
> Ada
>
>
> On Mon, Jun 20, 2022 at 4:53 PM Christopher Collins
 wrote:
> Hello,
>
>
>
> I’m looking for any open source or cloud-hosted solution for
complex word identification or word difficulty rating in French
for a reading application.
>
>
>
> As a backup plan we can use measures like corpus frequency,
length, number of senses, but we’re hoping someone has already
made a tool available.
>
>
>
> We found this but that’s it: https://github.com/sheffieldnlp/cwi
>
>
>
> Would appreciate any tips!
>
>
>
> Thanks,
>
>
>
> Chris
>
>
>
> Christopher Collins [he/him]
> Associate Professor - Faculty of Science
> Canada Research Chair in Linguistic Information Visualization
> Ontario Tech University
> vialab.ca 
>
> ___
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list -- corpora@list.elra.info
> To unsubscribe send an email to 

[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Ada Wan
(I just expounded on a point as a twitter reply today re the granularity of
one's thinking/processing. Pls feel free to read that also.)

One can think of it in a less binary manner --- not "good" vs "bad", not
"words" then "sentences", but to think of an utterance/sequence with all
the finer connections in between... That is the beauty of language --- from
a "philological" point of view.

I am not sure, though, if you were speaking from a scientific perspective,
because I have a paper to back my argument in that regard.


On Mon, Jun 20, 2022 at 6:06 PM Sylvain Kahane  wrote:

> “We’re destroying words–scores of them, hundreds of them, every day. We’re
> cutting the language down to the bone.” […]
>
> “It’s a beautiful thing, the destruction of words. Of course the great
> advantage is in the verbs and adjectives, but there are hundreds of nouns
> that can be got rid of as well. It isn’t only the synonyms; there are also
> the antonyms. After all, what justification is there for a word which is
> simply the opposite of some other words? A word contains its opposite in
> itself. Take ‘good,’ for instance. If you have a word like ‘good,’ what
> need is there for a word like ‘bad’? ‘Ungood’ will do just as well–better,
> because it’s an exact opposite, which the other is not. Or again, if you
> want a stronger version of ‘good,’ what sense is there in having a whole
> string of vague useless words like ‘excellent’ and ‘splendid’ and all the
> rest of them? ‘Plusgood’ covers the meaning, or ‘doubleplusgood’ if you
> want something stronger still. Of course we use those forms already, but in
> the final version of Newspeak there’ll be nothing else. In the end the
> whole notion of goodness and badness will be covered by only six words–in
> reality, only one word. Don’t you see the beauty of that, Ada?…”
>
> George Orwell, 1984
>
>
> > Le 20 juin 2022 à 17:33, Ada Wan  a écrit :
> >
> > Hi Christopher,
> >
> > It is of the best interest of the community to discontinue the usage of
> "word". The term is not only very shaky in its foundation (if any), but it
> can also effect disparity in performance in computational processing and
> robustness when human evaluation is involved.
> > Despite the term has been casually adopted by many in the past, like
> many un-PC terms that may have an inappropriate undertone, it needs to be
> discouraged and abandoned.
> > Last but not least, I noticed that you are located in Canada, in the
> event that you were to work with any indigenous communities, one MUST be
> advised to be careful with the usage of such term --- you could be imposing
> your own (EN- / FR- / dominant language-centric) view onto another
> individual/community. There is an element of cultural and linguistic
> hegemony with the usage of such term (including and not limited to making
> applications with it).
> > Please also consult recent work in this area:
> https://openreview.net/forum?id=-llS6TiOew.
> >
> > Feel free to get in touch if you should have any questions.
> >
> > Best,
> > Ada
> >
> >
> > On Mon, Jun 20, 2022 at 4:53 PM Christopher Collins <
> christopher.coll...@ontariotechu.ca> wrote:
> > Hello,
> >
> >
> >
> > I’m looking for any open source or cloud-hosted solution for complex
> word identification or word difficulty rating in French for a reading
> application.
> >
> >
> >
> > As a backup plan we can use measures like corpus frequency, length,
> number of senses, but we’re hoping someone has already made a tool
> available.
> >
> >
> >
> > We found this but that’s it: https://github.com/sheffieldnlp/cwi
> >
> >
> >
> > Would appreciate any tips!
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Chris
> >
> >
> >
> > Christopher Collins [he/him]
> > Associate Professor - Faculty of Science
> > Canada Research Chair in Linguistic Information Visualization
> > Ontario Tech University
> > vialab.ca
> >
> > ___
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list -- corpora@list.elra.info
> > To unsubscribe send an email to corpora-le...@list.elra.info
> > ___
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list -- corpora@list.elra.info
> > To unsubscribe send an email to corpora-le...@list.elra.info
>
>
___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Sylvain Kahane
“We’re destroying words–scores of them, hundreds of them, every day. We’re 
cutting the language down to the bone.” […]

“It’s a beautiful thing, the destruction of words. Of course the great 
advantage is in the verbs and adjectives, but there are hundreds of nouns that 
can be got rid of as well. It isn’t only the synonyms; there are also the 
antonyms. After all, what justification is there for a word which is simply the 
opposite of some other words? A word contains its opposite in itself. Take 
‘good,’ for instance. If you have a word like ‘good,’ what need is there for a 
word like ‘bad’? ‘Ungood’ will do just as well–better, because it’s an exact 
opposite, which the other is not. Or again, if you want a stronger version of 
‘good,’ what sense is there in having a whole string of vague useless words 
like ‘excellent’ and ‘splendid’ and all the rest of them? ‘Plusgood’ covers the 
meaning, or ‘doubleplusgood’ if you want something stronger still. Of course we 
use those forms already, but in the final version of Newspeak there’ll be 
nothing else. In the end the whole notion of goodness and badness will be 
covered by only six words–in reality, only one word. Don’t you see the beauty 
of that, Ada?…”

George Orwell, 1984


> Le 20 juin 2022 à 17:33, Ada Wan  a écrit :
> 
> Hi Christopher, 
> 
> It is of the best interest of the community to discontinue the usage of 
> "word". The term is not only very shaky in its foundation (if any), but it 
> can also effect disparity in performance in computational processing and 
> robustness when human evaluation is involved. 
> Despite the term has been casually adopted by many in the past, like many 
> un-PC terms that may have an inappropriate undertone, it needs to be 
> discouraged and abandoned. 
> Last but not least, I noticed that you are located in Canada, in the event 
> that you were to work with any indigenous communities, one MUST be advised to 
> be careful with the usage of such term --- you could be imposing your own 
> (EN- / FR- / dominant language-centric) view onto another 
> individual/community. There is an element of cultural and linguistic hegemony 
> with the usage of such term (including and not limited to making applications 
> with it). 
> Please also consult recent work in this area: 
> https://openreview.net/forum?id=-llS6TiOew. 
> 
> Feel free to get in touch if you should have any questions. 
> 
> Best, 
> Ada
> 
> 
> On Mon, Jun 20, 2022 at 4:53 PM Christopher Collins 
>  wrote:
> Hello,
> 
>  
> 
> I’m looking for any open source or cloud-hosted solution for complex word 
> identification or word difficulty rating in French for a reading application.
> 
>  
> 
> As a backup plan we can use measures like corpus frequency, length, number of 
> senses, but we’re hoping someone has already made a tool available.
> 
>  
> 
> We found this but that’s it: https://github.com/sheffieldnlp/cwi
> 
>  
> 
> Would appreciate any tips!
> 
>  
> 
> Thanks,
> 
>  
> 
> Chris
> 
>  
> 
> Christopher Collins [he/him]
> Associate Professor - Faculty of Science 
> Canada Research Chair in Linguistic Information Visualization
> Ontario Tech University 
> vialab.ca
> 
> ___
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list -- corpora@list.elra.info
> To unsubscribe send an email to corpora-le...@list.elra.info
> ___
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list -- corpora@list.elra.info
> To unsubscribe send an email to corpora-le...@list.elra.info

___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List]unsubscribe me please from this list

2022-06-20 Thread I G

___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List]Re: Complex Word Identification in French

2022-06-20 Thread Ada Wan
Hi Christopher,

It is of the best interest of the community to discontinue the usage of
"word". The term is not only very shaky in its foundation (if any), but it
can also effect disparity in performance in computational processing and
robustness when human evaluation is involved.
Despite the term has been casually adopted by many in the past, like many
un-PC terms that may have an inappropriate undertone, it needs to be
discouraged and abandoned.
Last but not least, I noticed that you are located in Canada, in the event
that you were to work with any indigenous communities, one MUST be advised
to be careful with the usage of such term --- you could be imposing your
own (EN- / FR- / dominant language-centric) view onto another
individual/community. There is an element of cultural and
linguistic hegemony with the usage of such term (including and not limited
to making applications with it).
Please also consult recent work in this area:
https://openreview.net/forum?id=-llS6TiOew.

Feel free to get in touch if you should have any questions.

Best,
Ada


On Mon, Jun 20, 2022 at 4:53 PM Christopher Collins <
christopher.coll...@ontariotechu.ca> wrote:

> Hello,
>
>
>
> I’m looking for any open source or cloud-hosted solution for complex word
> identification or word difficulty rating in French for a reading
> application.
>
>
>
> As a backup plan we can use measures like corpus frequency, length, number
> of senses, but we’re hoping someone has already made a tool available.
>
>
>
> We found this but that’s it: https://github.com/sheffieldnlp/cwi
>
>
>
> Would appreciate any tips!
>
>
>
> Thanks,
>
>
>
> Chris
>
>
>
> *Christopher Collins *[he/him
> 
> ]
> Associate Professor - Faculty of Science
> Canada Research Chair in Linguistic Information Visualization
> Ontario Tech University
> vialab.ca
> ___
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list -- corpora@list.elra.info
> To unsubscribe send an email to corpora-le...@list.elra.info
>
___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List]Complex Word Identification in French

2022-06-20 Thread Christopher Collins
Hello,

I'm looking for any open source or cloud-hosted solution for complex word 
identification or word difficulty rating in French for a reading application.

As a backup plan we can use measures like corpus frequency, length, number of 
senses, but we're hoping someone has already made a tool available.

We found this but that's it: https://github.com/sheffieldnlp/cwi

Would appreciate any tips!

Thanks,

Chris

Christopher Collins 
[he/him]
Associate Professor - Faculty of Science
Canada Research Chair in Linguistic Information Visualization
Ontario Tech University
vialab.ca
___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List]Post doc position: Semantics and Pragmatics of Emojis in Digital Communication

2022-06-20 Thread Tatjana Scheffler via Corpora
The project EmDiCom "Semantics and Pragmatics of Emojis in Digital 
Communication" will develop a formal semantics for emojis as a prime example of 
visual communication within the newly established DFG priority program ViCom 
(“Visual Communication. Theoretical, Empirical, and Applied Perspectives”). The 
project is carried out in cooperation between Jun.-Prof. Dr. Tatjana Scheffler 
(Ruhr-Universität Bochum, Germany) and Prof. Dr. Patrick Grosz (University of 
Oslo, Norway). At the Chair of Digital Forensic Linguistics (Scheffler), 
Department for German Language and Literature of the Faculty of Philology of 
the Ruhr-Universität Bochum, EmDiCom is looking for a:

Post-doc (m/f/d) with 39,83 hours per week for a period of 3 years (TV-L E13)

to start on 1st September 2022 or as soon as possible thereafter.

Your tasks:
- Independent research on the semantics and pragmatics of emojis in the 
research project "Semantics and Pragmatics of Emojis in Digital Communication" 
(EmDiCom).
- Participation in the interdisciplinary DFG priority program "Visual 
Communication" (e.g. participation in network meetings)
- Planning and conducting linguistic online experiments (acceptability studies, 
reading time measurements)
- Collaboration on publications and presentations at international conferences
- Supervision of research assistants
- Organization of scientific events (workshops)

Your profile:
- An above-average linguistics PhD is required
- Focus on formal semantics/pragmatics and experience in experimental 
linguistics
- Willingness to travel to the project partner in Oslo for research visits is 
expected (up to 6 months in total)
- Knowledge of tools for creating and conducting online experiments is an 
advantage
- Prior experience working with digital corpora is desirable
- Very good English skills are required, knowledge of German or Norwegian is an 
advantage

We offer:
- Challenging and varied tasks with a high degree of personal responsibility
- Exciting research on a current topic
- International cooperation with the "Super Linguistics" group at the 
University of Oslo and within the "Visual Communication" priority program
- A friendly and enthusiastic team at the interface of formal, digital, and 
computational linguistics
- Employment at one of the largest universities in Germany, part of the 
University Alliance Ruhr
- Flexibility for working from home
- Extensive opportunities for further education and training

Further information:
Since this position is part of a third-party funded research project, there is 
no teaching obligation.
Official job announcement: 
https://jobs.ruhr-uni-bochum.de/jobposting/f1ec75a1aff56228230b431eedf4678c9182aff40

Contact persons for further information:
Jun.-Prof. Dr. Tatjana Scheffler (tatjana.scheff...@rub.de) and Prof. Dr. 
Patrick Grosz (p.g.gr...@iln.uio.no)

Interviews are expected to take place via Zoom on July 28 and 29. Travel 
expenses, accommodation costs and loss of earnings or other application costs 
for interviews cannot be reimbursed according to the guidelines of the state of 
NRW.

The deadline of application is July 5, 2022. Applicants should submit a short 
cover letter including their motivation for the position, a full CV, two sample 
publications and the names of two potential referees, as a single pdf document. 
We look forward to receiving your application, quoting “EmDiCom”, by July 5, 
2022 by e-mail to the following address: malvina.wit...@rub.de 


---
Jun.-Prof. Dr. Tatjana Scheffler (she/her)
GB 5/157
Ruhr-Universität Bochum
Fakultät für Philologie, Germanistik
Universitätsstraße 150 
44780 Bochum
Germany

Mail: tatjana.scheff...@rub.de
Web: http://staff.germanistik.rub.de/digitale-forensische-linguistik/
Tel.: +49 234 32-21471


___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List]First Call for Papers Global WordNet Conference 2023

2022-06-20 Thread Itziar Gonzalez Dios
 [Apologies for cross posting]

1st Call for Papers

12th International Global Wordnet Conference

Donostia / San Sebastian, Basque Country

23-27, 2023

Global Wordnet Association: www.globalwordnet.org
Conference website: https://hitz.eus/gwc2023

The Global Wordnet Association is pleased to announce the 12th
International Global Wordnet Conference (GWC2023) in Donostia / San
Sebastian (Spain) hosted by HiTZ, Basque Center for Language Technology at
the University of the Basque Country.

NOTE: COVID-19 allowing, the conference will be in person only.

Organisers: Begoña Altuna, Itziar Aldabe, Xabier Arregi, Itziar
Gonzalez-Dios, Aritz Farwell and  Esther Miranda.

Details about the Association and the full announcement for the conference
can be found on the conference website: https://hitz.eus/gwc2023

We invite submissions with original contributions addressing, but not
limited to, the topics listed below. Proposals for tutorials are welcome as
well.

* Conference Topics*

Lexical semantics and meaning representation
* Critical analysis and applications of lexical and semantic relations
* Proposed new relations
* Definitions, semantic components, co-occurrence and frequency statistics
* Word, Sense and Context Embeddings
* Necessity and completeness issues
* Ontology and wordnet
* Other lexicographical and lexicological questions pertaining to
wordnet-style meaning representation
* Wordnets and other modalities

Architecture of lexical databases
* Language independent and language dependent components
* Integration of multi-wordnets in research infrastructures (like CLARIN,
ELG, etc.)
* Wordnets and Linked Open Data (LOD)

Tools and methods for wordnet development
* User and Data entry interfaces
* Methods for constructing, extending and enriching wordnets
* Methods for linking wordnets to other lexical and semantic resources
* Methods for leveraging existing wordnets and semantic networks with large
language models

Applications of wordnet
* Word sense disambiguation
* Text generation
* Commonsense reasoning
* Machine translation
* Information extraction and retrieval
* Document structuring and categorisation
* Automatic hyperlinking
* Language pedagogy
* Psycholinguistic applications
* Embeddings and pretrained language models
* Probing large neural language models

Standardization, distribution and availability of wordnets and wordnet
tools.

* Submissions*

Submissions will fall into one of the following categories (page limits
exclude references):
* long papers: 8 pages max, 30 minutes presentation
* short papers: 5 pages max; 15 minutes presentation
* project reports: 5 pages max., 10 minutes presentation
* demonstrations : 5 pages max, with an additional 3 pages screen dumps or
images; 20 minutes presentation

Submissions should be anonymous and any identifying information must be
removed. Authors must state the preferred category, though acceptance may
be subject to change in the category of the presentation, e.g. a long paper
submission may be accepted as a short paper.

Final papers should be submitted in electronic form (PDF only).

Paper submission site will be online soon.

Paper submissions must use the official ACL style templates, which are
available from here (Latex and Word). Please follow the paper formatting
guidelines general to “*ACL” conferences available here. Authors may not
modify these style files or use templates designed for other conferences.

* Important Dates*

September 30, 2022 Deadline for paper submission
November 18, 2022 Notification of acceptance
December 1, 2022 Registration opens
December 23, 2022 Deadline author registration, final version paper
January 23-27, 2023 Conference

* Proceedings*
Conference proceedings will be open access and downloadable from the GWA
website. The proceedings will have an ISBN and be published in the ACL
anthology.
Papers are only included in the proceedings if at least one author has
registered.
Inclusion of accepted submissions into the final program and the
proceedings is contingent upon at least one author’s registration. Late
registration and on-site registration for participants is possible without
inclusion of the paper and without presentation.

* Conference Chairs*
German Rigau - german.ri...@ehu.eus
Francis Bond - b...@ieee.org

* Local Organizing Chairs*
Begoña Altuna - begona.alt...@ehu.eus
Itziar Aldabe - itziar.ald...@ehu.eus
Xabier Arregi - xabier.arr...@ehu.eus
Itziar Gonzalez-Dios - itziar.gonzal...@ehu.eus
Aritz Farwell - asfarw...@ehu.eus
Esther Miranda - esther.mira...@ehu.eus

* Program Committee (to be confirmed and extended)*
Adam Pease, Articulate Software
Ales Horak, Masaryk University
Alexandre Rademaker, IBM Research Brazil and EMAp/FGV
Bolette Pedersen, University of Copenhagen
Christiane Fellbaum, Princeton University
Darja Fiser, University of Ljubljana
David Lindemann, IWiSt, University of Hildesheim
Diptesh Kanojia, IIT Bombay
Eneko Agirre, University of the Basque Country
Ewa Rudnicka, Wrocław 

[Corpora-List]Two postdoc positions in Transfer Learning for Demographic Factors from September 2022

2022-06-20 Thread mail
2-year Postdoc position in Natural Language Processing on Incorporating 
Demographic Factors into Natural Language Processing Models
Funded by ERC Starting grant INTEGRATOR 

Start: from September 2022
Dirk Hovy, Bocconi University and MilanLP group

Posting: https://bit.ly/3tk5UR6 
Application Form: https://bit.ly/3Q5j7qv 

Project:
The goal of the INTEGRATOR project is to develop novel data sets, theories, and 
algorithms to incorporate demographic factors into language technology. This 
will improve performance of existing tools for all users, reduce demographic 
bias, and enable completely new applications. 
Language reflects demographic factors like our age, gender, etc. People 
actively use this information to make inferences, but current language 
technology (NLP) fails to account for demographics, both in language 
understanding (e.g., sentiment analysis) and generation (e.g., chatbots). This 
failure prevents us from reaching human-like performance, limits possible 
future applications, and introduces systematic bias against underrepresented 
demographic groups.
Solving demographic bias is one of the greatest challenges for current language 
technology. Failing to do so will limit the field and harm public trust in it. 
Bias in AI systems recently emerged as a severe problem for privacy, fairness, 
and ethics of AI. It is especially prevalent in language technology, due to 
language's rich demographic information. Since NLP is ubiquitous (translation, 
search, personal assistants, etc.), demographically biased models creates 
uneven access to vital technology.
Despite increased interest in demographics in NLP, there are no concerted 
efforts to integrate it: no theory, data sets, or algorithmic solutions. 
INTEGRATOR will address these by identifying which demographic factors affect 
NLP systems, devising a bias taxonomy and metrics, and creating new data. These 
will enable us to use transfer and reinforcement learning methods to build 
demographically aware input representations and systems that incorporate 
demographics to improve performance and reduce bias.
Demographically aware NLP will lead to high-performing, fair systems for text 
analysis and generation.
This ground-breaking research advances our understanding of NLP, algorithmic 
fairness, and bias in AI, and creates new research resources and avenues.

Successful candidates will work actively on novel directions in NLP, machine 
learning, and neural networks for representation learning, and transfer 
learning in various languages, and collaborate closely with Prof. Hovy as well 
as the lab. The candidates will innovate in both NLP and social sciences. 

Successful candidates will have to prove 
* excellent programming skills in Python (additional languages like C++, R, 
Julia are a plus),
* knowledge of current neural network models for transfer and few-shot learning 
and
* implementation tools for neural networks (e.g. PyTorch, Tensorflow, etc.)
* prove strong track record in top-tier venues in the field of NLP/ Machine 
Learning. 
* fluency in spoken and written English. Knowledge of Italian is NOT a 
requirement.


INFORMATION

* Application deadline: July 7 2022

* Skype interviews will take place during July 2022

* Starting date: from September 2022, or any time thereafter

* Duration: 2 years, 1 year extension possible

* Salary: 42k EUR gross per annum (median salary in Milan is 37k EUR). 
Applicants from outside Italy may qualify for a researcher taxation scheme with 
reduced tax load.


HOW TO APPLY

The official application must be sent via https://bit.ly/3Q5j7qv 


Informal enquiries can be sent by email to Dirk Hovy (dirk.h...@unibocconi.it 
).

You can find more information about the call here: https://bit.ly/3tk5UR6 


___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List] Fully-funded PhD position “Computer Science: Computational Linguistics & Corpus Annotation” (m/f/d), TIB – Leibniz Information Centre for Science and Technology, Germany

2022-06-20 Thread Jennifer D'Souza
**Deadline fast-approaching**

Dear colleagues and friends,

The Data Science and Digital Libraries

at the TIB – Leibniz Information Centre for Science and Technology and
University Library  invites applications for a
Research Associate/PhD Candidate with the following specializations:
Natural Language Processing; Computational Linguistics; Corpus Annotation.

*Description: *

The PhD topics will be in the context of the Open Research Knowledge Graph (
https://www.orkg.org) and the project “SCINEXT - Neural-Symbolic Scholarly
Innovation Extraction”, funded by the Federal Ministry of Education and
Research (BMBF). The aim of these projects is to research and develop
techniques for crowdsourcing, representing and managing semantically
structured, rich representations of scholarly contributions and research
data in knowledge graphs and thus develop a novel model for scholarly
communication. In the context of the PhD thesis you will be responsible for
conducting independent and original scientific research involving corpus
development and annotation in organizing research contributions in the ORKG
in a structured, semantic way, so other researchers can get a quick
overview on the state-of-the-art in the field. You will participate in
local, national and international collaboration activities. Given the
multidisciplinary nature of the programme, we encourage applicants with a
strong curiosity and interest in Science to apply.

The tasks will focus on

   - Collaborating with researchers from different disciplines to gain
   familiarity with research problems and their contribution descriptions
   expressed in scholarly literature.
   - Conceptually designing, modelling and implementing ontology-based
   knowledge representations for crowdsourcing of the Open Research Knowledge
   Graph.
   - Annotation and curation of multidisciplinary scholarly contribution
   descriptions.

*Application Deadline:** June 28th 2022*

*Please apply online here:*
https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details/stellenausschreibung-nr-32-2022
*Contact Information: *Dr. Jennifer D'Souza
*Email:* jennifer.dso...@tib.eu

Best regards,
Jennifer
___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info


[Corpora-List]Call for Participation: Shared Task on Indian Language Summarization (ILSUM 2022)

2022-06-20 Thread Parth Mehta
Apologies for the multiple postings.


*Indian Language Summarization (ILSUM 2022)*
Website: https://ilsum.github.io/

To be organized in conjunction with FIRE 2022 (fire.irsi.res.in)
9th-13th December 2022 (Hybrid Event, hosted in Kolkata)

Registration Deadline*: 22nd July 2022*
---

The first shared task on Indian Language Summarization (ILSUM) aims at
creating an evaluation benchmark dataset for Indian Languages. While
large-scale datasets exist for a number of languages like English, Chinese,
French, German, Spanish, etc. no such datasets exist for any Indian
languages. Through this shared task, we aim to bridge the existing gap.

In the first edition, we cover two major Indian languages Hindi and
Gujarati alongside Indian English, a widely recognized dialect of the
English Language. It is a classic summarization task, where we will provide
~10,000 article-summary pairs for each language and the participants are
expected to generate a fixed-length summary.


*Timeline*
-
8th June - Task announced and Registrations open
22nd June - Training Data Release
1st August - Test Data Release
8th August - Run Submission Deadline
15th August - Results Declared
15th September - Working notes due
9th-13th December - FIRE 2022 (Hybrid Event hosted at Kolkata)

*Organisers*

Bhavan Modha, University of Texas at Dallas, USA
Shrey Satapara, Indian Institute of Technology, Hyderabad, India
Sandip Modha, LDRP-ITR, Gandhinagar, India
Parth Mehta, Parmonic, USA

*For regular updates subscribe to our mailing list: **il...@googlegroups.com
<+il...@googlegroups.com>*

Regards,
Parth Mehta
Co-organiser, ILSUM 2022
___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list -- corpora@list.elra.info
To unsubscribe send an email to corpora-le...@list.elra.info