On 12/2/14, Ankita Shukla <ankitashukla...@gmail.com> wrote:
> Hello!
>
> I am an OPW Intern for round#09 and will be working on a spelling
> dictionary project, the proposal of which is available here
> <https://www.mediawiki.org/wiki/User:Ankitashukla/Proposal>.
> Also, we'd be using this
> <https://github.com/ankitashukla/spelling-dictionary-opw> github repo for
> version controlling.
>
> Before we start off with the coding part, my mentors Kartik and Amir, and I
> thought it would be a great idea to have suggestions from everyone that
> might turnout to be very useful for us during the development of the
> project.
> We welcome all ideas of what your expectations are from the project, any
> specific design advice, any particular implementation or any advice, big or
> small, that might be useful to us.
>
>
> Thanks and regards,
> Ankita Shukla
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

The most immediate thing that comes to mind is why create a new
interface where users can "add" words, instead of just scrapping
wiktionary? (I take it from your proposal you plan to create a new
project where users can submit words for consideration for inclusion
into the dictionary).

Additionally as for experts rejecting or accepting words:
*Is that actually needed?
*Do experts actually exist who would be willing to do that sort of
thing? (This varries depending on your definition of "expert". For
example, if you mean people with PhD's in said language who will
verify the word is proper, the answer would be no. If you mean people
who are XX-3 or XX-N in the language then maybe, but I'm not really
sure how much of a benefit the review would provide relative to the
costs)

I recognize scrapping is difficult for a whole host of reasons (Mostly
the fact its semi-unstructured turns it into an NLP project, and that
standards aren't consistent cross languages - However, in this case it
seems like the information needed would not be that hard [famous last
words] to extract simply by looking at categories). It seems like
making users add data to a new project is duplicating effort going on
in wiktionary.

Even if this project can't use wiktionary for some reason, it seems
slightly overlapping with either wikidata or omegawiki, and could
perhaps re-use some work for those projects in terms of storing data.

Last of all, In your proposal you give some potential db schemas. I
imagine the schema should have a language column for what language the
word is for (Not to mention things get more complicated with related
languages e.g. EN vs EN-US vs EN-CA vs EN-GB)). Also words can have
multiple meanings, perhaps you might want to split up meaning from the
word. Its not really needed if the meaning is "immutable", but if
meanings can be modified, you may want some way to be able to identify
which individual meaning was edited (And then there's issues with
history, etc, which again leads back to see if you can have an
existing project that has already solved those issues for where the
data comes from, instead of making a new one)

--bawolff

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to