On 12/2/14, Ankita Shukla <ankitashukla...@gmail.com> wrote: > Hello! > > I am an OPW Intern for round#09 and will be working on a spelling > dictionary project, the proposal of which is available here > <https://www.mediawiki.org/wiki/User:Ankitashukla/Proposal>. > Also, we'd be using this > <https://github.com/ankitashukla/spelling-dictionary-opw> github repo for > version controlling. > > Before we start off with the coding part, my mentors Kartik and Amir, and I > thought it would be a great idea to have suggestions from everyone that > might turnout to be very useful for us during the development of the > project. > We welcome all ideas of what your expectations are from the project, any > specific design advice, any particular implementation or any advice, big or > small, that might be useful to us. > > > Thanks and regards, > Ankita Shukla > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
The most immediate thing that comes to mind is why create a new interface where users can "add" words, instead of just scrapping wiktionary? (I take it from your proposal you plan to create a new project where users can submit words for consideration for inclusion into the dictionary). Additionally as for experts rejecting or accepting words: *Is that actually needed? *Do experts actually exist who would be willing to do that sort of thing? (This varries depending on your definition of "expert". For example, if you mean people with PhD's in said language who will verify the word is proper, the answer would be no. If you mean people who are XX-3 or XX-N in the language then maybe, but I'm not really sure how much of a benefit the review would provide relative to the costs) I recognize scrapping is difficult for a whole host of reasons (Mostly the fact its semi-unstructured turns it into an NLP project, and that standards aren't consistent cross languages - However, in this case it seems like the information needed would not be that hard [famous last words] to extract simply by looking at categories). It seems like making users add data to a new project is duplicating effort going on in wiktionary. Even if this project can't use wiktionary for some reason, it seems slightly overlapping with either wikidata or omegawiki, and could perhaps re-use some work for those projects in terms of storing data. Last of all, In your proposal you give some potential db schemas. I imagine the schema should have a language column for what language the word is for (Not to mention things get more complicated with related languages e.g. EN vs EN-US vs EN-CA vs EN-GB)). Also words can have multiple meanings, perhaps you might want to split up meaning from the word. Its not really needed if the meaning is "immutable", but if meanings can be modified, you may want some way to be able to identify which individual meaning was edited (And then there's issues with history, etc, which again leads back to see if you can have an existing project that has already solved those issues for where the data comes from, instead of making a new one) --bawolff _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l