Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-27 Thread Priyank Modi
Hi all, I've completed the preliminary draft of my proposal and would really appreciate your comments/suggestions on the same : http://wiki.apertium.org/wiki/Pmodi/GSOC_2020_proposal:_Hindi-Punjabi Francis(firstly sorry for cc'ing you personally), since you have been managing the repo, could you

Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-21 Thread Hèctor Alòs i Font
Hi Prinyak, Yes, I now see that the Hindi गलत__adj paradigm is like this, and the Punjabi ਗਲਤ__adj seems to be a copy of it. I can only say that we do differently in the Romance languages I work with. I can say that the "Hindi method" is bad. It works for Hindi-Urdu, doesn't it? This makes

Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-21 Thread Priyank Modi
> > By the way, it seems strange that you have 9 analyses for this adjective. > Usually in these cases we put only the first analysis in the dictionary. > The other, in really needed, can be added as . Regarding this, I found a number of such anomalies in the Hindi monodix, and tried to resolve

Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-21 Thread Priyank Modi
Hi Hector, Thank you so much for taking time to look at my challenge in detail and providing the feedback. I already understand this error and will work on removing all '#' symbols in the final submission of my coding challenge. To start with, the number of '#'s were atleast 3-4 times of what I

Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-20 Thread Hèctor Alòs i Font
Hi Prinyak, I've been looking at you coding challenge. I can't understand anything, but I see the symbol # relatively often. That is annoying. See: http://wiki.apertium.org/wiki/Apertium_stream_format#Special This happens, for instance, when in the bidix the target word has a given gender and/or

Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-19 Thread Hèctor Alòs i Font
Hi Priyank, I'll try to look at your coding challenge later, although I'm not sure I'll be able to read anything :) With regard to the use of mass matching techniques of words in the two languages, I would strongly advise against it in the first phase of the project and would use it very

Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-18 Thread Priyank Modi
Hi Hector, Francis; I've made progress on the coding challenge and wanted your* feedback *on it - https://github.com/priyankmodiPM/apertium-hin-pan_pmodi *(The bin files remained after a `make clean`, so I didn't remove them from the repo, let me know if this is incorrect)* > I've attempted to

Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-11 Thread Hèctor Alòs i Font
Hi Priyank, I calculated the coverage on the Wikipedia dumps I got, and which I used for getting the frequency lists. I think this is fair, since these corpora are enormous. But I calculated WER on the basis of other texts. I calculated it only a few times, at fixed project benchmarks, since I

Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-11 Thread Priyank Modi
Hi Hector, Thank you so much for the reply. The proposals were really helpful. I've completed the coding challenge for a small set of 10 sentences(for now) which I believe Francis has added to the repo as a test set. I'll included the same in the proposal. For now, I'm working on building the

Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-06 Thread Hèctor Alòs i Font
Hi Priyank, Hindi-Punjabi seems to me a very nice pair for Apertium. It is usual that closely related pairs give not very satisfactory results with Google, because most of the time there is as an intermediate translation into English. In any case, if you can give some data about the quality of

[Apertium-stuff] Guidance for hin-pan language pair development

2020-03-06 Thread Priyank Modi
Hi, I am trying to work towards developing the Hindi-Punjabi pair and needed some guidance on how to go about it. I ran the test files and could notice that the dictionary file for Punjabi needs work(even a lot of function words could not be found by the translator). Should I start with that? Are