COMPARING COMPUTATIONAL LINGUISTICS APPROACHES ACROSS LANGUAGES

Sanjar Norqobilov

doi:10.47390/SPR1342V3I12.2Y2023N28

Authors

Sanjar Norqobilov

DOI:

https://doi.org/10.47390/SPR1342V3I12.2Y2023N28

Keywords:

Machine translation, computational linguistics, natural language processing, cross-linguistic analysis.

Abstract

This paper provides a comparative study on how core computational linguistics techniques function across typologically diverse languages. With a focus on machine translation (MT), it analyzes the complexities that linguistic variability poses for computational approaches. MT development requires language-specific adaptations rather than a one-size-fits-all model. Through a literature review and cross-linguistic case studies, challenges including word order differences, morphological complexity, lexical ambiguity and inadequate resources are explored across analytic, synthetic, tonal and morphologically-rich languages. Results reveal sites of MT difficulty for languages like Arabic, Chinese, Hindi and Swahili. Discussion centers on how techniques like rule-based, statistical and neural MT are impacted by unique linguistic features, requiring adjustments like morphological analyzers and tailored training data. This indicates the importance of inclusive computational linguistics that moves beyond reliance on English data. The study concludes that flexibility and language-specific customization is needed for algorithms to model the structures of the world’s roughly 7,000 languages effectively.

References

Attia, M., Pecina, P., Toral, A., Tounsi, L. & van Genabith, J. (2012). An open-source finite state morphological transducer for modern standard Arabic. In Proceedings of COLING 2012: Posters (pp. 125-134).

Bentivogli, L., Bisazza, A., Cettolo, M. & Federico, M. (2016). Neural versus phrase-based machine translation quality: a case study. arXiv preprint arXiv:1608.04631.

Boudelaa, S. & Marslen-Wilson, W. (2010). Aralex: A lexical database for Modern Standard Arabic. Behavior Research Methods, 42(2), 481-487.

Doron, E., Arielli, A., Choshen, L. & Dankin, L. (2021). Universal phonemic transcriptional system for endangered language documentation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 2235-2244).

Faruqui, M. & Pado, S. (2012). Towards a model of formal and informal address in Hindi. In Proceedings of the Eighth Workshop on Asian Language Resources (pp. 95-104).

Fransen, A., Bartels, C., Bilionis, I., Heij, V., Landsbergen, S., Embregts, P., ... & Nijholt, A. (2019). Low-resource phoneme recognition u sing transfer learning and a teacher-student curriculum. Proc. Interspeech 2019, 1133-1137.

Güngör, O. & Güngör, T. (2008, June). Disambiguation of Turkish homophones. In International Conference on Computational Linguistics and Intelligent Text Processing (pp. 229-239). Springer, Berlin, Heidelberg.

Habash, N. Y. & Sadat, F. (2006). Arabic preprocessing schemes for statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL (pp. 49-52).

Hadash, A., Kermany, E., Wang, C., Petrov, S., & Hajishirzi, H. (2021). Translate without seeing: A script-agnostic approach for translation. arXiv preprint arXiv:2104.08143.

Hayward, K. & Corbett, G. G. (1988). Resolution rules in Qafar. Linguistics, 26(2), 259-284.

Feist, T. & Dwyer, A. (2018). Modeling morphosyntax for endangered language revival. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 836-845).

Hu, M., Peng, Y., Wei, F. & Zhou, M. (2019). Explicit modeling of syntax-aware word meanings for machine translation. arXiv preprint arXiv:1904.00788.

Lakew, S. M., Lotriet, C., Mattiuz, M., & Horváth, T. (2021). Transfer learning for low-resourced languages: A survey. Speech Communication, 135, 88-102.

COMPARING COMPUTATIONAL LINGUISTICS APPROACHES ACROSS LANGUAGES

Authors

DOI:

Keywords:

Abstract

References

Downloads

Submitted

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Language

make

SidebarMenu

Browse

Article Template

EditorialTeam

Visitors

Social networks

Information

IndexedBy

Address:

Principal Contact :

Support Contact :