COMPARING COMPUTATIONAL LINGUISTICS APPROACHES ACROSS LANGUAGES

Authors

  • Sanjar Norqobilov

DOI:

https://doi.org/10.47390/SPR1342V3I12.2Y2023N28

Keywords:

Machine translation, computational linguistics, natural language processing, cross-linguistic analysis.

Abstract

This paper provides a comparative study on how core computational linguistics techniques function across typologically diverse languages. With a focus on machine translation (MT), it analyzes the complexities that linguistic variability poses for computational approaches. MT development requires language-specific adaptations rather than a one-size-fits-all model. Through a literature review and cross-linguistic case studies, challenges including word order differences, morphological complexity, lexical ambiguity and inadequate resources are explored across analytic, synthetic, tonal and morphologically-rich languages. Results reveal sites of MT difficulty for languages like Arabic, Chinese, Hindi and Swahili. Discussion centers on how techniques like rule-based, statistical and neural MT are impacted by unique linguistic features, requiring adjustments like morphological analyzers and tailored training data. This indicates the importance of inclusive computational linguistics that moves beyond reliance on English data. The study concludes that flexibility and language-specific customization is needed for algorithms to model the structures of the world’s roughly 7,000 languages effectively.

References

Attia, M., Pecina, P., Toral, A., Tounsi, L. & van Genabith, J. (2012). An open-source finite state morphological transducer for modern standard Arabic. In Proceedings of COLING 2012: Posters (pp. 125-134).

Bentivogli, L., Bisazza, A., Cettolo, M. & Federico, M. (2016). Neural versus phrase-based machine translation quality: a case study. arXiv preprint arXiv:1608.04631.

Boudelaa, S. & Marslen-Wilson, W. (2010). Aralex: A lexical database for Modern Standard Arabic. Behavior Research Methods, 42(2), 481-487.

Doron, E., Arielli, A., Choshen, L. & Dankin, L. (2021). Universal phonemic transcriptional system for endangered language documentation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 2235-2244).

Faruqui, M. & Pado, S. (2012). Towards a model of formal and informal address in Hindi. In Proceedings of the Eighth Workshop on Asian Language Resources (pp. 95-104).

Fransen, A., Bartels, C., Bilionis, I., Heij, V., Landsbergen, S., Embregts, P., ... & Nijholt, A. (2019). Low-resource phoneme recognition u sing transfer learning and a teacher-student curriculum. Proc. Interspeech 2019, 1133-1137.

Güngör, O. & Güngör, T. (2008, June). Disambiguation of Turkish homophones. In International Conference on Computational Linguistics and Intelligent Text Processing (pp. 229-239). Springer, Berlin, Heidelberg.

Habash, N. Y. & Sadat, F. (2006). Arabic preprocessing schemes for statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL (pp. 49-52).

Hadash, A., Kermany, E., Wang, C., Petrov, S., & Hajishirzi, H. (2021). Translate without seeing: A script-agnostic approach for translation. arXiv preprint arXiv:2104.08143.

Hayward, K. & Corbett, G. G. (1988). Resolution rules in Qafar. Linguistics, 26(2), 259-284.

Feist, T. & Dwyer, A. (2018). Modeling morphosyntax for endangered language revival. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 836-845).

Hu, M., Peng, Y., Wei, F. & Zhou, M. (2019). Explicit modeling of syntax-aware word meanings for machine translation. arXiv preprint arXiv:1904.00788.

Lakew, S. M., Lotriet, C., Mattiuz, M., & Horváth, T. (2021). Transfer learning for low-resourced languages: A survey. Speech Communication, 135, 88-102.

Published

2024-01-06

How to Cite

Norqobilov, S. (2024). COMPARING COMPUTATIONAL LINGUISTICS APPROACHES ACROSS LANGUAGES. Ижтимоий-гуманитар фанларнинг долзарб муаммолари / Актуальные проблемы социально-гуманитарных наук / Actual Problems of Humanities and Social Sciences., 3(12/2). https://doi.org/10.47390/SPR1342V3I12.2Y2023N28