Journal of English Literature and Cultural Studies

Journal of English Literature and Cultural Studies

Dependency Parsing in Persian: A Rule-based Hybrid Model

Document Type : Original Article

Authors
1 MSc from the Department of Computational Linguistics, Islamic World Science and Technology Monitoring and Citation Institute, Shiraz, Iran
2 Faculty Member of the Department of Computational Linguistics, Islamic World Science and Technology Monitoring and Citation Institute (ISC), Shiraz, Iran
3 Faculty Member of the Department of System Operations, Islamic World Science and Technology Monitoring and Citation Institute, Shiraz, Iran
Abstract
Dependency Parsing in Persian is mostly data-driven. In this paper, we have presented a hybrid model of grammar-based parsing method consisting of Hyperedge Replacement and Constraint Dependency Grammars. The model parses Persian POS tagged sentences based on their main verb valency structure in a rule-based manner. The sample data of the study, selected through purposive sampling, included 81 Persian sentences covering all types of Persian basic sentence structures. The results of the comparison of each sentence in Gold Analysis and the corresponding output of the model with both UAS and LAS have been presented in Table 10. These scores pertain to the entire research corpus and have been calculated based on the average score of each sentence result. HRG is fully capable of representing all base sentence structures as hypergraphs and in a sentence, could generate all possibilities for any desired sequence of POS tags. In our hybrid model, HRG collaborated with CDG on dependency parsing, and with the assistance of the heuristic function, the model could achieve an accuracy of 100% among the research corpus. Further, the represented model could undertake dependency parsing with no train dataset. Also, we presented a machine-readable verb valency dictionary, based on our proposed HRG model and corpus data. Although the model’s time complexity is quasi-polynomial time, the accuracy score in our sample data might be helpful.
Keywords