Asia Pacific Journal of Corpus Research Vol. 2, No. 2, pp. 31-41 |
Abbreviation: APJCR |
e-ISSN: 2733-8096 |
Publication date: 31 December 2021 |
Received: 30 September 2021 / Received in Revised Form: 19 November 2021 / Accepted: 8 December 2021 |
DOI: https://doi.org/10.22925/apjcr.2021.2.2.31 |
A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus |
Yiguo Yuan (Nanjing Normal University), Bin Li (Nanjing Normal University) |
Copyright 2021 APJCR
This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted, distribution, and reproduction in any medium, provided the original work is properly cited. |
Abstract |
This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus. |
Keywords |
Ancient Chinese, Lexical Evolution, Quantitative Study, Corpus-based Analysis, Computational Linguistics |
References |
Baker, M. (2000). Towards a methodology for investigating the style of a literary translator. International Journal of Translation Studies, 12(2), 241-266.
Cheng, N., Li, B., Ge, S., Hao, X. & Feng, M. (2020). A joint model of automatic sentence segmentation and lexical analysis for ancient Chinese based on BiLSTM-CRF model (in Chinese). Journal of Chinese Information Processing, 34(4), 1-9. Dong, X. (2002). Research of lexicalization of syntactic structure (in Chinese). Studies in Language and Linguistics, 3, 56-65. Guo, J., & Yang, E. (2015). A study on the lexicalization of combined idioms in mencius. In Lu, Q., & Gao, H. (Eds.), Workshop on Chinese Lexical Semantics, (pp. 307-319). Cham: Springer. Jiang, S. (1989). Review and prospect of the study of Chinese language history (in Chinese). Language Teaching and Linguistic Studies, 2, 124-129. Jin, H., & Dong, Y. (2019). Investigation on the lexicalization process and causes of “Guzhi”. In Hong, J. F., Zhang, Y., & Liu, P. (Eds.), Workshop on Chinese Lexical Semantics, (pp. 275-283). Cham: Springer. Li, S. (2007) The development of mid-ancient Chinese word formation from dissyllabic word data (in Chinese). Journal of Ningxia University (Social Science Edition), 3, 1-7. Wang, L. (1980). The Manuscript of Chinese History (in Chinese). Beijing: Zhonghua Book Company. |
The Authors |
Yiguo Yuan is an MA student at School of Chinese Language and Literature at Nanjing Normal University, China with emphasis in computational linguistics.
Bin Li is an Associate Professor at School of Chinese Language and Literature at Nanjing Normal University, China with emphasis in computational linguistics. |
The Authors’ Addresses |
First and Corresponding Author Yiguo Yuan MA student School of Chinese Language and Literature Nanjing Normal University 122 Ninghai Rd, Gulou District, Nanjing, CHINA E-mail: lexcliff1023@gmail.com Co-author |