APJCR_2021_2_2_31

Asia Pacific Journal of Corpus Research Vol. 2, No. 2, pp. 31-41
Abbreviation: APJCR
e-ISSN: 2733-8096
Publication date: 31 December 2021
Received: 30 September 2021 / Received in Revised Form: 19 November 2021 / Accepted: 8 December 2021
DOI: https://doi.org/10.22925/apjcr.2021.2.2.31

A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus

Yiguo Yuan (Nanjing Normal University), Bin Li (Nanjing Normal University)
Copyright 2021 APJCR

This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus.

Keywords

Ancient Chinese, Lexical Evolution, Quantitative Study, Corpus-based Analysis, Computational Linguistics

References

Baker, M. (2000). Towards a methodology for investigating the style of a literary translator. International Journal of Translation Studies, 12(2), 241-266.

Cheng, N., Li, B., Ge, S., Hao, X. & Feng, M. (2020). A joint model of automatic sentence segmentation and lexical analysis for ancient Chinese based on BiLSTM-CRF model (in Chinese). Journal of Chinese Information Processing, 34(4), 1-9.

Dong, X. (2002). Research of lexicalization of syntactic structure (in Chinese). Studies in Language and Linguistics, 3, 56-65.

Guo, J., & Yang, E. (2015). A study on the lexicalization of combined idioms in mencius. In Lu, Q., & Gao, H. (Eds.), Workshop on Chinese Lexical Semantics, (pp. 307-319). Cham: Springer.

Jiang, S. (1989). Review and prospect of the study of Chinese language history (in Chinese). Language Teaching and Linguistic Studies, 2, 124-129.

Jin, H., & Dong, Y. (2019). Investigation on the lexicalization process and causes of “Guzhi”. In Hong, J. F., Zhang, Y., & Liu, P. (Eds.), Workshop on Chinese Lexical Semantics, (pp. 275-283). Cham: Springer.

Li, S. (2007) The development of mid-ancient Chinese word formation from dissyllabic word data (in Chinese). Journal of Ningxia University (Social Science Edition), 3, 1-7.

Wang, L. (1980). The Manuscript of Chinese History (in Chinese). Beijing: Zhonghua Book Company.

The Authors

Yiguo Yuan is an MA student at School of Chinese Language and Literature at Nanjing Normal University, China with emphasis in computational linguistics.

Bin Li is an Associate Professor at School of Chinese Language and Literature at Nanjing Normal University, China with emphasis in computational linguistics.

The Authors’ Addresses

First and Corresponding Author
Yiguo Yuan
MA student
School of Chinese Language and Literature
Nanjing Normal University
122 Ninghai Rd, Gulou District, Nanjing, CHINA
E-mail: lexcliff1023@gmail.com

Co-author
Bin Li
Associate Professor
School of Chinese Language and Literature
Nanjing Normal University
122 Ninghai Rd, Gulou District, Nanjing, CHINA
E-mail: libin.njnu@gmail.com

DOI QR Code

☞ How to submit your manuscript to APJCR.