APJCR_2025_6_2_1

Asia Pacific Journal of Corpus Research Vol. 6, No. 2, pp. 1-14
Abbreviation: APJCR
e-ISSN: 2733-8096
Publication date: 31 December 2025
Received: 18 September 2025 / Received in Revised Form: 9 December 2025 / Accepted: 23 August 2025
DOI: https://doi.org/10.22925/apjcr.2025.6.2.1

VACSR Version 4: Enhancing a CEFR-J-Based Vocabulary Self-Reflection Tool for Multilingual and Foreign Language Education

Yukiko Ohashi (Yamazaki University of Animal Health Technology), JAPAN
Copyright 2025 APJCR This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This study introduces VACSR v.4.0, an updated version of the Vocabulary Analyzer for Self-Reflection. The tool automatically analyzes vocabulary occurrences and CEFR-J levels in transcribed texts, providing information on usage frequency, unused items, and tokens not classified within CEFR-J scales. VACSR v.4.0 incorporates syntactic parsing using the Stanza library, enabling the system to distinguish parts of speech and categorize identical lexical items according to their grammatical functions. The updated version also adds the automatic display of top headwords, a feature not included in previous versions, and extends its processing capability to tokens containing diacritical marks, such as ê and û, found in other languages. The study aims to explore how VACSR v.4.0 functions with languages other than English through a pilot analysis using two French texts containing vocabulary with diacritical marks. Although the system does not yet provide linguistically accurate POS tagging or CEFR-based level assignments for French, it can still extract high-frequency vocabulary and generate meaningful frequency-based lists. These findings show that, even without full multilingual NLP integration, VACSR v.4.0 helps compile core vocabulary lists across different languages and offers preliminary insights for pedagogical decision-making in foreign-language learning contexts.

Keywords

Corpus Linguistics, French Vocabulary, CEFR-J, Vocabulary, Language Education

References

Laufer, B., & Nation, I. S. P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16(3), 307-322.

Milton, J. (2009). Measuring Second Language Vocabulary Acquisition. Bristol, UK: Multilingual Matters.

Nation, I. S. P. (2001). Learning Vocabulary in another Language. Cambridge, UK: Cambridge University Press.

Nation, I. S. P. (2013). Learning Vocabulary in another Language. Cambridge, UK: Cambridge University Press.

Negishi, M., Takada, T., & Tono, Y. (2013). A progress report on the development of the CEFR-J. The Language Teacher, 37(4), 5-12.

Ohashi, Y., & Katagiri, N. (2020). The ratios of CEFR-J vocabulary usage compared with GSL and AWL in elementary EFL classrooms and suggestions of vocabulary items to be taught. Asia Pacific Journal of Corpus Research, 1(1), 61-94.

Ohashi, Y., Katagiri, N., & Honda, F. (2021). Classroom vocabulary analyzer combined with CEFR-J wordlist (CCVA): Tool development to examine vocabulary levels in classroom corpora based on the CEFR-J WORDLIST. The International Journal of Language Learning & Applied Linguistics World, 27(4) 1-12.

Ohashi, Y., & Katagiri, N. (2022). Vocabulary analyzer based on CEFR-J wordlist for self-reflection (VACSR): From classroom corpus compilation to self-reflection. International Journal of Language Learning and Applied Linguistics World (IJLLALW), 31(1), 1-15.

Ohashi, Y., Katagiri, N., & Oshikiri, T. (2023). Vocabulary analyzer based on CEFR-J wordlist for self-reflection (VACSR) version 2. Asia Pacific Journal of Corpus Research, 4(2), 75-87.

Pintard, A., & François, T. (2020). Combining expert knowledge with frequency information to infer CEFR levels for words. In Proceedings of the 1st Workshop on Tools and Resources to Empower People with Reading Difficulties (READI 2020) (pp. 85-92). European Language Resources Association.

Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. (2020). Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 101-108). Association for Computational Linguistics.  

Schmitt, N. (2010). Researching Vocabulary: A Vocabulary Research Manual. New York: Palgrave Macmillan.

Takahashi, T. (2015). Vocabulary instruction and its impact on Japanese learners: Fostering deeper lexical knowledge and communicative competence. Journal of Language Teaching and Research, 6(3), 567-576.

Tono, Y. (Ed.). (2013). The CEFR-J Handbook. Tokyo: Taishukan Shoten.

The Authors

Yukiko Ohashi is a professor at Yamazaki University of Animal Health Technology. She earned her PhD in literature in 2014. Her principal research lies in corpus linguistics. She has published several articles on aspects of language learning, in particular corpus compilation.

The Authors’ Address

First and Corresponding Author
Yukiko Ohashi
Professor
Yamazaki University of Animal Health Technology
4-7-2 Minami-Osawa, Hachioji, Tokyo 192-0364, JAPAN
Email: y_watanabe@yamazaki.ac.jp 

☞ How to submit your manuscript to APJCR.