N元组与二语写作水平关系研究The Relationship between N-gram Measures and L2 Writing Proficiency毕业论文
2020-04-18 19:43:33
摘 要
Chapter 1. Introduction 1
1.1 Research background 1
1.2 Need of the study 2
1.3 Research purposes 3
Chapter 2. Literature Review 4
2.1 Definitions of N-grams 4
2.2 Acquisition of N-grams 5
2.3 Aspects of L2 N-gram production 5
2.4 Research gap 6
Chapter 3. Research Methodology 8
3.1 Research questions 8
3.2 Corpus for the study 8
3.3 Indices of N-gram 9
3.3.1 N-gram frequency and proportion indices 9
3.3.2 N-gram range indices 10
3.3.3 N-gram association measures 10
3.4 Statistical analysis 11
Chapter 4. Results 12
4.1 Indices of N-gram measures 12
4.2 Regression analysis 12
Chapter 5. Discussion 14
5.1 Indices of N-gram measures predicting writing proficiency 14
5.2 Features of the indices of N-gram measures in the regression model 16
Chapter 6. Conclusion 19
6.1 Major findings 19
6.2 Implications 20
6.3 Limitations of this study and suggestions for future research 21
References 22
Acknowledgments
I really want to thank those who have helped and supported me when I was writing this thesis from my heart.
First, I would love to give my heartfelt gratitude to my supervisor, Prof. Hu Yuanjiang, who has offered me constant guidance whether from the beginning of the topic selection or in the process of data collection. His professional knowledge, rigorous academic attitude, excelsior work style, strict self-discipline and easy-going personality charm has far-reaching impact on me. Therefore, Prof. Hu gave me precious instructions in my studies as well as positive influence on my thought and life.
Next, my gratitude should be given to the professors in the School of Foreign Languages and Literature in Nanjing Tech University. They have shared their academic knowledge, broaden horizon and profound thinking with us, which creates a great academic atmosphere for me, so that my paper could be more rigorous.
Then, I feel like giving my appreciation to my friends accompanying me when I was confronted with barriers in both thesis and life. I enjoy communicating with them and without their help, I couldn't successfully solved all kinds of difficulties in life.
Last but not the least, I would like to express my profound love to my family, whose unwavering care and strong support in life contributed a lot to the successful completion of the present paper.
Abstract
N-gram has attracted great attention in the field of second language acquisition and corpus linguistics research recently. Although substantial studies have explored the importance of N-gram use in L2 and the relationship between the use of n-gram and L2 writing proficiency, how the production of N-grams is predictive of the judgments of L2 writing proficiency? In addition, frequency and association strength have usually been used to measure L2 N-gram use in learner corpus research, but indices such as dispersion and directional association strength are used much less often. Especially the studies that N-gram use is operated as a multidimensional phenomenon in learner corpus research are very rare. Hence, the present study aims to investigate two questions: 1.What indices of bigram and trigram use can predict the judgments of second language writing proficiency? 2. What are the features of the indices of bigram and trigram use predicting the judgments of second language writing proficiency?
Corpus for the study is self-built with 1600 compositions written by Chinese freshmen of different majors in Nanjing Tech University and totals about three hundred thousand words. Students should write an argumentative essay online when given a specific topic by teacher. After completion, every composition is assigned a grade by the system of website with a full score of 100. TAALES was used to analyze bigram and trigram indices. Correlation and regression analysis were conducted between the N-gram indices and the L2 writers' scores.
Major findings of the study are listed as follows:
- Results of the multiple regression revealed that the judgments of L2 writing ability are most strongly predicted by four indices, namely academic bigram delta P, spoken trigram 2 MI, academic trigram 2 delta P and academic trigram 2 MI2. Academic bigram delta P indicates students gaining higher scores on their compositions use academic bigrams that have stronger directional associations. Spoken trigram 2 MI suggests more proficient L2 writers make use of more spoken trigrams consisted of low-frequency words as well as stronger association between the first word and the last two words. Academic trigram 2 delta P shows the directional association between the first word and the last two words in the academic trigram is stronger. Academic trigram 2 MI2 reveals the more low-frequency academic trigrams with strong association between the first word and the last two words used by L2 writers, the higher their scores are. This model demonstrated that the four indices together explained 18.6% of the variance in the judgments of the compositions.
- There were five features of the indices of bigram and trigram measure predicting the judgments of writing proficiency. First, high scoring compositions contain more trigrams than bigrams. Second, more skilled L2 writers use more academic N-grams than spoken ones. Third, the directional association between the words in the academic N-gram has a significant impact on the level of L2 writing proficiency. Fourth, learners with more developed L2 writing abilities produce more academic N-grams composed of strongly associated words that occur less frequently in native speech. Fifth, more proficient L2 writers are better able to produce more N-grams normally occurred in native speech and writing.
Theoretically, this study finds out delta P and MI are the two most significant indices of N-gram use predicting the judgments of L2 writing proficiency. Learners with more developed L2 writing proficiency produced academic and spoken bigrams and trigrams composed of strongly directional associated words that occur less frequently in native speech. Pedagogically, the acquisition of N-grams has a positive influence on the score of composition, which provides a strong support for the teaching of N-gram in L2 writing classes. In this way, this study is reasonable to be a reference for the inclusion of N-gram instruction in the L2 writing classroom.
Keywords: N-gram; L2 writing proficiency; Automatic text analysis
中文摘要
近年来,N元组在二语习得和语料库语言学研究领域均引起广泛关注。虽然N元组在第二语言中的重要性及其与二语写作能力之间的关系已有大量研究,但是N元组的产出如何预测二语写作水平的判定?其次,频率和关联强度指标通常被用来衡量学习者语料库研究中的N元组使用情况,但分散度和定向关联强度等指标的使用却很少。尤其是当N元组的使用作为一种多维现象在学习者语料库研究中是非常罕见的。因此,本研究旨在回答两个问题:1.二元组和三元组中的哪些指标可以预测二语写作水平的判定? 2.预测二语写作水平的二元组和三元组中的指标具有哪些特点?
本研究自建语料库,由南京工业大学不同专业的大一新生撰写的1600篇作文组成,共计约30万字。根据既定主题,学生需在网上写一篇议论文。完成后,由网站系统对每一篇作文进行评分,满分100分。本研究利用TAALES 软件对二元组和三元组指标进行分析。然后对N元组指标和二语写作者的作文分数进行相关和回归分析。
本研究主要发现如下:
- 多元回归分析结果显示,学术二元组∆P、口语三元组2MI、学术三元组2∆P和学术三元组2MI2 这四个指标对二语写作水平的判定最具预测性。学术二元组∆P意味着学生在作文中的得分越高,他们使用的学术二元组具有更强的方向关联性。口语三元组2MI表明水平更高的二语写作者使用更多的由低频词组成的口语三元组,且其第一个词和最后两个词之间有更强的关联性。学术三元组2∆P显示学术三元组中第一个词和最后两个词之间具有较强的方向关联性。学术三元组2MI2 意味着使用更多由低频词组成且第一个词和最后两个词关联性更强的学术三元组的二语写作者的分数会更高。该模型表明,这四个指数对于作文分数的预测达到了18.6%的决定系数。
- 预测写作水平的二元组和三元组中的指标具有五个特点。首先,高分作文中三元组的数量比二元组多。第二,水平更高的二语写作者使用学术 N元组的频率比口语N元组高。第三,学术N元组中单词的方向关联性对二语写作水平有显著影响。第四,能力越高的二语学习者产出更多由强关联词组成且在本族语者中出现频率较低的学术N元组。第五,水平越高的二语写作者能够产出越多通常在本族语者的口语和写作中出现的N元组。
本研究的理论意义在于发现∆P和MI是预测二语写作水平判定的两个最重要的指标。具有更高水平的二语学习者的作文在学术、口语二元组和三元组中具有较强的关联性和较低的词频。教学意义在于发现N元组的习得对作文成绩具有积极的影响,为二语写作课N元组的教学提供了有力支撑。因此,本研究对第二语言写作课堂上进行N元组教学有借鉴作用。
关键词:N元组;二语写作水平;自动文本分析
Chapter 1. Introduction
1.1 Research background
Acquiring abundant English phraseology gradually becomes an indispensable part of process for gaining proficiency in second language (L2) writing. Corpus research has shown that many utterances in English are composed of fixed or semi-fixed multi-word sequences (MWSs), including collocations, idioms, N-grams, and lexical bundles (Romer, 2009; Sinclair, 1991). And much attention have been attached to the accurate use of highly frequent N-grams, especifically bigrams (2-word sequences) and trigrams (3-word sequences).
课题毕业论文、开题报告、任务书、外文翻译、程序设计、图纸设计等资料可联系客服协助查找。