Evaluating the Quality of Multiple—Choice Items for Reading Comprehension in the 2018 First Mock Test before the Senior High School Entrance Exam

2019-04-12陈文亮

校园英语·中旬 2019年2期

1. Test description

1.1 General description of the first mock test

The Senior High School Entrance Exam is held in the middle of June every year in Guangzhou， China. It aims at examining whether the junior high school graduates have achieved the English level which is set according to the English Curriculum Standards for Compulsory Education （2001）， and how much they have achieved. This exam is a high-stakes test. It has serious consequences for schools， teachers， and students （Callet， 2005）. A candidates study plan mainly depends on the results of this exam， since it is used to select the suitable students for senior high school education. The stakeholders are those who will be affected by the outcomes of the test （Davies， 1999）， including parents， students， teachers， the educational bureau and so on. They pay much attention to this exam. Therefore， nearly all the districts and some prestigious junior high schools organize mock tests before the Senior High School Entrance Exam every year so that the stakeholders can predict how well the candidates can perform in the entrance exam.

The first mock test is regarded as the most important mock test， as it is organized by the district educational bureau. All the schools in the district participate in the first mock test. They will be ranked according to their outcomes of the test and so will the students. The outcomes reflect so far how well a school has helped its students to review the knowledge and skills. In addition， before the entrance exam， candidates refer to the results they got from the first mock test to fill in the senior high school application form. Thus， the first mock test plays an important role in influencing the schools fame and the candidates chance of entering a senior high school.

There were 1，0695 candidates taking part in the 2018 first Mock Test of Baiyun District where I have been working. The number of candidates participating in this test in the school where I work is 460. The district educational researcher designs the test. The reading section contains two parts. One is reading comprehension， the other is reading and gap-filling. The total score of the test is 150， and the reading comprehension accounts for 40 scores， which means approximately 26.67% of the whole test. There are four passages in the reading comprehension. The words of each passage is around 300. There are five multiple choice items under each passage. Except the sections of the Vocabulary Spelling， Sentence Completion， Short Essay Writing， which are scored by the selected teachers from each school of the district， the other sections of the test are scored by the computer.

1.2 Motivation of the research and my research question

I am teaching Junior 3 students during this academic year. My students are busy preparing for the entrance exam by doing piles of test papers. On the other hand， I also have to do many test papers and to sum up the rules and strategies about how to tackle the test for my students. However， some papers are of low quality. Sometimes my students complain the items set ridiculous traps for them. Both my students and I wonder whether the items really test what they intend to measure. Therefore， it is important for me to explore the test quality of reading comprehension items in the first mock test.

My research question is to what extent is the first mock test valid in terms of test quality.

1.3 The structure of my assignment

My essay contains six parts. In the first part， I will introduce the general description of the reading section of the test， the motivation， the research question and the structure of my essay. The next part is the literature review which includes the understanding of validity， aspects of reading， multiple choice test items and guidelines used in reading assessment. The third part is the methodology and limitation. I will use the guidelines in reading assessment with multiple choice questions to inspect whether the test items are valid. Also， the limitation of the research will be mentioned. In the fourth part， I will present the findings and analyze the problems of the items. Suggestions for improving the items will be shown in the fifth part. The last part is the conclusion， where the main points of the essay will be summarized.

2. Literature review

2.1 Validity

The meaning of validity has changed in the past decades. In the past， validity was seen as a typical feature of a test. If a test or its composing section measures what it plans to measure， then the test or its composing section is valid （Henning， 1987）. Cronbach and Meehl put forth that there were several types of validity—construct validity， concurrent validity， predictive validity and content validity （Cronbach & Meehl， 1995， as cited in Carr， 2011） . At that time， construct validity was regarded as one of them but not the center. It was only the testing researchers concern to set up validity， since they had responsibility for developing large-scale high-stakes tests （Chapelle， 1999）.

But at present， validity is thought to be an argument which involves the interpretation and use of a test. It means to what extent the interpretations and uses of a test can be proven reasonable. Construct validity has risen to the center of validity while content validity and criterion-related validity serve as proof on construct validity. Today， every test user is responsible to prove the validity of test us reasonable （Chapelle， 1999）.

The present study is about investigating the test quality of reading comprehension items in the 2018 first mock test before the Senior High School Entrance Exam. It would seem in my context if the test items are of good quality， the reading comprehension test must be valid because the items can test what they are intended to measure.

2.2 Aspects of reading

Reading is an activity to reorganize the information on the text （Carroll， 1964， as cited in Moore et al.， 2007）. This understanding of the reading nature is the most significant construct in a lot of reading comprehension exams， especially for those with multiple choice questions. Reading can be seen as a process or a product. Reading is a selective process in which a reader uses his or her background knowledge and linguistic knowledge to communicate with the information in the passage. The process is private and individual. When a reader is reading， it is hard for others to tell what he or she is reading， whether he has understood， or how much he has understood （Liu， Han & Hu， 2000）. On the other hand， reading is regarded as a product. It means that the reader understands what he or she has read （McKay， 2006）. We can tell that the reader has understood what he has read from the scores he got in a reading test.

Although reading is regarded as either a process or a product， these views are inseparable to each other. After going through the reading process， a reader will attain the reading product. On the other hand， if we want to know how the reader succeeds or fails in understanding the reading， namely the reading product， we should go back to his or her reading process.

In a reading process， there are three levels of understanding （Gray 1960， as cited in Alderson， 2000）. The entry level is reading ‘the lines which assesses the candidates understanding of the words in a passage. The intermediate level is reading ‘between the lines which assesses his or her ability to imply meanings. The advanced level is reading ‘beyond the lines which requires the candidate to evaluate the passage.

Levels of understanding is usually associated with reading ability. According to Wu （2002）， English reading assessment has to define in detail the reading abilities. Heaton （1988， p.105-106， as cited in Wu， 2002） puts forth some reading abilities which are aligned with those mentioned on Page 78， the Instructions for Guangzhou Junior High School Graduates Achievement Exam--English. First， “deduce the meaning of words by （a） understanding word formation （roots， affixation， derivation and compounding）（b） contextual clues”. Second， “understand explicitly stated information”. Third， “anticipate and predict what will come next in the text”. Fourth， “understand information not explicitly stated by （a） making inferences （b） understanding figurative language”.

For most language tests， reading ability is thought to be composed of many different types and levels of subskills. Actually， reading comprehension assessment is testing these subskills （Liu et al.， 2000）. To see whether a reader possesses reading abilities， Davis （1968， as cited in Alderson， 2000） offers definitions for eight reading sub-skills. Below are those which are consistent with the reading sub-skills mentioned in the Instructions for Guangzhou Junior High School Graduates Achievement Exam--English. First， imply the meaning of a word within a given contest. Second， infer meaning from the test. Third， locate detailed information which corresponds to the question. However， this information is either described in an explicit way or paraphrased. Fourth， understand the writers ideas， attitudes and his or her aim of writing the passage. Last， understand and predict the plot and the end of the story. I will see whether the items in the first mock test indeed tests the candidates reading ability according to the sub-skills mentioned above.

The reading comprehension section of the Senior High School Entrance Exam examines the candidates ability of obtaining and dealing with the information in the reading process. The reading product of the reading comprehension section is presented through the scores of the test. It is supposed that if a candidate gains high marks in reading comprehension， he or she has a good mastery of the required reading sub-skills. However， the test items validity is a prerequisite for whether a candidate has mastered the reading sub-skills. If the items have loopholes with which the candidates do not need to read the passage and choose the correct answer， then their scores cannot represent their mastery of reading subskills.

2.3 Multiple choice test items

Multiple-choice items prevails in different tests. The number of options and how the candidates thinks when choosing the options can be under control. Weir （1993， as cited in Wu， 2002） puts forward some advantages and disadvantages of multiple choice items. For advantages， first， the reliability of scoring the papers is high. Second， the item difficulty can be adjusted according to the feedback from the pre-test. Third， the tested points are clear. Fourth， the problem of difficulty in wring the answer in the open questions can be avoided.

It is significant to choose appropriate passages for a test and to control the relationships among the items for a tested passage. Otherwise， reading comprehension items whose key can be identified by test-wise examinees will be produced （Pyczak， 1972）.

For disadvantages， first， when the candidate chooses the wrong answer， it is hard to tell whether it is because he did not understand the passage or he did not understand the question. Second， some degree of guessing may be involved. If a candidate is test-wise， he or she can eliminate the distractors or choose the correct answer after analyzing the item logically （Alderson， 2000）. Third， the cost for designing the items is high. Fourth， it takes long for candidates to read through the question and the options. Fifth， some strange thinking patterns in the distractors may hinder the thinking development of the immature teenagers （Weir， 1993， as cited in Wu， 2002）. It may confuse them and increase the difficulty of the test. Therefore，“poorly constructed MC questions can lead to an inaccurate assessment of a students true ability （Hansen & Dexter， 1997， p.94）”. The testers should design plausible distractors so that the candidates have to carefully analyze the distractors to choose the correct answer. If the candidates choose the distractor， the reason for their choice can be interpreted， since the distractors are carefully designed to represent the candidates weakness in reading comprehension （Alderson， 2000）.

In my context， the first mock test is a large-scale test within a district， not to say the entrance exam within the whole city. To be efficient， it is a good choice to use multiple choice items in such tests， despite multiple choice items have the demerits mentioned above.

2.4 Guidelines used in reading assessment with multiple choice items

No matter what kind of test it is， the fundamental unit for people to observe in a test is the test item （Haladyna， 2004）. It is “the smallest unit that produces distinctive and meaningful information or feedback on a test when it is scored or rated （Brown， 2005， pp.41）.” The procedure in which a person examines the test items with great care refers to item analysis （Brown， 2005）.

To conduct an item analysis， we have to follow some guidelines. To ensure a reading test valid， it would seem that the guidelines mentioned below should be followed in my context. First， the length of the options should not become a cue for the key. In most situations， if an option is obviously longer than the others or it is more explicit stated than the others， it is usually the key （Henning， 1987）. Second， extra and irrelevant information should not be involved in the options （Brown， 2005）. Third， three options should not be classified into a group that is distinctively different from the other one. Fourth， the same words should not be repeated in every option . Fifth， “the language should be correct， local， appropriate and concise. This is the most basic requirement for designing items （Liu et al.， 2000， p.59）.” Sixth， the words in the stem or the option should not be the same as the words in the passage （Henning， 1987）.In this way the candidates only need to match the words from the stem or the option with the words in the article. Seventh， a focus should not be absent from the stem （Liu et al.， 2000）. Eighth， two kinds of words should be excluded—words with extreme meanings like always and never as well as words in both the stem and one of the options （Haladyna， 2004）.

3. Methodology and limitation

The specification of the first mock test is the same as that of the Senior High School Entrance Exam. That is， the reading comprehension section is tested with multiple choice items. These items under the reading comprehension have their own validity， which is whether the test items can really reflect the candidates reading ability. To investigate whether the reading comprehension items is valid， a theoretical research is done in this study. I will use the guidelines applied in reading assessment with multiple choice items to inspect the test items in the reading comprehension of the first mock test.

I chose this method mainly because I wanted to help my students recognize the characteristics of the items and avoid the underlying traps in the reading comprehension test so that they can attain high marks.

However， due to the limited time， no questionnaire or interview was done for this research. Only I myself examined the test items to see whether they had problems， and if they had， what problems they had. In the future research， questionnaires should be sent to candidates and interviews should be conducted to ask their opinions of the test items.

4. Findings and analysis

After examining the items in the reading comprehension section， the following problems of the items have been found.

4.1 Incorrect language of the items

“The language should be correct， local， appropriate and concise. This is the most basic requirement for designing items （Liu et al.， 2000， p.59）.” Test items should be well written and follow editorial criterion （Osterlind， 1998）. However， Option B in Question 33 adds an unnecessary “ed” in the sentence， which makes the option wrong in grammar. For the candidates who can spot this mistake in the option may wonder whether this option should be eliminated because of its extra “ed” or it should not be eliminated because it is just a typing mistake.

4.2 Obviously different length of the options

According to Liu et al. （2000）， the length and the structure of the items should remain similar. However， the options of the following items from the first mock test below violates this guideline. Options A and D from Question 30， 37 and 42 are obviously much longer than Options B and C.

4.3 Extra and irrelevant information in the options

Options should not contain additional and unrelated information to the test （Brown， 2005）. Nevertheless， below are the examples from the first mock test which show extra and irrelevant information in the options.

If we read through Passage A （see Appendix）， no relevant information can be found to match the information mentioned in Options A and B under Question 29. Similarly， we cannot find the corresponding information mentioned in Option A of Question 42.

4.4 Redundant words in the options

According to Henning （1987） and Haladyna （2004）， redundancy must be avoided in the options because verbosity may cause longer time to understand the question. Repeating words is not efficient （Hansen & Dexter， 1997）. But here is an example from the first mock test that is inconsistent with this guideline. From the four options we can see the first words are the same—they.

4.5 Stems that are lack of a focus

The stem should have a focus of the information from the passage （Liu et al.， 2000）. Some questions that are short of a focus have been listed by Liu et al. （2000） in their book. Below are examples from the first mock test that show stems without focuses.

According to Liu et al. （2000）， they have no idea what exactly these questions are asking. They do not know what the focus is in these questions. “The unfocused stem fails to provide adequate information to address the options （Haladyna， 2004， p.108）”. Below is another example from the first mock test which only shows the stem with an adverbial and a line. Apparently， it is an unfocused stem （Liu et al.， 2000）.

4.6 Matching the words in the option and the passage

Reading comprehension test should not be reduced to simply matching the same words in the options with those in the passage （Henning， 1987）. In this way， the test just examines whether the readers have the ability of matching words but not whether they can understand the meaning of the passage. Below are two examples of this situation from the first mock test：

Extract of Paragraph 4 in Passage B：

The key to Question 33 is Option C. Both in Option C and the extract of Paragraph 4 contain the word “contact”. If the readers cannot find the sources from the passage for the other options， then the word “contact” will serve as a hint for them to confirm that Option C is the answer.

The key to Item 37 is Option D. The words “recover from” appear exactly in the same form both in the passage and in Option D.

4.7 Distractors that can be grouped together

According to Henning （1987）， test designers should not write items those of which are in the same group so that the candidates can odd out the one different from the others to be the key. Below is such an example from the first mock test.

From the options we can see Options A， B and D can be classified as the negative meaning group while Option C is the positive one. So it gives a hint to the candidates that Option C is different from the rest of the options. It is highly likely that Option C is the answer. And by chance after reading the test， Option C is really the key to this question. There is no need to read the passage if the candidates can spot such feature of the options.

4.8 Specific determiners that give hints to the correct response

Words like always， absolutely， completely， totally and never are specific determiners. Since they have extreme meanings， the chance that they are the right answers are remote （Haladyna， 2004）. Option B in Item 29 of the first mock test is an example：

The key to Item 29 is D rather than B. If the candidate knows the rule of not choosing options with specific determiners， he or she can eliminate Option B quickly and focus on the other options.

4.9 Clang associations in both the stem and one of the options

Clang associations indicate the same expression exists in both the stem and one of the options. If clang associations appear， the option with the same expression in the stem will be the key to the question （Haladyna， 2004）. “Similar wording can make the correct response more attractive to students who do not know the answer （Hansen， Dexter， 1997， p.95）.” Below is an example from the first mock test：

We can see the word “family” from the stem and Option D. And coincidentally， Option D is the correct answer. If a candidate is test-wise enough， he or she does not need to read the passage. Just read the stem and the options carefully， the candidate can choose the key to the item.

5. Suggestions for the test items

Based on the problems mentioned the previous part， some suggestions should be given to revise the first mock test.

5.1 The language of the item should be correct

As we can see from Question 33， there is an unnecessary “ed” in Option B. The “ed” following the verb “allow” in the simple past tense should not appear in the negative statement since there is already a “didnt”， which represents the past tense. After removing the “ed”， Item 33 can examine candidates ability of understanding the words in a passage （Gray， 1960， as cited in Alderson， 2000）.

5.2 The length of the options should be similar and the order of the options should be logical

Originally， Options B and C in Item 30 are obviously shorter than the others. After I revise them as the following， all the length of the options become similar （see the boldface）. With similar length options， the possibility of guessing the correct answer according to the length of the options is minimized. Then Item 30 can examine candidates ability of reading ‘the lines （Gray， 1960， as cited in Alderson， 2000）. It also examines the candidates subskill of “understanding the specific information of the passage （Guangzhou Education and Research Institute， 2018， p.78）”.

The original version：

30. The author got a deep sense of satisfaction and peace from _________.

A. attending her grandmothers birthday party

B. helping her neighbor

C. having a big family

D. knowing her elderly neighbors name

The revised version：

30. The author got a deep sense of satisfaction and peace from _________.

A. attending her grandmothers birthday party

B. giving a helping hand to her neighbor

C. having many family members around

D. knowing her elderly neighbors name

If we do not keep the length of the options similar， we can rearrange them in alphabetical order and/or by length （see the boldface）（Haladyna， 2004）.

The original version：

30. The author got a deep sense of satisfaction and peace from _________.

A. attending her grandmothers birthday party

B. helping her neighbor

C. having a big family

D. knowing her elderly neighbors name

The revised version：

30. The author got a deep sense of satisfaction and peace from _________.

A. having a big family

B. helping her neighbor

C. knowing her elderly neighbors name

D. attending her grandmothers birthday party

5.3 Extra and irrelevant information from the options should be removed

If we read Passage A （see Appendix）， we cannot find relevant information about Options A and B in the passage. However， compared with these two options， the information of Option C can be found the passage， although the information is wrong. So， to improve Options A and B， I try to replace them with relevant but wrong information.

The original version of Item 29：

29. What can we learn about the old woman？

A. She had been ill for many years.

B. She always helped others when she was young.

C. She had no daughters and sons.

D. She was grateful for the authors kindness.

The revised version of Item 29：

29. What can we learn about the old woman？

A. She wanted an ambulance to help her.

B. She had communication with me.

C. She had no daughters and sons.

D. She was grateful for the authors kindness.

The information from the revised Option A can be found in Paragraph 2—“She insisted she didnt need an ambulance”. And that from the revised Option B can be found in Paragraph 3—“As I left I felt sad when I realized that we hadnt communicated with her， except a few smiles， for the past 18 months.” After being revised， candidates cannot eliminate the irrelevant options. And Item 29 can examine candidates ability of reading ‘between the lines （Gray， 1960， as cited in Alderson， 2000）.”， which corresponds to the subskill of “making simple judgment or inference according to the passage （Guangzhou Education and Research Institute， 2018， p.78）.”

5.4 The language in the item should be concise

Multiple-choice items should be concise and “get to the point” （Haladyna， 2004， p. 107）. From the options of Item 33 we can see they all have “they”. If we change the question format into a fill-in-the-gap one， the options will not repeat the same word “they” four times. Also， some expressions can be shortened or replaced with other words. After being revised， the question becomes shorter， and the options are less wordy. The item can inspect the candidates ability of reading ‘the lines （Gray， 1960， as cited in Alderson， 2000）. Also， it can examine the candidates subskill of “understanding the specific information according to the passage （Guangzhou Education and Research Institute， 2018， p.78）.”

The original version of Item 33：

33. What did early humans probably do in order to keep away from illness or injury？

A. They chose to make more man-made things.

B. They didnt allowed their children to touch plants.

C. They chose to have little contact with plants.

D. They tried very hard to escape from the past.

The revised version：

33. To prevent illness or injury， early humans probably _______.

A. chose to make more man-made things.

B. didnt allow their children to touch plants.

C. chose to have little contact with plants.

D. tried very hard to escape from the past.

5.5 The stem should have a focus

A stem without a focus confuses the reader what the question wants to ask （Liu et al.， 2000）. To avoid this problem， the gist should be added in the stem instead of just in the options （Haladyna， 2004）.

The original version of Item 39：

39. Which of the following statements is NOT true？

A. At the height of his success， Horatio lost his son.

B. Spafford went to Europe with his family at the same time.

C. Philip Bliss was a composer of many songs.

D. Spaffords tragic story encouraged a lot of people.

Since the options of Item 39 are about Horatio Spafford and Philip Bliss， we can add these two names into the stem as followed （see in boldface）：

The revised version of Item 39：

39. Which of the following statements about Horatio Spafford and Philip Bliss is NOT true？

A. At the height of his success， Horatio lost his son.

B. Spafford went to Europe with his family at the same time.

C. Philip Bliss was a composer of many songs.

D. Spaffords tragic story encouraged a lot of people.

The original version of Item 42：

42. In the Ice Hotel __________.

A. you can enjoy a rainbow inside your room

B. you can enjoy free snacks

C. there is no restaurant

D. there are many local hand-crafted artworks

The revised version of Item 42：

42. What can you enjoy in the Ice Hotel？

A. A rainbow inside your room.

B. Free snacks.

C. The food and drinks in the hotel restaurant.

D. A local hand-crafted artwork show.

In this way， the focus of the question becomes much clearer than the previous version. And the candidates ability of reading ‘the lines （Gray， 1960， as cited in Alderson， 2000）， which is aligned with the subskill of “understanding the specific information according to the passage （Guangzhou Education and Research Institute， 2018， p.78）”， can be examined.

5.6 The same words both in the passage and the options should be replaced with their synonyms

Since the exact words both in the option and the passage only require candidates to match them but not understand them （Henning， 1987）， test designers should transform these words into others that have the similar meanings. The item can examine the candidates ability of understanding the words in a passage （Gray， 1960， as cited in Alderson， 2000）， which means investigating his or her subskill of “understanding the specific information according to the passage （Guangzhou Education and Research Institute， 2018， p.78）.”

Extract of Paragraph 3 in Passage C：

In 1873， Spafford planned a boat trip to Europe in order to give his wife and daughters a much-needed vacation and time to recover from the loss of their young son…

37. Why did Spafford send his family to Europe？

A. He wanted to do some businesses alone in Chicago.

B. He wanted them to move to Europe.

C. He sent them to have a summer holiday.

D. He wanted them to recover from his familys tragedy.

The revised version of Question 37：

37. Why did Spafford send his family to Europe？

A. He wanted to do some businesses alone in Chicago.

B. He wanted them to move to Europe.

C. He sent them to have a summer holiday.

D. He wanted them to raise their spirits.

5.7 Each distractor should be distinct from each other

In Item 38， Options A， B and D can be classified to have negative meanings while Option C is the only one that has positive meaning. Additionally， Option C happens to be the key. I revise one of the options from Options A， B and D so that these three options cannot be grouped as all negatives.

The original version：

38. What did Phillip Bliss think of Spaffords words？

A. Terrible. B. Uninteresting.

C. Wonderful. D. Hard to understand.

The revised version：

38. What did Phillip Bliss think of Spaffords words？

A. Terrible. B. Common.

C. Wonderful. D. Hard to understand.

After being revised， only Options A and D can be classified as negatives， while Option B is neutral， and Option C is positive. The candidates have to go back to the passage to search corresponding information to choose the correct answer. In this way， Item 38 really has its validity—to measure the candidates ability of inferring meaning （Gray， 1960， as cited in Alderson， 2000）， which is consistent with the subskill of “inferring and understanding the meaning of a new word based on the context and word formation （Guangzhou Education and Research Institute， 2018， p.78）.”

5.8 Specific determiners and clang associations should be kept off in writing options

Since generally speaking the specific determiners will not be the correct answer， testers should not include these words in the options which remind the testers not to choose them. So for Option B in Item 29， the word “always” should be eliminated. In this way， this item can examine the candidates ability to read ‘between the lines （Gray， 1960， as cited in Alderson， 2000）. It also inspects the candidates subskill of “inferring and understanding the meaning of a new word based on the context and word formation （Guangzhou Education and Research Institute， 2018， p.78）.”

Conversely， as clang associations usually suggest the correct answers， they should be excluded from the item. Therefore， Question 37 and its Option D can be revised in the following way：

The original version：

37. Why did Spafford send his family to Europe？

D. He wanted them to recover from his familys tragedy.

The revised version：

37. Why did Spafford send his family to Europe？

D. He wanted them to recover from their sons death.

After being revised， Item 37 can test the candidates ability of reading ‘the lines （Gray， 1960， as cited in Alderson， 2000）. It examines the candidates reading subskills of “understanding the specific information according to the passage （Guangzhou Education and Research Institute， 2018， p.78）.”

6. Conclusion

In order to ensure that the items are of good quality， this essay examines the nultiple choice items of the reading comprehension section in the 2018 first mock test of Baiyun District before the Senior High School Entrance Exam. A theoretical research has been conducted. And some guidelines applied in multiple-choice items of reading comprehension are used to examine the items. After analyzing them， we can safely prove that some items have the problems mentioned in the literature review： first， the length of the options should be similar； second， extra and unrelated information should be removed； third， three options should not be classified in a group； fourth， description should be brief and clear； fifth， the language should be correct； sixth， the skill of matching the same words from the option to the passage should be avoided； seventh， the stem should be focused； last， neither specific determiners nor clang associations should appear in the item. Therefore， it would seem that some of the items in the reading comprehension of the first mock test fail to be valid in terms of test quality.

For future research， researchers should send questionnaires to the candidates and interviews should be undertaken to see whether they think the items are of good quality and how the candidates choose the keyed options in the reading comprehension tests.

References：

[1]Alderson， J. C. Assessing reading[M]. Cambridge： Cambridge University Press，2000.

[2]Brown， J. D. Testing in Language Programs： A Comprehensive Guide to English Language Assessment[J]. Asia. The McGraw-Hill Companies， Inc，2005.

[3]Carroll， J， Language and thought， Prentice-Hall， Engelwood Cliffs NJ construct validity（2016）[M]. （1st ed.） Oxford University Press， 1964.

[4]Cronbach， L. J.， & Meehl， P. E. Construct validity in psychological tests[J]. Psychological Bulletin，1955，52（4）：281-302.doi：10.1037/h0040957.

[5]Haladyna， T. M. Developing and validating multiple-choice test items（3rd ed.）[J]. London；Mahwah， N.J； Lawrence Erlbaum Associates，2004.

[6]Hansen， J. D.， & Dexter， L. Quality multiple-choice test questions： Item-writing guidelines and an analysis of auditing testbanks[J]. Journal of Education for Business，1997，73（2）：94-97. doi：10.1080/08832329709601623.

[7]Heaton， J. B. 1931-（John Brian）. Writing English language tests （New ed.）[J]. London： Longman，1988.

[8]Henning， G. A guide to language testing： Development， evaluation， research[J]. Boston， Mass： Heinle & Heinle，1987.

【作者簡介】陈文亮，广州市广外附设外语学校。