【Abstract】The dissertation has made an observation on the words,which own the highest frequencies,in the reports of Peoples Daily on Xinjiang in year 2008、2009 and 2010 (all in Chinese)which around July 5th riots ,from 2 aspects below: what are the main features of Peoples Daily; After July 5th riots,specifically from which aspects does government change the developing strategy in Xinjiang; So by doing this,we can find out a new method so as to learn and analysis the reports,and get some messages and ideas from the linguistic research.I analyzed 3 years corpora from words high frequencies and made comparison among them to get the result.
【Key words】Peoples Daily of China; frequencies; LL test
Research questions
In this research the first 20 high frequency words (mainly focus on content words) of each years data were set as analytic targets to find the changing tendency and the results that are wanted.The questions that raised in the study are as below:1: what are the main features of Peoples Daily;2: what we can conclude that after July 5th riots,specifically from which aspects does government change the developing strategy in Xinjiang;
Methods:Here I used 3 years of news in Peoples daily on Xinjiang to do the research,which is respectively years 2008,2009 and 2010(all in Chinese).All the corpora are segmented,while untagged.The data were collected from Newspaper of Zhejiang University online library database.The total file length of 3 years corpora is 362,682 in Chinese,which respectively are 67,939,191,596 and 103,147 for each year from 2008 to 2010.The tokens in the texts are 20,929,566,33 and 31,412 in year 2008,2009 and 2010 respectively.The corpus analysis tools which I used are Wordsmith5,and Log Likelihood ratio calculator tests that created by Pro.jiajin Xu.
Results:Table 1
year 2008 2009 2010
Rank Freq word Freq word Freq word
1 1080 的 2947 的 1509 的
2 324 了 1005 新疆 770 新疆
3 308 一 839 和 505 和
4 247 在 625 了 339 了
5 204 是 624 在 321 發展
6 168 新疆 551 民族 313 在
7 165 和 516 是 280 工作
8 145 年 500 发展 218 一
9 124 个 433 年 213 是
10 111 多 357 一 198 年
11 109 到 342 群众 196 要
12 106 不 330 稳定 171 建设
13 96 上 316 社会 153 对
14 96 为 301 各族 150 中央
15 95 地 283 团结 149 文化
16 95 就 272 人民 149 民族
17 92 有 264 要 130 群众
18 91 她 260 为 123 为
19 88 工作 253 等 119 等
20 87 人 236 对 116 大
From table1 above,the first 20 high frequency words of each corpus are selected into observation; the word list is from data of year 2008,2009,and 2010 respectively.
The wordlist of first 20 words with preference of 3 years corpora in total is set in table2 as below:Table 2 (3 years)
Rank Freq word Rank Freq word
1 5536 的 11 594 工作
2 1943 新疆 12 536 群众
3 1509 和 13 520 要
4 1288 了 14 479 為
5 1184 在 15 456 社会
6 933 是 16 445 对
7 886 发展 17 436 多
8 883 一 18 435 各族
9 776 年 19 435 稳定
10 719 民族 20 429 个
Since the frequency of words in the newspaper mirror the trend and emphasis of development in Xinjiang,which mainly reflects on words with common part of speech in sub corpora,for instance,we can see in table3,the common content words that occurred in 3years corpora are: Table 3
Freq word
1943 新疆
886 发展
719 民族
594 工作
536 群众
456 社会
435 各族
435 稳定
1. As we can see,those content words with high frequencies reflect the main character of Xinjiang,word ‘Xinxiang is on the top of high frequency words list indicates the topicality and thematic ways of the news,and the main strategy and trends of development of the area.Since Xinjiang is a multi-ethnical and multi-cultural province,so “民族,群众,稳定” are highly promoted,in which can be translated respectively as “ethnic”,“the masses”,“stability” which reflects the characteristics and uniqueness of Xinjiang local culture.While the development of the certain area are promoted by the government with words “发展,工作,社会,”,that mean “development”,“working”,“society” which give expression to substantialness and contemporaneity of the news.
2.Since the main characters of the news are demonstrated,by which those content words with high frequencies do reflect the main character of Xinjiang,then around July 5th riots,are there any different focus in reports about Xinjiang? Since after the event,the central government and Xinjiang Uyghur Autonomous Region made effort to adjust policies and development strategy in the area.So the author predicted that the reports about Xinjiang are much more diverse and investigated around July 5th riots.Here the comparison has been made among the high frequency words from those 3 years corpora to see the difference and the development switching points.
The Corpora are divided into two parts,the first part is from 2008 till the first half year of 2009,the second part is from the later half year of 2009 (when July 5th Riots was at the start) till the year 2010.Table4
2008-2009 2009-2010
Rank Freq word Freq word
1 1393 的 4143 的
2 383 了 1693 新疆
3 380 一 1253 和
4 313 在 905 了
5 263 是 871 在
6 256 和 775 发展
7 250 新疆 670 是
8 179 年 661 民族
9 158 个 597 年
10 147 多 503 一
11 139 到 483 工作
12 138 不 453 群众
13 129 有 440 要
14 120 上 415 社会
15 117 为 394 各族
16 111 人 385 稳定
17 111 发展 368 对
18 111 工作 362 為
19 110 就 342 等
20 109 月 340 团结
As we can see in table 4,the main content words of year 2008 and the first half year of 2009 are not as many as in the later half year of 2009 and 2010 and so far the content words are much more diverse and relatively informative in the later half year of 2009 and 2010.Commonly,content words like “新疆,发展,工作,” occurred in both comparison objects,while differently,in the first half year of 2009 and year of 2010 we can see there are much more content words which do not occur in year 2008 and the first half year of 2009;such as ”民族,社会,各族,稳定,团结”.Those content words are the kernel words that reflect the focus of the work,core thoughts and the ideological trend.
Content word “团结” which means “unity” occurred in the later half year of 2009 261 times and 79 times in the year 2010 (which in total 340 times).Comparably,however in the first half year of 2009 it occurred 22 times,and 12 times in the year 2008 (which occurred altogether 34 times ) while did not ranked as high frequency word in top 20.In the same way,content word “稳定” which means “stability” in the corpora (From the concordance plot the regular collocation in the corpora is “维护稳定”,which means “stable maintenance” )in the first half year of 2009 which occurred for 19 times and 31 times in the year 2008 (for 50 times totally ),it did not ranked as high frequency words in top 20 neither.While in the later half year of 2009 it occurred for 311 times and 74 times in the year 2010 (for 385 times in all) ranked as 16th in the high frequency words list.
Hence,we can say that the government put much effort on the ethnical-unity,stability of the area,and steady economic development of Xinjiang in year 2009-2010 after the Riot released.So we can conclude that what had been predicted before that the reports about Xinjiang are much more diverse and investigated around July 5th riots is true.From the number of content words occurred in two corpora,Xinjiang,especially the economy and peoples livelihood in the area becomes the primary concern of local and central government around the riot.And whats more,via the word concordance,we can say that the government put more effort to rebuild the unity and stability of Xinjiang,and the construction of economics,politics and society are more emphasized meantime.
LL ratio tests
Due to the varied text sizes among 3 years corpora,we put the result into LL ratio tests to eliminate the influence of different corpus size and reveal statistical significance in the differences between the years.Table5
Year 2008 Year 2009 Year 2010
words LL value Sig. LL
value Sig. LL
value Sig.
稳定 38.11 0.000***- 26.24 0.000***+ 19.87 0.000***-
新疆 123.45 0.000***- 0.01 0.903- 53.23 0.000***+
发展 74.38 0.000***- 2.15 0.142+ 11.87 0.001***+
工作 5.5 0.018*- 16.63 0.000***- 43.09 0.000***+
民族 145.43 0.000***- 45.98 0.000***+ 14.43 0.000***-
社会 50.6 0.000***- 15.14 0.000***+ 2.13 0.144-
群众 14.62 0.000***- 8.65 0.003**+ 3.23 0.072-
各族 49.89 0.000***- 14.28 0.000***+ 1.76 0.185-
As we can see the high frequency words in 3 years corpora in table5,as a whole,the LL value of main content words are > 3.84 and P value <0.05,which state the influence of different corpus sizes among 3 years could be ignored.And above all,the significant differences of the content words among 3 years are striking.That is say,the content words with high frequencies of each year have significant difference which indicates that among year 2008,2009 and 2010 the social concerns are transformed.
The paper has made an observation of the words,which own the highest frequencies in the reports of Peoples Daily on Xinjiang in year 2008、2009 and 2010 (all in Chinese) which around July 5th riots from 2 aspects below: 1,the main features of Peoples Daily; 2,after July 5th riots,specifically from which aspects did government change the developing strategy in Xinjiang.After the statistic analysis and comparison among the high frequency words,the estimation that reports about Xinjiang are much more diverse and investigated around July 5th riots is proved.And the government switched the developing strategy,which mainly from ethnicity,society,unity,stability and economic perspectives to improve Xinjiangs current situation and peoples livelihood.
What else should be mentioned here is the geographical features and multi-ethnicity of Xinjiang demand that not only the economic development but also the social stability and unity should be emphasized and highlighted along the development strategy of Xinjiang,and if more focus is put on the ethnic area and within the systematic development,may Xinjiang be more prosperous and united.The development of multi-culture should be emphasized so people will know that they are affirmed and be in the governments attention.Hence ethnic groups identity is more shaped and reflected with their culture,as minority groups and their culture can not be separated,on this point; there is still a long way for people and the government to go.
【基金项目】本文系2013年度新疆维吾尔自治区普通高等学校人文社科重点研究基地中外文化比较与跨文化交际研究基地立项课题“新疆多元文化传播的文化话语研究” (XJEDU010713B01)的阶段性成果。