Model of Product Evaluation Based on Text Comments and Star Ratings and Corresponding Prediction
2019-09-10唐熠唐勇贺颖
唐熠 唐勇 贺颖
Abstract:To deal with tons of data generated by the users of the e-commerce platforms,we utilized NLP algorithms to analyze the emotion of the text comments and scored them with exact value,and statistic methods were taken to find out the correlation between emotion of text comments and star-rating levels.We possessed frequency-Weighed sentiment algorithm and the Label System to combine star rating and text comments.Whats more,with correlation analysis, we found that theres a moderate correlation between them.Last but not least,we used Prophet Model to predict the future performance of a product.
Introduction
With the development of the Internet,online shopping has been favored by more and more people.There are many factors affecting the sales of products.To better promote the development of the online market,e-commerce platforms provide customers with the opportunity to rate and evaluate purchases.in a way combining star ratings from 1 to 5 stars and text comments.Within the data contains the preference of the market in which customers are participating,and the potential success of products which the platforms are selling.Therefore,a method is needed to deal with the large amount of data and reveal the markets preference.
Model Construction
Evaluation Model Based on Text Reviews and Star Ratings
Qualify the Emotion of Text Reviews
To convert human language in to math quantities,there are generally 2 kinds of solutions.The first way is to calculate the frequency of positive and negative words.Another way is using Artificial Intelligence algorithms like CNN,RNN etc.. In our study we possessed the latter way.
In order to achieve this,we need to use Googles word2vec tool,which converts the words to vectors.After having built the CNN model,we trained the model,the result didnt went well.Therefore,we turned to a python tool names Textbolb,which score the sentence between[-1,1],where -1 means completely negative and 1 means completely positive.
The Correlation Between Star Rating Level and Text Review Attitude
With 32022 pieces of samples that contain star ratings and the corresponding reviews score on sentiment, we sorted the data set and divided it into 5 parts by their star ratings ranging from 1-5 and we fitted the distribution of the sentiment score(from -1 to 1) of each class of star rating with the normal distribution model.
We exert following steps to reviews of the same star-rating level.
Reviews scoring 0 are considered that as error and deserted. Then we fitted it to the normal distribution model,and the result is shown in Figure 1.
It is very clear that the mean of the distribution from 1 star to 5 star is moving right.Therefore,we tried to find out the correlation between star and mean of the sentiment with curve fitting,and we got the equation between star x and sentiment score y:
It's trivial that with the higher star ,people are more willing to post positive reviews about the products.
The average of the general sentiment score is 0.2718,meaning only five stars can show customer's satisfaction about the products.And while 1 and 2 stars below 0.1,which is very low,this could show reviewers' complaints.And the 3 and 4 stars is medium,people are either fond of nor dislike the product.
The Evaluation Combining Star Ratings and Text Comments
To evaluate a product more precisely,we need to combine star ratings and text comments.While star ratings can be utilized to evaluate easily,words are not the case. Therefore,we designed the Frequency-Weighed sentiment algorithm to calculate the score of a product regarding to reviews.The algorithm is presented below:
If the score calculated by the algorithm is above 0,it can be referred that the product is actually doing great.While score being negative indicates that the product is not that good.
However,high star-ratings are not the only standard for successful products.With the model mentioned before to take reviews into consideration,we are now capable of coming up with more scientific way of evaluating a product.
We decide to attach a
For example,an item with a 5-star rating and a <C:\Users\Administrator\Desktop\速读12下\Image\image10.pdf> label is more welcomed than an item of 5-star but with the <C:\Users\Administrator\Desktop\速读12下\Image\image11.pdf> label. In this way,products are divided into more classes even if they share the same star class.
The Effect of Existing Reviews to Later Ones
Now we discuss whether a specific pattern of star rating will attract reviews of similar pattern(for example existing bad reviews can lead to more bad review).This led us to analyze the correlation between star ratings and reviews.
Correlation analysis is an analysis of the degree of correlation between two variables,where ρ is usually used to represent the overall correlation coefficient,and r is used to represent the correlation coefficient of the sample.
Definition of correlation coefficient:
r>0 means two variables are positively correlated,and r<0 means two variables are negatively correlated.A higher r indicates stronger correlation.When <C:\Users\Administrator\Desktop\速读12下\Image\image18.pdf>, it can be considered that there is a high correlation between the two variables;When <C:\Users\Administrator\Desktop\速读12下\Image\image19.pdf>,it indicates that the correlation is weak and basically irrelevant.
Basing on the previous model,we selected a period of time to analyze the relevance of stars and reviews,the result in SPSS is shown in Table 1.
Obviously,<C:\Users\Administrator\Desktop\速读12下\Image\image21.pdf>,so there is a moderate correlation between stars and reviews.
The Prediction of Ratings and Sales
To identify the future performance of a product, we introduced a time series prediction model known as Prophet.With the input of the historical data, the model will make predictions on the future performance on a given index.
The input to Prophet is always a data frame with two columns: ds and y.After consideration,we decide to choose star ratings as ds. The result is shown in Figure 3 and Table 2.
Table 2:Result of the 15 days to come
Other than the star rating,we also need to study the pattern of the selling of a product.To describe the current and potential situation of a product, we established the model below:
Suppose theres a set S contains the monthly sales represented by the amount of reviewers,Si being the monthly sales, where ti means the corresponding month of Si. That would be
Conclusions
We combined modern computer technology such as NLP,CNN and conventional theories of probability and statistics together,to create a joint force.We found a way to classify text into positive and negative polarity with PYTHON,and used the prophet model to forecast the star rating of a given product.We fitted the distribution of sentiment scores of a specific star-rating class,and calculated their means to fit the correlation between star-ratings and review sentiment.Moreover,we designed a Frequency-Weighed sentiment algorithm to describe the general attitude towards a product.
References
[1]Kim Y.Convolutional Neural Networks for Sentence Classification[J].Eprint Arxiv,2014.
[2]Taylor S J,Letham B . Forecasting at Scale[J].The American Statistician,2017:0-0.
[3]Wang Xiaofei,Wang Bo,Lu Yuyu.Prediction of PM2.5 Concentration Based on Prophet-LSTM Model[J/OL].Software Guide:1-4[2020-03-10].
作者簡介
唐熠(2000.03—),男,汉族,江苏省扬州市人,南京林业大学理学院2018级在读本科生。
1.项目来源:国家自然科学基金面上项目,项目名称:Navier-Stokes-Allen-Cahn方程组的数学理论研究,项目编号:11971234.
2.项目来源:南京林业大学国际教育学院,项目名称:中外合作办学高水平示范,项目编号:164101005.
3.项目来源:南京林业大学,项目名称:2017年教学质量提升工程,项目编号:163101812.
4.项目来源:南京林业大学高等教育研究所,项目名称:通过数学建模竞赛提高创新性人才培养研究,项目编号:163101147.