APP下载

Selection for high quality pepper seeds by machine vision and classifiers

2018-08-06TUKelingLlLinjuanYANGLimingWANGJianhuaSUNQun

Journal of Integrative Agriculture 2018年9期

TU Ke-ling, Ll Lin-juan, YANG Li-ming, WANG Jian-hua, SUN Qun

1 Key Laboratory of Crop Genetic Improvement, Department of Plant Genetics and Breeding, College of Agriculture and Biotechnology, China Agriculturaluniversity, Beijing 100193, P.R.China

2 College of Science, China Agriculturaluniversity, Beijing 100083, P.R.China

Abstract This research aimed to improve selection of pepper seeds for separating high-quality seeds from low-quality seeds. Past research has shown that seed vigor is significantly related to the seed color and size, thus several physical features were identified as candidate predictors of high seed quality. Image recognition software was used to automate recognition of seed feature quality using 400 kernels of pepper cultivar 101. In addition, binary logistic regression and a neural network were applied to determine models with high predictive value of seed germination. Single-kernel germination tests were conducted to validate the predictive value of the identified features. The best predictors of seed vigor were determined by the highest correlation observed between the physical features and the subsequent fresh weight of seedlings that germinated from the 400 seeds. Correlation analysis showed that fresh weight was significantly positively correlated with eight physical features: three color features (R, a*, brightness), width, length, projected area, and single-kernel density, and weight. In contrast, fresh weight significantly negatively correlated with the feature of hue. In analyses of two of the highest correlating single features, germination percentage increased from 59.3 to 71.8% when a*≥3, and selection rate peaked at 57.8%.Germination percentage increased from 59.3 to 79.4%, and the selection rate reached 76.8%, when single-kernel weight≥0.0064 g. The most effective model was based on a multilayer perceptron (MLP) neural network, consisting of 15 physical traits as variables, and a stability calculated as 99.4%. Germination percentage in a calibration set of seeds was 79.1% and the selection rate was 90.0%. These results indicated that the model was effective in predicting seed germination based on physical features and could be used as a guide for quality control in seed selection. Automated systems based on machine vision and model classifiers can contribute to reducing the costs and labor required in the selection of pepper seeds.

Keywords: pepper seed, image processing, machine vision, seed vigor, binary logistic regression, multilayer perceptron neural network

1. lntroduction

Pepper (Capsicum spp.), one of the most important vegetable crops in China, has been cultivated in more than 20 provinces, and its production value ranksfirst among allother kinds of vegetable crops. Thus, the pepper industry has developed rapidly, and there is a growing demand by seedling factories for higher quality pepper seeds in production. The quality of pepper seeds have to meet specific criteria, typically in seed size, color, and texture,in order to be considered of high value. Traditional seedsorting methods, including the use of winnowing and gravity separating machines, do not meet the more stringent requirements of refined processing. Selection criteria may also vary with different levels of worker experience. In addition, separating high-quality seeds from low-quality seeds is a cost in production that can be minimized by technologicaladvances in machinery. Furthermore,conducting the germination test for seed quality control is a time- and labor-intensive process. Therefore, industrialautomated machines can help to significantly reduce processing time and labor during the quality determination process of pepper seeds.

Image processing is a powerful method that has increasingly been applied in research and subsequently, has been widely used in the detection of agricultural products(Chupawa and Kanjanawanishkul 2014; Szczypinski et al.2015). Wiwart et al. (2012) analyzed shape features, such as Feret’s diameter and roundness, and color descriptors,such as hue, saturation, and intensity, to identify wheat varieties using image processing and principal component analysis. Furthermore, many studies showed that seed vigor was significantly related to the seed color and size. A deeper seed coat color indicated greater vigor and maturity of a seed (Saeidi and Rowland 1999; Zareian et al. 2013;Chaugule and Mali 2016). More recently, Huang and Cheng(2017) have developed an automated machine that can efficiently sort cabbage seeds by machine-vision to extract data on seed characteristics of shape, color and texture.Then these features are used as input variables in neural networks to classify the quality of seeds.

Model classifiers have been applied in agriculture(Chen et al. 2010; Torkashvand et al. 2017). For example,hyperspectral images and neural networks were applied to detect chilling injury in apples in the study of ElMasry et al.(2009), with the use of RGB images and L*, a*, b* images.Machine vision and neural networks have also appeared in the study of Chen et al. (2007) to classify the quality of moldy peanuts. Lurstwut and Pornpanomchai (2016)examined germination in rice seeds with the application of a machine vision and image processor. Kurtulmus et al. (2016) determined that a 30-neuron hidden layer in a multilayer perceptron model, using a training algorithm of resilient back propagation, was most successful in classifying seeds among eight different pepper varieties with 84.94% accuracy.

However, few studies have applied image processing and classifiers to the selection of pepper seed. The goalof this study was to improve the selection of pepper seeds by using a simple image processing machine, binary logistic regression, and multilayer perceptron neural network model classifiers. Several physical features such as, color (e.g.,R, G, B, L*, a*, b*, and gray) and geometric features (e.g.,width, length, and projected area) were identified using image recognition software. The probability of pepper seed germination was then evaluated and subsequently,a single-kernel germination test was conducted to validate the efficacy of selection features and models applied in this study. Finally, we determined an optimal classifier to improve selection for higher quality pepper seeds.

2. Materials and methods

2.1. Machine vision system and experimental samples

The machine vision system implemented in this study included a scanner (Uniscan D6810, Unisplendour Corporation Limited (UNIS), China), which captured portable network graphics (PNG) images measuring 2 478×3 510 pixels in bitmap format. Adobe Photoshop CS6 was used to remove the background of the PNG image. The color features, R, G, B, gray scale, hue (H), saturation (S),brightness (B), L*, a*, and b* (the latter six color features were calculated based on average and modal values of R,G and B), to gether with width, length and projected area of a pepper seed were then extracted automatically using a software developed by our lab called Seed Identification(Fig. 1). This software processes images and records physical information of seeds, and then outputs all the information in an Excel file. The entire image processing method can be completed in seconds, with a low error rate of less than 2%. Two additional features, seed weight and density, were included in this study because the plumpness of seeds re fl ects the maturity of seeds, and thus can be indicative of seed vigor or quality. According to previous observations (data not shown), the differences were evident in seed weight and density between plump and withered seeds from our pepper seed lot. The weight and density of each pepper seed were obtained by an electronic density balance (FA1104J, Shanghai Sunny Hengping Scientific Instrument Co., Ltd., Shanghai, China). Then, a singlekernel germination test was conducted. The germination experiment was performed to obtain seedling fresh weight which was a surrogate measure of seed vigor in this study.

Four hundred seed kernels were randomly selected from a seed lot of cultivar 101. These pepper seeds were provided by Beijing Biosow Seed Co., Ltd., China.A standard germination test (ISTA 1996) was conducted on 400 kernels of pepper seeds to determine a baseline comparison after the physical determination of seed features. The original germination percentage was 59.3%.

Fig. 1 Feature extraction of pepper seeds. A, scanned image of pepper seeds. B, pre-processed image of pepper seeds. C,extracting features by Seed Identification Software developed by our lab.

2.2. Statisticalanalysis

All data, including those extracted by Seed Identification and those of weight and density, were analyzed by the statisticalanalysis software IBM SPSS Statistics 21 and Microsoft Excel 2013. Ten color features (R, G, B, L*, a*,b*, H, S, B, and gray), three geometric features (width,length, and projected area), and seed weight and density were employed to predict the viability of each pepper seed.Correlation analysis between each predictor (each of the 15 features) and fresh weight of seedlings was conducted.First, two predictors with the highest coefficient of variation and correlation with the fresh weight of seedlings were selected and used in single predictor models to classify and select pepper seeds. The pepper seeds were graded according to the principle of removing un-germinated seeds as much as possible for each single feature predictor.Then, classification was performed by using an artificial network, such as a multilayer perceptron. Binary logistic regression (BLR) was used to build the model in predicting the probability of seed germination to improve the selection process for high quality pepper seeds.

Feature selection Color and geometricalanalyses have been widely adopted in the classification process of seed quality and were applied in this research. Two categories of features were used as independent variables to select for potential predictors of seed quality and thus, high likelihood of germination: category A, allof the 15 features and category B, the nine vigor-related features (R, a*,brightness, width, length, projected area, density, singlekernel weight and hue).

Multilayer perceptron network classifie One of the most common neural network topologies is the multilayer perceptron. Its efficacy has been tested in many classification tasks (Boniecki et al. 2015; Kujawa et al.2014). Multilayer perceptron (MLP) is a feed-forward neural network, where its generalarchitecture includes an input layer, hidden layer, and output layer. The MLP network is a function of one or more predictors that minimizes the prediction error of outputs. Fig. 2 shows the MLP topology,similar to the research of Kurtulmus et al. (2016), that was used in this study. A single hidden layer containing a sufficient number of neurons can satisfactorily predict the most complex problems, and has been confirmed in previous studies (Nazghelichi et al. 2011; Omid et al. 2009). In this study, the MLP network with one hidden layer was chosen and implemented in SPSS. Units in the hidden layers used SPSS’ hyperbolic tangent activation functions. Units in the output layer used SPSS’ identity activation functions.

To avoid overfitting, we choose 50% of the data as training set, and 25% of the data as test set. A holdout set(25% of the data) was completely excluded from the training process and was used for an independent assessment of thefinal network.

Fig. 2 Multilayer perceptron network topology used in this research. 1, germinated seed; 0, un-germinated seed.

The input layer had 15 nodes which were related to our physical features of ten colors, three dimensions, weight and density. Z-score normalization was used to standardize the input features. The output layer was made of nodes related to two categories: germinated seed (1), and un-germinated seed (0).

Predictive model based on binary logistic regression Binary logistic regression is used to predict the probability of occurrence of an event byfitting data into a logistic curve. When there is a correlation between variables and dependent variables, binary logistic regression can be used to generate a predictive modelusing several variables (Boz 2016). Although this approach has rarely been reported with pepper seed selection, we applied this method to generate a predictive model for predicting if the pepper seeds can germinate or not. In this research, two categories A and B were respectively used as independent variables and germinated seed (1) and un-germinated seed (0) were used as dependent variables. All data were analyzed by the statisticalanalysis software SPSS. Approximately 50% of the total pepper seeds, including germinated and un-germinated, were used to create the model. The rest were used to validate the model’s functionality. Seeds used for the model development and model validation sets were selected at random by Bernoulli variates with a probability parameter of 0.5.

Binary logistic regression was used to explore the relationship between the features of pepper seeds and the probability of these seeds to germinate. The variables related to the probability of pepper seed to germinate are as follows:

Where, πiwas the probability of the ith case in the event that a pepper seed germinates; xijwas the jth variable for the ith case; and bjwas the coefficient of the jth variable.

In this study, a π value of “0” indicates a seed that will not germinate, whereas a π value of “1” indicates a seed that will germinate. However, π was set as the dependent variable in the model; the two categories of 15 or nine features were selected as independent variables. Z-score normalization was used to standardize the input features. The regression coefficients were estimated through an iterative maximum likelihood method.

Statistics was applied to generate classification tables and evaluate the applicability of the model for estimating whether the pepper seeds germinated or not.

3. Results

3.1. Correlation analysis between pepper seed features and seed vigor

As shown in Table 1, the fresh weight of seedlings was significantly positively correlated with eight physical features,R, a*, B, width, length, projected area, density and singlekernel weight (P<0.01) and was significantly negatively correlated with hue (P<0.01). None of the remaining features had significant correlations with seedling weight.Seed vigor was represented by the fresh weight of seedlings;therefore features that correlated with the greatest seedling fresh weight were the better predictors of vigor. The coefficients of variation indicated that the selection of pepper seeds based on the single features of a* and single-kernel weight is easier.

3.2. Pepper seed selection by the single seed feature

The coefficient of correlation and the coefficients of variation indicated that pepper seed quality was best classified by a*and single-kernel weight (Table 1) where the germination percentages increased with the increase in grade-levels of a* and weight (Table 2). Germination percentage increased from 59.3 to 71.8% with the increasing grades of a* and reached a selection rate of 57.8% when the grade of a*≥3.Germination percentage increased from 59.3 to 79.4%with the increasing grades of single-kernel weight and the highest selection rate was 76.8% at a grade of ≥0.0064 g.In the verification experiment, 200 pepper seeds from the same seed lot were selected at a*≥3 and weight ≥0.0064 g.Results showed that the germination percentage of these seed were 71.3 and 72.4%, respectively, after selection.

3.3. Multilayer perceptron network classifier

As shown in Table 3, of the pepper seeds in the test set category A, 67 seeds were predicted to germinate, however,

only 53 of the 67 seeds were correctly and 14 were incorrectly classified by the multilayer perceptron network classifier. Therefore, the predicted germination percentage of category A was 79.1% with a predicted selection rate of 90.0%. Similarly for category B, the predicted germination percentage and selection rate of the test set were 79.1 and 89.7%, respectively. The stability rates of these two models were 99.4 and 99.9%, respectively. Stability rate is the ratio of accuracy rates of the test to training sets, where more similar rates indicate more stable models. The holdout set was used to independently assess the MLP network and the efficacy of these models.

Table 1 Correlation analysis between features and seed vigor

Table 2 Pepper seeds selection based on single seed feature

Table 3 The classification table of multilayer perceptron

3.4. Binary logistic regression

As shown in Table 4, in category A, 101 of the 119 germinated seeds and 57 of the 81 un-germinated seeds were classified correctly, in the selected cases. Overall,the germination percentage of category A was 80.8%and was obtained by calculating the number of correctly predicted germinated seeds (101) and dividing it by the totalof number of seeds predicted to germinate (101+24).The selection rate was 84.9%. Similarly, the germination percentage and selection rate of category B were 80.3 and 85.7%, respectively. Because the modelappeared to be“over-fitting”, those pepper seeds that were not used to develop the model were selected as part of two validation sets, and the results were shown under the unselected cases columns in Table 4. The germination percentages of the validation set for category A and B were 80.2 and 81.7%, and the selection rates were 82.2 and 83.1%,respectively. The respective stability rates of the model for category A and B were 98.1 and 100%, respectively.

4. Discussion

4.1. Comparison of several results

From Table 5, the seed selection based on a single predictor, germination percentage increased from 59.3 to 71.8% when a*≥3, with a selection rate at 57.8%.Germination percentage also increased from 59.3 to 79.4%, and the selection rate reached 76.8% when singlekernel weight≥0.0064g. Compared to the single predictor selection method based on a single feature predictor model, the selection based on multilayer perceptron network and binary logistic regression models obtained higher germination percentages while maintaining a high selection rate. Moreover, the stabilities of these models remained high. Thus, it was feasible to select kernels that would likely germinate using this model. A comparison of all the models showed that the multilayer perceptron neural network, with 15 features (ten color features: R, G, B, L*, a*,b*, hue, saturation, brightness, and Gray, three geometric features: width, length, and projected area, seed weight and density) employed as variables to determine seed qualityand thus, likelihood of germination of pepper seeds, was the best model because the selection rate was the highest at 90%, germination percentage increased to 79.1%, and the model’s stability rate was 99.4%. Similar results were obtained in the validation experiment.

Table 4 The classification table of binary logistic regression

Table 5 Comparison of model results

4.2. Analysis method

In this study, the Seed Identification software was used to extract the physical features of pepper seed. In addition,another two important features, weight and density of single kernel, were introduced. Using these feature as predictors in a series of selection models, single predictors,multilayer perceptron and binary logistic regression models,we determined that all models had high germination percentages, selection rates, and stability rates. Moreover,the best model was developed by using a multilayer perceptron neural network. Satisfactorily high selection rates for sorting out high quality pepper seeds were achieved.However, classification accuracy was not high enough(usually about 85%) but the results obtained were similar to those from a previous study (Kurtulmus et al. 2016). Thus,our results are promising and encourage more research that focus on applying seed selection methods based on image processing and classifiers to improve the laborious task of selecting for high quality pepper seeds. Application of these technologicaladvances in the agricultural industry could significantly improve the processing time and selection accuracy to meet the demand of pepper seed production and therefore, increase economic profits.

We are optimistic that the multilayer perceptron neural network model, with 15 features chosen as covariates,can be implemented within the agricultural industry. An auto-sorting device for pepper seeds is expected to be constructed, similar to that described in the research of Huang and Cheng (2017). The device will collect images of physicalattributes of seeds, extract those features and then use proposed algorithms to classify the quality of seeds, accurately and efficiently. However, one limitation in the presented work was inefficiency in the slow process of collecting data on weight and density of each and every seed. Weight and density are two very important features, not only because they highly correlated with seed vigor (seedling fresh weight), but also because they were important covariates in the classifier and thus cannot be omitted, but there is no rapid measurement for single kernels’ weight or density. However, further improvements can be made to feature extraction methods and to increase classification accuracy by using various machine learning classifiers and a variety of features, for example, including an advanced segmentation algorithm.

5. Conclusion

The selection of pepper seeds based on the single predictor of each of two features (a* and single-kernel weight), had its advantages and disadvantages, for they can improve the germination rates, but the selection rates were low.Both models with a single feature could not satisfy two important conditions simultaneously: increasing germination percentage while achieving a high selection rate. In order to determine the optimal selection method, this study applied the binary logistic regression network classifier, and established a model based on binary logistic regression, to predict whether a seed would germinate. Models developed by multilayer perceptron and binary logistic regression were the better predictors of germination of pepper seeds compared to the single feature models. Comparisons of all models showed that the multilayer perceptron neural network, with 15 features chosen as covariates, was the best model. Germination percentage rose from the original 59.3 to 79.1%, and the selection rate was as high as 90%,and the model stability was 99.4%.

Acknowledgements

This study was supported by the Beijing Municipal Science and Technology Project, China (Z151100001015004).