Efficient model building in active appearance model for rotated face

2013-12-20JaehyunSoSanghunHanYoungtakKimHwanikChungYoungjoonHan

Journal of Measurement Science and Instrumentation 2013年4期

Jaehyun So, Sanghun Han, Youngtak Kim, Hwanik Chung, Youngjoon Han

(1. Department of Electronic Engineering, Soongsil University, Seoul 156-743, Korea; 2. Department of Computer Science, Kyungbok University, Pocheon 487-717, Korea)

Jaehyun So1, Sanghun Han1, Youngtak Kim1, Hwanik Chung2, Youngjoon Han1

(1. Department of Electronic Engineering, Soongsil University, Seoul 156-743, Korea; 2. Department of Computer Science, Kyungbok University, Pocheon 487-717, Korea)

This paper proposes the efficient model building in active appearance model (AAM) for the rotated face. Finding an exact region of the face is generally difficult due to different shapes and viewpoints. Unlike many papers about the fitting method of AAM, this paper treats how images are chosen for fitting of the rotated face in modelling process. To solve this problem, databases of facial rotation and expression are selected and models are built using Procrustes method and principal component analysis (PCA). These models are applied in fitting methods like basic AAM fitting, inverse compositional alignment (ICA), project-out ICA, normalization ICA, robust normalization inverse compositional algorithm (RNIC) and efficient robust normalization algorithm (ERN). RNIC and ERN can fit the rotated face in images efficiently. The efficiency of model building is checked using sequence images made by ourselves.

active appearance model (AAM); Procrustes alignment; principal component analysis (PCA); inverse compositional alignment (ICA); project-out ICA; normalization ICA; robust normalization inverse compositional algorithm (RNIC); efficient robust normalization algorithm (ERN)

CLD number： TP391.41 Document code： A

Many people want to know what other people think. With the devolopment of computer technology, this problem has been solved by using various approaches such as image, sound and biometrical signal. However, the perfect method is not as useful as ever, and this problem has been challenged by many researchers for a long time. Now it is mainly solved by computer vision theory because facial expression can show the emotions of a person.

In the early days, researchers focused on finding rectangular facial region like Viola and Jones's method[1]. However, this region includes unnecessary noises like background information. If many noises are in the region, facial expression can not be classified. To remove background, active appearance model (AAM)[2]was used to efficiently fit using model made by many training images. However, it is too dependent on texture error between current warped image and mean model, and it is slow because of complex calculation. To overcome this shortcoming, many fitting algorithms were proposed. Inverse compositional alignment(ICA)[3]makes computation minimum through making hessian image in pre-computation. Project-out ICA[4]considers appearance variation. Normalization fitting[4]is similar with project-out ICA. These methods have good performance but they can not efficiently fit in occluded face. So, robust normalization inverse compositional algorithm (RNIC)[5]was proposed to solve it using robust error function[6], but it is very slow because hessian images can not be made in pre-computation. Efficient robust normalization algorithm (ERN)[7]solves this problem by separating triangular regions made by Delaunay triangulation for pre-computing hessian images. In addition, nonlinear updation model was proposed[8]. But if model is made by bad images, fitting will fail. Therefore, the selection of training images is very important.

AAM consists of a shape model and an appearance model. Each model has a mean model and eigenvectors. These factors will affect fitting performance of various faces. Thus, we need to learn how the selection of training images affects results.

The rest of the paper is organized as follows. Section 1 describes AAM methods, including modeling and fitting methods (ICA, project-out ICA, normalization ICA, RNIC and ERN). Section 2 shows the experimental environments and results. A conclusion is given Finally.

1 AAM

1.1 Model building

Each training image has landmarks of face manually made by people.Based on these landmarks,training images are aligned by Procrustes method[8]as

Eq.(1) indicates points of the landmarks. Procrustes method aligns the images to the same position, the same scale and the same rotation. Finally, a mean model and eigenvectors are calculated by principal component analysis (PCA) using aligned images[1]. This model consists of shape model and appearance model and it can express various faces using the mean model and eigenvectors.

Eqs.(2) and (3) are shape model and appearance model, respectively. Shape model can be expressed by a mean shape S0plus a linear combination of n shape vectors Si. Appearance model is similar with shape model. Each vector is calculated by PCA. Assume that eigenvectors include 95% information of all the images. Appearance model is similar with shape model. Each vector is calculated by PCA. Assume that eigenvectors include 95% information of all images. Fig.1 shows the model building process based on Japanese female facial expression (JAFFE) database.

Fig.1 Flow of model building using JAFFE database

1.2 Model fitting

The goal of fitting is to minimize errors between current warped image and mean model, and it can be expressed by

where /(W(X;P)) is current warped image by Piecewise Affine transformation[9].

1) ICA

Forward fitting can warp current image to model, but this approach includes complex computations for Jacobian and steepest decent(SD) image. ICA solves this problem as reversing model and image.

Eq.(5) is Taylor series expansion of Eq.(4). And Eq.(6) is estimation of variation using Gauss-Newton method. H is hessian image of SD.

2) Project-out ICA and normalization ICA

Project-out ICA and fitting include appearance variation, which can be describe as

The above-mentioned methods use the orthonormalization of appearance vectors. Gram-Schmidt process is useful to orthonomalization calculation.

3) RNIC and ERN applied robust error function

RNIC and ERN apply robust error function for immunity to outliers.

where σ is scale parameter, which has effect on decision of outliers.

Fig.2 presents the calculation process using robust error function between model and image.

Fig.2 Calculation of robust error function between model and image

2 Experiment for efficient model

The goal of this paper is to learn which images are used for efficient model. We use the images from JAFFE database and the National Cheng-Kung University NCKU, as shown in Fig.3. JAFFE database includes various facial expressions and NCKU database includes various facial rotations. And image testing is made by ourselves.

Fig.3 JAFFE(upper) and NCKU(lower) databases

The role of each model is different. Shape model controls variation of facial shape. Appearance model controls variation of textures. If model is made by overfull images, fitting will fail. Therefore, model should be made using moderate amounts of training images. We conduct this experiment to know how the model affects fitting. Table 1 shows the experimental environment.

Table 1 Experimental environment

We manually make 68 points of facial landmarks and apply project-out ICA, RNIC and ERN as fitting algorithms. Texture features are gray and one channel. The tested images are about a rotated face, as shown in Fig.4. Robust error function uses Talwar function with 2.795 scale parameter[10].

Fig.4 Tested images about rotated face

We check the number of failures when fitting fails and high number of failures is better. Table 2 shows the results of the failed sequence.

Table 2 Results of failed rotation

If model is built by more training images, fitting performance will be better. However, the results of RNIC have a little difference, which shows that if training images do not include special shapes, good result can be expected, but if training images are not enough, fitting will fail.

3 Conclusion

We conduct an experiment about the rotated face based on AAM models. The more training images are used in model building, the better the rotated face is fit. Since the results of RNIC and ERN are good using JAFFE models (108 images and no rotation), the better results can be expected in case of rotation. Furthermore, project-out ICA can be expected if model is made using training images about the rotation. RNIC ane ERN are optimal for the occluded face. In case of occlusion, ICA does not fit.

We will research the effect of many appearance vectors and the recognition of facial expression in case of rotation and occlusion in the future.

[1] Viola P, Jones M. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137.

[2] Cootes T F, Edwards G J, Taylor C J. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(6): 681-685.

[3] Baker S, Matthews I. Equivalence and efficiency of image alignment algorithms. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2001, 1: 1090-1097.

[4] Baker S, Matthews I. Lucas-Kanade 20 years on: a unifying framework: part 1: the quantity approximated, the warp update rule, and the gradient descent approximation. International Journal of Computer Vision, 2004, 56(3): 221-255.

[5] Baker S, Gross R, Matthews I. Lucas-Kanade 20 years on: a unifying framework: Part 3. Technical Report CMU-RI-TR-03-35, Carnegie Mellon University Robotics Institute, 2003.

[6] Huber P. Robust statistics. John Wiley & Sons, USA, 1981.

[7] Gross R, Matthews I, Baker S. Constructing and fitting active appearance models with occlusion. In: Proceedings of the 1st IEEE Workshop on Face Processing in Video (FPiV), 2004: 1-8.

[8] Saragih J, Goecke R. A nonlinear discriminative approach to AAM fitting. In: Proceedings of the 11th IEEE Conference on Computer Vision( ICCV2007), Rio de Janeiro， Brazil, 2007: 1-8.

[9] Cootes T F. Statistical models of appearance for computer vision. [2013-03-21]. http:∥www.isbe.man.ac.uk/bim/refs.html.

[10] Theobald B, Matthews I, Baker S. Evaluating error functions for robust active appearance models. In: Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR06), Southampton, UK, 2006: 149-154.

date： 2013-09-12

Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 2012M3C4A7032182)； The MSIP (Ministry of Science, ICT & Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2013-H0301-13-2006) supervised by the NIPA(National IT Industry Promotion Agency)

Youngjoon Han (young@ssu.ac.kr)

1674-8042(2013)04-0346-03

10.3969/j.issn.1674-8042.2013.04.010

Journal of Measurement Science and Instrumentation

2013年4期