APP下载

Distortion Function for Emoji Image Steganography

2019-06-12LinaShiZichiWangZhenxingQianNannanHuangPaulinePuteauxandXinpengZhang

Computers Materials&Continua 2019年6期

Lina Shi,Zichi Wang,Zhenxing Qian, ,Nannan Huang,Pauline Puteaux and Xinpeng Zhang

Abstract:Nowadays,emoji image is widely used in social networks.To achieve covert communication in emoji images,this paper proposes a distortion function for emoji images steganography.The profile of image content,the intra- and inter-frame correlation are taken into account in the proposed distortion function to fit the unique properties of emoji image.The three parts are combined together to measure the risks of detection due to the modification on the cover data.With the popular syndrome trellis coding (STC),the distortion of stego emoji image is minimized using the proposed distortion function.As a result,less detectable artifacts could be found in the stego images.Experimental results show that the proposed distortion function performs much higher undetectability than current state-of-the-art distortion function HILL which is designed for natural image.

Keywords:Steganography,emoji image,distortion function.

1 Introduction

Data hiding embeds additional data into digital media without causing seriously distortion to guarantee the usability of cover object [Dong,Zhang and Liu (2018);Qian,Xu,Luo et al.(2018);Qian and Zhang (2016)].The purposes of data hiding are covert communication and copyright protection usually [Qian,Zhang and Wang (2014);Qian,Zhou,Zhang et al.(2016);Wang,Qian,Zhang et al.(2018)].To achieve covert communication,steganography aims to embed secret data into digital media without drawing suspicion by slightly modifying cover data [Li and Zhang (2019);Wang,Zhang and Yin (2018);Wang,Yin and Zhang (2019)].Early works by Fridrich et al.[Fridrich and Soukal (2006);Zhang and Wang (2006);Zhang,Zhang and Wang (2008)] increasing the undetectability of steganography by decreasing the quantity of modifications on cover data.However,the undetectability is not satisfactory since the security performance of steganography is also related to the modified locations.The currently popular approach by Fridrich et al.[Fridrich and Filler (2007)] is to minimize the additive distortion between the cover and the stego objec,which is achieved by syndrome trellis coding (STC)[Filler,Judas and Fridrich (2011)].In this framework,a user-defined distortion function is used to assign embedding costs for all cover elements to quantify the effects of modification.There are many distortion functions designed for spatial images,such as Holub et al.[Holub and Fridrich (2012);Holub,Fridrich and Denemark (2014);Li,Wang,Huang et al.(2014);Sedighi,Cogranne and Fridrich (2016)] and JPEG images,such as Guo et al.[Guo,Ni and Shi (2014);Guo,Ni,Su et al.(2015);Wang,Zhang and Yin (2016);Wei,Yin,Wang et al.(2018);Du,Yin and Zhang (2018)].

For other kinds of images,new distortion functions should be proposed to fit their unique properties.In the age of big data currently,many kinds of digital images [Guan,Zhang,Wu et al.(2019);Wu,Dong,Ota et al.(2018)] are emerged.Specially,emoji image is widely used in social networks,e.g.,Twitter,Facebook,and instant messaging systems,e.g.,Skype,WeChat,to express emotion vividly.Different with natural image,as shown in Fig.1,the emoji image is constituted by several curves with legible profile.The correlation between pixels is different from natural image which can be modeled as Markov chain.To save storage space,the usual format of emoji image is palette (a typical example:graphics interchange format).For the vitality of expression,most of the emoji images are motional.That means there are more than one frame contained in each emoji image.In this case,the correlation between the frames should also be considered for steganography.Furthermore,this inter-frame correlation is different from the correlation in natural images which are motionless.

Figure 1:Several emoji images

Existing distortion functions [Holub and Fridrich (2012);Holub,Fridrich and Denemark (2014);Li,Wang,Huang et al.(2014);Sedighi,Cogranne and Fridrich (2016);Guo,Ni and Shi (2014);Guo,Ni,Su et al.(2015);Wang,Zhang and Yin (2016);Wei,Yin,Wang et al.(2018)] are designed for natural image,which aim to restrain embedding changes into texture and complex regions to conceal the modification trace [Wang,Yin and Zhang (2018)].Although these distortion functions perform well in natural image,they are not suitable for emoji image since the profile has not been used enough.Therefore,it is necessary to develop customized distortion function for emoji image.To the best of our knowledge,there is no distortion function designed for emoji image.

To fill up this gap,we propose a distortion function for steganography in emoji image.Different with existing distortion functions designed for natural image,the proposed distortion function combines the profile of image content,the intra- and inter-frame correlation together to measure the risks of detection due to the modification on the cover data.In this way,the unique properties of emoji image are considered.When secret data is embedded with syndrome trellis coding,the obtained stego emoji exposes less detectable artifacts.

2 Structure

The structure of the proposed method is shown in Fig.2.To fit the properties of emoji image,the profile of image content,the intra- and inter-frame correlation are employed to form the profile,texture,and variation cost respectively.Then the three parts are combined together to measure the risks of detection due to the modification on the cover data.

Figure 2:Structure of the proposed distortion function

2.1 Emoji image

The usual format of emoji image is palette.As shown in Fig.3,each palette image is composed of a color palette and a color index matrix.The color palette is a list of entries of representative colors in the image,and the elements in the color index matrix are pointers to those palette entries that specify the red-green-blue (RGB)colors [Tzeng,Yang and Tsai (2004)].

Figure 3:Demonstration of the palette format

Since there is more than one frame (color index matrix)contained in each emoji image,each emoji image is composed of a color palette and several color index matrices corresponding to the frames.In other words,an emoji image is composed of several palette images with only one-color palette.The color palette is shared with all color index matrices.

2.2 Distortion function design

According to the properties of the emoji images,a new distortion function is designed.For an emoji image withkcolors andlcolor index matrices,denote thei-th index in color palette asci,i∈{0,…,k},thej-th color index matrix with sizeM×Nas Xj={xj(u,v)}∈{ci}M×N,j∈{0,…,l}.The proposed distortion function assigns a embedding cost for eachxj(u,v).The details are as follows.

Denote the RGB color values corresponding tociasRi,Gi,andBirespectively.To minimize the color value distortion cause by the modifications made onxj(u,v)during steganography,the value-similarxj(u,v)should corresponding to similar (Ri,Gi,Bi)values.To achieve this,all the Xjare modified using Algorithm 1.

Algorithm 1 Color index matrix adjustment Input:The j-th color index matrix Xj,color palette ci.Output:Adjusted Xj.(1)Find all the minimal xj(a0,b0)in Xj,a0∈{1,…,M},b0∈{1,…,N},then set all xj(a0,b0)as 0;(2)Calculate θi=2iR+2iG+2iB for all the M×N xj(u,v).(3)Set w=1;(4)Find all xj(aw,bw)with the most similar θi with xj(aw-1,bw-1),then set all xj(aw,bw)as w;(5)Repeat step (4)for all the cases of w> 1 until all the xj(u,v)in Xjare adjusted.

The first part of the proposed distortion function aims to take use of the intra-frame correlation.This purpose is similar with the distortion functions for natural images.Current state-of-the-art distortion function is HILL,which constituted by a high-pass filter and two low-pass filters to make the modifications clustered.We employ the approach of HILL to assign the texture costρjT(u,v)for eachxj(u,v).The details are as follows.

Let Fhbe a high-pass filter,the residuals Rjof Xjare calculated using Eq.(1)firstly.

where

The nonexistent pixel which is out of the image boundary would be obtained by pixel symmetric padding.For example,xj(u+1,v)is obtained by copyingxj(u-1,v)when it is out of the block boundary,and vice versa.Then two low-pass filters Fl1and Fl2are employed to obtained the cost matrix ρjH={ρjH(u,v)}M×Nin HILL [Li,Wang,Huang et al.(2014)] using Eq.(3).

where |·| represents the absolute value operation,and Fl1and Fl2are average filters sized 3×3 and 15×15 respectively.

Denote the color value distortion caused by modifyingxj(u,v)intoxj(u,v)+1 asδj(u,v),as shown in Eq.(4).WhereRj(u,v),Gj(u,v),Bj(u,v),andRj+(u,v),Gj+(u,v),Bj+(u,v)are the corresponding RGB color values ofxj(u,v)andxj(u,v)+1 respectively.

Thus,the texture costρjT(u,v)for eachxj(u,v)is defined in Eq.(5).Where the costρjH(u,v)is calculate using Eq.(3),which is the same with the cost in HILL.

To extract the content profile of frame Xj,it is transformed into grayscale image firstly by replacingxj(u,v)with the corresponding real luminance value.Then the obtained grayscale image is further transformed into binary image Yj={yj(u,v)}M×N.Thus,the value of the elements in Yjare 0 or 1.The content profile of Xjis the locations corresponding to “0” in Yj.

The locations corresponding to “1” in Yjbelong to the background of Xj,which are so smooth that any modifications would be discovered.Therefore,these locations are not suitable for steganography.Accordingly,the profile cost ρjP={ρjP(u,v)}M×Nfor eachxj(u,v)can be obtained using Eq.(6).

In order to utilize the inter-frame correlation among different Xj,we consider the color difference between Xjand Xj-1.Both Xjand Xj-1are transformed into grayscale image to calculate the color difference.To avoid the subscript of Xjoverflowing,the embedding tasks are not done on the first frame X1.That means the first frame is kept unchanged during data embedding.Denote the (u,v)th pixels of the grayscale Xjand Xj-1aspj(u,v)andpj-1(u,v)respectively,the color difference Dj={dj(u,v)} between Xjand Xj-1is,

Then the variation cost ρjV={ρjV(u,v)}M×Nfor eachxj(u,v)is defined in Eq.(8)to decrease the modifications on the frame which is similar with the previous frame.

where the values “1.3” and “15” are empirically determined by experiments.

Finally,the three parts (texture costρjT(u,v),profile costρjP(u,v),variation costρjV(u,v))are combined together by multiplication.The final embedding costρj(u,v)assigned forxj(u,v)is defined in Eq.(9).

3 Experimental results

Several experiments are conducted to verify the effectiveness of the proposed distortion function method.Firstly,we setup the experimental environments.Subsequently,we analyze the quality of stego image.Finally,we provide the results of undetectability compared with HILL.

3.1 Experimental setup

To build the cover image set,we collected 560 emoji images that is widely used in social networks.These images are in palette format and each image contains 256 colors and several frames.There are 2557 frames in total of the 560 images.We have uploaded all the 560 images on https://pan.baidu.com/s/1nOsn_eoI8vLpgqo8ue8nOQ.

We compare the proposed distortion function with the popular distortion function HILL which performs the state-of-the-art undetectability.Since HILL is designed for spatial image,each frame is transformed into grayscale image firstly when embedding with HILL.In other words,for HILL embedding,there are 2557 grayscale images are used as covers.The capacity of secret data embedded in each frame is set as 600 bits,700 bits,800 bits,900 bits,1000 bits,and 1100 bits respectively.All embedding tasks are done by the embedding simulator [Pevný,Filler and Bas (2010)] since it is widely used to simulate the optimal embedding.For steganalysis,the feature sets SPAM proposed by Pevný et al.[Pevný,Bas and Fridrich (2010)] and SRMQ1 proposed by Fridrich et al.[Fridrich and Kodovsky (2012)] are employed in our experiments.The ensemble classifier proposed by Kodovsky e al.[Kodovsky,Fridrich and Holub (2012)] is used to measure the property of feature sets.In detail,half of the cover and stego feature sets are used as the training set while the remaining half are used as testing set.The criterion to evaluate the performance of feature sets is the minimal total errorPEunder equal priors achieved on the testing set in Kodovsky et al.[Kodovsky,Fridrich and Holub (2012)]:

wherePFAis the false alarm rate andPMDis the missed detection rate.The performance is evaluated using the average value ofPEover ten random tests.

3.2 Image quality demonstrations

The demonstrations of the proposed method are shown in Fig.4.Where Fig.4(a)is a cover emoji image composed of several frames.After each frame is embedded with 600,800,and 1000 bits respectively,the obtained stego images are shown in Figs.4(b),4(c)and 4(d)correspondingly.

It is clear that the stego images are close to the cover image,which means the visual quality of the stego images is satisfactory regardless of the capacity.Thus,the usability of emoji images is reserved after embedded adequate secret data using the proposed method.

Figure 4:Demonstrations of (a)cover and corresponding stego emoji images using the proposed method with capacity (b)600 bits,(c)800 bits,(d)1000 bits

3.3 Undetectability comparison

Fig.5 shows the undetectability comparisons of HILL and the proposed method against SPAM and SRMQ1 tested on all the 2557 frames,and Tab.1 depicts all numerical values.

Figure 5:Comparisons of the proposed method with HILL against (a)SPAM,(b)SRMQ1

Table 1:Testing errors of the proposed method and HILL against SPAM and SRMQ1

It is clear that the security performance of the proposed method is much better than HILL for all cases,regardless of the steganalytic tools and capacity.Specifically,thePEvalues of the proposed method are more than two times of HILL.For the cases of large capacity,e.g.,900,1000,1100 bits,thePEvalues of the proposed method are nearly three times of HILL.The large improvement on undetectability is because that the proposed distortion function is designed by following the unique properties of emoji image,while HILL not.In addition,inter-frame correlation is the most unique property of motion image,which is mentioned in Usui et al.[Usui,Takano and Yamamoto (2017)].The modifications of steganography should avoid destroying the correlation as far as possible.The correlation can be reflected in the difference between image frames.For this reason,we also give the undetectability comparisons on difference image in Fig.6 to further demonstrate the superiority of the proposed method.Tab.2 lists the corresponding numerical values.

Figure 6:Comparisons on difference image against (a)SPAM,(b)SRMQ1

Table 2:Testing errors tested on difference image against SPAM and SRMQ1

Since the first frame is kept unchanged during data embedding,as mentioned in Subsection 2.2,the difference images are obtained by calculating the differences between the first frame and the other frames respectively in each emoji cover and stego image.That means the first frame is kept unchanged during embedding.As shown in Fig.6,the undetectability tested on the difference images of the proposed method is still better than HILL.

4 Conclusion

A distortion function for emoji image steganography is proposed in this paper.To fit the properties of emoji image,the profile of image content,the intra- and inter-frame correlation are considered in the proposed distortion function.The three parts are combined together by multiplication to resist steganalysis.Experimental results proved the effectiveness of the proposed distortion function.For further study,it is significant to develop the theoretical optimal embedding for emoji image by uniting the steganographic methods for palette image.

Acknowledgement:This work was supported by the Natural Science Foundation of China (U1736213,61572308),the Natural Science Foundation of Shanghai (18ZR1427500),the Shanghai Dawn Scholar Plan (14SG36),and the Shanghai Excellent Academic Leader Plan (16XD1401200).