A Behavioral Authentication Method for Mobile Based on Browsing Behaviors

2020-11-05DongxiangChenZhijunDingChungangYanandMimiWang

IEEE/CAA Journal of Automatica Sinica 2020年6期

Dongxiang Chen,Zhijun Ding,,Chungang Yan,and Mimi Wang,

Abstract—The passwords for unlocking themobile devices are relatively simple,easier to be stolen, which causes serious potential security problems. An important research direction of identity authentication is to establish user behavior models to authenticate users. In this paper,a mobile term inal APP browsing behavioral authentication system architecture which synthesizes multiple factors is designed.Thisarchitecture is suitable for users using the mobile term inal APP in the daily life. The architecture includes data acquisition, data processing, feature extraction,and sub model training. We can use this architecture for continuous authentication when the user uses APP at the mobile term inal.

I.In t roduction

IIMEDIA research statistics predict that in 2018,China’s

amount of mobile payment w ill reach 5 trillion and 737 billion yuan,74 percent of the total amount of electronic payment, becom ing an important force to promote the developmentof Interneteconomy[1],[2].However, there are existing some potential threats.Whenmobile devices become themost important platform of the Internet economy,ensuring users’safety on accountof mobile devicesbecomesan urgent need.In order to provide better user experience,mobile APPs remember the passwords of accounts.Once the device is stolen,the illegal user can use all the accounts on all APPs,which results in great security risks.There are mainly two kinds of authentication methods:one-time authentication,continuous authentication.One-time authentication adds the unlocking stepwhen the device isawakened every time.In the unlocking step,there are three main types of authentication techniques.The first one is password authentication,such as password and password pattern.The drawback is that complex passwords w ill bring inconvenience to users,simple passwords are faced w ith the risk of being stolen easily.The second one is biometric authentication,such as fingerprint recognition[3]–[7],face recognition[8]–[11],and iris recognition [12].The drawback is that it can beeasily affected by physical environment.For example,face recognition can notwork in the dark.More seriously,biometric features are easily stolen.A complete fingerprintmay be obtained directly from the screen of a mobile phone,the complete facial features can be obtained directly from the social network site.Some researchers[13]–[15] propose authentication methods for unlocking based on users’behaviors data.However,because of the coexistence of different ways of unlocking mobile devices,illegal users can avoid behavioral authentication by cracking the authentication of biological information.It is themost serious that many applications such as e-book,video playerw ill keep the mobile phones unlocked while they are running.If themobile phone is stolen during such period,then one-time authenticationw illnot provide any security protection.Therefore,we need a continuous authentication method which keeps account security during the whole process of using themobile device.An important kind of continuous authentication is based on user’s behavior data.Because user’s behavior is hard to imitate,it can keep the security of themobile devices and the accounts of all APPs effectively.

The authentication method proposed in this paper is a kind of continuous authentication method, which can keep the security of the device continuously during the whole process of using themobile device.By analyzing the behavior data of the user, we can construct the behavior model of the legal user,and use it to keep the security of the mobile device.In themodel construction stage,the device sends the origin data to the server.Feature extraction,model construction,model saving are all completed on the server.In the authentication stage, the device sends the origin data to the server,the user model calculates the feature vectors and authenticates the identity of the user continuously and in real time.Illegal user processing is also done by the server.This paper proposes a new kind of continuous behavioral authenticationmethod for mobile device,this method is based on the data of users’browsing behavior on the APP,synthesizing three factors namely,the external environment,screen-sliding behavior,browsing habits.In this paper,we notonly consider the habit of screen-sliding operations when users use mobile devices,but also consider what certain environments users w ill use mobile devices in,aswellasusers’habitsof browsing certain kind of content in each certain environment.Thismethod uses three different sub models to characterize user behavior,aiming at building amore detailed user behaviormodel,and obtaining more flexible adjustment space in practical applications,so as to achieve better results.The contributions of this paper are as follows:

1)We propose an authentication method that can be embedded in the background of any mobile device APP,and doesnot have any effect on theuser’snormal use of APP.

2)The external environmentmodel is constructed by using the external environment data(such as light intensity,sound decibels,tilt angle,acceleration)when users use the APP.

3)The comprehensive model is proposed in this paper,which synthesizes all the sub models. And we propose a complete authentication algorithm to effectively protect user account safety and device security.

II.Related Wor k

There are some researchers who propose different continuousbehavioralmethods for mobile devices.Fenget al.[16]propose a continuous authentication method based on users’gestures,and thismethod can achieve good results, but it need extra custom-built devices which brought great obstacle to large-scale promotion of thismethod.Fenget al.[17]propose a continuous authentication method based on users’typing virtual keyboard, but when users use mobile devices, they are mostly using gestures such as sliding and clicking,rather than using a virtual keyboard.Therefore,the method can notguarantee the security of the device when the user does not have text to input.Franket al.[18] propose a continuous authentication method based on the behavior of user’s interacting w ith the screen of smart phone.They take pressure,contact area,velocity,and other factors into consideration.Themodel can be constructed eitherw ith KNN[19]and SVM[20]algorithms.Good authentication results are achieved.However,the experimentaldata are collected by a unique collector,not in the ordinary daily life of the user.In addition,themethod does not take the external environment into consideration.However,in daily life of a user,the behavior can be easily affected by the external environment.Shenet al.[21]also propose a continuous authentication method based on the behavior of user’s touching behavior on the screen of smart phone.Themodel can be constructed by one-class SVM[22],so the model can be constructed w ith only positive samplesw ithout negative samples.They do not take the content which the user browses and external environment into consideration,which can affect users’behaviors.Sagave and Chaugule[23]propose a continuous authentication method using several gestures.Except for not considering the external environment,mostof the gestures do not happen frequently.Fenget al.[24] propose a continuous authentication method using several gestures and take the application context into consideration.They authenticate gestures w ith the data and the application where the gestures happen.However,in new Android systems,in the new Android system,the application of hand gesture data is not allowed to be obtained by other applications.It is impossible to use a listening program to simultaneously obtain the name of the application currently being used by the user and the gesture data to operate the application.Xuetal.[25] propose a continuousauthenticationmethod using gestures, pinch,and handw riting.The problem is that themethod pays too much attention to the user’s muscular behavior characteristics,which are easily affected by external environment and incidental events.Royet al.[26] propose a continuous authentication method using tap and slideoperation data based on hidden Markovmodel.They take pressure of tap and slide into consideration, but in the currentmobile phone equipped w ith capacitive touch screen,this value can not be obtained.And they do not take the external environment’s effect on users’ behaviors into consideration.

Compared w ith existing methods[16],[17],our method does not require extra devices,and isbased on high-frequency behavior.At the same time,the datawe need can be obtained in current popular devices w ithout barriers. Users can carry mobile phonesw ith them.The external environments can be of different kinds, which w ill affect the performance of user behavior at the data level.Our method can combine the external environment factor w ith users’behavior data effectively,compared w ith the existing methods[18],[21],[23].Ourmethod can be used w ithoutbeing restricted by the system,compared w ith existingmethod[24].Compared w ith the existingmethods[25],[26]themore uncertain factors are abandoned,themodel and strategy which canmaintain a low false positive false rate are adopted in our method.In this paper,the authentication method is designed to synthesize multiple factorseffectively,retain the personalization of users and reduce incidental impact.

III.Da ta Acquisit ion and Fea ture Ex t raction

In order to get the required experimentaldata and verify our authentication method,we develop a news APP using WebView library and news APIof Extreme Speed Data,w ith data acquisition program fragments embedded in its background.Users can browse all kinds of real-time new s through it,and the behavior data of browsing news APP w ill be recorded w ithout any constraints and restrictions in user’s daily life.The display interfaces of the contentof all kinds of news are the same.The interface and news content of the whole application are in Chinese.The interface of APP is shown in Fig.1.

Fig.1.The interface of the news APPwe developed for data acquisition.

TABLE I Sensor Da ta Vector

The external environment model,the screen-sliding behavior model,and the browsing behavior model need to capture the follow ing two kinds of original data:sensor data,screen-sliding data.The sensor data is collected every 5 seconds,and the sensor data vector is shown in Table I.

The screen-sliding data is collected every time the user performs a screen-sliding operation.In a screen-sliding trajectory, the data vector of each contact point is shown in Table II.

TABLE II Screen-Sl iding Data Vector

Themodel of browsing behavior is first initialized into a 0 vector.Whenever the environment class changes,or theuser’s browsing section changes,itw ill be reset to 0 vector.When the kind of environment the device is in and the section the users are browsing are unchanged,every timewemake a new upward sliding on the screen,the feature vector of the browsing behavior model w ill be updated iteratively according to the data of the upward sliding.The browsing behavior model feature vector is shown in Table V.

IV.M odel Const ruction and Au then tica tion Method

A.The Construction of the Comprehensive Model (CM)

The CM proposed in this paper synthesizes three sub models,namely:external environmentmodel(EEM),screensliding behavior model(SSBM), browsing behavior model(BBM).EEM is to describewhat kind of environment the user uses the APP in.SSBM is based on EEM.The data used to train SSBM is classified according to.SSBM is aim ing at describing what areas the trajectoriesof sliding operationsw ill concentrateon in a given kind of environment.BBM is based on EEM and the sections,the data used to train BBM are classified according to.BBM is aiming at describing how user browsing the content of each section in a given kind of externalenvironment.The construction of the CM is shown in Fig.2.

A session’s sensor data is sorted to be a sequence of sensor feature vectors according to time stamps.The screen-sliding data is sorted according to the time stamp to be the screensliding vector sequence.In the construction stage of EEM,only the sensor feature vector sequence is scanned,and the EEM feature vector sequence isgenerated w ithout proceeding the screen-sliding data.The EEM feature vectors of all the sessions aremerged as the data set for training.After being constructed,EEM and the EEM feature vector sequences of all the sessions w ill be the basis of the construction of SSBM and BBM.The formalized representation of the EEM construction process isshown in Fig.3.

A fter the EEM is constructed,the EEM feature vector sequencesof all the sessionsaregenerated.In the construction stage of SSBM, the EEM feature vector sequence and the screen-sliding vector sequence are scanned synchronously.In the process,every time a screen-sliding vector is scanned,an SSBM feature vector isgenerated, the current EEM feature vector is got through scanning the EEM feature vector sequence,and the current EEM center vector can be got.The SSBM feature vectors are added into a data set for training according to their current EEM center vector. A fter scanning all the EEM feature vector sequences,SSBM w ill be got by training the data sets according to the EEM center vectors.The formalized representation of the SSBM construction process isshown in Fig.4.

TABLE III Ex terna l Environment Model Fea ture Vector

TABLE IV Screen-Sl iding Behavior Model Fea ture Vector

TABLEV Brow sing Behavior Model Featu re Vector

Fig.2.The construction of the CM.

Analogously,in the construction stage of BBM,the EEM feature vector sequence and the screen-sliding vector sequence are scanned synchronously.In the process,every time a screen-sliding vector is scanned,the current section where the operation occurred is extracted,and the current EEM center vector isgot.The BBM feature vector updated by the screen-sliding vector is added into the data set according to the current section and the current EEM center vector.BBM w ill be got by training the data sets according to the section and the EEM center vectors.The formalized representation of the BBM construction process is shown in Fig.5.

Fig.3.The formalized representation of the EEM construction process.

Fig.4.The formalized representation of the SSBM construction process.

Fig.5.The formalized representation of the BBM construction process.

In the part of EEM.The vector of maximumsand the vector of minimums are stored,they are used to normalize the EEM feature vectors.The EEM objectsaremarked by EEM feature vector centers and are used to authenticate the EEM feature vectors.In the part of SSBM,SSBM objects aremarked by EEM feature vector center and SSBM feature center.The EEM feature vector center represents the kind of environment which the device is in when the screen-sliding operation happens.The SSBM feature center represents the cluster center of the trajectories of slidings.The SSBM objects authenticates SSBM feature vectors by the kind of environmentwhich the device is in,the closest vector center to the trajectory of sliding and the classifier.In the part of BBM,BBM objectsaremarked by EEM feature vector center and the section.The EEM feature vector center represents the kind of environment which the device is in while the browsing behavior is in progress.The section represents the section in which the browsing behavior is in progress.The BBM objects authenticates BBM feature vectors by the kind of environment which thedevice is in,and the classifier.

This paper declares that the whole process of opening APP to users until the application is closed is an APP session.The establishment of themodel is based on the behavior data in each session.The training data set is divided according to sessions.In the stage of model building, the EEM is established by using the training set data,and then the SSBM and the BBM are established.

B.The Construction Method of EEM

For each session,we first initialize the EEM feature vector to be a 0 vector,and getnew EEM feature vector by updating the EEM featurevector w ith the sensor data vector.

Considering, we do not know in advance how many kinds of environments in which the user w ill use the APP,so we first have to cluster the EEM feature vectors.The most popular clustering algorithms are the follow ing:DBSCAN[27],[28],PAM[29],SPEA[30],CLARA[31],K-means[32],[33].K-means algorithm ismore intuitive and efficient,we can set the parameter range and do multiple clustering to selectan optimal clustering result.We useK-means[32],[33]algorithm to cluster all the EEM feature vectors,then we calculate the mean center vector of each cluster and use the data of each cluster to generate the classifier of the cluster.Because each component of the EEM feature vectors has different range of values and discrete degree,all components need to be normalized so that each component has the same weight in cluster results.Themethod of normalization is as follows.

In addition, because theK-meansalgorithm needs to specify the value of theK, but we do not know the value ofKin

For each cluster,we need to train a classifier for it.There aremany popularmachine learning classifieralgorithms,such as:LDA[35],Logistic Regression[36],SVM[20],Decision Tree[37]–[39], Navie Bayes[40],BP Neural Network[41],[42],Random Forest[43],Ada Boost[44],Bagging[45],One-Class SVM[22].In this paper,the One-Class SVM[22]algorithm is used to train the classifier for the feature vectors of each cluster.Because of the correlation between the three models used in this paper,it is difficult to find representative negative samples for training classifiers in practical application.One-Class SVM classifier can be trained w ith only positive samples.More importantly, because of their physicalmeanings in the Euclidean space,One-Class SVM is very suitable for training the classifiers for each cluster.The idea of One-Class SVM is to find a smallhyper sphere on the trained data, but to include more data pointsas far as possible.We define that:Ris the radius of the sphere,is the slack variable,is the total number of the vectors in the feature vectors of the cluster,iis the index of a feature vector,is a feature vector,is the center feature vector.Then the destination is to solve the follow ing optimization problem:

A lgorithm 1 The construction algorithm of EEM Input:The data of all the sessions for training,images/BZ_71_2046_390_2259_431.png;Output:EEM,;1:Initialize the set of EEM feature vectors,envSet =an empty set;2:for session∈SESSIONS do 3:Iteration number variable,count = 0;4:sensorRecords=The sensor data vectors in session(Sorted by time);5:Initialize the EEM feature vector envR to be a 0 vector;6:for sensorR∈ sensorRecords do 7: Update envR iterativly w ith sensorR;8:count = count +1;9:if count > 6 then 10: Add envR to envSet;11:end if 12:end for 13:end for 14:Calculate the feature vector consisting of them inimum valuesof each component of all the feature vectors in envSet,minEnvR;15:Calculate the feature vector consisting of themaximum values of each componentof all the feature vectors in envSet,maxEnvR;16:for envR∈envSet do 17: normalize envR using minEnvR;maxEnvR;18:end for 19:Initialize themost optimalsilhouette coefficient variable bestCoef = −∞;20:Initialize thebest cluster result bestCluResult = NULL;images/BZ_71_1486_492_1665_525.png21:for do 22:Use K-means and k to cluster envSet and get the result,cluResult;23:meanCoef =themean silhouette coefficient of each feature vector in cluResult;24:if meanCoef > bestCoef then 25:bestCluResult = cluResult;26:bestCoef = meanCoef;27:end if 28:end for 29:Initialize the EEM, EnvModel = Empty Dictionary;30:for clu ∈bestCluResult do 31:UseOne-Class SVM to train clu and get the classifier,svmObj;32:cent = the mean center vector of clu;33:Add< cent;svmObj > to EnvModel,w ith cent as the key;34:end for 35: Add<"minEnvR",minEnvR>to EnvModel w ith"minEnvR"as the key;36:Add<"maxEnvR",maxEnvR>to EnvModel w ith"maxEnvR"as the key;37:return EnvModel;images/BZ_71_1463_1895_1604_1933.png

Because the EEM feature vector isgenerated by the iterative calculation w ith the sensor data vector.Therefore,if we output the EEM feature vectors at the beginning of the iterative computation,it is very possible that some less representative feature vectors generated from calculating iteratively w ith fewer sensor data vectors w ill be m ixed into the representative feature vectors.Therefore,in this paper,after the number of samples used for iteration reaches 6,the EEM feature vectors start being generated and output.That is to say,aftermonitoring the external environment for 30 s,it starts to output the EEM feature vectors which are iterated w ith the sensor data in this session.After generating the EEM feature vectors,we normalize these feature vectors w ith the feature vector consisting of the maximum values of each component of all the feature vectors and the feature vector consisting of the m inimum values of each component of all feature vectors.Then,we cluster on these normalized feature vectors and get the clusters.We calculate the feature vector center of each cluster and train a One-Class SVM classifier for each cluster.The feature vector of minimum values,the feature vector of maximum values and each classifier w ill be stored in EEM w ith corresponding keys.The construction algorithm of EEM isshown in Algorithm 1.

C.The Construction Method of SSBM

As the user is browsing,the vastmajority of the screensliding operations is sliding upwards so that the content below can reach the visual area.In the opposite direction,sliding operation is very casual.Too few samples are not representative.Click operation is restricted by the layout of the interface,which is difficult to reflect the personalization.Therefore,this paper only uses upwards sliding data to construct themodel.Even in a kind of external environment,the screen-sliding operations’contact trajectories may also concentrate on several areas.Therefore,wemust first cluster the SSBM feature vectors,and then use the clustering results of each cluster’s SSBM feature vectors to train the classifier.K-means is also used in clustering,and the classifier is also trained by One-Class SVM algorithm.We read the two kinds of data namely,sliding data,sensor data synchronously and generate an SSBM feature vector using the data of a sliding operation.In extra,we calculate the EEM feature vector when the sliding operation happens,and put it into the data set of corresponding EEM feature vector center.Then, we proceed the data setof each EEM feature vector center.We cluster on the data set,get the clusters and train a One-Class SVM classifier for each cluster.Finally,each classifier is stored in the SSBM w ith corresponding key.The construction algorithm of the SSBM isshown in A lgorithm 2.

A lgorithm 2 The construction algorithm of SSBM Input:EEM, EnvModel;The set of all the data of sliding upwardsoperations in the sessions for training,UpGests;Output:SSBM,GestModel;1:Initialize the train setof SSBM,GestTrainSet =an empty dictionary;2:for gest ∈UpGests do

3:Get the EEM feature vector envR when gest occurs;4:Use EnvModel to normalize envR;5:closestCent =the closest center vector to envR in EnvModel;images/BZ_72_1658_458_1679_492.png6:if closestCent GestTrainSet.keys()then 7:GestTrainSet[closestCent]=[];8:end if 9:Add gest to GestTrainSet[closestCent];10:end for 11:Initialize GestModel to be an empty dictionary;12:for cent ∈GestTrainSet.keys()do 13: gests= GestTrainSet[cent];14:Initialize the best silhouette coefficient variable,bestCoef = −∞;15:Initialize the best clustering result,bestCluResult =NULL;16:for k ∈[2,6]do 17:Use K-means and k to cluster gests,and get the result cluResult;18:meanCoef =mean silhouette coefficient of each feature vector in cluResult;19:if meanCoef > bestCoef then 20:bestCluResult = cluResult;21:bestCoef =meanCoef;22:end if 23:end for 24:for clu ∈bestCluResult do 25:Use One-Class SVM to train clu and get the classifier svmObj;26: gestCent = mean center vector of clu;27:Addto GestModel[cent]w ith gestCent as the key;28:end for 29:end for 30: return GestModel;

D.The Construction Method of BBM

In the processof using APP to browsenews, personalization is reflected in the browsing behavior affected by the external environment and the section of the content.In the process of browsing,users w ill continue to slide the screen to let the invisible content move to the visiblearea, while the proportion of reverse sliding operations is very small.Therefore, this article is based on the environment and content category to build the browsing behavior model by using the upwards screen-sliding data when browsing the news in APP.Because the BBM feature vectorsare generated by iterative calculation w ith SSBM feature vectors,if we output the browsing behavior feature vectors at the beginning of the iterative computation,it is very possible that some less representative feature vectors generated from calculating iteratively w ith fewer screen-sliding behavior data w ill be m ixed into the representative feature vectors.Therefore,in this paper,after the number of samples used for iteration reaches 3,the BBM feature vectors start being generated and output.The BBM feature vectors are put into the data setmarked by the EEM feature vector center and the section where the browsing behavior happens.Then we train a One-Class SVM classifier for each data set.Each classifier w ill be store in BBM w ith corresponding key.The construction algorithm of BBM is shown in A lgorithm 3.

A lgorithm 3 The construction algorithm of BBM Input:EEM, EnvModel;The set of sessions for training,SESSIONS;Output:BBM, BrowseModel;1:Initialize the BBM, BrowseModel = an empty dictionary;2:for cent ∈ EnvModel.keys()do 3: BrowseModel[cent]= an empty dictionary;4:end for 5:for session∈SESSIONS do 6:Gests = All the screen-sliding behavior data of session(Sorted by time);7:Initialize the current center feature vector of EEM,NowCent = NULL;8:Initialize the current section, NowSection= NULL;9:Initialize the BBM feature vector,BrowseModelVec =0 vector;10:Initiazlize the count variableof iteration,count = 0;11:for gest ∈Gests do 12:envR=current EEM feature vector;13: section= the section where gest occurs;14: Use EnvModel to normalize envR;15:closestCent = the closest center feature vector to envR in EnvModel;16:if closestCent == NowCent and section==NowSection then 17:Update BrowseModelVec iteratively w ith gest;18:count =count+1;19:if count > 3 thenimages/BZ_73_655_1950_676_1983.png20:if section BrowseModel[closestCent].keys()then 21:Initialize BrowseModel[closestCent][section]=an empty queue;22:end if 23:Add BrowseModelVec to BrowseModel[closestCent][section];24:end if 25:else 26:count =1;27: NowCent = closestCent;28: NowSection= section;29: BrowseModelVec =0 vector;30:end if 31:end for 32:end for 33:for envCent ∈ BrowseModel.keys()do 34:for section∈ BrowseModel[envCent].keys()do 35:Use One-Class SVM to train BrowseModel[envCent][section]and get the classifer svmObj;36: BrowseModel[envCent][section]= svmObj;37:end for 38:end for 39: return BrowseModel;

E.The Authentication Method of CM

In the authentication stage of the CM proposed in this paper,the EEM automatically implements one authentication every 5 s,SSBM,and BBM each implements an authentication every time an upward screen-sliding operation occurs on a content display interface.If any of the three sub models outputs the authentication illegal result and the number of continuous illegal results is beyond the limit of the corresponding counter module,the illegal user processing module w ill be triggered.The way three sub models work cooperatively isshown in Fig.6.

Fig.6.Theway all submodelswork cooperatively.

APP sends the sensor data vectors to EEM, the sliding data to SSBM, the sliding data and current section to BBM.EEM generatesand updates the EEM feature vector using the sensor data vectors.Every time a new EEM feature vector is got, it w ill be sent to current EEM feature vector holder.Itw ill also be authenticated by EEM,then the resultw ill be sent to the EEM continuous negative results counter.Every time sliding data is sent to SSBM, SSBM w ill generate a new SSBM feature vector and authenticate it.The authentication result w ill be sent to the SSBM continuous negative results counter.Every time sliding data and the section where the sliding operation happens are sent to BBM,BBM w ill generate a BBM feature vector and authenticate it.The authentication resultw ill be sent to BBM continuous negative results counter.The continuous negative results counter of each model work in the same way.The counter w ill send an illegal warning to the illegal user processing module when the number of continuous negative results is beyond its continuous negative w indow.Why continuous negative w indows are needed is as follows:Assum ing that the false negative rate of a submodel is,if the interval of detection ist.When the user is a legal user,the expected interval of an illegal user processingmodel’s being triggered by thismodel is

F.The Authentication Method of EEM

The EEM receives the sensor data,generates the latest EEM feature vector continuously by updating it iteratively,and sets up the current EEM feature vector in the whole authentication system.The authenticationmethod normalizes the latest EEM feature vector by using the EEM,and searches for the closest center vector to the feature vector in themodel object.The method authenticates the EEM feature vector using the closest center vector’s corresponding classifier.The continuous authentications results of EEM are counted by the EEM continuous negative results counter.If the count exceeds the continuous negative w indow,It w ill return the authentication result that the current user is illegal.The authentication algorithm of EEM isshown in Algorithm 4.

Algorithm 4 Theauthentication algorithm of EEM Input:EEM, EnvModel;Continuous negativew indow,we;Output:VOID or the current user to be illegal authentication result;1:Initialize the current EEM feature vector,envR=0 vector;2:Initialize the count variableof iteration,count = 0;3:Initialize the count variable of continuous negative authentications,contNegCount = 0;4:while the APP is running do 5:Get the sensor data vector, sensorR;6:count+ = 1;7:Update envR iteratively w ith sensorR;8:if count > 5 then 9: Normalize envR w ith EnvModel;10:closestCent = the closest center vector to envR in EnvModel;11:Set the current EEM feature vector of the whole authentication system, NowEnvR = envR;12:Get the classifier svmObj corresponding to closest Cent;13:Authenticate envR w ith svmObj and get the result,result;

G.The Authentication Method of SSBM

The SSBM gets the upward screen-sliding data on the content display interface,and generates an SSBM feature vector for every upward screen-sliding behavior data.The authentication method searches for the closest environment feature center vector according to the current external environment feature vector after normalized,and uses the classifier corresponding to the feature center vector to authenticate the SSBM feature vector.In this method,the results of continuous authentications is counted by the SSBM continuous negative results counter.If the count exceeds the continuousnegativew indow,it w ill return the current user to be an illegal authentication result.The authentication algorithm of SSBM is shown in A lgorithm 5.

H.The Authentication Method of BBM

The BBM gets the current external environment feature vector,the current section that the user is browsing on,and use the screen-sliding behavior data to update the BBM feature vector iteratively.The authentication method searches for the closest EEM feature center vector according to the current EEM feature vector after normalized,and uses the classifier corresponding to the closest feature center vector and the current section to authenticate the BBM feature vector.In this method,the results of continuous authentications is counted by the BBM continuous negative results counter.If the countexceeds the continuousnegativew indow,it w ill return the current user to be an illegal authentication result.The authentication algorithm of BBM is shown in Algorithm 6.

A lgorithm 6 Theauthentication algorithm of BBM Input:BBM, BrowseModel;Continuous negativew indow,ws;Output:VOID or the current user to be illegal authentication result;1:Initialize the count variable of continuous negative authentications contNegCount = 0;2:Initialize the BBM feature vector BrowseModelVec =0 vector;3:Initialize the count variable of iterations count = 0;4:Initialize the section variable KeepSection= NULL;5:Initialize the EEM feature vector variable KeepEnvCent =NULL;6:while the APP is running do 7:Get the screen-sliding data gest;8:Get the current EEM feature vector NowEnvR;9:closestEnvCent =the closest center external EEM feature vector to normalized NowEnvR;10:NowSection=the current section which the user is browsing on;11:if closestEnvCent == KeepEnvCent and NowSection== KeepSection then 12:count = count + 1;13: Update browseModelVec iteratively w ith gest;14:if count >3 then 15:Authenticate BrowseModelVec using BrowseModel[KeepEnvCent][KeepSection]and get the result,result;16:if result == negative then 17:contNegCount+ = 1;18:if contNegCount > ws then 19:return illegal;20:end if 21:else 22:contNegCount = 0;23:end if 24:end if 25:else 26:count = 0;27: BrowseModelVec = 0 vector;

28: KeepEnvCent = closestEnvCent;29: KeepSection= NowSection;30:end if 31:end while

V.Exper imenta l Resu l t

Our experimental data is got from ten users,In order to avoid the influence of mobile phone screen size on users’behaviors, we ensure that the size of their mobile phone screen is very close,they are in the range of 5.5 to 5.7 inches w ith 2K resolution.The versionsof Android operating system on their phones are between 4.2.2 and 7,and the system versionsare fully compatiblew ith the news APPof this paper.They install the APP on their mobile phones.We do not make any restriction on time and environment.They use this APP in their customary times and environments freely.The data collection time of someuserscan last up to 3 weeks.

We divide the data of each user.Some data areused to build the comprehensive behavior model proposed in this paper.The other part isused as a positive data set to testeach user’s ownmodel.In addition,the whole data of each user isused as a part of the data set of negative samples for testing other users’models.In the construction stage of the model,for every user’s data set used formodeling,in session unit,we divide training data and testdata according to the ratio of 7 to 1.Training sessions are random ly selected.For each user,the positive samples are his own data,and the negative samples come from all other users.In the training stage, we first read the sensor data sequentially,and generate the EEM feature vectors of each session.We construct the EEM using these feature vectors,shown in Algorithm 1.Based on the EEM,we read the two kindsof data to simulate the real synchronization sequence of usersand get the SSBM feature vectorsand BBM feature vectors of each session.We construct the SSBM using these SSBM feature vectors,shown in Algorithm 2.We construct BBM using these BBM feature vectors,shown in A lgorithm 3.In the test stage,we read the sensor data and screen-sliding operation data according to the time sequence of data recorded in each session.We read the two kinds of data to simulate the real synchronization sequence of users’operations.The reading of sensor data triggers the work of EEM,shown in Algorithm 4.Screen-sliding behavior data triggers the work of SSBM,shown in Algorithm 5 and the work of BBM,shown in A lgorithm 6.The working relationships between the three sub models is shown in Fig.6.In order to show that in considering the various factors mentioned,in the face of real time data from the daily life of the user,themethod proposed in this paper can achieve better results than the method w ithoutconsidering the above factors,we select the method in[21]for reference experiment,also w ith continuous negative w indow,and the same strategy to decide the value of thew indow.

For the experiment,we refer to the follow ing two indexes:

1) Detection Rate(DR):the ratio of the count of the negative sampleswhich areauthenticated to benegative to the count of all negative samples,namely,1-FRR [46],DR =(The count of negative samples which are authenticated to be negative)/(The count of all negative samples), the higher the DR is,the better the result is.

2)Wrong Alarm Rate(WAR):the ratio of the count of positive sampleswhich are authenticated to be negative to the count of all positive samples,namely,1-TPR[46],WAR=(The count of positive sampleswhich are authenticated to be positive)/(The count of all positive samples),the lower the WAR is,the better the result is.

We conducted experiments for several times and averaged the results.

The DR of the negative sample feature vectors that should be handled by the three submodels presented in this paper is compared w ith the DR of themethod [21]asshown in Fig.7.

Fig.7.The DR of each model.

TheWAR of the positive sample feature vectors thatshould be handled by the three submodels presented in this paper is compared w ith the WAR of the method [21]as shown in Fig.8.

Fig.8.TheWAR of each model.

In the proposed method,any sub model’s issuing illegal authentication w ill regard the current users as illegal users,which triggers the illegal user processing module.Therefore,even if the DR of all sub models are relatively low, the high DR can still be achieved.The DR of the APP sessions w ith the CM presented in this paper is compared to the DR of the method in [21]for APP sessions,asshown in Fig.9.

The authentication method is based on several sub models which work cooperatively.The WAR of the APP sessions w ith the CM proposed in this paper and ones of the model proposed in [21]are shown in Fig.10.

Fig.9.The DR of negative sessions.

Fig.10.The WAR of positive sessions.

The average DR of the method in this paper is 86.0%,the average DR of the method in[21]is 67.1%.The average WAR of themethod in this paper is 10.3%,the averageWAR of the method in[21]is 8.2%.The experimental results show that when the continuous authentication of mobile devices does not take into account the environment which the device is in and the section of the content which users are browsing,the DR of themodelw ill be lim ited inmostof the cases,and may lead to a high WAR.W ith a single model,it can be impossible toweigh the trade-off between DR and WAR.The comprehensive behaviormodel used in this paper is based on multiple sub models,considering the various factors such as the external environment,screen-sliding operation and browsing.It can better depict the user’s behavior and get higher DR w ith little increase of WAR.

Our experimental server environment is:Intel(R)Core(TM)i5 5200U CPU(2.2GHZ);8GB RAM;Debian 8 operating system;Python 2.7 development language.The details of the average time cost of constructing each usermodel are shown in Table VI,which shows themean time cost of each part of the model construction.The details of the average authentication time cost of each user model are shown in Table VII,which shows themean time cost of each model’s authenticating one corresponding feature vector.Through the results, we can see that the time cost is in the acceptable range.In practical application scenarios,if more high-performance servers and parallel computing used,the time cost w ill be further reduced.

TABLEVI Mean Time Cost of Each Par t in Const ruct ion Process

TABLE VII Mean Time Cost of Each Pa r t in Au thentication Process

VI.Conc lusion

In this paper,a mobile term inal browsing behavioral authentication method is proposed to build a comprehensive model for users.The model ismade up of three submodels.Different sub models are modeled on different factors in the behavior of users using APP to read.Each sub model can achieve a certain detection effect,when all of the submodels work cooperatively,themethod can achieve the average DR of more than 86%of the certification effect,can effectively guarantee the security of the account at the same time,WAR w ill bemaintained at a relatively low level.

The strategy how SSBM and BBM work based on EEM can be used as a framework for user behavior authentication methods.W ith this strategy,we can introduce more factors into user behaviormodels,reduce the data dimension of the model.Still, the relationship between differentmodels can be maintained.From the experimental results,it can be seen that if we do not consider the environment,this objective factor,for users easily affected by external environment,the detection effect is not good.In addition,from the exchanges w ith companies that maintain partnershipsw ith the laboratory,we know that computing power is no longer amajor problem for large companies today.Therefore,we insist that it is worthwhile to get a more detailed user behavior model by increasing the amount of computation appropriately,and to achieve better results.Moreover,according to the time-cost data records,the time cost of this method is still in the acceptable range.

Because of the characteristics of the sub models presented in this paper,mainly based on time, these features can be vulnerable to some kinds of external disturbances when users are using mobile devices. As a result, the DR of this sub model is relatively low,and the WAR is relatively high.Therefore,there is much room for improvement in porting the submodelsof user browsing behaviors.In addition,due to the fact that all sub models still have less probability than traditional singlemodel authenticationmethods,the accuracy of each sub model still needs further improvement.In the future work,we w ill consider usingmore accuratemodels to depict various factors of users’behaviors,so as to further improve DR,reduceWAR and achieve better results.