Application of Question and Answering on Virtual Human Dialogue:a Review and Prediction

2015-04-15LIULi

Journal of Donghua University(English Edition) 2015年2期

LIU Li(刘里)

1 Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology，Tianjin University of Technology，Tianjin 300384，China

2 Key Laboratory of Computer Vision and System(Tianjin University of Technology)，Ministry of Education，Tianjin 300384，China

Application of Question and Answering on Virtual Human Dialogue:a Review and Prediction

LIU Li(刘里)1，2*

1 Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology，Tianjin University of Technology，Tianjin 300384，China

2 Key Laboratory of Computer Vision and System(Tianjin University of Technology)，Ministry of Education，Tianjin 300384，China

Nowadays，virtual human(VH)is becom ing a hot research topic in virtualization.VH dialogue can be categorized as an application of natural language processing(NLP)technology，since it is relational to question and answering(QA)technologies.In order to integrate these technologies，this paper reviews some important work on VH dialogue，and predicts some research points on the view of QA technologies.

virtual human(VH)dialogue;natural language processing (NLP);question and answering(QA);interaction

Introduction

Virtual human (VH) is a hot research field of virtualization.VH dialogue is the application of natural language processing(NLP)technology on the research field，since it is relational to question and answering (QA) technologies.This paper reviews the important work of VH dialogue，and proposes some research points on the view of QA technologies.

1 Review

Recently，some research facilities have focused on the research of VH dialogue，such as Institute for Creative Technologies(ICT)［1］and Virtual Experiences Research Group (VERG)［2］.The purpose of their work is to improve the intelligence of VH.Their works are summarized as follows.

Rossen et al.used VH to bootstrap the creation of other VHs［3］.They developed a system called Roleplay Trainer Creator，which created a virtual medical student based on hundreds of interactions between real medical students and a virtual patient.Then，this virtual medical student was used to train standardized patients-human actors who role-played the patients in practice in doctor-patient encounters.By generating new VHs w ith human VH interaction logs，it showed the significant potential for interpersonal training applications w ith VH.It also displayed that the Roleplay Trainer Creator was beneficial for increasing the standardization of roleplay partners.

Rossen et al.presented a new approach to create robust conversational models， called human-centered distributed conversationalmodeling(HDCM，shown in Fig.1)［45］which was a distributed system.In HDCM，domain experts and novices could collaborate asynchronously through a graphical user interface(GUI).Virtual people factory(VPF，shown in Fig.2)was the realization of HDCM，which was used to evaluate HDCM.The experiment showed that the VPF obviously reduced expert time in creating the speechunderstanding portion of a conversational model，and it also increased the possibility of building larger corpus than pervious methods.Finally，they released VPF to the public and obtained much languages resources from various domains.

Sun developed a sem i-automated analytic model，called Articulate(shown in Fig.3)［6］.The implementation of the model was as follows:(1)parsing user queries w ith NLP technologies，by tagging the words in queries w ith part-ofspeech labels，to obtain the root of words;(2)based on the parsing results，mapping the queries into a smaller feature space，and applying a supervised learning method in the space to predicting the class of task;(3) proposing simplified visualization language(SimVL)to pass the classification results and the specifying attributes to graph reasoner precisely;(4) finally，generating the graph.W ith respect to SimVL commands，several types of graph were generated.

Articulate was guided by a conversational user interface to allow users to verbally describe and then manipulate what they want to see.Compared w ith many traditional visualization tools，Articulate needed less specific know ledge to generate graph，so itwas convenient.

Artstein et al.studied the lim its of simple dialogue acts fortactical questioning dialogues［7］.Tactical questioning used a simple scheme of dialogue acts， which were generated automatically from a representation of facts in〈object，attribute，value〉triples and actions in〈character，action〉pairs.They found the simple dialogue acts combined w ith some dialogue management techniques could cover over 75% of unseen utterances，and it could generate coherent interaction.They also found out even the kinds of utterances were not covered，and the simplex source of corpus was finally pointed out，then led to the result.

Nouri et al.analyzed the influence of adding new know ledge to a conversational virtual character［8］.They presented an experiment，which took a conversational character trained on a setof hand-authored，linked question-answer pairs，and let the character import the new sets of question-answer pairswhich were generated automatically from texts on different topics.The experiment showed that adding such know ledge affected the character＇s performance，and increased the error rate on questions that the original character was trained to answer.In return，the experiment showed the augmented character could also answer questions in the new topics.

Raij et al.proposed virtual social perspective-taking (VSP)，a new class of virtualexperience that immersed users in the experience lived by another person［9］.Their exploration of VSP was driven by medical interview，and presented three principals to immerse the users:(1)providing input to user senses from the logs of target＇s senses;(2)instructing users to act and interact like the target;(3)reminding users that they were playing the role of the target.

VSP elicited perspective-taking，and a new study pointed it would allow users to live and learn from the diverse experiences of others.It would help participants deeply understand others and the world，so that they could improve their behavior.

Traditional method in VH dialogue system was to use professional human recordings or domain-specified speech synthesis.Georgila et al.performed a systematic evaluation to determ ine the best trade-off of these methods between performance and cost［10］.The evaluation was on naturalness，conversation，and likability.They tested different types(indomain vs.out-of-domain)，length，and content of utterances，and took into account the age and native language of raters as well as their fam iliarity w ith speech synthesis.They performed two experiments—a pilot one and the one running on Amazon＇s Mechanical Turk.The experiment showed that: (1) a professional human voice worked well than an amateur human voice and synthesized voices;(2)a high-quality generalpurpose voice or a good limited-domain voice could perform better than amateur human recordings;(3)both trained w ith speech recorded by actors，a high-quality general-purpose voice and a limited-domain voice had almost the same performance; (4)for out-of-domain sentences，the high-quality generalpurpose voice＇s rating was higher than the domain-specified voice＇s rating，but for in-domain sentences，the high-quality general-purpose voice＇s rating was lower;(5)long or negativecontent utterances did not receive lower ratings.

Yao et al.proposed a new question generation tool for extracting question-answer pairs from text articles［11］.They performed three experiments to demonstrate whether the new tool was suitable for giving domain-specific know ledge to conversational characters.The experiment showed that the new tool was convenient，effective，but w ith some degradation of the ability to answer questions about topics that the original character was trained to answer.Overall，question generation was prom ising for creating or augmenting a question answering conversational character using an existing text.

Georgila et al.presented a new annotation scheme for cross-cultural argumentation and persuasion dialogues［12］.The goal has two-fold:(1)aiming to fill the gap in the literature of cross-cultural argumentation and persuasion;(2)using this coding scheme to annotate negotiation dialogues to automatically learn argumentation and persuasion dialogue policies for different cultures.

The scheme was based on a review of literature on crosscultural argumentation and persuasion，and adaptation of existing coding schemes on negotiation.They tested this scheme in three domains: florist-grocer domain， Saudi Arabian Standards Organization(SASO)domain(shown in Fig.4)and toy-naming domain.It proved that the scheme was general enough to be applicable to different domains w ith minor or no modifications at all.This scheme was used to efficiently learn culture-specific dialogue policies for argumentation and persuasion.

Morbini et al.proposed a method to segment a given utterance into non-overlapping portions，each associated w ith a dialogue act［13］.Compared w ith traditional methods，this method only needed labeled utterances(or utterance segments) corresponding to a single dialogue acting as training data.Experiments show the method has the benefit of significantly increased understanding of user intent，but has the drawback of complexity of the segment optimization.

Brusk et al.studied the people＇s intuitive notion of gossip and its computational implement［14］.They conducted two experiments.One was to identify what type of conversion could be recognized as gossip，and the other was to identify whether these conversations could fulfill three proposed elements: third person focus，pejorative evaluation and substantiating behavior.The results showed that: (1) conversations were very easily to be considered gossip if all elements were present，no intimate relationships existed between the participants，and also the person in focus was unambiguous;(2)conversations that had atmost one gossip elementwere not considered as gossip;(3)conversations that lacked one or two elements or had an ambiguous element led to inconsistent judgments.

Abu-Jbara et al.presented Attitude M iner(shown in Fig.5)，a system for m ining attitude from online discussions［15］.Attitude M iner analyzed the online discussion from four levels: the word level，the sentence level，the post level，and the thread level.The discussion thread was represented as a signed network in which each discussantwas represented by a node and message between two discussants was represented as an edge.The polarity of text associated w ith the edge identified the sigh of the edge.The system predicted attitudes and identified subgroups(w ith a homogeneous and common focus among the discussants)w ith high accuracy.

Traum etal.were concerned w ith situations in which there were at least three parties［16］.They tracked the behaviors of head and examined how these behaviors influenced some aspects of a multi-layer dialogue model.They had implemented the model and tested in the Saudi Arabian Standards Organization English(SASO-EN) negotiation domain.The model was perhaps themost comprehensive implemented system involving visual recognition to supportmulti-party dialogue，because the model supported multiple virtual agents and involved head gestures w ith multifunctional meaning.In the model，head gestures could assist participate understanding the other＇s utterance.

Morency et al.investigated how dialog context from an embodied conversational agent(ECA)could improve visual recognition of user gestures［1718］.They presented a framework to extract information from spoken language to predict head gesture.They found a module of lexical，punctuation and tim ing features that could be used to learn how to predict user feedback.By using thismodule they were able to improve the recognition rate of the vision-only head gesture recognizer.

2 Prediction

Nowadays，VH application research in NLP is demanding.If we focus on studying interaction in QA technology，then apply the study result on HV dialogue，we may acquire surprising result.Based on the review of HV dialogue，the follow ing aspectswould be the further research points.

2.1 Know ledgemodel adapted to VH dialogue

Know ledgemodel is the foundation of interactive sentences representation and data storage.The know ledgemodel for VHs＇dialogue has higher complexity，so that it can represent more know ledge points than traditional know ledgemodel of QA.The existing models(such as rational database，XML database，and RDF)have their own characteristics，but cannot represent VH＇s dialogue or store dialogue-relational data efficiently.The exploration on adaptive know ledge model would be the initial points of VH dialogue research.

2.2 Research on VH's dialogue strategy

The virtual dialogue should not only focus on NLP，but also the dialogue strategy is worth exploring.However，the problem is，many questions cannot be clearly described by just one question.For example，in medical science，doctor cannot diagnose w ith just one symptom described by a patient.VH＇s dialogue can take interactive QA system as reference，which is based on interaction strategy.How to define the interaction strategy，and base on the interactive strategy to retrieval know ledge base is a key problem which needs exploring.

2.3 Interaction optim ization and know ledgebase reduction

Interaction can bemapped into the question thatmatching the“know ledge point set”extracting from question and from know ledgebase.The similarmatching question is proved to be NP-complete problem，which means that just rely on simple interaction strategy will lead to too-high frequency.In order to conquer this weakness，it is essential to study the know ledge storage strategy，data model，search demand，and reduction method for the large data.In the condition ofmany know ledge points from questions，how to reduce the complexity of question and improve the effect of question analysis technology are also important issues.

Above all， know ledge model， dialogue strategy，interaction optimization and know ledgebase reduction are proposed as research points from QA，and these technologies would be used in VH dialogue.

3 Conclusions

This paper reviews the important work of VH dialogue，and proposes some research points on the view of QA technologies.The future work is to find more suitable technologies，and deeply research the interaction QA，for probably improving the result of VH dialogue.

［1］Institute for Creative Technologies.OfficialWeb Site of Institute for Creative Technologies［EB/OL］.［2013-10-23］.http://ict.usc.edu/.

［2］Virtual Experiences Research Group.OfficialWeb Site of Virtual Experiences Research Group［EB/OL］.2013-10-23.http:// verg.cise.ufl.edu/.

［3］Rossen B，Cendan J，Lok B，et al.Using Virtual Humans to Bootstrap the Creation of Other Virtual Humans［C］.Intelligent Virtual Agents 2010，Philadelphia，Pennsylvania，USA，2010: 392-398.

［4］Rossen B，Lind S，Lok B，et al.Human-Centered Distributed Conversational Modeling:Efficient Modeling of Robust Virtual Human Conversations［C］.Intelligent Virtual Agents 2009，the Netherlands，2009:474-481.

［5］Rossen B，Lok B.A Crowdsourcing Method to Develop Virtual Human Conversational Agents［J］.International Journal of Human-Computer Studies，2012，70(4):301-319.

［6］ Sun Y.Articulate:a Sem i-automated Model for Translating Natural Language Queries into Meaningful Visualizations［C］.The 10th International Symposium on Smart Graphics，Banff，Canada，2010:184-195.

［7］Artstein R，Rushforth M，Gandhe S，et al.Limits of Simple Dialogue Acts for Tactical Questioning Dialogues［C］.The 7th IJCAI Workshop on Know ledge and Reasoning in Practical Dialogue Systems，Hyderabad，India，2011:1-7.

［8］Nouri E，Artstein R，Leuski A，etal.Augmenting Conversational Characters with Generated Question-Answer Pairs［C］.AAAI Fall Symposium:Question Generation，Arlington，Virginia，2011:49-52.

［9］Raij A，Kotranza A，Lind D S，et al.Virtual Experiences for Social Perspective-Taking［C］.Virtual Reality Conference2009，Lafayette，Louisiana，2009:99-102.

［10］Georgila K，Black A W，Sagae K，et al.Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems［C］.The 8th International Conference on Language Resources and Evaluation，Istanbul，2012:3519-3526.

［11］Yao X C，Tosch E，Chen G，et al.Creating Conversational Characters Using Question Generation Tools［J］.Dialogue＆Discourse，2012，3(2):125-146.

［12］Georgila K，Artstein R，Nazarian A，et al.An Annotation Scheme for Cross-cultural Argumentation and Persuasion Dialogues［C］.The 12th Annual SIGdial Meeting on Discourse and Dialogue，Portland，Oregon，USA，2011:272-278.

［13］Morbini F，Sagae K.Joint Identification and Segmentation of Domain-Specific Dialogue Acts for Conversational Dialogue Systems［C］.The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies，Portland，Oregon，USA，2011:95-100.

［14］Brusk J，Artstein R，Traum D.Don＇t Tell Anyone!:Two Experiments on Gossip Conversations［C］.The 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue，Tokyo，Japan，2010:193-200.

［15］Abu-Jbara A，Hassan A，Radev D.Attitude M iner:M ining Attitude from Online Discussions［C］.2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session，Montréal，Canada，2012:33-36.

［16］Traum D，Morency L P.Integration of Visual Perception in Dialogue Understanding for Virtual Humans in Multi-party interaction［C］.International Workshop on Interacting w ith ECAs as Virtual Characters，Toronto，Canada，2010:70.

［17］Morency L P，Sidner C，Lee C，et al.The Role of Context in Head Gesture Recognition［C］.The 21st National Conference on Artificial Intelligence，Boston，Massachusetts，2006:1650.

［18］Morency L P，Sidner C，Lee C，et al.Head Gestures for Perceptual Interfaces: The Role of Context in Improving Recognition［J］.Artificial Intelligence，2007，171(8/9):568-585.

TP181 Document code:A

1672-5220(2015)02-0341-04

date:2014-10-10

s:National Nature Science Foundations of China(Nos.61170027，61202169，and 61301140);Tianjin“131”Creative Talents Training Project，China(the 3rd level)

* Correspondence should be addressed to LIU Li，E-mail:niceliuli@sina.com

Journal of Donghua University(English Edition)

2015年2期