APP下载

Application of reliability technologies in civil aviation:Lessons learnt and perspectives

2019-01-16EnrioZIOMengfeiFANZhiguoZENGRuiKANG

CHINESE JOURNAL OF AERONAUTICS 2019年1期

Enrio ZIO,Mengfei FAN,Zhiguo ZENG,Rui KANG

aChair System Science and the Energy Challenge,Fondation Electricitéde France(EDF),CentraleSupélec,Université Paris Saclay,Gif-sur-Yvette 91190,France

bEnergy Department,Politecnico di Milano,Milan 20156,Italy

cSino-French Risk Science and Engineering Lab,Beihang University,Beijing 100083,China

dSchool of Reliability and Systems Engineering,Beihang University,Beijing 100083,China

KEYWORDS Civil aviation;Commercial aircraft;Fault diagnosis and prognosis;Human reliability analysis;Maintenance;Quality control;Reliability design;Reliability engineering

Abstract We consider reliability engineering in modern civil aviation industry,and the related engineering activities and methods.We consider reliability in a broad sense,referring to other system characteristics that are related to it,like availability,maintainability,safety and durability.We covered the entire life cycle of the equipment,including reliability requirement identification,reliability analysis and design,verification and validation of reliability requirements(typically involved in the equipment design and development phase),quality assurance(which typically enters in the manufacturing phase),and fault diagnosis and prognosis and maintenance(which are connected to the operation phase).Lessons learnt from reliability engineering practices in civil aviation industry are given,which might serve as reference for reliability managers and engineers,also from other industries with high reliability requirements.

1.Introduction

Over 115 years ago,when Wilbur and Orville Wright struggled to finish their historical 59 s flight with the first powered aircraft in human history,they would not have imagined how complex and widely used the civil airplanes we have today could be.A latest estimate1shows that up to 2017,there are around 23600 civil airplanes in the world and this number keeps increasing at an annual rate of 5.1%.2Each year,3.3 billion people travel by means of airplanes.2Reliability and safety are,then,obvious,concerns for civil aviation:if high reliability and safety cannot be guaranteed in civil aviation,severe,sometimes unbearable,losses might be suffered,e.g.,human fatalities, financial losses,etc.For this reason,the civil aviation industry has imposed strict reliability requirements on itself.For example,before entering the market,it is mandatory that any commercial aircraft be certificated for airworthiness by governmentaviation administration authorities,e.g.,Federal Aviation Regulations(FAR)in the United States3and China Civil Aviation Regulations(CCAR)in China.4In the certification of these authorities,reliability is as an indispensable requirement that must be guaranteed to a high degree and with confidence.

Meeting the high reliability requirements imposed by the authorities,however,is not easy for the civil aviation industry,especially considering the scale and complexity of modern airplanes.A modern airplane,like Boeing 777 for example,comprises of 4.5 million parts designed and manufactured in over ten different countries.Over 6500 employees are involved in the design and manufacturing of the airplane and a total number of 10 million labor hours are consumed.Regardless of the difficulty in achieving high reliability,civil aviation industry manages to obtain a very satisfactory result:the accident rate of commercial airplanes worldwide in 2016 is 2.1 accidents per millions of departure,which is by far lower than that of road accidents.5How does the civil aviation industry manage to achieve such a success in reliability?Are there any good practices and experiences that can be shared for reference,and even transferable to other industries with high reliability requirements?In this paper,we address these questions by providing a thorough status report of the common practices for reliability assurance in civil aviation industry.This paper is not a review,nor do the authors intend to be exhaustive in terms of the topics covered and information provided.Rather,it is a synthesis of common industrial practices and a discussion on future perspectives.

The rest of the paper is organized as follows.In Section 2,we present an overall picture of how reliability activities are performed and methods are applied in the different phases of the lifecycle of commercial airplanes.Each reliability activity/technique is discussed in details,in Sections 3-8,respectively.In Section 9,we present lessons learnt and perspectives for the reliability practice in civil aviation industry.Finally,the paper is concluded in Section 10.

2.Reliability-related activities in the lifecycle of civil airplanes

The lifecycle of civil airplanes can be simplified as comprising of three phases:design and development,manufacturing,operation(see Fig.1).In the design and development phase,design solutions for components and systems are determined to satisfy design requirements(from different aspects).At the end of the design and development phase,verification and validation are performed to check if the design solutions indeed meet the requirements.Then,the manufacturing phase begins for the production of the civil airplanes in large scales.Finally,the airplanes are handed to the airlines for field operation and the operation phase begins.

Various reliability-related engineering activities,referred to as reliability activities in this paper,are performed in the different phases of the lifecycle to ensure the final reliability level of commercial airplanes(see Fig.1).In Fig.1,each reliability activity block represents a collection of reliability techniques.The design and development phase starts from identifying the reliability requirements.Then,reliability analysis and design techniques are used to determine design solutions that satisfy the reliability requirements.Once a design solution is chosen,verification and validation is performed,where tests and design reviews are used to verify whether the design solution actually meets the reliability requirements.Once the reliability of the design solution is verified,the latter can move on to the manufacturing phase.In this phase,quality assurance techniques are used to make sure that no defects are introduced in the manufacturing process so that the inherent reliability of the design solution can be maintained.Finally,in the operation phase,fault diagnosis and prognosis,and maintenance are needed to ensure the operational reliability of civil airplanes.A detailed discussion on the reliability activities in Fig.1 is provided in Sections 3-8.

3.Identification of reliability requirements

The identification of the reliability requirements is the first and most important task in the reliability engineering process of civil airplanes.In reliability engineering,the reliability requirements are often expressed in terms of quantitative reliability indexes.Different reliability indexes can be used to measure the effect of reliability on various system attributes,including availability,reliability(in a narrow sense),maintainability,safety and durability.Table 1 summarizes the commonly used reliability related indexes in civil aviation industry.6For identifying the reliability requirements,it is necessary to determine the reliability indexes to be used and determine their target values.

Fig.1 Reliability activities applied in lifecycle of commercial airplanes.

Table 1 Reliability indexes for commercial airplanes.6

In Table 1,the reliability indexes are categorized into Contractual Indexes(CIs)and Operational Indexes(OIs).Contractual reliability indexes measure the inherent reliability of the airplane,which is determined by the processes of airplane design,development and manufacturing.Operational reliability indexes,on the contrary,are also influenced by the actual operational,environmental and maintenance conditions of the airplane.Then,CIs are required in the contract or assignment book,and can be controlled in the development and manufacturing processes;OIs,on the other hand,may not be required in the contract,but used to measure the field reliability of commercial airplanes.

In practice,the reliability indexes affect higher level requirements for the civil airplanes,i.e.,of safety,punctuality and economy:the correspondence among the reliability indexes and these requirements is also given in Table 1.Among the reliability indexes in Table 1,the most important and widely applied reliability index,which greatly influences the economic benefits and customer satisfaction of civil airplanes,is dispatch reliability,which is defined as ‘‘the percentage of scheduled flights which depart without making a mechanical delay of more than 15 min or cancellation”.7According to the data on Boeing’s website,the dispatch reliabilities of Boeing 737NG,Boeing 767,Boeing 777,Boeing 787 are 99.7%,99.4%,99.2%,99.0%,respectively.It is reported that the ARJ21 from the Commercial Aircraft Corporation of China(COMAC)is designed for a dispatch reliability of 99.5%.8

4.Reliability analysis and design

Once the reliability requirements are identified,reliability analysis and design techniques are implemented to conceive solutions such that the reliability requirements can be achieved.A general flowchart of reliability analysis and design in civil aviation industry is given in Fig.2.

As shown in Fig.2,the system-level reliability requirements(expressed in terms of the reliability indexes)need to be first allocated down to the component level,i.e.,translated into component-level requirements with corresponding indexes.The commonly used techniques for defining and allocating reliability indexes in civil aviation industry are summarized in Table 2.Then,reliability design techniques are used to develop design solutions that could fulfill the reliability requirements.In Sections 4.1 and 4.2,we present in detail the techniques of both hardware and software reliability design commonly used in civil aviation industry.Once a design solution is determined,corresponding reliability models are developed and the related analyses are conducted to estimate the reliability of the design solution.If the estimated reliability does not meet the reliability requirements,the reliability design and analyses procedures are repeated again,until a design solution that meets the reliability requirements is found.Due to page limits,we do not go into details on how to do reliability modelling and analyses,but only summarize the most commonly used methods in Table 2.Interested readers might consult the references provided herein.Finally,tests and analyses are conducted to verify if there are some design defects which are not fully considered in the original reliability design and modelling.Commonly used methods for exposing the design defects are listed in Table 2.

Fig.2 General flowchart of reliability analyses and design.

Table 2 Techniques used in reliability design process.

It should be noted that unlike functional design,which focuses on the realization of the system functions,reliability design concerns how to maintain the system’s functions without failures throughout its lifecycle.To avoid failures,reliability analysis and design is a recursive process with two basic procedures:(A)perform modelling,tests and analyses to discover system design flaws and potential failure modes under stated operating conditions;(B)change system design to eliminate the discovered flaws and analyzed failures.In civil aviation practice,the techniques in Table 2 are integrated for reliability design improvements;interactions of those tools in the whole process are presented in Fig.3.

Such an iterative reliability analysis and design procedure has been widely applied in civil aviation industry.For example,leading civil aviation manufacturers,such as Boeing and Airbus,have adopted such a procedure and,in particular,utilize the aircraft safety assessment tools suggested in Refs.18,19to assist reliability design and analysis,including FTA,Failure Modes,Effect and Critical Analysis(FMECA),Dependence Diagram(DD),Markov Analysis(MA),Failure Modes and Effects Summary(FMES)and Common-Cause Analysis(CCA).According to Ref.20,a safety analysis platform xSAP,which integrates tools including FTA,FMECA,Failure Propagation Analysis(FPA)and CCA,is used in a joint R&D Project involving the Boeing company.The aircraft design handbook,6published by the Aviation Industry Press of China,introduces the reliability analysis tools for aircrafts,including reliability prediction and allocation techniques,like FMECA,FTA and CCA.

In the following two subsections,we present some typical reliability design techniques for hardware and software in civil aviation,respectively.In Section 4.3,we introduce human reliability analysis techniques,which are extremely important for civil aviation,since one of the largest contributors of civil aviation accidents is human errors.5,21

4.1.Hardware reliability design techniques

In this section,we introduce two typical hardware reliability design techniques in civil aviation,i.e.,fault-avoidance technologies and fault-tolerant technologies.Fault-avoidance is discussed in Section 4.1.1,while fault-tolerance is discussed in Section 4.1.2.

4.1.1.Hardware fault-avoidance technologies

Fault-avoidance technologies improve hardware reliability by reducing the probability of the occurrence of a failure.Common fault-avoidance technologies include derating design,22sneak circuit analysis,23environmental conditions(thermal,altitude,vibration,Electro-Magnetic Compatibility(EMC)24etc.)analysis and various fault protections.

Derating design,i.e.,to make the devices and equipment to operate at a stress level lower than their rated value,25is a useful technology to improve component operational reliability,and is widely applied for both aircraft electronic and mechanical subsystems.The European Cooperation for Space Standardization(ECSS)standard,i.e.,ECSS-Q-ST-30-11C,26and the national military standard of China,i.e.,GJB/Z 35-93,27provide specifications for derating Electrical,Electronic and Electromechanical(EEE)devices.Foraircraft structure design,factors of safety are considered as an alternate way of derating.The required factors of safety for commonly used materials in civil aviation can be found in CFR Title 14 Part 253by the Federal Aviation Administration(FAA)and CCAR-254by the Civil Aviation Administration of China(CAAC).

Fig.3 Reliability design process.17

For avionics subsystems,sneak circuit analysis and various environmental conditions tests are conducted to eliminate potential design flaws.The standard RTCA/DO-160G28provides standard procedures and test criteria for environmental conditions of avionic systems of Boeing 747-8,including EMC,temperature,altitude,vibration,sand/dust,power input,radio frequency susceptibility,lightning,and electrostatic discharge.29

Various fault protection designs are also useful to prevent devices and equipment from failures.According to Ref.30,protection designs are implemented in the electrical power system of Boeing 777 to protect the system from dangerous temperature rises and potential failures in the system.For example,differential current and unbalanced current sensors are used to protect generator electrical feeder conductors and the main bus;a thermal disconnect mechanism is employed to protect the integrated drive generator from overheat-induced failures.

4.1.2.Hardware fault-tolerant technologies

Fault-tolerant technologies intend to maintain the system’s normal operation even though failures or errors of one or more components within the system occur.Redundancy design31is a fundamental means for fault tolerance,which has been applied in various critical devices and equipment of commercial airplanes.The Fly-By-Wire(FBW)system used in Boeing 777 can provide triple redundancy for all hardware resources,including computing systems,airplane electrical power sources,hydraulic powers and communication paths.32Application of redundancy design increases the mission reliability of commercial airplanes.However,Common-Cause Failures(CCF)33is a severe threat to redundancy systems,which might destroy all the redundancies at the same time.

Other fault-tolerant technologies allow systems to maintain their functions through a procedure of failure detection,identification,and accommodation.34In aviation industries,such technologies have been widely applied on Fault-Tolerant Control Systems(FTCSs).Generally,FTCS could be classified into Passive FTCS(PFTCS)and Active FTCS(AFTCS),where the former is designed to maintain its function after a fault occurs without any modification of its structure or parameters,while the latter changes the parameters or the structure of the control system(known as reconfigurable and restructurable control systems,respectively).35Different from PFTCS,AFTCS needs fault information obtained by Fault Detection and Diagnosis(FDD)to inform the reconfiguration;this will be introduced in details in Section 8.1.

Applications of fault tolerance techniques can be widely found on flight control systems.For example,fault-tolerant strategies have been applied on the primary flight computers of Boeing 777 with respect to lane failures.36A reconfigurable linear parameter varying controller was implemented on Boeing 747-100/200,which can remain operational in the presence of an elevator fault.34An autonomous architecture was implemented on the JPL/Boeing gyroscope,which is able to maintain system functionality in the presence of single-harderrors.37

4.2.Software reliability design technologies

Modern civil aviation depends on software to achieve most of its functions.The scale and complexity of software keep increasing as modern civil airplanes are getting more and more complex.For example,there are 14 million lines of codes in Boeing 787 airplane.38Hence,software reliability has strong influence on aircraft reliability.According to Ref.39,software reliability is defined as the probability of failure-free software operation for a specified period of time in a specified environment.Unlike hardware,software is invisible,therefore,people usually have limited prior knowledge on the occurrence of software failures.40In the past decades,considerable efforts have been made to increase software reliability on commercial airplanes.41In this section,we briefly introduce some of the most typical techniques.

4.2.1.Software fault-avoidance technologies

For software systems,fault avoidance is conducted by strictly following formal development guidelines,testing and validation procedures.25Formal methods,which are based on the use of formal languages with precise rules,are widely recognized fault-avoidance technologies in software engineering.42Through mathematically-precise models and analysis procedures applied in the specification,design,and analysis of software systems,formal methods can reduce the ambiguity and uncertainty introduced to the specifications by using natural language,and prove whether the system design meets the users’requirements.

4.2.2.Software fault-tolerant technologies

Like other man-made products,also software contains errors.Then,software fault-tolerant technologies are important means to improve software reliability.Generally,software fault-tolerant technologies can be classified into two groups,i.e.,single-version and multi-version.43Single-version faulttolerant technologies add mechanisms to detect and recover from faults when designing the software.Multi-version faulttolerant technologies,on the contrary,use multiple versions of the software,developed by different designers,different algorithms or different design tools,to ensure that faults in one version do not cause system failures.A good tutorial of those techniques is given in Ref.44.

N-version programming is a common multi-version faulttolerant technology used in Flight Control Computers(FCC)design.In Airbus 340,each flight control primary(or second)computer is partitioned into two different and independent channels.To avoid common-mode failures,different programming languages are used for the software design and development of different channels,i.e.,assembly language for control channels,PL/M for monitoring channels of the primary computers,Pascal for monitoring channels of the secondary computers.45Boeing 777 employs a different plan for FCC redundancy design,but,the programming language,is used for the FCC software design of all channels.45

4.3.Human reliability analysis techniques

In the history of civil aviation,accidents caused by human errors,which include those of pilots,maintenance personnel and air traffic controllers,account for a large percentage of the total number of accidents.5,21Therefore,human reliability is an important aspect of civil aviation.Human Reliability Analysis(HRA)techniques that are widely used in civil aviation can be classified as first,second,third generations and expert judgment methods,46as shown in Table 3.

First Generation Methods(FGMs)quantify the likelihood of human errors by breaking tasks into parts and,then,consider the potential influence of Performance Shaping Factors(PSFs)such as training,experience,procedures and individual psychological and physiological stressors.Mitomo et al.48performed HRA on an actual aircraft accident occurred in Japan using a representative FGM,i.e.,the THERP method.The FGMs are widely used in quantitative risk assessments,but are often criticized for the lack of consideration of factors such as the impact of context,organizational factors and errors of commission.

Second Generation Methods(SGMs)have emerged in the 1990s in an attempt to consider operational factors in human error prediction.Alvarenga et al.47stated that the evolution of SGMs is to establish a mapping function between PSFs and cognitive error mechanisms being influenced or triggered in a given operational context.In SGMs,such as ATHEANA49and CREAM,50Tables are established to show the relationships between PSFs,cognitive error mechanisms and specific human error types associated to operational contexts in each stage of human information processing,i.e.,detection,diagnosis,decision making and action.Lin et al.51applied CREAM on the HRA of a carrier-based aircraft recovery procedure.Alvarenga et al.47argued that both FGMs and SGMs have deficiencies of failing to model organizational factors(especially political,economic and normative ones)and nonlinear interactions among PSFs,error mechanisms and human errors at individual and group levels.To this regard,they recommended two modern HRA approaches based on non-linear models,i.e.,Functional Resonance Accident Model(FRAM)52and Systems-Theoretic Accident Model and Process(STAMP),53which use the concept of control system theory to model non-linear interactions.

New methods emerging based on FGMs,such as HEART,54are known as Third Generation Methods(TGM).Maguire55conducted an HRA on aircraft landing tasks using HEART.The detailed procedure is as follows:(A)a task hierarchy is constructed and possible error sources are identified by Hierarchical Task Analysis(HTA),test-pilot interview,procedural analysis and goal analysis;(B)a fault tree is developed for each task segment in the task hierarchy;(C)HEART analysis is performed to obtain the likelihood of errors (determined by the nominal human unreliability and the associated error multiplier effect)for each aircrew task in the fault tree.

Table 3 HRA methods by category.47

Methods that provide a structured way for experts to evaluate the likelihood of human errors in a specific operational context are classified as expert judgment-based methods.Chen and Huang56developed a Bayesian Network(BN)model for HRA of a visual inspection task in aviation maintenance using expert opinions and data from accident reports,where Human Factors Analysis and Classification System-Maintenance Extension(HFACS-ME)57is used to identify critical influence factors of visual inspection.Cacciabue et al.58applied the system response generator concept to the HRA of the pilot air plane interaction in the approach phase in the landing of Boeing 747,and compared the results of the proposed method and the classical THERP analysis.

5.Verification and validation of reliability

After the iterative process of reliability analysis and design is completed,the prototype of airborne equipment is tested to verify that it meets the reliability requirements.Handbooks for reliability testing are developed in countries and international institutes.MIL-HDBK-781A59in the US provides typical test plans,test methods and environment profiles for the design and implementation of reliability test programs for system development,qualification and production.GJB 899A—200960in China provides guidance on environmental test conditions,statistical tests plans,parameter estimation methods and procedures for reliability qualification test and reliability acceptance test.The International Electrotechnical Commission(IEC)standards on reliability testing,i.e.,IEC 61123,61IEC 61124,62are also references for conducting reliability demonstration tests.

For products of high reliability,tests in normal environmental and operational conditions canhardly expose failures in limited product development times,which makes it difficult to conduct statistical inference on system reliability.For such products,accelerated testing,9which imposes a test environment more severe than that experienced in normal operation,is conducted to get more information on system reliability.Then,the system life is predicted using accelerated models63with accelerated life data obtained by Accelerated Life Tests(ALTs)64,65or accelerated degradation data obtained by Accelerated Degradation Tests(ADTs).66,67As a general guidance,IEC 62506-201368provides typical methods for accelerated tests.

6.Quality assurance

A prerequisite to high field reliability is that quality assurance is well implemented in the manufacturing phase,so that produced structures,components and systems of the airplane can maintain the reliability levels achieved in the design and development phase.Various techniques have been applied to assure the quality in the manufacturing phase of civil airplanes,e.g.,Quality Function Deployment(QFD),Taguchi method,Statistical Process Control(SPC),Design of Experiments(DOE),etc.69The quality control techniques have been organized into different quality management systems,such as Total Quality Management(TQM),70ISO9000,71lean manufacturing72and Six Sigma,73etc.,in order to achieve continuous improvement of quality in the manufacturing phase.Basically,the continuous improvement of quality is achieved based on Deming’s Plan-Do-Check-Act(PDCA)circle.In the Aerospace Standard AS9100D74of the Society for Automotive Engineering(SAE),PDCA is adopted within riskbased thinking for quality management of processes and systems.

A typical application of the quality management system in civil aviation industry is Boeing.Boeing has a mature quality system for quality control and continuous improvement.In 1990s,Boeing established an Advanced Quality System(AQS)for Boeing suppliers and published standard series D1-9000,which was one of the references for SAE standards AS9000 series.The current standard Boeing quality management system requirements for suppliers75specifies its requirements on the supplier’s quality management system.The document was supported by SAE standards AS9100D,74AS9110C,76AS9120B,77AS9103A78etc.

Another example of quality management system in civil aviation is COMAC.COMAC establishes its quality management system based on CAAC standards including CCAR-21,79AP-21-04,80and IAQG standards AS9100-9120.81Currently,the design and manufacture of airborne systems of ARJ21 and C919 airplanes is subcontracted to domestic and international(account for more than 90%)system suppliers.82Therefore,a major task of the quality management of ARJ21 and C919 airplanes for COMAC is the quality management and audits of its suppliers.In addition to the third-party certification of the suppliers,COMAC focuses on the audits of product implementation processes,including customer requirement management,product planning,design,development,procurement,production and service processes,according to AS9100C.82

7.Maintenance techniques

After a commercial airplane is launched and enters the operation phase,maintenance activities are needed to make sure that its performance remains as expected by design and the airplane can achieve high operational reliability and availability.In this section,we first examine some typical maintenance policies for civil aviation in Section 7.1;then,we introduce in detail the Reliability-Centered Maintenance(RCM)concept(see Section 7.2),which is widely applied in the maintenance of modern commercial airplanes; finally(see Section 7.3),we introduce the Virtual Maintenance(VM)technology and its application on the maintainability design of civil airplanes.

7.1.Maintenance policies

Over the past decades,maintenance policies and methodologies have significantly evolved along with the growth of technology.In literature,various maintenance policies have been developed and classified from different perspectives.83Ding and Kamaruddin84classified maintenance policies in five types:

(1)Corrective Maintenance(CM)policy,in which maintenance is conducted only after failure occurs and its purpose is solely to restore(repair or replace)the failed components/systmes85;

(2)Preventive Maintenance(PM)policy,also known as Time-Based Maintenance(TBM),which aims to reduce the probability of a failure and to retain the system in proper operation conditions by conducting maintenance at predetermined intervals or according to prescribed criteria86;

(3)Condition-Based Maintenance(CBM)policy,also known as Predictive Maintenance(PdM),which is carried out according to the monitoring data of actual conditions of the system and,also,aims to prevent the occurrence of failures;

(4)Autonomous Maintenance(AM)policy,in which maintenance and production departments cooperate to accomplish the maintenance tasks.Maintenance functions are transformed into a partnership relationship with every person in the manufacturing industry;

(5)Design Out Maintenance(DOM)policy,which not only concerns system maintenance but also aims to improve the system design for easier and more ergonomic maintenance and operation.

Each of the above maintenance policies has its advantages and limitations.In other words,no policy,on its own,suits for all types of systems in civil aviation industry.As the earliest implemented maintenance policy,CM policy is easy to conduct in practice.However,CM often results in long equipment down-time,large economic losses and,sometimes,disastrous consequences due to sudden failures of critical components.87Through periodical preventive maintenance,PM policy could effectively expand the Remaining Useful Life(RUL)of the system,but how to determine the optimal maintenance intervals remains a challenging problem.In practice,determining the optimal maintenance interval often requires large amount of historical data and abundance of experience from maintenance planners.88,89CBM policy can maintain the system in healthy states in an efficient way.However,it relies on accurate condition monitoring,fault diagnosis and prognosis,which are not always easily achievable in practice.90AM and DOM policies provide solutions to continuous improvement of system operation and maintenance,but require high-level knowledge and skills of operation and maintenance staff.For a specific component or system in civil aviation industry,tradeoffs need to be made in terms of anticipated operational reliability and constraints on costs and resources,in order to determine an appropriate maintenance strategy for the target component or system.

7.2.Reliability-centered maintenance

Another aspect that influences a company’s maintenance strategy decision is its needs and Maintenance Concept(MC),which reflects the way the company recognizes the role of maintenance as an operational function.91Waeyenbergh and Pintelon91compared a few of the most important MCs in literature including RCM,Business-Centered Maintenance(BCM),Total-Productive Maintenance(TPM)and several‘‘lifecycle cost” approaches.The fundamental difference between those MCs is their optimization objectives.The MCs,BCM,TPM and the ‘‘lifecycle cost” approaches aim at profitability maximization,equipment effectiveness maximization and total maintenance cost optimization.RCM,on the other hand,focuses on preserving the required system reliability at the lowest possible cost,and is more suitable for aircraft industries and other high-risk industries such as oil and gas fields and nuclear power plants.

The maintenance concept of RCM was first proposed by Nowlan and Heap in the 1960s,92after the scheduled overhaul strategy was found not cost-effective for the then‘‘new” Boeing 747.In their report,it was found that only 11%of the components showed a degrading failure characteristic that would justify a scheduled maintenance program,while the rest exhibits random failure characteristics,which cannot be prevented by scheduled overhaul or replacement.Based on this thinking,RCM was proposed as a systematic approach to create a cost-effective maintenance strategy to preserve critical system functions.An important aspect of the RCM philosophy is to prioritize the components and systems based on the criticality of the consequences of their failures.According to the priority levels,maintenance policies are selected for the dominant failure causes of the preventable failures.93

In 1999,SAE International issued the standard JA1011,94which provides a formal definition of the RCM process for civil aviation and comprises the following steps:(A)define the functions of each asset in its operating context and the associated desired performance;(B)identify possible failures that could impair the critical functions;(C)identify the causes of the failures;(D)identify the consequences of the failures;(E)select effective and applicable maintenance tasks to prevent,detect or respond to the onset of failures.The implementation of RCM is a systematic process,which requires a set of techniques to fulfill each of the above steps.Siddequi and Ben-Daya93presented a detailed introduction on the RCM methodology in terms of selecting systems and collecting information,system boundary definition,system description and functional block diagram,system functions and functional failure,FMECA,logic decision tree analysis and task selection.95Optimization methods are often used for maintenance decision making.Ding and Kamaruddin84reviewed the maintenance policy optimization models in literature and made a classification based on the different degrees of uncertainty.Piasson et al.96proposed a multi-objective model to optimize the RCM plan of an electric power distribution system,where an optimized Pareto frontier was derived using a nondominated sorting generic algorithm.RCM has a wide application on the maintenance planning in civil aviation.Boeing issued maintenance handbooks MSG-1,MSG-2 and MSG-3,and implemented them in the development of Boeing 747,757 and 767.These handbooks have become paradigms of RCM for development of commercial aircrafts and other industrial systems.93

7.3.Virtual maintenance technology

Virtual maintenance refers to carrying out maintenance and maintainability activities under computer-generated virtual environments using Virtual Reality(VR)technology,and is a widely applied technology in civil aviation industy to sup-port maintenance design and planning.97Using VM,the entire maintenance process of a product in its lifecycle can be simulated in the design and development phase,98through which maintainability design,analysis,evaluation and optimization can be performed at the early design stage.Thus,design flaws that might affect maintainability characteristics,e.g.,accessibility of the components,could be discovered and corrected in time,which could significantly reduce product lifecycle costs,and shorten the design and development cycle.99

VM technologies have already been widely applied in civil aviation industry to support maintainability design.Zhou et al.97proposed an object Petri net model to describe the VM process,and applied the model to the disassembly of an aircraft parameter recording equipment.Liu et al.100presented a path planning algorithm for the VM of aircraft components to reduce contact collision problems in the disassembly process.Amundarain et al.101developed a haptic system,i.e.,REVIMA(Virtual Reality for Maintainability),for maintainability simulation of aircraft engines to replace a costly hard mock-up.Bowling et al.102developed a VM simulation model for aircraft cargo bay inspection processes,and compared the performance of different inspection methods.The implementation of a virtual maintenance system,i.e.,FlyThru,has greatly reduced assembly and systems problems in Boeing 777 compared to its previous models.103

Another application of the VM technology is the training of maintenance personnel.Aircraft maintenance is a hazardous work and improper maintenance could lead to catastrophic consequences.In early days,maintenance training was conducted on mock-ups or real planes,which was costly and may put the trainees under hazardous situations.Maintenance training in virtual environment,on the contrary,is more economic and much safer.104Christian et al.105pointed out that the VM technology and the Augmented Reality(AR)technology,106which involves a combination of virtual and real worlds,have great potential in the technical training of operation and maintenance personnel.Zou et al.107designed a virtual maintenance training system for airborne electronic equipment,which supports training tasks for avionics engineers,radar engineers and avionics repairers.

8.Fault diagnosis and prognosis techniques

Fault diagnosis and prognosis techniques are applied in the operation phase to gain data and knowledge on system states,faults and failures,in order to support maintenance activities and on-board emergency treatments.Since the 1980s,fault diagnosis techniques have been applied in civil aviation industry,mainly through designing and implementing Built In Tests(BITs).108

As the complexity of airborne systems increases,advanced fault diagnosis technologies with more in-depth understanding of components failure mechanisms and systems failure propagation processes have been developed and applied.In the 1990s,NASA first introduced the concept of Integrated Vehicle Health Management(IVHM),109which is a comprehensive system that integrates software,sensor,intelligent diagnosis,digital communication and system integration to support aircraft-level fault diagnosis,prediction and health management.IVHM has already been successfully applied in commercial airplanes and military aircrafts,e.g.,the Crew Information System and Maintenance System(CIS/MS)of Boeing 787,110the Prognostics Health Management(PHM)system of the Joint Strike Fighter(JSF)F-35,111etc.In the following subsections,we first briefly introduce the architecture of and key methods used in the IVHM(Sections 8.1 and 8.2),and,then,survey the application of IVHM on civil airplanes in Section 8.3.

8.1.IVHM architecture

The standard Open System Architecture for Condition Based Maintenance(OSACBM)112is a typical architecture for IVHM systems,as shown in Fig.4.An OSACBM consists of seven functional blocks,i.e.data acquisition,data manipulation,state detection,health assessment,prognosis assessment,decision support and presentation.The seven blocks work together to support the three main tasks of Airplane Health Management(AHM)systems:(A)system health monitoring,(B)fault diagnosis and prognosis and(C)control and management,as shown in Fig.4.System health monitoring uses sensor networks distributed in critical subsystems of the aircraft,such as engine systems,electromechanical systems,structure and hydraulic systems,to acquire data of aircraft state and performance.Then,various fault diagnosis and prognosis tools are used to transform the data into useful information,conduct analysis,and provide knowledge of the system health state.Fault diagnosis provides information on the location and modes of failures,whereas prognosis provides information on the predicted RUL of a component or system.Finally,based on the results of fault diagnosis and prognosis,decisions are made regarding aircraft emergency handling,maintenance decision and,sometimes,reconfiguration or reconstruction in AFTCSs of the aircraft control systems,as described in Section 4.1.

8.2.Fault diagnosis and prognosis methods

Fig.4 OSACBM architecture and AHM process.

The most important task in an AHM system is Fault Diagnosis and Prognosis(FDP).In this subsection,we present some typical methods for fault diagnosis and prognosis widely used in civil aviation industry.Generally,FDP methods can be clas-sified into data-driven methods and model-based methods,also known as physics-based methods.113Model-based methods employ mathematical models which describe the degradation processes leading to failure to predict the evolution of the system state,whereas data-driven methods empirically rely on observed process data related to system degradation and failure states without resorting to any explicit models.114Esperon-Miguez et al.108reviewed the state of the art of FDP methods for IVHM and summarized the challenges of applying them in civil airplanes from both technical and organizational perspectives.The advantages and limitations of typical model-based and data-driven methods, including traditional statistical methods(also known as experience based methods),data mining techniques,115Artificial Intelligence(AI)techniques116are also discussed.

As shown in Fig.4,fault diagnosis and prognosis are related to the function blocks of Data Manipulation(DM),State Detection(SD),Health Assessment(HA)and Prognostics Assessment(PA).For DM,the primary function is to perform single and/or multi-channel signal transformations.Signal acquired from the sensors are manipulated using feature extraction algorithms117,118to extract or accentuate signal features representing system or component health.The SD block outputs indicators related to the system or component state by processing data from DM or other SD blocks,so that system health features can be estimated and compared to their expected values or operational limits.

Diagnosis is performed in the HA block.Traditional fault diagnosis is conducted by BIT systems,which can provide health degradation information.Nowadays,advanced diagnosis tools are developed using data mining techniques,AI techniques,failure propagation models,expert systems etc.Skormin et al.119developed a data mining model for avionics fault prognostics using the historical data related to environmental and operational conditions.Rojas and Nandi120presented a practical and efficient fault classification scheme for rolling-element bearings based on Support Vector Machine(SVM).Joly et al.121proposed an Artificial Neural Network(ANN)-based diagnostic tool for Rolls-Royce engines through a three-level diagnosis structure,where the top level distinguishes single-component faults and double-component faults,the middle level contains components with faults and the bottom level estimates parameters of engine components.Ofsthun and Abdelwahed122introduced a Time Failure Propagation Graph(TFPG)-based reasoner for real-time failure diagnosis and applied the method on fuel systems in single and multiple failure scenarios.Long and Wang,123Lopez and Sarigul-Klijin,124Chen and Chen125investigated fault diagnosis expert systems for the aircraft fuel system,the structural damage and embedded airborne electronic equipment,respectively.

Prognostics is conducted in the PA block,the output of which is the health state at future times,or the RUL of a component or system.In literature,various prognostics tools are developed based on AI techniques such as ANNs and Dynamic Bayesian Networks(DBNs).Brotherton et al.126developed a fault prognosis technique combining ANNs and automated rule extractors for gas turbine engines,and applied the technique on a dataset of operating engines.Byington et al.127developed a data-driven approach for aircraft actuators PHM and applied it to electro-hydraulic servo-valves,

where an ANN is used to predict the control valve position and Kalman filter128is applied to follow the historical health state and predict the RUL of the system.Medjaher et al.116proposed a DBN-based failure prognosis procedure,which allows to deal with uncertainty of the estimation of the RUL.Dong and He129developed a segmental Hidden Semi-Markov Model(HSMM)-based statistical modelling framework for failure diagnosis and prognosis,and applied the methodology to predict the RUL of hydraulic pumps,which is obtained using the estimated state duration probability distributions.

8.3.IVHM applications in civil aviation

A typical application area of IVHM on civil airplanes is to the engines.As the ‘‘heart” of aircrafts,aero-engines need to be extremely reliable.To further reduce the number of in- flight engine shutdowns,aborted take-offs and flight delays,engine health monitoring technologies are used to support the sensing,acquisition,analysis,detection,and data handling functions for Engine Health Management(EHM).In 2016,SAE International published standard ARP5120,130which recommends procedures and technologies for physical and functional design,development,integration,Verification and Validation(V&V)of highly reliable EHM systems.In the ‘‘Total Care”engine service support program of Rolls-Royce,a basic strategy is to install data acquisition and Intelligent Engine Health Monitoring(IEHM)systems to support comprehensive maintenance.131An artificially intelligent EHM system was installed by Rolls-Royce on its engine RB211-535E4,which is designed for Boeing 757.The EHM system aims to increase the dispatch reliability from 99.91%to 99.95%.Goodrich corporation has cooperated with Rolls-Royce to provide nacelle maintenance services for British Airways’Boeing 787 Dreamliner,using Goodrich’s Maintenance,Repair and Overhaul(MRO)facility in conjunction with Rolls-Royce’s ‘‘Total Care” support program.132

Another typical application of the FDP techniques is the Structural Health Monitoring(SHM)technique for airframe structures.In 2013,SAE International published the standard ARP6461,133where SHM is defined as ‘‘the process of acquiring and analyzing data from on-board sensors to evaluate the health of a structure”.ARP6461 provides guidance on the definition,development,V&V,and certification of SHM.According to Ref.134,an SHM system,the Comparative Vacuum Monitoring(CVM)system,which monitors the integrity of a structure and detect a crack before it propagates to the critical length,has been approved by Boeing.A real-time monitoring and forecasting software was used in the static test of ARJ21,which is able to monitor the strain and displacement data through distributed monitors,and calculate the structure stress and safety margin in real time with high accuracy and reliability.135

Besides,Boeing also has successfully implemented IVHM on their planes.For example,in Boeing 737,Boeing 757,Boeing 767 airplanes,Boeing uses the Aircraft Condition Monitoring System(ACMS)to collect the data needed to support critical health management programs,such as Flight Operation Quality Assurance(FOQA)and ECM.136The FOQA program uses the data to identify and address operational risks before they can lead to accidents.Over the Airline Communication Addressing&Reporting System(ACARS),failure reports generated by the ACMS are sent to the ground before the aircraft lands,which support maintenance actions such as spare parts planning,and,therefore,help to prevent flight delays and increase the operational reliability of the planes.

9.Lessons learnt and perspectives

9.1.Lessons learnt

In the previous sections,we have introduced key reliability techniques that are involved in different phases of lifecycles of a commercial airplane.As discussed in the Introduction,the reliability engineering practice in civil aviation industry is quite successful:modern commercial airplanes exhibit very high reliability,despite complexity and large scale.In this section,we summarize some lessons learnt from the reliability engineering practice of civil aviation industries,which might benefit reliability managers and engineers,also from other fields.We focus on five lessons learnt from civil aviation industry with respect to ensuring high reliability in an effective and efficient way:

(1)Carefully implement reliability systems engineering.Reliability systems engineering refers to the technical and management activities related to the planning,organizing and implementing of reliability-centered systems engineering activities in the lifecycle of a system.137-139In most successful commercial airplane companies,reliability systems engineering is implemented thoroughly in the whole product lifecycle.A top-down bottom-up method is implemented at all levels of the airplane,which breaks down the top-level reliability requirements to lower level(top-down process),and use formal verification and validation processes to verify,at each level,that the reliability requirements are met.Through reliability systems engineering,the reliability techniques introduced in Sections 2-9 are organized into an effective and efficient reliability program,whose successful implementation guarantees that the high reliability requirements can be achieved.

(2)Select highly reliable components.A common practice in civil aviation industry is to ensure that the components have high reliability.Often,this is done by selecting matured components from trustable suppliers,whose reliability has been well-demonstrated.Leading aircraft companies like Boeing,Airbus and COMAC have strict controls over component suppliers:designers can only choose component suppliers that are qualified in terms of reliability,so that the reliability of the components can be guaranteed.Besides,screening techniques,e.g.,environmental stress screening,highly accelerated stress screening,etc.,are widely applied to screen out the weak components before the components are used in assembling the system.Through these strict measures,the reliability of the components that comprise the system can be assured,so that the inherent reliability of the airplane can be ensured.

(3)Widely apply redundancy designs.Redundancies are

widely applied in the reliability design of commercial airplanes,their systems,subsystems and components.For example,in Boeing 737,there are three independent hydraulic systems,systems A,B,and the emergency system.Systems A and B are redundant to each other and the emergency system can provide minimal control forces to the most important rudders,in case that both systems A and B fail.By widely applying redundancies in the system and component designs,modern commercial airplanes achieve high reliability and tolerance of failures.

(4)Take high premium on reliability tests.A common feature of leading companies in civil aviation industry is that they pay extremely high attention to reliability tests in the design and development phase of new airplanes.Various types of tests are needed,aiming at verifying and validating different aspects of the reliability.For example,the major structural components in a commercial airplane must pass the static and dynamic stress tests,in order to verify that its static strength and fatigue resistance meet the reliability requirements.Also,reliability demonstration tests are needed to verify that the reliability requirements on the components or systems are satisfied.In the product development phase,development tests like reliability growth tests and HALT are performed to help the designers find the weak design points and improve the designs.These tests help to improve and verify the inherent reliability of the components and systems in the civil airplanes.

(5)Carefully implement maintenance plans.In modern airplanes,a combination of different maintenance strategies is carefully implemented at different levels of the airplane to ensure its operational reliability.For example,scheduled maintenance is performed after each flight on the most critical systems,subsystems and components,to make sure that they are functioning normally and will not affect the safety and reliability of the next flight.Condition-based maintenance is also performed by collecting condition-monitored signals and determining the optimal maintenance time based on the estimated health state from these signals.The maintenance activities are undertaken by well-trained maintenance personnel,according to well-defined maintenance handbooks and guidelines.The carefully implemented maintenance activities can discover potential failures promptly,and,therefore,greatly improve the operational reliability of the airplane.

9.2.Perspectives

Although in general,application of reliability techniques in civil aviation is quite successful,there are still some open problems that deserve further investigations and developments of improved solutions.In the following,we reflect on some perspectives for future research and development for each phase of the lifecycle,respectively.

9.2.1.Design and development phase

Quantification of system reliability indexes is a fundamental task underpinning various reliability-related activities,e.g.,system health state evaluation,maintenance planning,system design improvement,etc.However,as argued by Maguire,55for modern civil aviation with extremely high reliability indexes,accurately quantifying these indexes through traditional statistics-based methods is,in general,difficult to implement.Two approaches may be explored as alternatives.One is to integrate all Knowledge,Information and Data(KID)on the failure processes and similar systems140for quantifying the reliability indexes.PoF and PHM technologies are promising attempts of this approach,where in-depth knowledge of failure mechanisms and various system performance data collected by sensors are used to predict the failures and quantify the reliability index.Another approach is to develop some new reliability indexes that are able to handle large uncertainties with limited,sometimes subjective,KID on the system.The exploration on this approach is still on-going,e.g.,evidence theory-based reliability metric,141interval-analysis-based reliability metric,142fuzzy-interval-analysis-based reliability metric,143posbist reliability144and belief reliability,145,146etc.A critical review of these new reliability metrics is given in Kang et al.147

Another open issue is the integration of organizational factors in the reliability assessment.Assurance and improvement of commercial aircraft reliability relies on the systematic and organic management of reliability-related activities including aircraft design,manufacturing,maintenance and logistics.Complex dependencies exist among these activities.Some fundamental activities,such as FMECA and FRACAS,provide KID that would be shared by reliability related activities during the lifecycle of the aircraft.These activities also provide KID to each other.For example,monitoring data processed by AHMS are transferred through ACARS to support maintenance and logistics deployment in advance.Therefore,advanced reliability technologies alone cannot guarantee high reliability of the aircraft;rather,an informed, systematic organizational structure of the reliability-related activities is needed.In fact,the reliability is determined,at least influenced,by the organization and effectiveness of the work conducted in the design and development phase of the aircraft.How to extract information from these organizational factors and integrate them to provide a comprehensive quantification is an interesting opportunity that deserves further investigation.

9.2.2.Manufacturing phase

A significant trend in the manufacturing phase of a modern commercial airplane is that companies are increasingly relying on outsourcing to reduce the development costs and the development cycles.For example,the degree of outsourcing in Boeing 787 is more than 70%and over 50 subcontractors are involved in the supply chain.148Moreover,Boeing allows the subcontractors for further outsourcing their components,which results in a complex supply chain with a tiered structure.Although believed at the beginning of the development that outsourcing can reduce the 787s development time from six to four years,and development cost from$10 to$6 billion,the mismanagement of such a complex supply chain can make the end-result opposite:the project of Boeing 787 was billions of dollars over budget,the delivery schedule was put back at least 7 times and three years behind schedule.148How to manage the increasingly complex supply chain for modern civil aviation industry is,therefore,an important open issue that deserves further investigations.

9.2.3.Operation phase

An essential task to improve reliability and availability in the operation phase is to implement efficient and effective PHM systems and use the information from the PHM systems to make informed maintenance decisions.The current PHM models are mainly based on a single source and usually limited data.In recent years,however,as the digital,physical and human worlds continue to integrate,the 4th industrial revolution,the internet of things and big data,the industrial internet,are changing the way we collect data and information for PHM:more and more knowledge,data and information throughout the lifecycle of aviation products,which are unavailable in the past,have become available for PHM and maintenance decision making.Therefore,it becomes attractive to try to develop an integrated PHM framework capable of treasuring from all the available big KID related to the components and systems degradation and failure processes,in order to accurately predict the RUL and better inform the maintenance decision making.

10.Conclusions

In this paper,we have described how reliability engineering is implemented in modern civil aviation industry and reviewed the major reliability techniques used in the different phases of the lifecycle of commercial aircrafts.The reliability techniques covered in this paper include reliability requirement identification,reliability analysis and design,verification and validation of reliability requirements,quality assurance,maintenance,fault diagnosis and prognosis,etc.Lessons learnt from successful reliability engineering practices in civil aviation industry are also discussed,including carefully implementing reliability systems engineering,selecting highly reliable components,widely applying redundancy designs,taking high premium on reliability tests, carefully implementing maintenance plans.These five lessons learnt are most important enablers for achieving high reliability in modern commercial airplanes and can serve as reference when planning reliability assurance programs also for other industries with high reliability requirements.

Acknowledgements

This research is supported by the National Natural Science Foundation of China (Nos.61573043,71671009 and 71601010).The authors would express their deepest gratitude to their numerous industrial partners from civil aviation industry who collaborated with the authors through consulting services or collaborative researches. The experience of collaborating with them greatly helps the authors to shape this paper.