A critique of reliability prediction techniques for avionics applications

2018-02-02GuruPrsdPANDIANDigntDASChunLIEnricoZIOMichelPECHT

CHINESE JOURNAL OF AERONAUTICS 2018年1期

Guru Prsd PANDIAN,Dignt DAS,Chun LI,Enrico ZIO,Michel PECHT

aCenter for Advanced Life Cycle Engineering,University of Maryland,College Park,MD 20742,USA

bNational Research Base of Intelligent Manufacturing Service,Chongqing Technology and Business University,Chongqing 400067,China

cChaire System Science and the Energy Challenge-Fondation Electricite’de France,CentraleSupelec,Paris 92290,France

dDepartment of Energy,Politecnico di Milano,Milan 20133,Italy

1.Introduction

The reliability property of a device(e.g.,a component or system)relates to its ability to provide its required function for the period of time needed.There are various definitions available for reliability in both academic journals and English dictionaries.For our practical purpose,reliability is defined as the ability of a device to perform as intended(i.e.,without any failure and within specified performance limits)for a specified time,in its lifecycle conditions.1From a quantitative point of view,reliability is typically evaluated as the probability that a device performs its function for a required period,under specified environmental and operational conditions.Reliability estimations are used to evaluate a design,compare design alternatives,trade off system design factors,support test planning,track reliability improvements(reliability growth),and organize maintenance and sustainment logistics.

The Military Handbook 217(MIL-HDBK-217)has been developed to estimate the reliability of military electronic equipment and systems,based on a statistical approach.Point-estimate models are used,whose parameters are to be determined fromfield failure data.Since its introduction,this handbook has been constantly cited in reliability requirement contracts.It has also been updated about once every seven years,to address deficiencies and inaccuracies.The limitation of the handbook in designing an electronic assembly has been studied in the past and has been shown to out-burden on requirements of complete information of the board design which may not be practical in real time situations.2However,the last update to MIL-HDBK-217 was implemented in 1995,in reaction to a contract where a supplier found the models to be without any scientific foundation and the results to be highly inaccurate.3,4The updated version carries the same defi-ciencies from its predecessors and is being used even currently by military and aerospace industries in their reliability and contractual documents.In spite of the updates,about 50%of the 52 major defense systems reported in between 2006 and 2011 by the Department of Defense(DoD)Office of the Director,Operational Test and Evaluation(DOT&E)have failed to meet the required reliability levels.5Since the last update,there have been other handbooks such as GJB/Z 299(Chinese version of the MIL handbook),Telcordia SR-332,PRISM,RDF-2000,217Plus,FIDES,Siemens SN29500,NTT Procedure,SAE PREL,and British Telecom HRD-5 created by different industrial groups which are advertised as addressing the limitations of MIL-HDBK-217,but as will be discussed in this article,they are basically progenies of the MIL handbook and carry forward the same shortcomings in their predictions.In addition,we discuss the effectiveness of these handbooks with respect to criteria set by IEEE standard 1413 for handbook-based predictions.

In 2014,the National Research Council’s Panel on Reliability Growth Methods for Defense Systems published a report titled ‘‘Reliability Growth:Enhancing Defense System Reliability”5to examine how current U.S.DoD practices could be improved so that defense systems would operate more reliably.A key recommendation in the report states that‘‘...military system developers should use modern Design-For-Reliability(DFR)techniques,particularly Physics-of-Failure(PoF)based methods,to support system design and reliability estimation.MIL-HDBK-217 and its progeny have grave deficiencies...”

Despite the wide consensus on the invalidity of the handbook’s modeling approach and the consequent inaccuracy in the evaluations,the avionics industry continues to use it as a reference for reliability assessments.This paper shouts concerns with the use of MIL-HDBK-217 and its progeny of updates,with specific focus on avionics systems.Alternative methodologies for reliability estimation are,then,discussed,including similarity analysis,testing,physics-of-failure,and data analytics for prognostics and systems health management.

2.MIL-HDBK-217 and its progeny

MIL-HDBK-217 was developed in 1961 for reliability prediction of electronic equipment.The handbook provides failure rate models and values for various electronic components including integrated circuits,transistors,diodes,resistors,capacitors,relays,switches,and connectors.It was developed originally for military and aerospace applications,but was also used by other industries.However,because of the inaccuracies of the document,various organizations such as Telcordia Technologies,Alion System Reliability Center,and French telecommunications industry XXX went off to develop their own handbook formulas based on the 217 approach hoping to obtain better results from those tailored versions.

In MIL-HDBK-217,the form of the model for the failure rate of a productpis

where λpis the product failure rate,λbis the base failure or the failure rate mentioned in the handbook,πTis the temperature stress factor,πRis the power rating factor,πSis the voltage factor,πQis the quality factor,and πEis the environmental factor.As it can be seen from Eq.(1),the failure rate values for different conditions are calculated extrapolating the base failure rate value(available in the handbook)using constant multiplication factors representing different types of stresses,such as temperature,power,part quality,and environmental conditions.Because the failure rate is assumed to be a constant,the underlying distribution of the failure time is exponential and a Mean Time Between Failures(MTBF)can be calculated.

All the handbook-based reliability estimation methodologies can be traced back to MIL-HDBK-217 underlying modeling approach and,hence,can be treated as its progeny.Table 1 lists some of the standards and methodologies that are considered to be the progeny of MIL-HDBK-217.5

GJB/Z299,released in 1987,was based on the then-version of MIL-HDBK-217D,and is used for electronics reliability estimation by the Chinese Military and Aerospace Communities.6,7It provides failure rates for electronic components,based on a single number for quality,environmental,electrical,and thermal stress factors.This standard was created in 1987 as GJB/Z 299-1987,and the latest version(the fourth version)was published in 2006 as GJB/Z 299C-2006.8It was adaptedwith emphasis on China’s national conditions and can be considered consistent in essence with MIL-HDBK-217.9Telcordia SR-332 is a hardware reliability estimation methodology developed by Bell Communications Research in collaboration with many other industrial companies.10PRISM is a 217 look-alike reliability assessment method,developed by the Reliability Information Analysis Center(RIAC),which is one of the 13 information analysis centers chartered by the DoD to collect and analyze data to provide reliability,maintenance,and support services to industries.11The method is available only as software and the most recent version(1.5)was released in May 2003.12RDF-93/2000 is a French reliability estimation method for commercial applications.217Plus,developed by the RIAC and updated in 2015 by Quanterion Solutions Incorporated,is a handbook of reliability prediction models based on MIL-HDBK-217.13It was updated in 2015 with failure rate models for new components,and has the same format as that of MIL-HDBK-217.The FIDES methodology was developed specifically for the French Ministry of Defense under the direction of the De´le´gation Ge´ne´rale pour l’Armement.14Siemens SN 29500 standard is used by Siemens AG and the Siemens companies,as the basis for reliability predictions.This standard is based on the IEC 61709 concept that ‘‘the failure rate of the system is calculated by summing up the failure rates of each component in each category(based on probability theory)”.This is based on the assumption that a failure of any component leads to a system failure.15Nippon Telegraph and Telephone corporation(NTT),a Tokyo-based telecommunicationsfirm,developed a system architecture that can simulate and predict reliability of communication signals congestion and large data analysis.The approach used to estimate reliability is similar to that of MIL-HDBK-217.16The SAE model is referred to as a ‘‘fudge factor” model.The model estimates the base failure rate for a generic component,which is then scaled to a specific component based on the component’s physical characteristics.The British Telecom HRD-5 standard was developed based onfield failure data and laboratoryderived data collected by British Telecom and France Telecom.17It provides failure rates for telecommunications components as well as electronic circuitry components.

Table 1 MIL-HDBK-217 and its progeny.5

The handbook methods have been computerized and commercialized by numerous companies.Aerospace component manufacturers employ many reliability modeling software packages,including Item Software,Reliasoft Lambda Predict,T-Cubed,ALD Reliability Software,Alion System Reliability Center,Isograph,PTC,SoHAR,Probabilistic Software Inc.,and Quanterion.These software tools are used to estimate reliability parameters,such as failure rate and MTBF,based on reliability handbooks such as MIL-HDBK-217,Bellcore,FIDES predictions,and Siemens SN 29500-1.For example,the European Aeronautic Defense and Space-Ae´rospatiale Protection Syste`mes(EADS-APSYS)uses Item Software,which includes MIL-HDBK-217 analysis,as design collaboration for the Airbus group of companies.18Table 2 lists these software tools.

3.Reasons not to use handbook-based reliability predictions

This section discusses the limitations of MIL-HDBK-217 and its progeny.The failure models used in handbooks such as Telcordia SR-332,CNET,PRISM,RIAC 217Plus,and FIDES can be traced back to MIL-HDBK-217.They all calculate the failure rates of components based on the same principle of obtaining failure models by statistical curve fitting offield failure data.The MIL-HDBK is titled as ‘‘Reliability Prediction of Electronic Equipment”;however,it is actually not‘‘predicting” reliability but providing deterministic numbers forfield failure rates.The above-mentioned handbooks do not consider or provide information on the design of components being assessed,nor do they consider inaccuracies in collected data which should have been presented as uncertainty percentages along with the prediction.Hence,taking numbers out of a handbook and applying them to all types of capacitors or resistors or plastic components without considering the uncertainty or confidence bounds based on the design might penalize a reliable component or would underestimate the MTBFs of components.

The estimations provided in the MIL-HDBK-217 handbook have been shown to be inaccurate through many case studies and experimental data.Discrepancies of several orders of magnitude have been observed between the MTBF obtained from these studies and that from the handbook.Discrepancies have been reported from studies conducted as early as the 1960s.Studies were conducted on commercial electronics such as computer parts and memory.25Lower Dynamic Random-Access Memory(DRAM)failure rates were observed than predicted based on testing conducted in 1994.This was noted to be a common industrial experience because the latest release of the Bellcore failure rate handbook had reduced DRAM failure rates by a factor of 2–4,and the C1 factors for DRAM in MIL-HDBK-217F were 20 times lower than those in MILHDBK-217E.Tandem Computers Company based in Cupertino conducted a study to compare the MTBFs of theirfielded products to the MTBF estimations provided in the handbook.The company makes fault-tolerant computers for the on-line transaction processing market such as banks,stock markets,and telephone companies.Over the years,the company observed that the handbook predictions were conservative as compared to actualfield failure data.The measured MTBFs offielded products were found to be always higher than the handbook-predicted MTBFs by a factor of 1.5–5.

Table 2 Reliability prediction software.

A survey conducted by the US Army showed that discrepancies existed not only in commercial electronics but also in products manufactured by DoD contract agencies.26The survey comprised electronics from diverse platforms such as communication devices,network command and control,ground systems,and aviation training systems.The survey was limited to estimates from MIL-HDBK-217 and its progeny.The handbook predictions were found to be as high as 1.2–218 times the demonstrated MTBFs.As it can be seen,discrepancies are not restricted to a single type of component or system:inaccuracies are inherent to handbook-based reliability estimation methodologies due to the reasons discussed below.

3.1.Failure model

To estimate reliability,the handbook considers stress factors due to various design and environmental parameters and computes a failure rate assuming each of them is independent of the others.Eq.(1)can be written in a function form as

whereF(t)is the calculated failure rate,andT,P,V,Q,andErepresent the temperature,power,voltage,quality,and environmental conditions,respectively.It is assumed that failure rates change linearly with respect to changes in stresses.As per the model,the power dissipated by a device has no bearing on the temperature it experiences.Similarly,the environment in which the device is operated has no bearing on the temperature effect,and the thermal stresses do not have any influence on the electrical characteristics of the components.In addition,all these stresses are point-values:the variability of each of these factors is not taken into account(e.g.,temperature cycling and powerfluctuations based on load requirements).No scientific or empirical reasoning is provided to back up this model formulation,the values given to these factors,and the classification of stresses.

Exponential models that provide constant failure rates were used in the 1950s to model mortality in actuarial studies.26Because these constant failure rates simplify reliability MTBF calculations,they were adopted by the reliability engineering community.McLinn27contended that once this approach attained widespread usage,most practitioners considered it the ‘‘reliability paradigm” without questioning its accuracy.Studies conducted on semiconductors showed that their failure rates decreased,extending the operation to many thousands of hours.To account for decreasing failure rates,the Telcordia and SAE models arbitrarily increased their infant mortality region to 10,000 and 100,000 h,respectively.5These models were developed from field data of specific part types(capacitors or resistors),but then have been generalized to be applicable with the same failure rate to all parts produced by different manufacturers with different materials.Field reliability may or may not match the prediction models because they are simple approximations and are missing inputs from factors such as mechanical shock and vibration.25Manufacturers should collect and use reliability data from their delivered products to provide feedback and adjust their assumption of a constant failure rate in their estimation models.

3.2.Temperature factors are sensitive to activation energy

The temperature factors in the handbooks are calculated based on the Arrhenius equation model.These temperature factors are listed below.Due to the exponential form of the Arrhenius equation,the temperature factors are sensitive to the activation energyEA.Even a small change as 0.05 inEAat 70°C can lead to an increase in a temperature factor by a magnitude of 5.The activation energy is in turn dependent on failure mechanisms which are not considered in the handbooks in thefirst place.In addition,the activation energy values estimated for different failure mechanisms have been shown to vary over a wide margin as shown in Table 3.Original sources for these values can be found in the cited reference.

Table 3 Activation energy based on failure mechanism.28

where∏Tis temperature stress factor;EA,Ea,1,Ea,2activation energy;A,A1,A2constants;TjTj，1，Tj，2junction temperature;Rgas constant.

3.3.Environmental and operational loading conditions

MIL-HDBK-217 does not account for different loading conditions such as mechanical,thermal,and electrical.The handbook implies that components fail at the same rate under all of the loading conditions,making it unsuitable to estimate the reliability of electronic components under specific applications.Referring to the second edition of the handbook MILHDBK-217B released in 1969,Codier29pointed out that the handbook’s estimations were ‘‘faulty” and lacked an understanding of the realities of hardware development.He also noted that there was no feasible way to evaluate the values of the updates that introduced new constants.As shown in Eq.(1),failure rates are linearly scaled from base failure rates using multiplicative factors that have constant values.Per Codier,the accuracy of such estimations depends on the estimation of these constants rather than the inherent reliability of a design and components.

The environmental factor term πEdoes not specifically account for temperature,vibration,and humidity.Moreover,accounting for thermal cycling,bending and varying frequency,and amplitude of vibrations altogether in a single number is irrational as these non-constant loading conditions have been found to be failure drivers in many of the studies.30The handbook ignores all these factors and provides constant values for more generic conditions based on how controlled environments are,such as ground benign,groundfixed,and ground mobile.

3.4.Evolving technologies

In 1965,Moore predicted that the number of transistors on Integrated Circuits(ICs)would double every nearly 2 years.31This would mean that the number of transistors should have increased by a factor of 215from thefirst edition of the handbook until its last revision.The handbook revisions after thefirst edition were barely able to capture even a few of the newer generations of ICs.Since the 1990s,the types of packages and the number of I/Os in a single package have increased drastically.However,MIL-HDBK-217 features estimates only for ceramic and plastic packages based on dual inline packages and pin grid arrays,which have become a rarity in practical applications since 2003.The handbook does not differentiate these packages from advanced packages such as Ball Grid Arrays(BGAs),Quad-Flat No-lead packages(QFNs),package on package,and stacked die packages.It would be futile to characterize the failure rates of all these packages using a single constant failure rate because each of these packages would exhibit different failure modes and mechanisms under different loading conditions.

MIL-HDBK-217 does not account for many of the recently adopted technologies in the aerospace industry.For instance,Lithium-Ion Batteries(LIBs)have become the next generation of energy storage systems in the aerospace industry,especially in commercial aircraft.Boeing used LIBs in its latest 787 Dreamliner aircraft to power up the engine and supply power to auxiliary electrical and electronic devices.32This battery equipment is not included in the handbook’s reliability estimations,making the handbook inapplicable for evaluating reliability of these energy storage systems.In addition to energy storage devices,technologies that have become common in power electronics,such as Insulated Gate Bipolar Transistors(IGBTs),Metal-Oxide Semiconductor Field-Effect Transistors(MOSFETs),and other semiconductor packaging technologies,are not a part of the handbook’s estimation models,further reducing its practical relevance.Then,manufacturers adhering to these handbook-based methodologies are forced to base their estimations of the reliability of new parts on those methods that apply the ‘‘closest” match to those parts.This may very well underestimate the reliability of newer and more reliable components,because of the extrapolation from previous,less reliable components.

3.5.Not replicating real application conditions

There have been significant discrepancies reported between failure rates estimated by the above-mentioned handbookbased methods and those observed in thefield.Jones and Hayes.33compared the failure rates estimated in FITs(Failure In Time)of components on a board used in a telecommunication application to the actual failure rate values duringfield use(collected from International Electronics Reliability Institute(IERI)at Loughborough University,United Kingdom).The board consisted of 149 components with eighteen different component types such as transformer,coil activated relay,capacitors,diodes,and LEDs.

Fig.1 shows the extent of deviation between handbook estimations and observed values (failures per million hours=fpmh).All the handbook-based estimates have penalized the board reliability,i.e.,they have under-estimated the value of the failure rate of the board while the components on the board actually have lasted longer than estimated.

Fig.1 Deviation of handbook-estimated reliabilities from values observed in thefield.33

Brown34from Northrop Grumman compared the failure rates of plastic encapsulated components estimated from MIL-HDBK-217 models to the failure rate values reported in thefield between 1993 and 1999 by the Modular Airborne Radar(MODAR)program,military designation AN/APN-241.Failure data accumulated in 300,000 operating hours was collected to make the comparison.Fig.2 shows the failure rates in fpmh estimated with multiple quality factors ranging from commercial grade(πQ=10)to highest quality(πQ=0.25)and the failure rate values fromfield data.34

It can be seen that the handbook model can provide estimates ranging from conservative(under-estimating)to optimistic(over-estimating),depending on quality factors values,and there is no standard and unambiguous procedure to choose a quality factor for a product.The selection is as arbitrary as the quality factor values themselves.This makes an estimation vulnerable to being misinterpreted and manipulated,based on the company’s ‘‘assumption”of the quality of the product.Maintenance is an important support function in aviation to improve availability,reliability,and safety of aircraft.It has been studied and concluded that traditional reliability prediction methods and maintenance plans cannot prevent most failures,thereby making preventive maintenance ineffective.35

3.6.Not meeting IEEE 1413 requirements

IEEE compiled a standard and a guidebook involving over 50 experts(working group),with leaders representing each of the major reliability prediction methods.Over 150 expert reviewed the draft standard,and a unanimous vote of approval was obtained for both the standard and the guidebook.The standard identifies the required attributes for an understandable and credible reliability prediction.A prediction made per IEEE Standard 1413 includes sufficient information including inputs,assumptions,uncertainties,and outputs,so that the risks associated with using the prediction are understood.The guide includes reviews of reliability prediction methods,field data(including similarity analysis),test data,stress and damage modeling,and handbooks(Mil-Hdbk-217F,SAE PREL,Telcordia SR-332,PRISM,and CNET).

According to the IEEE 1413,field data is obtained to find a failure distribution that bestfits the measured field performance and to extrapolate to estimate future reliability.Field data should include:the actual number of units in operation,the age of each product at the time of failure,all life cycle load conditions,the procedures used to assess the failures and determine the failure modes,mechanisms and failure sites(root cause),and all the products in which the failure could not be duplicated(no fault found).Failure data is then organized,a failure distribution(s)is obtained,and reliability metrics are determined.Test data is obtained tofind a failure probability distribution(s).Extrapolation may be necessary to predict the reliability for the intendedfield conditions.Reliability prediction tests are essentially of two types:non-accelerated tests and accelerated tests.In order to obtain a good result from an accelerated test,it is generally required that the same failure mechanism active during part operation is dominant during accelerated testing,and the acceleration(from test to useful life)of this failure mechanism can be expressed in the form of an acceleration transform.Load(stress)and damage simulation analysis is used to determine when a specific failure mechanism will occur for a product in a given environment.The load and damage model approach typically involves:identifying potential failure modes,failure mechanisms,and failure sites(FMEA);identifying appropriate failure models for speci-fic failure mechanisms and sites,including inputs associated with material characteristics,damage properties,relevant geometries at failure sites,manufacturingflaws and defects,and life cycle environmental and operational loads;determining the variability for each design parameter and computing the effective reliability function.

The results of an assessment made on the handbook against IEEE 1413 criteria are displayed in Table 4.A new update to this standard has been released in 201036which has all the criteria listed in the table and in addition has few criteria added to the points to be met by a valid reliability prediction handbook.

Fig.2 MIL-HDBK-217 failure rate estimates vsfield failure data.34

4.Alternatives to MIL-HDBK-217

Since the 1980s,researchers and engineers have questioned the usability and accuracy of MIL-HDBK-217.The two main factors driving the concerns are improvements in electronic component design(complexity and size factor)and changes in manufacturing technology(quality factor).To account for these,European industries have shown interest in developing a standard similar to the handbook but capable of distinguishing each supplier’s reliability practice.This approach could provide vendor-specific reliability estimates.The Japanese reliability community has been focusing on the physics of failure(PoF)approach due to the limitations of the constant failure rate assumption and the averaging effect of the Arrhenius relation for all failure mechanisms.Moreover,the last update made to Military Acquisition Handbook 17937in 1993 includes the uses of reliability physics-based design,suitability analysis,and verification as essential to the success of its application for specific integrated circuits.

4.1.Physics of failure model based prediction

As modern electronics keep getting more complex and component manufacturing practices keep maturing,failures caused by component mismatch rather than a component itself are becoming more common.Especially in complex avionics modules,it is difficult to locate the failure cause using just on-board diagnostics.The Consumer-Off-The-Shelf(COTS)electronics industry is moving into sub-23-nm semiconductor sizes,which will introduce new reliability issues and require appropriate technological solutions.This scaling of semiconductors,if developed without reliability analysis,is expected to reduce the mean service life of microcircuits to below 10 years.38These sub-100-nm technologies will have more distinct failure mechanisms than current designs due to modified atomic-level interactions.The handbook-based methodology is becoming irrelevant due to faster turn-out times of new technologies.Faster technology introduction times cause out-of-scopefield failure-based predictions and make these predictions unrepresentative of current and future technologies.

PoF is a reliability design approach that examines a component’s failure mechanisms in an appropriate environment throughout the component’s life cycle.These failure mechanisms are generally due to stress or wear-out,with contributing factors from material selection,mechanical stress,electrical stress,thermal effects,and chemical interactions.Material degradation caused by wear-out can be modeled using PoF,whereas handbook-based methodologies do not consider at all wear-out mechanisms.The Aerospace Vehicle Systems Institute(AVSI)has funded many studies to develop models for small-scale effects such as electromigration,timedependent dielectric breakdown,hot carrier injection,and negative bias temperature instability.Similarly,VMEbus International Trade Association(VITA),an industrial working group,developed a PoF-based approach for reliability estimation.39

PoF models and techniques can be applied to initial designs of electronics components,to determine potential failure mechanisms and locations for these failures.Failure mechanisms are modeled explicitly and used to estimate product reliability,improve product design,and conduct life cycle testing.Application of PoF can direct accelerated testing towards identified failure areas to verify a failure mechanism and validate a model.If the model is not accurate,then testing results can be fed back into the model itself to refine it.After the component is matured and sold,any reliability data from thefield can be used to update the failure models and reliability assessments.

In 2004,the U.S.Defense Advanced Research Projects Agency(DARPA)and the Naval Air System Command40funded a project to develop a methodology to combine prognostics with PoF modeling to improve the reliability prediction accuracy for military equipment.The methodology was tested on an H-60 helicopter gear with a seeded fault/crack.It was found that prediction of the system’s state of health was improved by using system-level observable features as inputs to PoF models to track the component’s material condition.The fusion also helped improve failure prediction over a range of operational and environmental loads.The study showed that calibrated prognostic tools in combination with PoF models lead to intelligent asset management and improved availability and safety.

Table 4 Results of the IEEE 1413 assessment presented in IEEE 1413.

4.2.Data-driven prognostics

There has been increased interest in monitoring health of components and systems.Here,health refers to the extent of degradation or deviation from normal conditions.41Prognostics and health management(PHM)is an alternative approach to handbook-based reliability estimation,where the health of a product can be monitored and its Remaining Useful Life(RUL)estimated by evaluating the extent of deviation or degradation from the expected ideal state of the product in specified usage conditions.42Prognosis is conducted based on analysis of failure modes,effects of operating conditions and loads on the system,and correlation of aging symptoms with expected damage trends.PHM offers many advantages,such as:(A)premature failure warnings;(B)minimized unscheduled maintenance;(C)longer intervals between maintenance cycles;(D)improved system availability;(E)reduction in life cycle costs by minimizing downtime and inspection costs and by better buffer management;(F)improved qualification and assisting in the design and logistical support offielded and future systems.43Different prognostics techniques can be adopted to predict the failure of a component or subsystem and thereby plan maintenance activities.Prognostics involve real-time monitoring of health indicators such as resistance,capacitance,and voltage of electrical and electronic devices,based on which the state of health of those devices can be estimated and in turn their RULs can be calculated.

Prognostics have been used in electronics in different formats such as(A)fuses and canaries:these are used to sense damages to systems in advance and thereby prevent catastrophic failures;(B)monitoring and reasoning of failure precursors:here a measurable parameter is monitored and its deviation is considered to be an indication of system degradation;(C)monitoring environmental and usage profiles for damage modeling:measuring loads in-situ provides inputs to be used in damage models which can be used to assess degradation.41Efforts have been made in the last decade to employ prognostics in aerospace to better control the maintenance and availability of aerospace systems in a cost-effective manner.Recent advances in sensor technologies,computational power,and intelligent algorithms have made prognostics more effective in aerospace applications.The capability of prognostics to estimate downtime and remaining useful life can be used to schedule maintenance activities as shown in a simulated example in Ref.44Prognostics provide the ability to characterize components and system-level performance in real time,thereby helping advances in control systems to maintain system reliability as aircraft become larger,faster,and more complex.NASA has set up a dedicated Prognostics Center of Excellence(PCoE)at its Ames Research Center.The center is aimed at investigating damage propagation mechanisms at both the system level and the electrical and electronic component level.The focus is on developing a holistic approach of combining prognostics with PoF methods to model the probability of failure and estimate the remaining useful life.

The aerospace industry has been researching on improvements in non-renewable fuel efficiency and effective alternate energy storage systems.LIBs have been found to be the best option for a third-generation energy storage system due to their load efficiency and better energy-to-weight ratio.45In spite of the high energy density,long cycle life,low selfdischarge rate,and high output voltage,LIBs’reliability,degradation,and prognostics still remain a challenge.As can be seen in the case of the Boeing 787,46the reliability of LIBs cannot be taken for granted even when they are used only as an alternative power supply.Collateral damages,where glitches in battery operation might lead to not only power supply cutoff but also thermal runaways,47are critical aspects from a real-time monitoring and health management scope.He et al.have developed a prognostics methodology to predict failures in lithium ion batteries aiming at on-board applications such as in Boeing 787.48Similarly,Guo et al.developed a Bayesian approach based on covariate identification,model selection,and prognostics data selection strategy.49Numerous data-driven approaches such as auto-regressive(AR)model,50particlefilter,51Gaussian Process Regression(GPR),52Support Vector Machine(SVM),53and Relevance Vector Machine(RVM)54have been developed to aid the implementation of prognostics in RUL estimation.Liu et al.55developed a hybrid prognostics method to predict the RUL of batteries used in aerospace applications.They combined RVM and AR models to improve the RUL prediction accuracy in fusion with an uncertainty estimation algorithm.This algorithm was validated with the use of low-Earth-orbit simulation results run on LIBs.The ideal data to use for an item’s reliability prediction is thefield reliability data for that item in the same operating environment.

4.3.Similarity analysis-based prediction

The reliability of a system can be predicted based on certain similarity measures between the performances of a system under different conditions.The difference in performances can be due to different operating conditions or degradation in system performance.Similarity analysis is based on recognizing certain patterns in data and using those patterns to estimate system health.Patterns can be seen by employing one of the following techniques:classification,nearest neighbor,clustering,or neural networks.Classification-based methods separate data into single(unsupervised or semi-supervised mode)or multiple(supervised or semi-supervised mode)classes to identify degradation or anomalies in system performance.Multiple-class classifier techniques such as Linear Discriminant Analysis(LDA)work on the principle of maximizing the between-class scatter and minimizing the within-class scatter.A single-class classifier such as Principle Component Analysis(PCA)considers the entire dataset as single class andfinds projection vectors that increase the variance of the whole global dataset.Anomalies are assumed to be farthest away from the mean of this global dataset.

Nearest-neighbor-based techniques operate on the assumption that healthy instances occur in dense groups,whereas faulty instances occur far from their nearest neighbor.If the distance of a test data point from the kth nearest neighbor is more than a pre-set threshold value,then the test point is considered to be an anomaly.These distance measures generally are either Euclidean distance,56Mahalanobis distance,or Bayesian distance.57Clustering algorithms are used to group data into clusters of multiple healthy classes corresponding to multiple operating conditions,58and differ from nearestneighbors approaches by considering the distances of test data points from the whole cluster rather than from only thek-nearest neighbors.A test data point is considered to be healthy if it is close to the centroid of the cluster,whereas it is deemed faulty if it is far away.These techniques are based on the assumption that anomalies do not form multiple clusters themselves.

Neural networks or ‘‘Artificial” Neural Networks(ANNs)are models for recognizing patterns in data that are complex and may not follow any particular known distribution.59Each input datum in an ANN is assigned a weight arbitrarily or based on past experience.These weighted data are sent through ‘‘neurons” or processing units that add the weighted data and provide an output.During the learning process,the ANN tunes the weights assigned to the data in an iterative process until they are closer to the actual output of the training data.The accuracy of the output classification depends on the goodness of the learning algorithm used to train the ANN.The advantage of these pattern recognition-based methods is that they can be used as machine learning algorithms that enable active monitoring of a system and in situ detection of degradation.In addition,these techniques can be employed to estimate system-level reliability irrespective of the complexity of the system.

4.4.Field data-based prediction

Field data represents the actual performance of an item in its actual operational environment.Thus,a reliability estimation based onfield data is appropriate for an item already in service,(e.g.,for logistics planning,warranty reserve,repair department sizing,or future corrective actions).The ideal data to use for an item’s reliability estimation is the field reliability data for that type of item in the same operating environment.Reliability estimations based onfield data require knowledge of the operating time before failure for failed items and the accumulated operating time for all items that have not failed.This implies that three things are known for each unit:(A)initial operation time,(B)life cycle history and operating profile(along with the operating environment),and(C)failure time(or current time if the item has not failed).

Field data is rarely perfect,as needed for an analyst.It takes time,effort,and planning to create afield dataset.Some products have built-in sensors and tracking mechanisms(e.g.,operating-hour meters).A system to record the initial start time,use duration,and time of failure or removal from service is necessary.Mechanisms to collect the information are also needed.More automated collection processes,which may be built into a product,are typically more accurate but more expensive than manual processes.The value of accurate data and timely,accuratefield data analyses must be weighed against the cost of collecting the information.Will adding 20%cost,weight,or complexity to a product be justified by the benefits to customers via improved product performance,timely maintenance,or verification that contract obligations are met?As the prices of sensors and associated components continue to decline and communication between a product and its manufacturer becomes easier,collection of accuratefield data will become more cost-effective and prevalent.

Regardless of the type or use offield data,a field failure tracking and reporting system along with afield failure database is essential for providingfield data statistics.In addition to failure reporting,records of initial operating time,operating profile,operating environment,and failure time for each unit should be stored in the database.Data for maintenance actions,replacements,and returns should be kept in the failure reporting database to assist in predictions and to aid in corrective actions.Replacements include functional restoration(e.g.,switching to a backup assembly in a satellite).Returns include detailed failure event data used for diagnostics in lieu of having the failed item to examine.Failure causes in the failure reporting database should be as detailed as possible to allow future design analysis and corrective actions as well as reliability estimations.The failure reporting database is often a part of a Failure Reporting And Corrective Action System(FRACAS).It may also contain inspection and test failure data for analysis or estimations.

4.5.Test data-based prediction

Reliability predictions based on test data include failure data and failure information,albeit in a test environment.The time required to observe failures can often be accelerated to increase the amount of data available in a shorter time thanfield use.Test data-based prediction can be used in combination with or as a validation of other methods.

One critical aspect of all reliability tests is careful planning.Tests can be constructed so that they either demonstrate reliability at a specific confidence level or generate valid test hours for general data accumulation.Tests are often conducted to either determine or demonstrate reliability at the component,assembly,subsystem,or system level.Reliability test data at lower levels may be combined to infer reliability at the next higher system hierarchy if failure results from interactions are negligible.The value of test data depends on how well the test environment can be related to the actual use environment.Tests should be conducted to reflect effects of a typical operating environment;to include failures resulting from stresses such as thermal environment,electro-magnetic disturbances,and humidity;and to avoid failures that are not typical of the operating environment.

Some failures may be excluded from results when analyzing test data.However,exclusions should be approved only after rigorous analysis of the failed unit under test is completed and the failure cause can truly be ascribed to the testfixture(hardware),test software,or environmental conditions that will not be present in the actual use environment.Multiple failures due to the same single cause or exhibiting the same single mode or mechanism must all be counted as separate individual failures and not counted as a single failure.The(erroneous)rationale for consolidating failures is that there is only one underlying cause so it should be counted as only one failure.For example,in the testing of Winchester disk drives,thermal asperities are a significant failure mode.If there were 10 thermal asperities in a reliability demonstration test,they should all be counted separately,resulting in 10 instances of data loss.They should not be consolidated and counted as only one data loss(failure).

5.Conclusions and recommendations

Reliability estimations based on MIL-HDBK-217 and its progeny are known to be inaccurate and misleading.A National Academy of Science(NAS)report states,‘‘The use of Military Handbook(MIL-HDBK)217 and its progeny has been discredited as being invalid and inaccurate:they should be replaced with physics-of-failure methods and with estimates based on validated models.”Handbook estimations fail to capture the cause-and-effect relationships needed for design for reliability,reliability testing,and reliability assurance.Furthermore,the use of handbook methods has resulted in reliability estimates that have led to severe costs and,in some cases,safety issues,including loss of life.

In 1996,the U.S.Army announced that MIL-HDBK-217 should be discontinued for predicting reliability because it‘‘has been shown to be unreliable,and its use can lead to erroneous and misleading reliability predictions”.60However,some military contractors and avionics companies have not stopped using MIL-HDBK-217 or some of it progeny,as can be seen from the NAS report.5Employing MIL-HDBK-217 and its progeny such as GJB/Z-299 methodologies to predict reliability of electronic products,especially in aerospace applications,might seem easy and cheap;however,its use is known to be costly in terms of life cycles and safety.Understanding the limitations of other handbook methodologies as well,GM stated that ‘‘...GM concurs and will comply with thefindings and policy revisions of Feb.15,1996 by the Assistant Secretary of the U.S.Army for Research,Development and Acquisition....Therefore:Mil-Hdbk 217,or a similar thermal stress impact on a generic historical component reliability assessment method such as SAE PREL,SHALL NOT BE USED.The supplier may request a waiver from reliability engineering,during a phase out transition period that shall end Jan 1,1996”.61Even the Air Force Rome Laboratory,the agency that aided in preparing the MIL-HDBK,has stated that MIL-HDBK-217 ‘‘is not intended to predictfield reliability and,in general,does not do a very good job at it in an absolute sense”.62

Handbook methodologies are based on the Arrhenius equation,63which translates the failure rate burden purely on a part’s reliability excluding external factors such as the type of stresses and the corresponding degradation in strength of the part.Alternative methodologies exist and should be exploited,such as physics of failure and prognostics,which can take these into account when modeling the reliability of the product.In addition,employing prognostics techniques to predict the remaining useful life enables continuous monitoring of systems,thereby improving availability of products by aiding in maintenance and service scheduling.Similaritybased pattern recognition techniques aid in correlating operating conditions(functional stresses and environmental factors)and the state of health of components/systems by grouping data into different clusters/patterns for each condition.Reliability estimations based onfield failure data can provide behaviors of components and devices under actual operating conditions,while estimations based on test data help provide large amount of data on different failure mechanisms that a component or device can undergo under varied stress conditions in a very short span of time.

1.Kapur KC,Pecht M.Reliability engineering.New Jersey:Wiley;2014.p.4.

2.Pecht M,Kang WC.A critique of mil-Hdbk-217E reliability prediction methods.IEEE Trans Reliab1988;37(5):453–7.

3.Charpenel P,Cavernes P,Casanovas V,Borowski J,Chopin JM.Comparison betweenfield reliability and new prediction methodology on avionics embedded electronics.Microelectr Reliab1998;38(6):1171–5.

4.Nilsson M,Hallberg O¨.A new reliability prediction model for telecommunication hardware.MicroelectrReliab1997;37(10):1429–32.

5.National Research Council.Reliability growth:enhancing defense system reliability.Pittsburgh:The National Academies Press;2014.

6.Fan J.Model-based failure diagnostics and reliability prognostics for high power white light-emitting diodes lighting[dissertation].Hong Kong:The Hong Kong Polytechnic University;2014.

7.Andonova AV,Yordanov RS.Reliability prediction of HIC and MCM.Annu J Electr2009;3(2):268–71.

8.Isograph.GJB/Z 299B&299C[Internet].Available from:https://www.isograph.com/software/reliability-workbench/reliability-prediction/gjbz-299b-299c/.

9.Mou H,Hu W,Sun Y,Zhao G.A comparison and case studies of electronic product reliability prediction methods based on handbooks.International conference on quality,reliability,risk,maintenance,and safety engineering(QR2MSE);2013.

10.Telcordia Technologies.Special report SR-332:reliability prediction procedure for electronic equipment,Issue 1.Piscataway,NJ:Telcordia Customer Service;2016.

11.Reliability Analysis Centre.Surface mount technology industry directory. [Internet]. Available from: https://www.smtnet.com/company/index.cfm?fuseaction=view_company&company_id=48090.

12.Alion System Reliability Center.[Internet].Available from:https://src.alionscience.com/prism/prism_demo.html.

13.Quanterion SolutionsIncorporated.HDBK-217PlusTM:2015,Notice 1 [Internet]. Available from: https://www.quanterion.com/product/publications/hdbk-217plus-2015/.

14.FIDES Group.FIDES guide.Paris,France:Union Technique De L’Electricite;2009.

15.Siemens AG Standard Document.Failure rates of components,expected values.Berlin:Siemens;2004.SM 29500-1.

16.Shiono N,Arai E,Mutoh S.Historical overview of semiconductor device reliability for telecommunication Networks––Field data,prediction model of device failure rate,and wear-out failure analyses at NTT.NTT Tech Rev2015;11(5):1–12.

17.Item Software.HRD-5 electronic reliability prediction[Internet].Available from:http://www.itemsoft.com/iqt_hrd_5.html.

18.APSYS.Strategic risk management support and expertise[Internet]. Available from: http://www.apsys-airbus.com/en/26/Aeronautics.

19.Item Software.MIL-HDBK-217F notice 2 electronic reliability prediction[Internet].Available from:http://www.itemsoft.com/iqt_mil-217.html.

20.T-Cubed Systems Inc.ReCalc for windows features[Internet].Available from:http://www.t-cubed.com/features.htm.

21.Reliasoft.Lambda predict:Standards based reliability prediction software tool[Internet].Available from:http://www.reliasoft.com/predict/.

22.ALD.Basic reliability prediction software[Internet].Available from:http://aldservice.com/Basic-Reliability-Prediction-Software.html.

23.Quanterion Solution Incorporated.History[Internet].Available from:https://www.quanterion.com/about/history/.

24.Crimson Quality.Windchill prediction:perform reliability analyses using globally accepted standards[Internet].Available from:http://www.crimsonquality.com/products/reliability-prediction/.

25.Wood AP,Elerath JG.A comparison of predicted MTBFs tofield and test data.Proceedings of the 1994 annual reliability and maintainability symposium;1994.p.153–6.

26.Jais C,Werner B,Das D.Reliability predictions:continued reliance on a misleading approach.Proceedings of the 2013 reliability and maintainability symposium;2013.p.1–6.

27.McLinn JA.Constant failure rate—a paradigm in transition.Qual Reliab Eng Int1990;6(4):237–41.

28.Cushing MJ,Mortin DE,Stadterman TJ,Malhotra A.Comparison of electronics-reliability assessment approaches.IEEE Trans Reliab1993;42(4):542–6.

29.Codier EO.Reliability prediction—help or hoax?Proceedings of the 1969 annual symposium on reliability;1969.p.383–90.

30.Ganesan S,Pecht M.Lead-free electronics.Hoboken,NJ:John Wiley;2006.

31.Years of Moore’s law,Intel[Internet].Available from:http://www.intel.com/content/www/us/en/silicon-innovations/mooreslaw-technology.html.

32.Boeing.787 lithium-ion battery events a guide for fire fighters[Internet].2013.Available from:http://www.boeing.com/assets/pdf/commercial/airports/faqs/787batteryprocedures.pdf.

33.Jones J,Hayes J.A comparison of electronic-reliability prediction models.IEEE Trans Reliab1999;48(2):127–34.

34.Brown LM.Comparing reliability predictions tofield data for plastic parts in a military airborne environment.Proceedings annual reliability and maintainability symposium;2003.p.207–13.

35.Guo J,Li Z,Wolf J.Reliability centered preventive maintenance optimization for aircraft indicators.Annual reliability and maintainability symposium(RAMS);2016.p.1–6.

36.IEEE Reliability Society.IEEE standard framework for reliability prediction of hardware:revision of IEEE Std 1413–1998.Piscataway(NJ):IEEE Reliability Society;2010.

37.Department of Defense,USA.Military handbook 179(ER),microcircuit application handbook.Washington,D.C.:Department of Defense;1993.

38.Bechtold LE.Industry consensus approach to physics of failure in reliability prediction.IEEE reliability and maintainability symposium;2010.p.1–4.

39.VITA.VITA 51.2.Physics of failure reliability predictions:Revision 0.24.Oklahoma:VITA;2011.

40.Kacprzynski GJ,Sarlashkar A,Roemer MJ,Hess A,Hardman W.Predicting remaining life by fusing the physics of failure modeling with diagnostics.J Miner,Met Mater Soc2004;56(3):29–35.

41.Pecht M.Prognostics and health management of electronics.New York:John Wiley&Sons Ltd;2008.

42.Vichare N,Pecht M.Prognostics and health management of electronics.IEEE Trans Compon Pack Technol2006;29(1):222–9.

43.Gu J,Pecht M.Prognostics and health assessment implementation for electronic products.J IEST2010;53(1):44–58.

44.Li Z,Guo J,Zhou R.Maintenance scheduling optimization based on reliability and prognostics information.IEEE annual reliability and maintainability symposium;2016.p.1–5.

45.Liu D,Wang H,Peng Y,Xie W,Liao H.Satellite lithium-ion battery remaining cycle life prediction with novel indirect health indicator extraction.Energies2013;6(8):3654–68.

46.Patterson T.Dreamliner battery probe ends:8 questions and answers[Internet].CNN;2014.Available:＜http://www.cnn.com/2014/12/11/travel/boeing-787-dreamliner-investigation-report/>.

47.Cohan P.Thermal runaway in 787 dreamliner batteries must be stopped[Internet].2013.Available from:http://www.forbes.com/sites/petercohan/2013/02/06/thermal-runaway-in-787-dreamlinerbatteries-must-be-stopped/#34c15ddc3864.

48.He W,Williard N,Osterman M,Pecht M.Prognostics of lithiumion batteries based on Dempster-Shafer theory and the Bayesian Monte Carlo method.J Power Sourc2011;196(23):10314–21.

49.Guo J,Li Z,Pecht M.A Bayesian approach for Li-Ion battery capacity fade modeling and cycles to failure prognostics.J Power Sourc2015;281:173–84.

50.Liu D,Luo Y,Peng Y,Peng X,Pecht M.Lithium-ion battery remaining useful life estimation based on nonlinear AR model combined with degradation feature.Annual conference of the prognostics and health management society;2012.p.1803–36.

51.Saha B,Goebel K,Poll S,Christophersen J.Prognostics methods for battery health monitoring using a Bayesian framework.IEEE Trans Instrum Meas2009;58(2):291–6.

52.Liu D,Pang J,Zhou J,Peng Y,Pecht M.Prognostics for state of health estimation of lithium-ion batteries based on combination Gaussian process functional regression.Microelectr Reliab2013;53(6):832–9.

53.Pattipati B,Pattipati K,Christopherson JP,Namburu SM,Prokhorov DV,Qiao L.Automotive battery management systems.IEEE AUTOTESTCO;2008.p.581–6.

54.Widodo A,Shim MC,Caesarendra W,Yang BS.Intelligent prognostics for battery health monitoring based on sample entropy.Expert Syst Appl2011;38(9):11763–9.

55.Liu D,Xie X,Lu S,Peng Y.Battery prognostics with uncertainty fusion for aerospace applications.Annual reliability and maintainability symposium;2015.p.1–6.

56.Puterman ML.Markov decision processes:discrete stochastic dynamic programming.New York:John Wiley;1994.

57.Kontkanen P,Lahtinen J,Myllymaki P,Tirri H.An unsupervised Bayesian distance measure,Advances in Case-Based Reasoning.Proceedings of the 5th European workshop on case-based reasoning;2000.p.148–60.

58.Coates A,Ng AY.Learning feature representations with k-means.Neural networks:Tricks of the trade.Berlin:Springer;2012.p.561–80.

59.Hinton G,Salakhutdinov RR.Reducing the dimensionality of data with neural networks.Science2006;313(5786):504–7.

60.Cushing M,Krolewski J,Stadterman T,Hum BUS.Army reliability standardization improvement policy and its impact.IEEE Trans Compon,Pack,Manuf Technol,Part A1996;19(2):277–8.

61.GM North American Detroit,State of Michigon:GM;1996.Technical specification number:10288874;1996.

62.Pecht M,Boullie J,Hakim E,Jain A,Jackson M,Knowles I,et al.The realism of FAA reliability-safety requirements and alternatives.IEEE AES Syst Mag1998;13(2):16–20.

63.Pecht M,Lall P,Hakim E.Temperature as a reliability factor.Therm Manage Electron Syst II1997;27–41.

CHINESE JOURNAL OF AERONAUTICS

2018年1期