APP下载

Application of artificial intelligence in gastroenterology

2019-05-08YoungJooYangChangSeokBang

World Journal of Gastroenterology 2019年14期

Young Joo Yang, Chang Seok Bang

Abstrac t Artificial intelligence (AI) using deep-learning (DL) has emerged as a breakthrough computer technology. By the era of big data, the accumulation of an enormous number of digital images and medical records drove the need for the utilization of AI to efficiently deal with these data, which have become fundamental resources for a machine to learn by itself. Among several DL models, the convolutional neural network showed outstanding performance in image analysis. In the field of gastroenterology, physicians handle large amounts of clinical data and various kinds of image devices such as endoscopy and ultrasound. AI has been applied in gastroenterology in terms of diagnosis,prognosis, and image analysis. However, potential inherent selection bias cannot be excluded in the form of retrospective study. Because overfitting and spectrum bias (class imbalance) have the possibility of overestimating the accuracy,external validation using unused datasets for model development, collected in a way that minimizes the spectrum bias, is mandatory. For robust verification,prospective studies with adequate inclusion/exclusion criteria, which represent the target populations, are needed. DL has its own lack of interpretability.Because interpretability is important in that it can provide safety measures, help to detect bias, and create social acceptance, further investigations should be performed.

Key words: Artificial intelligence; Convolutional neural network; Deep-learning;Computer-assisted; Gastroenterology; Endoscopy

INTRODUCTION

Recently, artificial intelligence (AI) using d eep-learning (DL) has emerged as a breakthrough comp uter technology, and numerous research stud ies, using AI app lications to id entify or differentiate images in various med ical fields includ ing radiology, neurology, orthopedics, pathology, ophthalmology, and gastroenterology,hav e been p ublished[1]. How ev er, AI, the d isp lay of intelligent behav ior indistinguishable from that of a human being, w as alread y mentioned in the 1950s[2].Although AI has w axed and w aned over the past six d ecades w ith seemingly little improvement, it was constantly applied to the medical field using various models of machine learning (ML) includ ing Bayesian inferences, d ecision trees, linear d iscriminants, sup p ort vector machines (SVM), logistic regression, and artificial neural networks (ANNs).

By the era of big data, the accumulation of enormous digital images and medical records drove a need for the utilization of AI to efficiently deal with these data, w hich also become fundamental resources for the machine to learn by itself. Furthermore,the evolution of comp uting pow er with graphic processing units can overcome the limitations of trad itional ML, p articularly overtraining for input d ata (overfitting).This led to a revival of AI, especially w hen using DL technology, a new form of ML.Among several DL methods, the convolutional neural network (CNN), w hich consists of multilayers of ANN w ith step-by-step minimal processing, show ed outstanding performance in image analysis and has received attention in AI (Figure 1 and Table 1).

In the field of gastroenterology, physicians hand le large amounts of clinical data and various kind s of image d evices such as esop hagogastroduod enoscopy (EGD),colonoscopy, capsule end oscop y (CE), and ultrasound equip ment. AI has been ap plied in the field of gastroenterology w hen making a d iagnosis, p red icting a prognosis, and analyzing images. Previous studies reported remarkable results of AI in gastroenterology. The rapid progression of AI d emand s that gastroenterologists learn the utility, strengths, and pitfalls of AI. In addition, physicians should prepare for the changes and effects of AI on real clinical practice in the near future. Hence, in this review, w e aim to (1) briefly introduce an ML technology; (2) summarize an AI ap plication in the field of gastroenterology, w hich is divid ed into tw o categories(statistical analysis for recognition of d iagnosis or p red iction of p rognosis, and analyze images for patient applications excluding animal studies); and (3) discuss the challenges for the application and future directions of AI.

ML TECHNOLOGY

Generally, AI is considered as a machine intelligence that has cognitive functions similar to those of humans including “learning” and “problem solving[3]”. Currently,ML is the most common approach of AI. It automatically build s mathematical algorithms from given data (known as input training data) and predicts or makes decisions in uncertain conditions without human instructions (Figure 1A)[4]. In the medical field, ML methods such as Bayesian networks, linear discriminants, SVMs,and ANNs have been used[5]. A naïve Bayes classifier that represents the probabilistic relationship between input and output data is a typical classification model[6]. The SVM, which was invented by Vladimir N Vapnik and Alexey Ya Chervonenkis in 1963[7], is a d iscriminative model that uses a divid ing hyperplane. Before DL development, SVM showed the best performance for classification and regression,which were achieved by optimizing a hyperplane with the largest functional margin(distance from the hyperplane in a high- or infinite-dimensional space to the nearest training data point of any class)[8].

Table 1 Artificial intelligence terminology

An ANN is a multilayered interconnected netw ork insp ired by the neuronal connections of the human brain. Although the ANN w as introduced by McCulloch and Walter in 1943[9], it was studied in 1957 by Frank Rosenblatt using the concept of the perceptron[10]. The ANN as a hierarchical structure consists of an input, hidd en connection (between the input and output layer), and output layer. The connection in the hidden layer has a strength (known as weight) that is used for the learning process of the netw ork (Figure 1B). Through an ap p rop riate training process (learning process), the network can ad just the value of the connection w eight to optimize the best result (Figure 1C).

In the 1980s, an ANN w ith several hidd en layers betw een the input and output layer w as introduced. This w as know n as a DL (or a deep neural netw ork). Although the ANN showed remarkable performance in managing nonlinear datasets regarding diagnosis and prognostic pred iction in the medical field, the ANN revealed several weaknesses as well: a vanishing gradient, overfitting, insufficient computing capacity,and lack of training data. These w eaknesses hampered the advancement of the ANN.Finally, the recent availability of big data provid ed sufficient input data for training,and the rapid progression of computing power allowed researchers to overcome prior limitations. Among several AI methods, DL received the attention of the public and has shown excellent performance in the computer vision area using CNNs.

A CNN consists of (1) convolutional and p ooling layers, w hich are the tw o main comp onents to extract d istinct features; and (2) fully connected layers to make an overall classification. The input images were filtered to extract specialized features using numerous sp ecific filters, and to create multip le feature map s. This preprocessing operation for filtering is called convolution. A learning process for the convolution filter to make the best feature maps is essential for success in a CNN.These feature maps are compressed to smaller sizes by pooling the pixels to capture a larger field of the image, and these convolutional and pooling layers are iterated many times. Finally, fully connected layers combine all features and produce the final outcomes (Figure 1B).

The rap id grow th of the CNN w as d emonstrated at the ImageNet Large Scale Visual Recognition Competition (ILSVRC) in 2012 by Geoffrey Hinton, and several CNNs such as Incep tion from google and ResNet from Microsoft have show n excellent performance. A grap hical summary of AI, ML, and DL d evelop ment is shown in Figure 1.

APPLICATION OF AI IN GASTROENTEROLOGY

Recognition of diagnosis and prediction of prognosis

Although AI in the field of gastroenterology recently focused on image analysis,several ML models have shown promising results in the recognition of diagnosis and prediction of prognosis. The ANN is appropriate for dealing w ith complex datasets to overcome the draw backs of trad itional linear statistics. In ad d ition, the ANN can stand for the sophisticated interactions between demographic, environmental, and clinical characteristics.

Figure 1 Schematic graphical summary for artificial intelligence, machine learning and deep learning development. A: Definition of artificial intelligence,machine learning (ML) and deep learning (DL). B: Comparison of process between classic ML and DL. C: Modes of learning and examples of ML.

In terms of diagnosis, Pace et al[11]demonstrated an ANN model in 2005 that made a diagnosis of gastroesophageal reflux disease using only 45 clinical variables in 159 cases w ith an accuracy of 100%. Lahner et al[12]performed a similar pilot study to recognize atrophic gastritis solely by using clinical and biochemical variables from 350 outpatients by using ANNs and linear discriminant analysis. This study showed great accuracy.

Regarding the prediction of prognosis, in 1998, Pofahl et al[13]compared an ANN mod el to the Ranson criteria and the Acute Physiologic and Chronic Health Evaluation (APACHE II) scoring system to predict the length of stay for patients with acute pancreatitis. The authors used a backprop agation neural netw ork that w as trained using 156 patients. Although the highest specificity (94%) was observed in the Ranson criteria, the ANN mod el show ed the highest sensitivity (75%) w hen pred icting a length of stay more than 7 d. Similar accuracy w as observed for the Ranson criteria and APACHE II scoring system[13]. In 2003, Das et al[14]used an ANN to predict the outcomes of acute low er gastrointestinal bleeding using 190 patients. The authors compared the performance of ANNs to a previously validated scoring system(BLEED), which revealed a significantly better predictive accuracy of mortality (87%vs 21%), recurrent bleed ing (89% vs 41%), and the need for therapeutic intervention(96% vs 46%) in the ANN model.

Sato et al[15]presented an ANN mod el in 2005 to predict 1-year and 5-year survival using 418 esophageal cancer patients. This ANN mod el show ed improved accuracy compared to the conventional linear discriminant analysis model.

Recently, the number of input training d ata items for ANNs was increased from hundreds to thousands of patients. Rotondano et al[16]compared the Rockall score to a supervised ANN model to predict the mortality of nonvariceal upper gastrointestinal bleed ing using 2380 patients. This ap proach show ed superior sensitivity (83.8% vs 71.4%), sp ecificity (97.5% vs 52.0%), accuracy (96.8% vs 52.9%), and area und er receiver operating characteristic (AUROC) of the predictive performance (0.95 vs 0.67)in the ANN model to those in the complete Rockall score.

Takayama et al[17]established an ANN mod el for the prediction of prognosis in patients w ith ulcerative colitis after cytoapheresis therapy and achieved a sensitivity and specificity for the need of an operation of 96% and 97%, respectively. Hardalaç et al[18]established an ANN model to predict mucosal healing by azathioprine therapy in p atients w ith inflammatory bow el d isease (IBD) and achieved 79.1% correct classifications. Peng et al[19]used an ANN model to pred ict the frequency of the onset,relapse, and severity of IBD. The researchers achieved an average accuracy to predict the frequency of onset and severity of IBD but a high accuracy in p redicting the frequency of relap se of IBD (mean square error = 0.009, mean absolute p ercentage error = 17.1%).

SVMs have been used to analyze d ata and recognize p atterns in classification analyses. Recently, Ichimasa et al[20]analyzed 45 clinicop athological factors in 690 end oscop ically resected T1 colorectal cancer p atients to p red ict lymp h nod e metastasis using a SVM. This ap proach show ed superior p erformance (sensitivity 100%, sp ecificity 66%, accuracy 69%) comp ared to those of American (sensitivity 100%, sp ecificity 44%, accuracy 49%), Europ ean (sensitivity 100%, sp ecificity 0%,accuracy 9%), and Japanese (sensitivity 100%, specificity 0%, accuracy 9%) guidelines.A p red iction mod el using a SVM mod el red uced the amount of unnecessary additional surgery (77%) when misdiagnosing lymph node metastasis than those of a pred iction model using American (85%), European (91%), and Japanese guidelines(91%). Yang et al[21]constructed an SVM-based mod el using clinicop athological features and 23 immunologic markers from 483 patients w ho und erw ent curative surgery for esophageal squamous cell carcinoma. This stud y revealed reasonable performance in identifying high-risk patients w ith p ostoperative distant metastasis[sensitivity 56.6%, specificity 97.7%, positive pred ictive value (PPV) 95.6%, negative predictive value (NPV) 72.3%, and overall accuracy 78.7%] (Table 2).

Analysis of images

Although end oscop ic screening p rograms hav e red uced the mortality from gastrointestinal malignancies, they are still the leading cause of death worldwide and remain a global economic burd en. To enhance the d etection rate of gastrointestinal neop lasms and op timize the treatment strategies, a high-quality end oscop ic examination for the recognition of gastrointestinal neop lasms and classifications betw een benign and malignant lesions are essential for the gastroenterologist. Thus,gastroenterologists are interested in the applications of AI, esp ecially w hen using CNNs and SVMs for image analysis. Furthermore, AI has been increasingly ad opted in terms of non-neoplastic gastrointestinal diseases including infection, inflammation,or hemorrhage.

Up per gastrointestinal field:Takiyama et al[22]constructed a CNN model that could recognize the anatomical location of EGD images with AUROCs of 1.00 for the larynx and esophagus, and 0.99 for the stomach and duodenum. This CNN model could also recognize specific anatomical locations within the stomach, with AUROCs of 0.99 for the upper, middle, and lower stomach.

To assist in the discrimination of early neoplastic lesions in Barrett's esophagus,van d er Sommen et al[23]d evelop ed an automated algorithm to includ e sp ecifictextures, color filters, and ML from 100 endoscopic images. This algorithm reasonably d etected early neop lastic lesions in a per-image analysis w ith a sensitivity and specificity of 83%. In 2017, the same group investigated a mod el to imp rove the detection rate of early neoplastic lesions in Barrett's esophagus by using 60 ex vivo volumetric laser end omicroscop y images. This novel comp uter mod el show ed op timal p erformance comp ared w ith a clinical volumetric laser end omicroscopy prediction score w ith a sensitivity of 90% and specificity of 93%[24].

Table 2 Summary of clinical studies using artificial intelligence for recognition of diagnosis and prediction of prognosis

Sev eral stud ies evaluated the ML mod el using sp ecialized end oscop y to d ifferentiate neop lastic/d ysplastic and non-neop lastic lesions. Kod ashima et al[25]show ed that comp uter-based analysis can easily id entify malignant tissue at the cellular level using end ocytoscop ic images, w hich enables the microscop ic visualization of the mucosal surface. In 2015, Shin et al[26]reported on an image analysis mod el to d etect esop hageal squamous d ysp lasia using high-resolution microendoscopy (HRME). The sensitivity and specificity of this model w ere 87% and 97%, respectively. During the follow ing year, Quang et al[27]from the same stud y group evolved this model, w hich w as incorporated in tablet-interfaced HRME w ith full automation for real-time analysis. As a result, the mod el red uced the costs compared to previous laptop-interfaced HRME and showed good diagnostic yields of esop hageal squamous cell carcinoma w ith a sensitivity and sp ecificity of 95% and 91%, respectively. How ever, there w as a limitation for the application of this mod el owing to the unavailability of specialized endoscopy.

Finally, Horie et al[28]d emonstrated the utility of AI using CNNs to make a diagnosis of esophageal cancer. This was trained w ith 8428 conventional endoscopic images includ ing w hite-light images (WLIs) and narrow-band images (NBIs). This CNN mod el detected esophageal cancer w ith a sensitivity of 95% and could identify all small cancers of < 10 mm. This mod el also d istinguished superficial esophageal cancer from advanced cancer with an accuracy of 98%.

Helicobacter pylori (H. pylori) infection is the most important risk factor of peptic ulcers and gastric cancer. Several researchers challenged AI to aid the end oscop ic diagnosis of H. pylori infections. In 2004, Huang et al[29]investigated the predictability of H. pylori infection by refined feature selection with a neural netw ork using related gastric histologic features in endoscopic images. This model was trained and analyzed w ith 84 image parameters from 30 patients. The sensitivity and specificity for the detection of H. pylori infection were 85.4% and 90.9%, respectively. In addition, the accuracy of this mod el for id entifying gastric atrop hy, intestinal metaplasia, and predicting the severity of H. pylori-related gastric inflammation was higher than 80%.

Recently, two Japanese researchers reported on the application of a CNN to make a diagnosis of H. pylori infection[30,31]. Itoh et al[31]d eveloped a CNN mod el to recognize H. pylori infections by using 596 endoscopic images after the data augmentation of a prior set of 149 images. This CNN model showed promising results with a sensitivity and sp ecificity of 86.7% and 86.7%, resp ectively. Shichijo et al[30]comp ared the p erformance of a CNN to that of 23 end oscop ists for the d iagnosis of H. pylori infection by using endoscopic images. The CNN model show ed superior sensitivity(88.9% vs 79.0%), sp ecificity (87.4% vs 83.2%), accuracy (87.7% vs 82.4%), and diagnostic time (194 s vs 230 s).

In 2018, a p rospective pilot study w as conducted for automated d iagnosis of H.pylori infections using image-enhanced endoscopy such as blue laser imaging-bright and linked color imaging. The p erformance of the d evelop ed AI mod el w as significantly higher with blue laser imaging-bright and linked color imaging training(AUROCs of 0.96 and 0.95) than WLI imaging training (0.66)[32].

The utility of AI in the diagnosis of gastrointestinal neoplasms was classified into tw o main categories: d etection and characterization. In 2012, Kubota et al[33]first evaluated a computer-aided p attern recognition system to id entify the d epth of the w all invasion of gastric cancer using endoscopic images. They used 902 endoscopic images and created a backp ropagation mod el after a 10-time cross validation. As a result, the d iagnostic accuracy w as 77.2%, 49.1%, 51.0%, and 55.3% for T1-4 staging,respectively. In particular, the accuracy of T1a (mucosal invasion) and T1b staging(submucosal invasion) w as 68.9% and 63.6%, respectively. Hirasawa et al[34]reported on the good performance of a CNN-based diagnostic system to d etect gastric cancers in endoscopic images. The authors trained the CNN model using 13584 end oscopic images and tested it w ith 2296 images. The overall sensitivity was 92.2%. In addition,the d etection rate w ith a d iameter of 6 mm or more w as 98.6%, and all invasive cancers w ere id entified[34]. All missed lesions w ere superficially d epressed and d ifferentiated-typ e intramucosal cancers that w ere d ifficult to d istinguish from gastritis even for experienced endoscopists. How ever, 69.4% of the lesions that the CNN d iagnosed as gastric cancer w ere benign, and the most common reasons for misdiagnosis w ere gastritis w ith redness, atrophy, and intestinal metaplasia[34].

Zhu et al[35]further app lied a CNN system to d iscriminate the invasion d epth of gastric cancer (M/SM 1 vs deeper than SM 1) using conventional endoscopic images.They trained a CNN mod el w ith 790 images and tested it w ith another 203 images.The CNN mod el show ed high accuracy (89.2%) and sp ecificity (95.6%) w hen d etermining the invasion d ep th of gastric cancer. This result w as significantly sup erior to that of experienced end oscopists. Kanesaka et al[36]studied a computeraid ed d iagnosis system using a SVM to facilitate the use of magnifying NBI to distinguish early gastric cancer. The study reported on remarkable potential in terms of d iagnostic p erformance (accuracy 96.3%, PPV 98.3%, sensitivity 96.7%, and specificity 95%) and the p erformance of area concord ance (accuracy 73.8%, PPV 75.3%, sensitivity 65.5%, and specificity 80.8%).

In terms of hepatology, the ultrasound has been challenged for the application of AI. Gatos et al[37]established a SVM diagnostic model of chronic liver d isease using ultrasound shear wave elastography (70 p atients w ith chronic liver disease and 56 healthy controls). The performance w as p romising, w ith an accuracy of 87.3%,sensitivity of 93.5%, and specificity of 81.2%, although the prospective validation was not conducted. Kuppili et al[38]established a fatty liver detection and characterization model using a single-layer feed-forw ard neural netw ork, and validated this mod el with a higher accuracy than the previous SVM-based model. These researchers used ultrasound images of 63 patients, and the gold standard for labeling for each patient was the pathologic results of a liver biopsy.

The determination of liver cirrhosis was also challenged with ML technology. Liu et al[39]developed a CNN model w ith ultrasound liver capsule images (44 images from controls and 47 images from patients with cirrhosis), and classified these images using a SVM. The AUROC for the classification w as 0.951, although the p rosp ective validation was not conducted.

Lower gastrointestinal field: Among various gastrointestinal fields, the development of an AI model using colonoscopy has been the most promising area because polyp detection during colonoscopies is frequent. This provides sufficient sources for AI training, and a missed colorectal polyp is directly associated with interval colorectal cancer development.

In terms of polyp detection, Fernandez-Esparrach et al[40]established an automated computer-vision method using an energy map to detect colonic polyps in 2016. They used 24 videos containing 31 polyps and show ed acceptable performance w ith a sensitivity of 70.4% and a specificity of 72.4% for polyp detection (Table 3). Recently,this performance w as improved with a DL application for polyp detection[41,42].Misawa et al[41]designed a CNN model using 546 short videos from 73 full-length videos, which were divided into tw o groups of training data (105 polyp-positive videos and 306 polyp-negative videos) and test data (50 polyp-positive videos and 85 polyp-negative videos). The researchers showed the possibility of the automated detection of colonic polyps in real time, and the sensitivity and specificity were 90.0%and 63.3%, respectively. Urban et al[42]also used a CNN system to identify colonic polyps. They used 8641 hand-labeled images and 20 colonoscopy videos in various combinations as training and test data. The CNN model detected polyps in real time with an AUROC of 0.991 and an accuracy of 96.4%. Moreover, it assisted in the identification of an additional nine polyps compared with expert endoscopists in the application of test colonoscopy videos.

Although there w ere many promising performances of the automated polyp detection models, a prospective validation was not conducted[43-45]. However, Klare et al[46]performed a prototype software validation under real-time conditions (55 routine colonoscopies), and the results were comparable between those of endoscopists and the established softw are. The endoscopists' polyp detection rates and adenoma detection rates were 56.4% and 30.9%, respectively, and these rates were 50.9% and 29.1% for the software, respectively). Wang et al[47]established a DL algorithm by using data from 1290 patients, and validated this model with 27113 newly collected colonoscopy images from 1138 patients. This model showed remarkable performance with a sensitivity of 94.38%, specificity of 95.2%, and AUROC of 0.984 for at least one polyp detection[47].

For AI applications of polyp characterization, magnifying endoscopic images,which is useful when discriminating pit or vascular patterns, was first adopted to enhance the performance of AI. Tischendort et al[48]developed an automated classification model of colorectal polyps by magnifying NBI images to evaluate vascular patterns in 2010. They reported that the overall accurate classification rates were 91.9% for a consensus decision between the human observers and 90.9% for a safe d ecision (classifying p olyps as neop lastic in cases w hen there w as an interobserver discrepancy)[48]. In 2011, Gross et al[49]compared the performances of a computer-based model for the differentiation of small colonic polyps of < 10 mm using NBI images. The expert endoscopists and computer-based model show edcomparable diagnostic performance in sensitivity (93.4% vs 95.0%), specificity (91.8%vs 90.3%), and accuracy (92.7% vs 93.1%)[49].

Table 3 Summary of clinical studies using artificial intelligence in the upper gastrointestinal field

AI: Artificial intelligence; EGD: Esophagogastrod uod enoscop y; CNN: Convolutional neural netw ork; AUROC: Area und er receiver op erating characteristic; SVM: Support vector machine; HRME: High-resolution microendoscopy; NBI: Narrow band image; H. pylori: Helicobacter pylori; ANN:Artificial neural netw ork; PPV: Positive predictive value.

Takemura et al[50]retrospectively compared the identification of pit patterns of a computer-based model with shape descriptors such as area, perimeter, fit ellipse, or circularity in reference to endoscopic d iagnosis by using magnified endoscopic images with crystal violet staining in 2010. The accuracies of the type I, II, IIIL, and IV pit patterns of colorectal lesions were 100%, 100%, 96.6%, and 96.7%, respectively. In 2012, the authors applied an upgraded version of a computer system via SVM to distinguish neoplastic and non-neoplastic lesions by using endoscopic NBI images,which show ed a detection accuracy of 97.8%[51]. They further demonstrated the availability of a real-time image recognition system in 2016, and the accuracy between the pathologic results of diminutive polyps and diagnosis by a real-time image recognition model was 93.2%[52].

Byrne et al[53]d evelop ed a CNN mod el for the real-time d ifferentiation of diminutive colorectal polyps by using only NBI video frames in 2017. This model discriminated adenomas from hyperplastic polyps with an accuracy of 94%, and identified the adenoma with a sensitivity of 98% and a specificity of 83%[53]. Likewise,Chen et al[54]made a CNN model trained with 2157 images to identify neoplastic or hyperplastic polyps of < 5 mm with a PPV and NPV of 89.6% and 91.5%, respectively.In 2017, Komeda et al[55]reported on the preliminary d ata of a CNN model to distinguish adenomas from non-adenomatous polyps. The CNN model was trained with 1800 conventional endoscopic images with WLI, NBI, and chromoendoscopy,and the accuracy of a 10-hold cross-validation was 75.1%.

To enhance the differentiation of polyps, a Japanese study group reported several articles for AI application w ith endocytoscopy images, which enables the observation of nuclei on site, and show ed comp arable diagnostic results to those of pathologic examinations. In 2015, these researchers first developed a computer-aided diagnosis system using endocytoscop y for the d iscrimination of neoplastic changes in small polyps. This approach show ed a comparable sensitivity (92.0%) and accuracy (89.2%)with those of expert endoscopists[56]. In 2016, this research team d eveloped a secondgeneration model that could (1) evaluate both nuclei and ductal lumens, (2) use an SVM instead of multivariate analysis, (3) p rovid e the confid ence levels of the decisions, and (4) provide a more rapid process of discriminating neoplastic changes from 0.3 s to 0.2 s. The end ocytoscopic microvascular patterns could be effectively evaluated by staining w ith d ye[57]. These researchers also developed endocytoscopy with NBI without staining to evaluate microvascular findings. This approach showed an overall accuracy of 90%[57]. The same group performed a prospective validation of a real-time computer-aided diagnosis system using endocytoscopy with NBI or stained images to id entify neop lastic d iminutive p olyp s. The researchers rep orted a pathologic prediction rate of 98.1%, and the time required to assess one diminutive polyp w as about 35 to 47 s[59].

The application of a computer-aided ultrahigh (approximately 400 ×) magnification end ocytoscop y system for the d iagnosis of inv asive colorectal cancers w as investigated by Taked a et al[60]. This system w as trained w ith 5543 end ocytoscop ic images from 238 lesions and reported a sensitivity of 89.4%, specificity of 98.9%, and accuracy of 94.1% using 200 test images[60].

For the application of AI in IBD, Maeda et al[61]d eveloped a diagnosis system using a SVM after refining p revious comp uter-aid ed end ocytoscopy systems[56-58]. They evaluated the d iagnostic performance of this mod el for the p red iction of persistent histologic inflammation in ulcerative colitis p atients. This mod el show ed good performance w ith a sensitivity of 74%, specificity of 97%, and an accuracy of 91%[61].

Currently, the resolution of images is relatively low in cap sule end oscop y compared to other digestive endoscopies. Moreover, the interpretation and diagnosis of capsule endoscopy images highly depends on the review er's ability and effort. It is also a time-consuming process. Therefore, several conditions w ere attempted for the automated d iagnosis of capsule end oscop y images includ ing angioectasia, celiac disease, or intestinal hookworms, or for small intestinal motility characterization[62-65].

Leenhardt et al[62]developed a gastrointestinal angiectasia detection model using semantic segmentation images w ith a CNN. They used 600 control images and 600 typical angiectasia images to form 4166 small bowel capsule endoscopy videos, w hich were divided equally into training and test data sets. The CNN-based model revealed a high diagnostic performance w ith a sensitivity of 100%, specificity of 96%, PPV of 96%, and NPV of 100%[62](Table 4). Zhou et al[63]established a CNN mod el for the classification of celiac disease from control w ith cap sule endoscopy clips from six celiac disease patients and five controls. The researchers achieved 100% sensitivity and specificity for the test data set. Moreover, the evaluation confid ence w as related to the severity level of small bowel mucosal lesions, reflecting the potential for the quantitative measurement of the existence and d egree of pathology throughout the small intestine[63]. Intestinal hookworms are d ifficult to find w ith direct visualization because they have small tubular structures w ith a whitish color and semitransparent features similar to background intestinal mucosa. Moreover, the presence of intestinal secretory materials makes them difficult to detect. He et al[64]established a CNN model for the detection of hookworms in capsule endoscopy images. The CNN-based model show ed a reasonable performance with a sensitivity of 84.6%, specificity of 88.6% and only 15% hookworm images and 11% non-hookworm image were falsely detected.

The interpretation of w ireless motility capsule end oscopy is a complex task. Seguí et al[65]established a CNN mod el for small-intestine motility characterization and achieved a mean classification accuracy of 96% for six intestinal motility events(“turbid”, “bubbles”, “clear blob”, “w rinkles”, “w all”, and, “und efined”). This outperformed the other classifiers by a large margin (a 14% relative performance increase).

CHALLENGES AND FUTURE DIRECTIONS FOR APPLICATION OF AI

Although many researchers have investigated the utility of AI and have shown promising results, most studies were designed in retrospective manner: as a casecontrol study from a single center, or by using endoscopic images that were chosen from specific endoscopic modalities unavailable from many institutions. Potentialinherent bias such as selection bias cannot be excluded in this situation. Therefore, it is crucial to meticulously validate the performance of AI before the application of AI in real clinical p ractice. To p rop erly verify the accuracy of AI, p hysicians should und erstand the effects of overfitting and spectrum bias (class imbalance) on the performance of AI, and try to evaluate the performance by avoiding these biases.

Table 4 Summary of clinical studies using artificial intelligence in the lower gastrointestinal field

AI: Artificial intelligence; CNN: Convolutional neural netw ork; NBI: Narrow band image; AUROC: Area under receiver operating characteristic; SVM:Support vector machine; PPV: Positive predictive value; NPV: Negative predictive value.

Overfitting occurs w hen a learning mod el tailors itself too much on the training dataset and predictions are not w ell generalized to new datasets[66](Table 5). Although several method s were used to reduce overfitting in the d evelopment of DL mod els,they did not guarantee the resolution of this problem. In ad dition, datasets that w ere collected by case-control d esign are p articularly vulnerable to sp ectrum bias.Sp ectrum bias occurs w hen the d ataset used for mod el d evelop ment d oes not ad equately rep resent the range of patients w ho w ill be ap plied in clinical p ractice(target population)[67].

Because overfitting and spectrum bias may lead to overestimation of the accuracy and generalization, external validation using unused datasets for model development,collected in a way that minimizes the spectrum bias, is mand atory. For more robust clinical verification, w ell-d esigned multicenter p rosp ective stud ies w ith ad equate inclusion/exclusion criteria that rep resent the target p op ulation are need ed.Furthermore, DL technology has its ow n “black box” nature (lack of interpretability or exp lainability), w hich means the d ecision mechanism of AI is not clearly demonstrated (Figure 2). Because interpretability is important in that it can provide safety measures, help to d etect bias, and establish social accep tance, further investigation to solve this issue should be performed. However, there have been some method s to complement “black box” characteristics, such as the attention map and saliency region[68].

It is obvious that the efficiency and accuracy of ML increases as the amount of data increases; how ever, it is challenging to develop an efficient ML model ow ing to the paucity of human labeled d ata given the issue of p rivacy w ith regard to p rivatemed ical records. To overcome this issue, d ata augmentation strategies (w ith synthetically modified data) have been proposed[69]. Spiking neural networks, which more closely mimic the real mechanisms of neurons, can potentially replace current ANN mod els w ith more p ow erful comp uting ability, although no effective supervised learning method currently exists[70].

Table 5 Summary of clinical studies using artificial intelligence in the capsule endoscopy

The precision of diagnosis or classification using AI does not always mean efficacy in real clinical practice. The actual benefit of the clinical outcome, the satisfaction of physicians, and the cost effectiveness beyond the academic performance must be proven by sop histicated investigation. Finally, the acquisition of reasonable regulations from responsible authorities and a reimbursement policy are essential for integrating AI technology in the current healthcare environment. Moreover, AI is not perfect. That's why “Augmented Intelligence” emerged emphasizing the fact that AI is d esigned to improve or enhance human intelligence rather than rep lace it.Although the aim of applying AI in medical practice is to improve the workflow with enhanced precision and to reduce the number of unintentional errors, established models with inaccuracy or exaggerated performance are likely to cause ethical issues owing to misdiagnosis or misclassification. Moreover, we do not know the impact of AI application on the doctor-patient relationship, which is an essential part of healthcare utilization and the practice of medicine. Therefore, ethical principles relevant to AI model development should be established in the current period when AI research begins to increase.

CONCLUSION

Since AI w as introd uced in the 1950s, it has been persistently challenged in terms of statistical or image analyses in the field of gastroenterology. Recent evaluation of big d ata and comp uter science enabled the d ramatic d evelopment of AI technology,particularly DL, w hich show ed promising potential. Now, there is no doubt that the implementation of AI in the gastroenterology field will progress in various healthcare services. To utilize AI w isely, physicians should make great effort to und erstand its feasibility and ameliorate the draw backs through further investigation.

Figure 2 lnterpretability-accuracy tradeoff in classification algorithms of machine learning.