Accuracy of an administrative database for pancreatic cancer by international classification of disease 10th codes: A retrospective large-cohort study
2019-10-19YoungJaeHwangSeonMeeParkSoominAhnJongChanLeeYoungSooParkNayoungKim
Young-Jae Hwang, Seon Mee Park, Soomin Ahn, Jong-Chan Lee, Young Soo Park, Nayoung Kim
Abstract BACKGROUND Korean National Health Insurance (NHI) claims database provides large-cohort.However, studies regarding accuracy of administrative database for pancreatic cancer (PC) have not been reported. We aimed to identify accuracy of NHI database regarding PC classified by international classification of disease (ICD)-10 codes.AIM To identify the accuracy and usefulness of administrative database in PC and the accurate ICD codes for PC with location.METHODS Study and control groups were collected from 2003 to 2016 at Seoul National University Bundang Hospital. Cases of PC were identified in NHI database by international classification of diseases, 10th revision edition (ICD-10 codes)supported with V codes. V code is issued by medical doctors for covering 95% of medical cost by Korean government. According to pathologic reports, definite or possible diagnoses were defined using medical records, images, and pathology.RESULTS A total of 1846 cases with PC and controls were collected. Among PC, only 410(22.2%) cases were identified as specific cancer sites including head in 234 (12.7%)cases, tail in 104 (5.6%) cases and body in 72 (3.9%) cases. Among PC, 910 (49.3%)cases were diagnosed by definite criteria. Most of these were adenocarcinoma(98.0%). The rates of definite diagnosis of PC were highest in head (70.1%)followed by body (47.2%) and tail (43.3%). False-positive cases were pancreatic cystic neoplasm and metastasis to the pancreas. In terms of the overall diagnosis of PC, sensitivity, specificity, positive predictive value, and negative predictive value were 99.95%, 98.72%, 98.70%, and 99.95%, respectively. Diagnostic accuracy was similar both in terms of diagnostic criteria and tumor locations.CONCLUSION Korean NHI claims database collected according to ICD-10 code with V code for PC showed good accuracy.
Key words: Korean national health insurance; Accuracy; Pancreatic cancer; International classification of disease; Sensitivity; Specificity
INTRODUCTION
Pancreatic cancer (PC) has a very poor prognosis because most are diagnosed at advanced stages, are inoperable state due to invasion of adjacent arteries, or are intractable to chemotherapy[1-3]. Accurate diagnosis of PC remains challenging despite the widespread use of endoscopic ultrasound-guided fine needle aspiration (EUSFNA) and biopsy. Therefore, pathological diagnosis of PC is not always possible, and most are diagnosed based on clinical features including image findings, clinical course, and laboratory data.
Location of primary PC is important for prognosis[4]. Patients with PC in head showed a 5% increased survival benefit as compared with PC in body or tail[4]. It may be associated with early symptom of PC in the head by obstruction of bile duct or pancreatic duct. Further research is needed about epidemiology and risk factor of PC in body or tail for screening and early diagnosis. If the primary location of PC is well described in database, it might be easier to do research for PC.
Recently, an administrative database has been widely used for medical research[5-8].The administrative database includes personal medical information of a large number of the population with long-term follow-up. In addition, administrative database can provide easy access for study of PC location such as identification of information regarding this PC location. For proper interpretation of the results derived from this database, the reliability on the database is critical. Nevertheless, their accuracy in identifying cancer patients for the claims databases collect data for the purposes of reimbursement remains in doubt[9]. Furthermore, there have been limited studies regarding accuracy and usefulness of the administrative database[9-11].
The Korea National Health Insurance System (NHIS) contains a complete set of health information pertaining to 50 million members[12]. The source of the NHIS is the Health Insurance Review and Assessment (HIRA) database, including all insurance claims information of approximately 97% of the Korean population. In this database,the name of the disease is usually coded according to the international classification of diseases, 10th revision edition (ICD-10 code) published by the World Health Organization[13,14]. Direct validation for the accuracy between the administrative dataset and NHIS data is impossible because of the Personal Information Protection Act in Korea. Therefore, validation for accuracy and usefulnessof diagnostic codes could only be performed at individual hospitals where the diagnosis of each disease was performed and reported to HIRA for insurance claims. Furthermore even though ICD-10 code includes the information for location of PC sometimes it is difficult to define the location of PC. In this situation ICD-10 code without location of PC is used by medical doctors. If current situation is analyzed, it might be good information for approaching the patients with PC.
From this background we aimed to evaluate the accuracy and usefulness of administrative database in PC. To certify the accuracy of diagnosis, we calculated the sensitivity, specificity, positive predictive value (PPV) and negative predictive value(NPV) of PC by ICD-10 codes compared to controls[15]. In addition, we aimed to identify the location of PC in detail using ICD-10 codes and electronic medical records(EMR) to define how much the doctors insert the accurate ICD codes for PC with location.
MATERISALS AND METHODS
Data source
From May 2003 to December 2016, cases of PC were retrospectively collected using the Seoul National University Bundang Hospital (SNUBH) Clinical Data Warehouse(CDW)[16], which was its own database analysis program. The EMR system contains information on the visiting hospital departments, the principal diagnoses and surgical and diagnostic procedures for each patient[17]. In addition, it includes pathologic results of specimens and imaging modalities, including computerized tomography(CT), endoscopic retrograde cholangiopancreatography (ERCP), EUS, magnetic resonance imaging (MRI), and positron emission tomography (PET)[3,18-21].
Study population
Information regarding patients, including hospital visit dates, subject characteristics,diagnostic procedures, pathologic results, and surgeries was collected. These information was easily obtained from administrative database. Other hospital medical data were identified though the uploaded database in SNUBH EMR. After approval of the study protocol by the Ethics Committee at SNUBH (IRB number B-1701/378-105), a list of patients with PC according to the ICD-10 code as primary diagnosis was acquired: (1) C25.0-25.3 (malignant neoplasm of pancreas at head, body, tail, and duct,respectively); (2) C25.4 (Malignant neoplasm of endocrine pancreas); and (3) C25.7-9(malignant neoplasm of pancreas at other parts, overlapping lesion, and unspecified,respectively)[22]. Then, searched cases were checked as being registered as V codes in the NHIS to confirm diagnostic codes[23]. The V code is a special code for patients with any ICD-10 cancer codes in South Korea, established by the Korean Ministry of Health and Welfare in 2008. Cancer patients who are registered in the NHIS have issued a V code and are reimbursed at 95% of the medical cost by the Korean government for 5 years. Control cases are defined as individuals without ICD-10 codes for PCs (C25.0-25.9) during the study periods, who experience work-up pathways similar to those of PC, including images (CT, MRI, ERCP, or EUS) and surgery.
Analyzing accuracy of PC diagnosis from administrative database
Medical records of the study and control groups were analyzed to identify definite and possible diagnostic criteria. Definite diagnoses were made according to pathologic reports compatible with PC[24-26]. Possible diagnoses were made according to image findings, clinical courses, or increased CA 19-9 > 100 U/m compatible with PC[3,21,27-29]. Typical image finding of PC was defined as focal hypo-attenuated lesions,pancreatic ductal dilation, distal pancreatic parenchymal atrophy, and involvement of the surrounding vascular structures or other organs on radiologic examinations (Table 1 and Figure 1)[30-32].
In the definite diagnosis group, cancer cell types (adenocarcinoma, adenosquamous carcinoma, or neuroendocrine tumor) and methods of pathologic diagnosis through surgery, endoscopic biopsy or FNA were analyzed (Tables 1 and 2)[25,33]. In the possible diagnosis group, we examined reports of images (CT, MRI, ERCP, EUS, endoscopy,and PET) by a radiologist or medical records of a physician’s reading of the images.We used serum levels of CA 19-9 to differentiate PC from other cancers[21,29]. To enhance the study reliability, three reviewers carefully examined medical records and compared the final decisions for each case. For discordant cases, they discussed thecases and reached consensus. After reviewing medical records and classifying each case, the sensitivity, specificity, PPV, and NPV with 95% confidence intervals (CI)were calculated. We also compared diagnostic power according to cancer sites at the head (C25.0), body (C25.1) and tail (C25.2). In addition, we analyzed patients with ICD 10-code of PC with primary location (C25.0, C25.1, and C25.2).
There was also a need for building materials to make the bridge, all kinds and shapes of wood and rock and rope and twine12. Of course, with all this material they needed carts to carry it and animals to pull it so there was a run on wagons14 and horses and donkeys and oxen. Lastly, the chinaware merchants had a field day. They sold crystal goblets15, wooden goblets, big cups, little cups, coffee cups, fat cups, skinny cups. To fill these cups the wine merchants and the milkmaids sold red wine, white wine, sparkling wine, cow s milk, goat s milk and all types of fruit juice.
Table 1 Diagnostic criteria of pancreatic cancer
RESULTS
Characteristics of cases diagnosed as PC by the international classification of diseases, 10th revision edition
A total of 1846 subjects were identified as registered with ICD-10 codes for PC at the SNUBH during the study period (Table 3). Among PC, 1428 (77.4%) cases were registered as unspecified PC, and only 410 (22.2%) cases could be identified with specific cancer sites. PC in the head [234, (12.7%)] was the most common, followed by the tail [104, (5.6%)] and body [72, (3.9%)]. Proportions of PC cases in the pancreatic duct, neuroendocrine tumor, or overlapping were very rare, with only 0.3%, 0.1%, or 0.1%, respectively. Primary cancer location couldn't be identified in patients with C25.9 [1428, (77.4%)]. In these cases with C25.9, we carefully examined all medical records one by one to identify primary cancer location.
Among PC, 910 (49.3%) cases had pancreatic pathologic results associated with the definite diagnostic criteria and classified as definite diagnosis group. Other 936(50.7%) cases were classified as possible diagnosis group. Pathologic diagnosis was accomplished by surgery in 717 (78.8%) cases, by ERCP or endoscopy in 163 (17.9%)cases, and by EUS-FNA or percutaneous biopsy in 30 (3.5%) cases. Among 1198 cases with serum levels of CA19-9, 684 (57.1%) cases had elevated levels (> 100 UL).
Diagnostic accuracy of PC by the international classification of diseases, 10th revision edition in the administrative database
We analyzed accuracy of ICD-10 codes of PC by definite or possible diagnostic criteria(Table 4). Among 910 cases with pathologic diagnosis, 904 cases satisfied definite diagnostic criteria of PC. Pathologic diagnoses were adenocarcinoma in 886 (98.0%)cases, adenosquamous carcinoma in 3 (0.3%), and neuroendocrine tumor in 15 (1.7%)cases. Six cases who were identified as false-positives, were pancreatic cystic neoplasms, including serous cystic neoplasms, mucinous cystic neoplasms and intraductal pancreatic mucinous neoplasms (Table 3). Among 938 cases with possible diagnoses, 924 subjects satisfied possible diagnostic criteria for PC. Fourteen cases identified as false-positive were pancreatic metastasis from other primary cancers in 6 cases, pancreatic cystic neoplasms in 5 cases, pancreatitis in 2 cases, and accessory spleen in 1 case.
Among 1846 cases of control, only one case of PC was identified (Table 4). This patient underwent distal pancreatectomy because of a pancreatic tail mass and pancreatic ductal dilatation on CT scan. Pathologic diagnosis was invasive carcinoma originating from an intraductal papillary mucinous neoplasm of the pancreas. This case should be coded as PC; however, it was registered as a benign neoplasm of the pancreas (D13.6)[34,35].
Figure 1 Proposed study algorithm for the inclusion and classification of subjects.
The diagnostic accuracy of PC differed according to tumor sites (Table 5). The rate of definite diagnosis in the pancreas head was 70.0%, while those in pancreas tail and body were 46.2% and 43.1%, respectively. Incorrect diagnoses including falsepositives and false-negatives were 1.4% for pancreatic body cancer, 1.0% for pancreatic tail cancer, and 0% for pancreatic head cancer.
Accuracy of the international classification of diseases, 10th revision edition of PC in the administrative database
Calculated statistical values are summarized in Table 6. For overall diagnostic criteria of PC, the sensitivity and specificity of ICD-10 codes for PC were 99.95% (95%CI:99.94-99.95) and 98.72% (95%CI: 98.70-98.73), respectively. The PPV and NPV were 98.70% (95%CI: 98.68-98.72) and 99.95% (95%CI: 99.94-99.95), respectively. For definite diagnostic criteria of PC, the sensitivity and specificity of ICD-10 codes for PC were 99.89% (95%CI: 99.88-99.90) and 99.68% (95%CI: 99.67-99.68), respectively. The PPV and NPV were 99.34% (95%CI: 99.32-99.36) and 99.95% (95%CI: 99.94-99.95),respectively. For possible diagnostic criteria for PC, the sensitivity and specificity were 99.89% (95%CI, 99.88-99.90) and 99.03% (99.02-99.05), respectively. The PPV and NPV were 98.08% (98.05-98.11) and 99.95% (99.94-99.95), respectively.
DISCUSSION
This study demonstrated that ICD-10 codes for PC in the administrative database are valid for use in population-based large-cohort studies. Although half of the cases were diagnosed by clinical and radiological features, they showed high diagnostic accuracy. Our results suggest the reliability of previous large-cohort studies using the administrative database in South Korea.
Administrative large databases from various disease registries have been used for population-based studies. However, the quality of a database may be suggested by the quotation of previous studies[36,37]or by demonstrating similar trends in national estimates[38]instead of validation of their database. Jonet al[39]studied cancer trends in liver, gallbladder, bile duct, and pancreas in an elderly population in Denmark. They identified cases by ICD-10 codes using the NORDCAN database, widely used in a previous study[40], without validation.
Previous studies for accuracy of ICD-9 codes revealed that interpretation of administrative databases relying only on ICD-9 codes requires caution. Arouset al[8]identified a total of 1107 PC patients by ICD-9 codes from institutional health care information system (HIS)-linked data sets and surgical databases. They reviewed all patients manually to validate the diagnoses. Analysis regarding pancreatic pathology revealed that 80.3% of patients had true pancreatic neoplasms and 19.7% had otherpancreatic pathologies. When they used only the HIS-linked dataset, only 36.3% of patients were consistent with pancreatic neoplasms. Friedlin et al[9]compared the diagnostic accuracy of ICD-9 codes and natural language processing (NLP)technology to identify PC in a cohort of pancreatic cysts. They reported that ICD-9 codes achieved lower specificity than did the NLP method (46% and 94%,respectively) in spite of the high sensitivity for identifying PC by both ICD-9 codes and NLP (95% and 84%, respectively).
Table 2 Cancer cell type of pancreatic cancer
Our study identified a study group of PC by ICD-10 codes by adding V code using two disease registries, the SNUBH database and the NHIS. Previous population-based large-cohort studies identified cancer populations by both V code and ICD-10 codes[41,42]. They reported the usefulness of the NHIS database collected by V code in South Korea[41,42]. Seo et al[42]compared the cancer incidence rates found in the NHIS against in the National Cancer Registry of Korea. The results showed similar overall cancer incidences as well as age-, sex-, and disease-specific rates in both databases.
The reason why we tried to identify the accuracy ofICD-10 code for PC registered in the NHIS in the present study was because the disease entity of PC is difficult to diagnose. We used two disease registries, the SNUBH database and the NHIS, to identify PC cases and controls. We analyzed the diagnostic accuracy according to definite diagnostic criteria in the presence of pathologic reports. Although the rates of pathologic diagnosis were only 49.3%, they achieved a high sensitivity of 99.89%,specificity of 99.68%, PPV of 99.34%, and NPV of 99.95%. These results provide scientific evidence of the results of previous studies using the administrative database.The rates of definite diagnosis and identification of specific cancer sites were higher for pancreatic head cancer (n = 163) than for pancreatic body (n = 33) or tail (n = 45)cancers. These results suggest that pancreatic head cancer is detected earlier and specimens are obtained more easily than for other sites[28]. In addition, we suggest that it is rather difficult to diagnose pancreatic body or tail cancer, respectively, based on pathologic finding.
Half of the cases registered as PC by ICD-10 codes were validated by possible diagnostic criteria. Because obtaining pancreatic specimens by non-surgical methods is difficult and most would not be candidates for surgery. Only 15-20 percent of patients could be candidates for surgery[4]. In our study, 717 (38.8%) patients got pancreatectomy. In patients who were not candidates of surgery or procedure because of advanced stages, PC was diagnosed only by clinical, radiologic or serologic features. For the diagnostic accuracy of PC we did not absolutely depend on the level of CA19-9. Instead we used tumor markers of CA19-9 and αFP to differentiate them from other cancer such as hepatocellular carcinoma when image findings and clinical symptoms were insufficient to diagnose PC. Cases registered as PC by ICD-10 codes without pathologic confirmation achieved a high sensitivity of 99.89%, specificity of 99.03%, PPV of 98.08%, and NPV of 99.95%.
We analyzed false-positive and false-negative cases. Cases with incorrect diagnostic pathologic codes were pancreatic cystic neoplasm. Malignant transformation can occur in premalignant pancreatic cystic neoplasm. The differential diagnosis between them is very difficult[36]. Among cases with possible diagnoses, the wrong diagnosis was caused by pancreatic metastases, pancreatic cystic neoplasm, pancreatitis, or ectopic adjacent organs. PC was difficult to differentiate from invasion, metastasis from adjacent organs or benign cystic lesion.
We found that diagnosis according to cancer sites was not accurate in spite of the high overall diagnostic accuracy for PC. Unspecified PC (C25.9) comprised 77.4% of all PC, and most of the false-positive cases (23 out of 24) were recorded as C25.9.
Table 3 Characteristics of patients with pancreatic cancer according to the international classification of diseases, 10th revision edition
Therefore PC by ICD-10 code adding a V-code in the NHIS data was not sufficient to study cancer sites. For the accurate study regarding primary PC location, we excluded PC patients of C25.9 or examined these patients one by one. If patients with C25.9 are excluded, the advantages of large administrative database disappear. If patients with C25.9 need to be checked primary cancer location individually, the advantage of easy access for medical information is eliminated. Both methods reduce the usefulness of administrative database. So we should try to fill in the ICD-10 codes with primary location of PC. Another weak point of PC coded by ICD-10 in the NHIS data was that it was not adequate for evaluation of neuroendocrine tumors. All neuroendocrine tumors were coded as C25.0 or C25.9, whereas they should be coded as C25.4.Furthermore, two adenocarcinoma cases were coded as C24.4 and should have been coded as C25.4. For the study for accuracy of diagnostic codes in the administrative database, institutions require two conditions: A high burden of cancer patients and a well-established CDW system. SNUBH might be an adequate hospital to perform this study because of its comprehensive EMR system[8]. SNUBH developed an in-house comprehensive EMR in 2003. The warehouse system provides easy access to diagnostic information for research[8,16]. In addition, SNUBH is a tertiary referral hospital to which regional hospitals would refer patients; therefore, sufficient numbers of PC cases would be enrolled in this study to enhance the power of the study results. To satisfy statistical requirements (α = 0.05, 1-β = 0.95, and effect size 0.1), more than one thousand cases are needed. The size of our study group was sufficient to fulfill the statistical criteria. We provided a new study model for evaluating the accuracy and usefulnessof large administrative databases. Many studies using large administrative databases of PC have been done, and our study could support the reliability of these studies[10,37-40,43]. To enhance the reliability of studies with large administrative databases, our study could be cited as a reference.
Our study has several limitations. One-half of cases were diagnosed by possible diagnostic criteria without pathologic confirmation. Pathologic diagnosis of PC is sometimes impossible because of poor patient conditions and technical difficulty.Therefore, if we adopted only definite diagnostic criteria of PC for accuracy of diagnosis, selection bias could occur. Another limitation was that the study was done only in a single hospital, SNUBH. The diagnostic accuracy might be increased in a tertiary referral hospital rather than a multicenter study. Because most PC cases are treated in referral hospitals in South Korea, we believe that our data may represent the entire PC data of the NHIS in South Korea. In spite of this limitation, our study demonstrated the excellent diagnostic accuracy of the PC data of the NHIS.
In conclusion, ICD-10 codes of PC in an administrative database are acceptable for use for population-based large-cohort studies. To prove reliability of administrative database, we examined subjects dividing two groups, definite and possible diagnosis.In addition, we analyzed both disease registries, SNUBH and NHIS. This study also compared with control group for calculating sensitivity, specificity, PPV and NPV.
Table 4 Diagnostic accuracy of pancreatic cancer diagnosed by the international classification of diseases, 10th revision edition in the administrative database
To identify usefulness of database, we examined cancer location. If researchers could get information of PC site through only ICD-10 code, they can perform the study more easily.
To enhance the diagnostic accuracy, we recommend patient identification by the ICD-10 code with tumor location information and V-code system. From this, we preserved huge administrative database without exclusion. More researches with multiple institutions and various diseases should be needed to practice researches with administrative database.
Table 5 Diagnostic accuracy of pancreatic cancer according to tumor sites by the international classification of diseases, 10th revision edition
Table 6 Diagnostic power of international classification of diseases, 10th revision edition for pancreatic cancer
ARTICLE HIGHLIGHTS
Research conclusions
We showed accuracy of administrative database of PC in seoul national university Bundang hospital. In addition, we identified the location of PC to usefulness of database. Administrative database is useful and important for research. However, validation of database is necessary.From this result, study based on administrative database might be reliable. Future study with administrative database of PC could receive credibility from this result. In addition, this study presented a research method how to identify validation of administrative database.
Research perspectives
We thought that future study involved multiple institute should be planned. In addition, it is important to gather data in a unified way. We think there is a need for researches for accuracy of administrative database on other disease. These researches should be necessary for studies base on administrative database.
杂志排行
World Journal of Gastroenterology的其它文章
- Pathogenesis and clinical management of Helicobacter pylori gastric infection
- Oncogenic ADAM28 induces gemcitabine resistance and predicts a poor prognosis in pancreatic cancer
- Correlation of plasma miR-21 and miR-93 with radiotherapy and chemotherapy efficacy and prognosis in patients with esophageal squamous cell carcinoma
- Post-transplant infection improves outcome of hepatocellular carcinoma patients after orthotopic liver transplantation
- Short-term efficacy of robotic and laparoscopic spleen-preserving splenic hilar lymphadenectomy via Huang's three-step maneuver for advanced upper gastric cancer: Results from a propensity score-matched study
- Estimating survival benefit of adjuvant therapy based on a Bayesian network prediction model in curatively resected advanced gallbladder adenocarcinoma