APP下载

Applying latent tree analysis to classify Traditional Chinese Medicine syndromes (Zheng) in patients with psoriasis vulgaris

2022-11-17XUWenjieZHANGNevinLIPingWANGTianfangCHENWeiwenLIUAprilMOHLERKUOMeichun

XU Wenjie,ZHANG Nevin L.,LI Ping,WANG Tianfang,CHEN Weiwen,LIU April H.,MOHLER-KUO Meichun

XU Wenjie,CHEN Weiwen,Beijing Hospital of Traditional Chinese Medicine,affiliated to Capital Medical University,Beijing 100010,China

XU Wenjie,Institute of Epidemiology,Biostatistics and Prevention,University of Zurich,Zurich 8001,Switzerland

ZHANG Nevin L.,LIU April H.,Department of Computer Science and Engineering,the Hong Kong University of Science and Technology,Kowloon,Hong Kong,China

LI Ping,Beijing Research Institute of Chinese Medicine,Beijing 100010,China

WANG Tianfang,Department of TCM Diagnostics,Preclinical School,Beijing University of Traditional Chinese Medicine,Beijing 100029,China

LIU April H.,Department of Computer Science,Shanghai University of Finance and Economics,Shanghai 200433,China

MOHLER-KUO Meichun,Department of Child and Adolescent Psychiatry and Psychotherapy,University Hospital of Psychiatry Zurich,University of Zurich,Zurich,8032,Switzerland;La Source,School of Nursing Sciences,HES-SO University of Applied Sciences and Arts of Western Switzerland,1004,Lausanne,Switzerland

Abstract OBJECTIVE: To treat patients with psoriasis vulgaris using Traditional Chinese Medicine (TCM),one must stratify patients into subtypes (known as TCM syndromes or Zheng) and apply appropriate TCM treatments to different subtypes.However,no unified symptom-based classification scheme of subtypes (Zheng) exists for psoriasis vulgaris.The present paper aims to classify patients with psoriasis vulgaris into different subtypes via the analysis of clinical TCM symptom and sign data.METHODS: A cross-sectional survey was carried out in Beijing from 2005-2008,collecting clinical TCM symptom and sign data from 2764 patients with psoriasis vulgaris.Roughly 108 symptoms and signs were initially analyzed using latent tree analysis,with a selection of the resulting latent variables then used as features to cluster patients into subtypes.RESULTS: The initial latent tree analysis yielded a model with 43 latent variables.The second phase of the analysis divided patients into three subtype groups with clear TCM Zheng connotations: ‘blood deficiency and wind dryness’;‘blood heat’;and ‘blood stasis’.CONCLUSIONS: Via two-phase analysis of clinic symptom and sign data,three different Zheng subtypes were identified for psoriasis vulgaris.Statistical characteristics of the three subtypes are presented.This constitutes an evidence-based solution to the syndromedifferentiation problem that exists with psoriasis vulgaris.

Keywords: psoriasis vulgaris;syndrome differentiation;cluster analysis;Latent tree analysis

1.INTRODUCTION

Psoriasis vulgaris is the most common form of psoriasis,an immune-mediated inflammatory skin disease characterized by well-demarcated,scaly,red plaques.In a systematic review of the global epidemiology of psoriasis,1the prevalence of psoriasis was found to range from 0% (Taiwan China) to 2.1% (Italy) among children(age < 18 years old),and from 0.91% (United States) to 8.5% (Norway) in adults.Psoriasis vulgaris accounts for 90% of all psoriasis cases.2

Promising treatments for psoriasis have recently been developed in Western Medicine.3However,there is still no drug that can completely cure this condition,and many of the drugs used have potentially-serious side effects.Traditional Chinese Medicine (TCM) has been used to treat psoriasis since ancient times.4With TCM treatment,patients with psoriasis vulgaris are first classified into different Zheng subtypes based on the symptoms and signs associated with their skin lesions and general body symptoms,as well as with tongue characteristics and their pulse,with different prescriptions used for different subtypes.5

The patient classification step in TCM is known assyndrome differentiation.This step is critical to effective treatment.Several syndrome differentiation standards are currently available for psoriasis.One example is thenational diagnosis standard,6which was set up in 1994.The latest standard is theGuidelines for Diagnosis and Treatment of Common Diseases of Dermatology in Traditional Chinese Medicine,7which was published and put into use in 2012.However,all these standards are based upon the experiences of experts and/or a review of the literature,with none based on data derived from large-sample investigations.As such,no widelyaccepted standards currently exist.In both clinical practice and research,different individuals tend to use different classification schemes.4

An effort to place syndrome differentiation for psoriasis vulgaris on an evidence-based foundation was recently made by Yanget al,4who analyzed symptom and sign data from 507 patients using latent class analysis (LCA),8and divided psoriasis into three Zheng subtypes:dampness-heat,blood heat,andYin deficiencyandblood dryness.In the study presented here,we attempted to conduct a similar analysis as Yanget al,4albeit with a few notable differences.

First,Yanget al4excluded symptoms and signs related to the skin lesions themselves from their study.However,psoriasis vulgaris is a skin disease.In fact,dermatologists generally consider the skin lesions to be the most important clinical characteristic,even within the context of TCM.In published standards,6,7skin lesion symptoms and signs play a primary,while other symptoms and signs play a secondary role.In the current work,we focused primarily on the signs and symptoms of skin lesion,and combined them with other symptoms and signs to consider all these clinical indicators of disease as a whole.

Second,when classifing patients into clusters,Yanget al4used symptom and sign variables as features for LCA.The implicit assumption was that,within each patient cluster,the symptom and sign variables are mutually independent.Known aslocal independence,this assumption is often violated when the number of variables is large,which is a widely-recognized weakness of LCA.9,10In our work,we avoided making this same restrictive assumption,by following the method proposed by Zhanget al.11Accordingly,we first analyzed symptom and sign data using Latent Tree Models (LTMs) to obtain a list of latent variables that model correlations among subsets of variables,and then used the latent variables as features for the LCA.This resulted in patient clusters that fit the data better than LCA alone.

2.METHODS

2.1.Participants

The present study used data from a cross-sectional survey carried out in Beijing.The participants were recruited between November 2005 and November 2008 from three Grade Ⅲ-A (top grade) TCM hospitals that exist in Beijing: Beijing TCM Hospital,Guang’anmen Hospital,and Dongzhimen Hospital.Patients with psoriasis vulgaris were recruited,both from among inpatients and outpatients,if they had been diagnosed by one of the treating physicians at one of those hospitals,using the diagnosis standard described inGuidelines for Clinical Research on New Chinese Medicine Drugs.12Excluded from the study were subjects who would not answer the questions.During the period of three years a total of 2764 patients aged between 5 and 91 years participated in the survey.

2.2.Procedure

Ethics approval was obtained from the Ethics Commission at Beijing Traditional Chinese Medicine Hospital,which is affiliated with Capital Medical University.The survey consisted of face-to-face interviews conducted by a dermatologist.The interview was voluntary and the patients gave oral consent and agreed to participate by answering the questions.If the patients were children,they and their parents would answer the questions together.Besides the sociodemographic information,the questionnaire included questions about symptoms and signs that had been selected in consultation withChinese Terms in Traditional Chinese Medicine and Pharmacy,13as well as certain textbooks on TCM diagnosis.14The questionnaire which including 108 symptoms and signs was finally approved by 15 experts in dermatology after rounds of discussion.These 108 symptoms and signs included skin lesion symptoms and signs,general body symptoms,and information regarding the patient’s tongue and pulse.TCM skin lesion signs are more detailed than their counterparts in Western medicine.For example,plaque color (pale red,bright red,or dark red)and shape (ring,drop,coin,map) are considered in TCM.These symptoms and signs were coded as either present or absent in the survey according to the information of the patients.A complete list of the 108 symptoms and signs used can be found both in Figure 1 and in Additional file 1.In general,it would take 20 min on average for one patient to complete the interview.After finished collecting the clinical information,a data set has been established to record all the information by EPIINFO 6.0 and dual data entry by different persons were taken to ensure data quality.

2.3.Statistical analysis

Latent tree analysis

Data were analyzed with software Lantern which was invented by Pro.Zhang NL in two phases.In the first phase,data were analyzed using latent tree models.The objective was to divide all symptom and sign variables into a number of groups,such that the correlations between variables in each group could be modeled properly using a single latent variable.This was done using the Extension Adjustment Simplification until Termination (EAST) algorithm.15In particular,we used the Java implementation of EAST that is available at the web version of Lantern.16

The results of the structure derived from the latent tree model are depicted in Figure 1.The nodes labeled with English phrases represent symptom and sign variables.Each node has two possible values,indicating the presence or absence of each listed symptom and sign.The symptom and sign variables stemmed from the data set.The nodes labeled with the capital letter ‘Y’ followed by integers are latent variables not drawn from the data set,but rather derived during data analysis to explain data patterns.

The widths of the lines represent the strength of correlations between variables,in terms of mutual information.For example,Y4 is strongly correlated with Y5,but weakly correlated with Y3.Similarly,Y7 is strongly correlated with Coin_like_lesion,but weakly correlated with Shell_like_lesion.

In the model,the symptom and sign variables are divided into groups,and there is a latent variable from each group of symptom and sign variables,which models the correlations between variables within that group.For example,the three signs Pale_red_plaque,Bright_red_plaque and Dark_red_plaque are grouped together.This is because they are mutually exclusive.The latent variable Y4 captures this mutual exclusion.It has three possible states.Hence,it partitions patients into three clusters.As shown in Table 1,the first cluster Y4=s0 incorporates 31% of the patients,Y4=s1 accounts for 33% of the patients,and the third cluster Y4=s2 captures the remaining 36%.The sign Bright_red_plague occurs with a probability of 1.00 in cluster Y4=s0,but never in the other two clusters;Dark_red_plague occurs with a probability of 1.00 in cluster Y4=s1,but never in the other two clusters;and pale_red_plague occurs with a probability of 0.97 in cluster Y4=s2,but never in the other two clusters.As such,Y4 properly models the mutual exclusion of these three signs.

Table 1 Partition given by Y4

As another example,consider the symptom and sign variables grouped under Y26.The latent variable Y26 has two possible states.As shown in Table 2,Y26 partitions patients into two clusters,the first,Y26=s0,accounting for 84% of the patients,while Y26=s1 captures the remaining 16%.The symptoms and signs Reddened_lips,Red_complexion,and Sonorous_voice rarely occur in cluster Y26=s0,while they occur with some probability in cluster Y26=s1.As such,Y26 captures the probabilistic co-occurrence of the three symptoms and signs.

Table 2 Partition given by Y26

In general,a latent variable either models (a)probabilistic co-occurrence,or (b) mutually-exclusive relationships between symptom and sign variables.The reader is referred to17for a detailed analysis of all the latent variables.

2.4.Joint clustering

In the second phase of analysis,some of the latent variables were selected and LCA performed using them as features to partition patients into several subtypes.We chose the following ten latent variables — Y4,Y5,Y7,Y8,Y9,Y10,Y11,Y24,Y25 and Y30 — for two reasons.First,since psoriasis vulgaris is a skin disease and skin symptoms and signs are the first indicators of disease that dermatologists consider,these ten variables are directly connected to skin lesion signs and symptoms.Second,the results of the first phase indicated that those latent variables were strongly correlated among themselves,but weakly correlated with other latent variables.In fact,the ten latent variables form two blocks in the model structure presented in Figure 1.They are connected to other latent variables through three links: Y4-Y3,Y11-Y16,and Y25-Y22.All three links are weak.

To perform LCA using the ten selected latent variables as features,we generated the model shown in Figure 2.The latent variables and the symptom and sign variables directly connected to them are copied from the global model.There are a total of 33 symptom and sign variables.A new latent variable Z1 is introduced to represent the patient subtypes to be identified.In the analysis,we determined the number of Z1states and the model parameters.This was performed using the Lantern software that is available at the web version of Lantern.16While each of the ten Y-latent variables represents a partitioning of patients primarily based on some of the 33 symptom and sign variables,Z1 represents partitioning based on all ten latent variables and,hence,all 33 symptom and sign variables.As such,the second phase of the analysis is called joint clustering.11

Figure 1 Structure of the model obtained by latent tree analysis on a psoriasis vulgaris data set

Figure 2 Structure of the Z1 model obtained by joint cluster analysis using latent variables Y4,Y5,Y7,Y8,Y9,Y10,Y11,Y24,Y25 and Y30

3.RESULTS

In total,2764 patients participated in the survey.Among them,1561 (56.5%) were male and 1203 (43.5%) female.Most (96.5%) were Han Chinese.Average subject age was (37 ± 15) and average duration of disease (10 ± 10)years.Since psoriasis vulgaris is a chronic disease with a typical course of repeated disease flairs between periods of either complete remission or disease stability,almost three quarters of the participants (72.3%) had recurrent,as opposed to newly-diagnosed or first-episode disease.The patients could also be classified into three phases of psoriasis vulgaris according to Western Medicine diagnostic standards.More than half were considered to have stable disease (53.7%),about one third (34.2%)progressive disease,and the rest (8.8%) to be in remission.More details on socio-demographic characteristics can be found in Table 3.

The results of two-phase latent tree analysis are shown in Table 4,in which the patients are divided into three clusters.The first cluster (Z1=s0) encompasses 26% of patients,the second cluster (Z1=s1) 32%,and the third cluster (Z1=s2) 42%.

Symptom occurrence probabilities within each of the clusters are also given in Table 4.Instead of listing all 33 symptom and sign variables,we have only included those with strong correlations with Z1.The other variables not shown in the table are weakly correlated to Z1.Selection was determined using the concept of cumulative information coverage,15with the inclusion cut-off set at 99%.Since each symptom or sign has an exact mutual information value with the corresponding latent variable and contributes to explaining the latent variable to some degree,cumulative information coverage depicts what percentage of the latent variable’s variance is accounted for by the selected symptom and sign.

Table 3 Socio-demographic characteristics of 2764 psoriasis vulgaris patients

Table 4 Partitioning of 2764 Psoriasis Vulgaris patients generated by joint clustering

The three patient classifications clearly correspond to TCM Zheng concepts.In the cluster Z1=s2,the signs/symptoms dark red plagues,severe infiltration,thick scales,and flaky scales occur with much higher probabilities than within either of the other two clusters.In TCM theory,the co-occurrence of those four features is characteristic of the Zheng type blood stasis.In cluster Z1=s1,drop-like lesions and abundant fresh lesions occurred with much higher probabilities than in the other two clusters;these two features cha-racterize the Zheng type blood heat.Finally,in cluster Z1=s0,coin-like lesions and pale red plagues,which are characteristic of blood deficiency and wind dryness,occurred with a high probability.

These results lead to the following conclusion: there are three Zheng subtypes among patients with psoriasis vulgaris: blood deficiency and wind dryness,blood heat,and blood stasis.

(a) The ‘blood stasis’ Zheng type (Z1=s2) is the most common classification,with roughly 42% of patients assigned to this subtype.These patients are more likely than others to report or demonstrate dark red plagues,severe infiltration,mild infiltration,thick scales,and flaky scales.They also have a high probability of coinlike lesions.

(b) ‘Blood heat’ (Z1=s1) is the second most common Zheng type,accounting for 32% of patients.These patients are more likely than others to have drop-like lesions and abundant fresh lesions.They also might have fine scales,pale red plagues,flaky scales,mild infiltration,and bright red plagues with either a high or moderately-high probability.

(c) Finally,26% of the patients were allocated to the‘blood deficiency and wind dryness’ Zheng type (Z1=s0).These patients were more likely than others to have coin-like lesions and pale red plagues.They also might have had fine scales,flaky scales,mild infiltration,and dark red plagues with either a high or moderately-high probability.

The three-cluster partition described in Table 4 was obtained based on skin lesion signs and symptoms,however,taking all other clinical features into consideration when using latent tree analysis.It would naturally be interesting to investigate how it correlates with other symptom and sign variables.To do so,we first calculated the probability of each patient belonging to each of the clusters and assigned that patient to the cluster of maximum probability.This is called hard assignment.After that,we tallied up the frequencies of the symptoms and signs in each of the clusters.

These results are shown in Table 5.Excluded from that table are those signs and symptoms whose frequencies do not vary significantly across the clusters,according to chi-square analysis,with p set as 0.05.We see that,in the blood deficiency and wind dryness cluster,the following six features occurred with higher probabilities than in other clusters: pink tongue,little tongue coating,deep pulse,pale complexion,hectic cheek and pale lips.In the blood heat cluster,the following five features occurred with higher probabilities than in other clusters: red tongue,slipper pulse,rapid pulse,pharyngeal swelling,and reddened lips.And in the blood stasis cluster,the following three features occurred with higher probabilities than in other clusters: purple dark tongue,deep red tongue,and stringy pulse.These findings are all consistent with relevant TCM theories,indicating that these characteristics are useful for differentiating between clusters or the corresponding Zheng subtypes.

Table 5 Occurrence probabilities for tongue,pulse and general body symptoms within the three clusters

4.DISCUSSION

The present study,using a large sample of patients with psoriasis vulgaris residing in a metropolitan city,showed that through two-phase latent tree analysis of clinic signs and symptom data,three main syndromes could be identified for psoriasis vulgaris: ‘blood heat’,‘blood deficiency and wind dryness’,and ‘blood stasis’.

The results of our analysis match the national standard for psoriasis Zheng classification6well.To show this,we summarize the information for each of the three patient clusters identified in list form.For each cluster,we list the characteristics that occur with the highest probabilities in that cluster.The clinical features with the second highest level of probability are also included,if these probabilities are not too distant from those that are highest.

(a) Blood heat (32%): drop-like lesions,fine scales,abundant fresh lesions,bright red plagues,ring-like lesions,Auspitz’s sign,isomorphism,pharyngeal swelling,reddened lips,red tongue,slippery pulse,rapid pulse.

(b) Blood deficiency and wind dryness (26%): coin-like lesions,fine scales,pale red plaques,pale complexion,pale lips,hectic cheek,pink tongue,little tongue coating,deep pulse.

(c) Blood stasis (42%): coin-like lesions,mild/severe infiltration,thick scales,flaky scales,dark red plaques,bright red plaques,dark purple tongue,deep red tongue,stringy pulse.

The national standard (Sections 33.2.1,33.2.2,and 33.2.3 of Reference)6also divides psoriasis vulgaris into three Zheng subtypes:

(a) Wind heat and blood dryness: fresh red lesions,increased rashes and red plagues,Auspitz′s sign,petechial bleeding,isomorphism,vexation,thirst,dry stool,yellow urine,red tongue,yellow or greasy tongue coating,and a stringy,slippery or rapid pulse.

(b) Blood deficiency and wind dryness: pale lesions,partially fading lesions,relatively abundant scales,excessive thirsty,dry stools,pale pink tongue,thin and white tongue coating,slow and fine pulse.

(c) Blood stasis on skin: thick lesions with infiltration,dark in color and long lasting,dark purple tongue,ecchymosis,astringent or slow pulse.

There is a clear match between our results and the national standard.The value of our work lies in providing evidence-based justification for the classification of psoriasis vulgaris into three Zheng subtypes.In addition,we provide quantitative information about the prevalence and symptom characteristics of these subtypes.

As mentioned above,Yanget al4analyzed the symptom and sign data of 507 psoriatic patients using LCA and concluded that psoriasis should be divided into three Zheng subtypes: dampness-heat (35.1%),blood heat(34.7%),andYindeficiency and blood dryness (30.2%).Yanget al4excluded skin symptoms and signs from their study.Consequently,their results do not match the national standard as well as ours.The present study involving two-phase latent tree analysis demonstrated a better way to classify psoriasis vulgaris,with the following improvements.

First,we used two-phase latent tree analysis to avoid thelocal independencethat can occur with standard LCA.The study by Yanget al4included a large number (n=96) of symptoms and signs.The use of LCA in such a situation is questionable due to local independence,a well-known weakness of LCA.To illustrate the impact of local independence,we performed LCA on our data using the 33 skin lesion characteristics directly as features.This analysis divided patients into six clusters.The Bayesian information criterion (BIC) score for the plain LCA model was 36 022.In contrast,the BIC score(Figure 2) obtained by the two-phase approach we used was 31 049,which is much lower,indicating that the three-cluster result yielded by the two-phase method fits the data much better than the six-cluster result obtained using standard LCA.

Second,in our analysis,we classified patients according to their skin symptoms and signs first,since psoriasis vulgaris is,after all,primarily a skin disease in which skin-related issues are among the most problematic.However,with two-phase latent structure analysis,all other clinical characteristics also were considered by using the hard assignment technique to combine general body signs and symptoms,and information about the patient’s tongue and pulse with skin symptom and signs,since TCM theory always considers diseases to be systemic.

Based on this,it is reasonable to expect that the results reported by Yanget al.4would differ significantly from ours,in terms what the subtypes are,their prevalence,and their characterization in terms of sign/symptom occurrence probabilities.A key reason for these differences is that Yanget al4excluded skin lesion features from their analysis,while we primarily focused on such features in our own.Other potentiallycofounding reasons include the different geographical locations of the two studies,which may result in small differences in patient characteristics,and the composition of the sign and symptom check list.

Several other studies have been published on the Zheng classification of psoriasis,10,18-20but these studies were based on labeled data: each patient was assigned to a Zheng subtype by an attending doctor,and the distribution statistics of subtypes calculated and reported.In contrast,both our work and the study by Yanget al.4were conducted using unlabeled data,containing information about symptoms and signs,but no predetermined judgments regarding Zheng subtype.As such,there was less subjectivity in our two studies.Patients were divided into several clusters based upon the distributional characteristics of their signs and symptoms.In conclusion,viatwo-phase latent tree analysis,2764 psoriasis vulgaris patients were classified into three Zheng subtypes: blood heat,blood deficiency and wind dryness,and blood stasis.The prevalence of these subtypes and the probabilities of signs and symptoms occurring within each subtype have been presented.Our results match those of the national standard for Zheng classification of psoriasis.We thereby provide justification for this standard and supplement it with quantitative information.

Our investigation was conducted on data collected from several hospitals in Beijing,a large metropolitan city in North China.Therefore,geographic or climate differences that may have affected the classification of patients with psoriasis vulgaris could not be assessed.Further research across different regions and climates should be considered to generalize our findings.

Furthermore,to date,we have only identified the subtypes.Rules for classifying patients into these subtypes have not yet been developed,and should be one major goal of future research.

5.ACKNOWLEDGMENTS

We are grateful to doctors within the departments of Dermatology at Beijing Traditional Chinese Medicine Hospital,Guang’anmen Hospital,and Dongzhimen Hospital who provided the data used in this paper.