Importance of different electronic medical record components for chronic disease identification in a Swiss primary care database: a cross-sectional study

BACKGROUND: Primary care databases collect electronic medical records with routine data from primary care patients. The identification of chronic diseases in primary care databases often integrates information from various electronic medical record components (EMR-Cs) used by primary care providers. This study aimed to estimate the prevalence of selected chronic conditions using a large Swiss primary care database and to examine the importance of different EMR-Cs for case identification. METHODS: Cross-sectional study with 120,608 patients of 128 general practitioners in the Swiss FIRE (“Family Medicine Research using Electronic Medical Records”) primary care database in 2019. Sufficient criteria on three individual EMR-Cs, namely medication , clinical or laboratory parameters and reasons for encounters , were combined by logical disjunction into definitions of 49 chronic conditions; then prevalence estimates and measures of importance of the individual EMR-Cs for case identification were calculated. RESULTS: A total of 185,535 cases (i


Introduction
Electronic medical records (EMRs) are increasingly used in primary care [1][2].They are typically organised into different, often structured, components (EMR-Cs) such as medication data, laboratory values, clinical parameters or coded diagnoses.EMR data from multiple primary healthcare providers can be merged into primary care databases, which then offer significant potential for research [3][4][5][6].Large primary care databases have been developed in healthcare systems internationally, including in the UK [7][8], Canada [9], Spain [10] and Italy [11].
In Switzerland, primary care is predominantly provided by general practitioners who work in private practices and bill according to a nationwide, uniform fee-for-service tariff system.Costs exceeding a patient's deductible are covered by compulsory general health insurance.About 70% of Swiss general practitioners stored their patients' medical records in electronic form in 2019 (82% in 2023) [12], using over 20 different practice information systems [13].The first primary care database in Switzerland, called FIRE ("Family Medicine Research using Electronic Medical Records"), was established ten years earlier in 2009 [14].FIRE, which to our knowledge is still the only relevant Swiss primary care database, integrates data from seven different EMR systems used by primary care providers in German-speaking Switzerland.For more information on FIRE, visit www.fireproject.ch/en.
Coded diagnoses are often unavailable in primary care databases.In the Swiss healthcare system in particular, there are no financial incentives for coding diagnoses, and other incentives, such as facilitating access to decision aids by linking them directly to coded diagnoses, are difficult to implement on a larger scale due to the multitude of different practice softwares used.Therefore, coded diagnoses often need to be implemented secondarily in order to realize the primary care database's full potential for research, clinical practice and public health.This can be achieved, for example, using natural language processing, statistical and machine learning methods or rule-based algorithms, which process operationalised diagnostic criteria [15][16].Rulebased algorithms can yield valid results and have certain advantages over other chronic disease identification techniques -among others, faster implementation and easier interpretability -that make them attractive for many applications and researchers [15,17].Previous research on identifying chronic diseases in primary care databases using rule-based algorithms often focused on medication data, but the relevance of exploiting multiple complementary EMR-Cs has been pointed out [18].Yet there is little quantitative evidence on the importance of different EMR-Cs for chronic disease identification.
The aims of the present study were to analyse period prevalence estimates of 49 chronic conditions in the FIRE primary care database identified by customized rule-based case definitions using three commonly used EMR-Cs available in the FIRE primary care database, and to examine the importance of the different EMR-Cs for case identification using tailored importance metrics.

Study design, setting and participants
We performed a cross-sectional study using data from the Swiss FIRE primary care database [14].At the end of 2019, the FIRE primary care database held almost nine million consultation records from over 500 general practitioners with medication prescription data including Anatomical Therapeutic Chemical (ATC) codes [19] and Global Trade Item Numbers (GTIN), clinical parameters and laboratory test results, as well as reasons for encounters coded according to the International Classification of Primary Care, 2 nd edition (ICPC-2), a classification method developed by the World Organization of Family Doctors (WONCA) that allows an episodically ordered classification of reasons for encounters, health problems and interventions in primary care encounters [20].In addition, the FIRE primary care database contains administrative data and demographic information of general prac-titioners and patients; only year of birth and gender are known of the patients.Patient identification in FIRE is via identification numbers, which are already hashed in the practices before data transmission.A detailed specification of variables contained in the FIRE database has been published [21].Not all practice softwares export medication data to the FIRE primary care database with date stamps.For this study, we only considered practices where each medication prescribed in, or related to, a consultation in 2019 was exported with both a start date and an end date (or explicitly as a prescription until further notice).From these practices, we included all patients aged 18-99 years with at least one consultation in 2019 (N = 120,608) and used all their data available up to their last recorded consultation (i.e. the index consultation) in 2019.For the study flowchart, please see supplementary figure 1.
Studies within the fully anonymised FIRE project do not fall within the scope of the Human Research Act and are exempt from ethics review.Ethical approval was waived by the Ethics Committee of the Canton of Zurich; BASEC No. Req-2017-00797.

Chronic conditions
The selection of chronic diseases to be identified was based on an appraisal of the literature and consideration of their relevance to general practice.Specifically, we combined the list of 75 chronic diseases (considered most relevant in the context of multimorbidity by experts in family medicine) by N'Goran et al. [22] and the list of 40 chronic diseases (considered by clinicians as the most relevant chronic diseases that constitute multimorbidity) by Barnett et al. [23].The resulting list was reviewed by three authors (RM, TG, SM) for identifiability in the FIRE primary care database, and the selection of chronic diseases for this study was reached by consensus.Selected chronic diseases were further combined into overarching disease complexes based on presumed indistinguishability in the FIRE primary care database (e.g."asthma" and "chronic obstructive pulmonary disease" were combined into "obstructive lung disease").This resulted in a selection of 49 chronic diseases or overarching disease complexes -henceforth "chronic conditions" -potentially identifiable in the FIRE primary care database.

Case definition and diagnostic criteria
For these 49 chronic conditions, we specified individual criteria on three commonly used EMR-Cs available in the FIRE primary care database, namely medication (MED), clinical or laboratory parameters (CLP) and ICPC-2 coded reasons for encounters (RFE), each intended to sufficiently characterise the chronic condition in question.We then defined a case of a specific chronic condition as a patient with at least one EMR-C criterion for this chronic condition fulfilled.For most high-prevalence chronic conditions like diabetes or hypertension, we relied on criteria already validated in other primary care database research.In some cases however, validated criteria had to be adapted or combined, or new criteria developed, as is often the case in primary care database research [24][25]: 1. MED: Medication criteria used ATC codes or the more specific GTINs and required implementation of the Swiss pharmaceutical cost groups [26] and of all medications' indications as approved by the Swiss Agency for Therapeutic Products (swissmedic) [27].To reduce false-positives due to episodic rather than chronic prescription, we required a minimal treatment duration of six months to identify a chronic condition based on medication data.

RFE:
The reasons for encounters used were those sufficiently specific to allow identification of selected chronic conditions.
We distinguished reversible (e.g.chronic pain or mental disorders) from permanent chronic conditions (e.g.hypertension or dyslipidaemia).In permanent chronic conditions, EMR-C criteria could be met at any time in the patient history, whereas for reversible chronic conditions, they had to be met within no more than six months prior to the index consultation.
The EMR-C criteria for all 49 selected chronic conditions were compiled by RM and independently reviewed by TG and SM.Differences were resolved by consensus.The final criteria for all chronic conditions can be found in supplementary table 1.

Prevalence estimates and EMR-C importance metrics
We estimated the period prevalences p for the year 2019, of specific chronic conditions in the FIRE general practice patient population as the number of cases divided by the total number of patients included, i.e. (using notation from figure 1):

p = n/N = (m + l + r + ml + mr + lr + mlr) / #patients
For the analogous prevalence estimate of having any chronic condition, multiple cases (of different chronic conditions) with the same patient were counted only once, i.e. as one single case of the collective condition.
In order to determine the relevance of each EMR-C for case identification, we defined importance metrics as follows (again using notation from figure 1): The relative contribution of an EMR-C to case identification of a specific chronic condition was defined as the proportion of cases identified by this EMR-C divided by the total number of cases identified by any EMR-C.An EMR-C's relative exclusive contribution to the identification of cases of a specific chronic condition was defined as the proportion of cases identified by this EMR-C but not identified via any other EMR-C, again divided by the total number of cases.Using the EMR-C MED as an example, this is: and

Additional analyses
Although redundant in principle, we reported 95% Wilson confidence intervals (CIs) for all p (supplementary table 2) and selected c and c e as a courtesy to the reader (simultaneous CIs for multinomial proportions where appropriate).
For fine-grained comparability with prevalence estimates found in Swiss general practice, and thus for a more thorough validation of our estimates, we also calculated period prevalence estimates stratified for sex × age groups.Furthermore, we used medians and quartiles as well as the quartile coefficient of dispersion, defined as qcod = (q 0.75 -q 0.25 ) / (q 0.75 + q 0.25 ) to describe the distributions of the (unstratified) prevalence estimates across general practitioners.The qcod is a robust relative measure of dispersion, useful for comparing the spread of non-normally distributed sets of data that may differ in their medians or units of measurement.Its range and interpretation are similar to those of the better-known coefficient of variation.

Ethics approval and consent to participate
Studies within the FIRE project do not need ethics approval as they do not fall within the scope of the Human Research Act (waiver granted by the Ethics Committee of the Canton of Zurich; BASEC No. Req-2017-00797).

Sample characteristics
We analysed data of N = 120,608 patients (

Prevalence estimates and EMR-C importance metrics
Prevalence estimates are shown in figure 2 for any chronic condition and for the 24 most frequently identified chronic conditions (encompassing 97.0% of all identified cases) together with importance metrics of the three EMR-Cs.Full numerical data for all 49 chronic conditions (including CIs for prevalence estimates) can be found in supplementary

Stratified prevalence estimates and prevalence distributions across general practitioners
Prevalence estimates stratified by sex × age groups are presented in figure 3 for the 24 most frequently identified chronic conditions, and full numerical data for all 49 chronic conditions can be found in supplementary table 3, which also shows how prevalence estimates of chronic conditions varied across general practitioners.In both sexes, the prevalence estimates of most chronic conditions increased with age, one notable exception being migraine whose prevalence started to decrease in patients in their sixth decade.Distinctly higher prevalence estimates in female patients (compared to their male counterparts) were found for chronic pain, mental disorders, osteoarthritis, thyroid disease, irritable bowel syndrome, migraine, dementia and osteoporosis.Higher prevalence estimates in male patients were observed for hypertension, dyslipidaemia, obstructive atherosclerotic disease, diabetes mellitus, benign prostatic hyperplasia, gout and heart disease.Prevalence estimates of most chronic conditions varied little across general practitioners apart from random fluctuations in low-prevalence chronic conditions and except for chronic kidney disease (median 2.0% [IQR 0.5-3.8%],qcod = 0.76), cancer (0.6% [0.3-1.6%],0.70) and obesity (3.6% [1.9-7.6%],0.59).

Discussion
Case definitions for chronic diseases are often used for research with primary care databases, but little is known about the importance of different EMR-Cs for chronic disease identification.In this study, we implemented case definitions for 49 chronic conditions in a large Swiss primary care database and analysed prevalence estimates and the importance of three commonly used EMR-Cs for case identification.We found that MED was the EMR-C in the FIRE primary care database contributing most to chronic disease identification.CLP and RFE complemented identification of frequent chronic conditions such as chronic kidney disease, cancer, heart disease and obesity.For most chronic conditions, our case definitions yielded prevalence estimates lower than expected, but observed sex-and agespecific epidemiological patterns were concordant with previous research.
Many chronic diseases can reliably be identified using medication data; consequently exclusively medicationbased identification methods have been developed [36][37].Our results were in line with this; nearly 90% of chronic condition cases could be identified via MED.MED was most important for the identification of chronic pain, acidity-related stomach problems and chronic constipation, but identification of most other chronic conditions strongly depended on MED data too.
Medication-based chronic disease identification, however, may introduce false-positives due to prevention, overtreatment, off-label treatment or as-needed prescriptions in specific cases.For example, primary preventive use of lowdose aspirin is common and may cause overestimation of obstructive arteriosclerotic disease [38].Overtreatment with proton pump inhibitors is also common in Switzerland and has most likely inflated the prevalence estimate of acidity-related stomach problems in our study [39].Likewise, off-label prescribing can also cause overestimation of prevalence even if it is medically justified.An obvious example in our study was the misclassification of 0.1% of women as having benign prostatic hyperplasia; the most plausible explanation here would be off-label use of selective α1 receptor antagonists for kidney stones [40].Finally, medication prescribed as-needed for conditions with paroxysmal, non-chronic patterns may also lead to an overestimation of chronic conditions.This effect may have contributed to the high prevalence of chronic pain and acidity-related stomach problems, as pain medication and proton pump inhibitors are more likely to be prescribed on an as-needed basis than other drugs.
In addition to such false-positives, false-negative identification of chronic diseases based on the EMR-C MED may also occur, especially when drugs used in specific case definitions are typically prescribed in specialised settings only.This constellation occurs with chemotherapeutics or monoclonal antibodies typically prescribed in secondary or tertiary care and explains why this EMR-C is prone to underestimate cancer and autoimmune diseases in primary care.Furthermore, identification of chronic diseases based on medication alone may also fail to detect chronic diseases in mild cases or early stages when non-pharmacological management is still an option (e.g.obesity or early stages of type 2 diabetes mellitus) and thus leads to underestimation of their prevalence.Lastly, patients without medication prescriptions may differ in their socioeconomic status compared to those receiving medication, and case identification depending on the EMR-C MED may be biased accordingly [41,42].
The value of the EMR-Cs CLP and RFE was limited both in terms of their overall contributions to case identification (c = 22.1% and 19.3% for CLP and RFE, respectively) as well as their exclusive contributions complementing other EMR-Cs (c e = 6.9% and 4.7%).Among all chronic conditions, case identification of chronic kidney disease most strongly depended on laboratory test results (c = 100% for CLP), as no specific medication could be operationalised, nor is there a sufficiently specific ICPC-2 code for identifying chronic kidney disease.While considering CLP and RFE may reduce some of the above-mentioned risks of bias linked to medication prescribing, laboratory testing in particular may still be related to disease severity and socioeconomic status, depending on the healthcare setting [42].Identification based on RFE is likely to be less susceptible to such selection bias.However, our data showed highly incomplete RFE coding by general practitioners in the FIRE network, making this EMR-C per se very unreliable for chronic disease identification.The issue of low coding performance by general practitioners has been documented before; in the context of primary care database research using case definitions for chronic disease identification, it is most problematic when other EMR-Cs, especially MED, are missing to fill this gap [43][44][45].In particular, incomplete coding of cancer and heart disease is well known and confirmed by our low prevalence estimates (p = 1.2% for cancer, 1.3% for heart disease), and the high relative exclusive contributions of RFE to identifying such cases highlights the problem (c e = 59.9% and 42.5%).In the context of RFE coding, the ICPC-3 system should be mentioned, which, in addition to function-related information, also includes conditions missing in ICPC-2, such as chronic kidney disease [46].It remains to be seen whether the higher level of de-

Original article
Swiss Med Wkly.2023;153:40107 tail on the one hand, but the expected higher coding effort on the other, will improve case identification of chronic conditions in primary care databases.
Incomplete documentation is also problematic if vital parameters are used for chronic disease identification: as in other primary care databases, we assume that the body mass index (BMI) may be incompletely documented by general practitioners in the FIRE network, leading to underestimation of both anorexia and obesity whose definitions strongly depend on this measure [47][48].
While a high specificity of our definitions was achieved by stipulating sufficient criteria for each individual EMR-C, sensitivity was increased by logical disjunction over multiple EMR-Cs.However, sensitivity likely remained the most important limitation [17].To assess its severity, external studies on chronic disease prevalence in Swiss general practice are a useful comparison standard.A recent study by Excoffier et al. surveyed a sample of 118 Swiss general practitioners on chronic diseases of 25 consecutive patients per general practitioner, and produced prevalence estimates for multiple chronic diseases in Swiss general practice [49].
Our results differed to varying degrees from the prevalences found in this study: prevalence estimates were similar in hypertension, dyslipidaemia and diabetes mellitus (i.e., p = 27.5% vs 32%, 13.5% vs 12% and 6.6% vs 10%, respectively).Case identification of hypertension and diabetes mellitus depended on definitions that had been validated in other primary care databases and required only minor adaptations in the FIRE primary care database [25,50].Substantial underestimation appeared in obesity where case definitions are known to lack sensitivity (p = 5.2% vs 15%) [51].On the other hand, we also found several higher prevalence estimates compared to Excoffier et al., most notably of chronic pain (20.2% vs 9%).Interestingly, similar primary care database-based studies also estimated the prevalence of chronic pain to be about 20%, potentially an indication of a systematic error inherent in case definitions for chronic pain [52][53][54].
Lastly, compared with another study by Tomonaga et al. in Swiss general practice, the prevalence of chronic kidney disease in our study was rather low (p = 2.5% vs 18%) [55].Given this discrepancy, our case definition (which depends exclusively on laboratory test results) appears to suffer from poor sensitivity.
While the accuracy of prevalence estimates is questionable for most chronic conditions, our results mirrored known demographic patterns with remarkable precision.First, most prevalence estimates increased with age with the expected exception of migraine which is known to remit in the sixth decade, exactly as seen in our results [56].And second, all sex differences in our study were consistent with prior knowledge (i.e. higher prevalence among females of chronic pain [57], mental disorders [foremost depression] [58], osteoarthritis [59], thyroid disorders [60], irritable bowel syndrome [61], migraine [62] and osteoporosis [63]; and higher prevalence among males of hypertension [64], dyslipidaemia [65], obstructive arteriosclerotic disease [66], diabetes mellitus [67] and gout [68]).
Assuming that the prevalence of frequent chronic diseases is distributed similarly among general practitioners in real life, between-general practitioner-dispersion of prevalences as estimated in our study is another method for assessing plausibility of results and understanding methodological implications and risks of bias.In this respect, it is worth noting that the identification of frequent chronic conditions with the highest prevalence variabilities among general practitioners in our study (chronic kidney disease [qcod = 0.76], cancer [0.70] and obesity [0.59]) depended on case definitions that did not -or not predominantlyexploit the EMR-C MED.This observation suggests that well known variabilities between general practitioners in laboratory testing and documentation practice may translate into biased prevalence estimates in primary care database-based research [69][70].In this context, the EMR-C MED may act as a moderator, provided that there is less dispersion among general practitioners in drug prescribing than in other activities reflected in EMR-Cs.This might not be the case for the prescription of pain medication or proton pump inhibitors, which are themselves subject to significant between-general practitioner variation in Switzerland [71][72].Therefore, prevalence estimates of chronic pain and acidity-related stomach problems as measured in our study may have suffered from this source of bias.

Strengths and limitations
Our study advances understanding of using multiple EMR-Cs for case identification of chronic diseases in primary care databases.Our detailed analysis of the contributions of different EMR-Cs to case identification and of their relative importances for prevalence estimation may serve researchers as a look-up document for planning, implementing and evaluating similar approaches in other primary care databases, and for the critical appraisal of prevalence estimates gained from primary care databases using similar methods.
The main limitation of this study was the validation of case definitions: since anonymisation and ethical restrictions precluded manual review of the electronic medical records, the accuracy of our case definitions remains speculative.This applies even to case definitions validated in other primary care databases because documentation, data transfer and storage in the FIRE primary care database might differ from those of other primary care databases, and validation studies are not necessarily transferable.In the context of increasingly widespread research with big datasets, methods to explore and unravel specific areas of concern -as demonstrated in this article -are of growing importance.Newly introduced methodology intended to increase accuracy (such as the concept of permanent and reversible chronic diseases) has yet to be validated and can probably be improved.Our case definitions, however, are easy to replicate and implement in other primary care databases, which should facilitate future validation and optimisation.

Conclusions
Our analysis of the relative importances of commonly used EMR-Cs for case identification and of the resulting prevalence estimates has provided new insights into the strengths and weaknesses of case definitions applied to the FIRE primary care database.By far, most cases of chron-ic conditions were identified via medication data, but clinical or laboratory parameters as well as ICPC-2 coded reasons for encounters were important for the identification of a subset of chronic conditions.The combination of all EMR-Cs produced a spectrum of prevalence estimates of varying concordance with external data but often underestimating presumed chronic condition prevalences in general practice.While sensitivity was the principal limitation to accuracy, our results matched known sex-and age-specific prevalence patterns, which points to the potential of the FIRE primary care database for the study of epidemiological trends of the more common chronic conditions considered in this study.

Figure 1 :
Figure 1: Notation used for defining prevalence estimates and importance metrics.The lower case labels within the regions in the Euler diagram denote the number of cases identified exclusively by the respective electronic medical record components.

Figure 2 :
Figure 2: Prevalence estimates (based on N = 120,608 patients) and importance metrics, shown in (approximately) area-proportional Euler diagrams, with the areas representing the number of cases identified by the respective electronic medical record component(s).p: prevalence estimate; c: relative contribution; c e : relative exclusive contribution; MED: medication; CLP: clinical or laboratory parameters; RFE: reasons for encounters.a including depression, psychotic disorders and anxiety disorders; b including coronary, cerebral and peripheral arteries including arrhythmias and congestive heart disease; c including asthma, chronic obstructive pulmonary disease and chronic bronchitis; d including arrhythmias and congestive heart disease . 2023;153:40107 Swiss Medical Weekly • www.smw.ch• published under the copyright license Attribution 4.0 International (CC BY 4.0)

Figure 3 :
Figure 3: Sex × age group-specific prevalence estimates.Panels represent age group-specific prevalence estimates of female patients (lighter, left) and male patients (darker, right) for the 24 most prevalent chronic conditions in decreasing order of overall prevalence (N = 38,954 (in age group 18-40) + 47,573 (41-64) + 24,175 (65-80) + 9,906 (81-99) = 120,608 patients).a including depression, psychotic disorders and anxiety disorders; b including coronary, cerebral and peripheral arteries including arrhythmias and congestive heart disease; c including asthma, chronic obstructive pulmonary disease and chronic bronchitis; d including arrhythmias and congestive heart disease

table 2 .
The patients had a mean number of 1.54 chronic conditions per patient and a mean number of 2.78 chronic conditions per patient with one or more chronic conditions.