Estimating the incidence of traumatic spinal cord injuries in Switzerland Use of administrative data to identify potential coverage error in a cohort study

INTRODUCTION: Inferences from population-based cohort studies may be inaccurate as a result of biased coverage of the target population. We investigated the presence of absolute coverage error and selection bias in the Swiss Spinal Cord Injury (SwiSCI) cohort study, using a secondary, nationally representative data source. The proposed methodology is applicable to future Swiss cohort studies aiming to assess their coverage error.


Introduction
Population-based cohort studies are vulnerable to incomplete and biased coverage of the target population.This so-called coverage error can seriously challenge study representativeness and generalisability, and thereby its validity as an evidence base [1,2].Quantification of the uncertainty of epidemiological indicators when using population-based cohorts is necessary in order to arrive at sound epidemiological conclusions.Ideally, an operational, full-coverage disease registry would be used to assess coverage error, given its superiority in case identification.However, registries are not always available and are also potentially subject to bias, notably nonparticipation bias [3].Administrative data are an alternative and valuable resource for quantifying coverage error, given their routine and comprehensive collection [4].Although they can be somewhat limited in terms of the capacity for disease classification [4], the use of sensitivity analyses, supported by rational classification scenarios, could provide an adequate and efficient method to address issues of uncertainty.
One example of a population-based cohort study is the Swiss Spinal Cord Injury (SwiSCI) cohort study, including a medical records study covering all spinal cord injury (SCI) specialised rehabilitation centres in Switzerland [5].Albeit rare, spinal cord injuries are a life-threatening condition, with long-lasting neurological implications.In comparison with other neurological conditions, the associated economic costs of SCI are 2 to 20 times higher [6] the socioeconomic burden on an individual and their community reinforces the importance of primary prevention.In order to help inform national prevention policy, a recent effort was made by Chamberlain et al. to estimate the first reliable and contemporary incidence rates (IRs) of traumatic SCI (TSCI) admitted to specialised rehabilitation in Switzerland, using data collected from the SwiSCI cohort study [7].Unfortunately, previous studies that have used nationally collected hospital data in countries similar to Switzerland have found a significant proportion of persons with TSCI to be discharged to institutions other than specialised rehabilitation centres [8].This suggests the potential for coverage error in the SwiSCI cohort study; IRs estimated in this study, given its rehabilitation-based nature, are likely to underestimate the true population risk of TSCI [7].In Switzerland, electronically collected administrative data on 99% of hospitalisations are available [9].These data provide an opportunity to estimate the coverage error in Swiss-based cohort studies, namely the absolute coverage error in terms of study representativeness and systematic error caused by selection bias.Therefore, the purpose of our study was to investigate the presence of absolute coverage error and selection bias in the SwiSCI cohort study using administrative data collected by the Swiss Federal Statistical Office (SFSO).The study can further serve as a case in point for a proposed methodology that future Swiss cohort studies could employ to assess their own coverage error.

Materials and methods
This study focused on traumatic spinal cord injuries within Switzerland, using data collected in 2012 and 2013.Switzerland recently implemented diagnosis-related group (DRG)-based reimbursement; this has been previously shown to improve coding accuracy.Therefore, this study uses only the most recently available data from the Swiss Federal Statistical Office (SFSO), from after the DRG-based reimbursement implementation.

Observational cohort data
The Swiss Spinal Cord Injury (SwiSCI) cohort study is an observational, open cohort study that collects data from four rehabilitation centres geographically distributed across Switzerland.This investigates conditions for SCI patients and comprises three data collection pathways, described elsewhere [5].Both retrospectively and prospectively collected data are included within the SwiSCI study, which is ongoing and includes historical data pre-1960 [5,7].The transition to prospective data collection occurred during the course of 2013 at all specialised rehabilitation centres in Switzerland.The persons eligible for SwiSCI are at least 16 years old, residing in Switzerland, with traumatic or non-traumatic SCI aetiology and who received first rehabilitation in one of four specialised rehabilitation centres in Switzerland (REHAB Basel; Swiss Paraplegic Centre; Balgrist University Hospital; Clinique Romande de Réadaption) [5].The current study included only traumatic cases of SCI collected in 2012 and 2013 of the first phase of SwiSCI, the Medical Records study [5].The SwiSCI study defines TSCI as SCI caused by one of the following: transport activity, sports or leisure activity, fall, other accident cause, and assault [10].

Administrative data
Data on inpatient hospitalisations were obtained from the SFSO.The SFSO hospital discharge data used in this study cover all Swiss health facilities except for birth clinics and psychological institutions.Data are collected annually, and concern patients who have received medical treatment from healthcare professionals for at least 24 hours or who required an over-night stay [11].The SFSO data include individual, anonymised, patient identifiers (limiting inclusion of repeat admissions).The international standard International Classification of Diseases (ICD-10-GM: German modification) codes were used to identify new cases of TSCIs.The SFSO dataset contains around 700 variables; this study used the variables age, sex, Swiss national (yes or no), diagnosis, reason for discharge (e.g., hospital/patient decision, death, transfer to other hospital), hospital type and discharge destination.Three (out of the four) SwiSCI-covered specialised rehabilitation centres allowed identification by the SFSO, thereby facilitating comparisons between the observational cohort data and administrative data, as well as facilitating regression analyses on the likelihood of visiting a specialised rehabilitation centre after TSCI.Data were obtained through the SFSO and approved for use in this study (Reference number: 150399).

General population
Population-based data of permanent residents, stratified by age, sex and year were used as the denominator to calculate incidence rates per one million population (PMP).These data are available from the SFSO and were downloaded from their website.The potential to discriminate between Swiss nationals and non-Swiss within the SFSO dataset was used to include only Swiss nationals in a sensitivity analysis, in order to mimic, to the extent possible, the SwiSCI inclusion criterion (i.e., having a permanent Swiss residence).For calculation of IRs excluding non-Swiss patients (i.e., those without a Swiss passport), alternative data from the SFSO that similarly excludes non-Swiss nationals were used.

Data quality and preparation
For comparison between the data sets, sociodemographic factors (sex and age) and SCI characteristics, including lesion level (paraplegia or tetraplegia), degree (complete or incomplete) of TSCI and segmented levels of lesion (e.g., C1-C4, C5-C8) were used.Paraplegia refers to low lesions; in other words, injury to the thoracic, lumbar or sacral segments of the spinal cord (T1-S5) [12].Tetraplegia refers to high lesions, or injury to the cervical segment of the spinal cord (C1-C8).The degree of SCI refers to the completeness of lesion.Complete injuries are characterised by having no sensory and motor function in the lowest sacral segments (S4-S5) of the spinal cord, whereas incomplete lesions have some sensory or motor function remaining below the level of the lesion [12].Both the level and completeness of injury have implications for biological functioning below the level of the lesion, with complete tetraplegia having the most severe effects.Patient characteristics were classified according to the recommended International Spinal Cord Society (ISCoS) categories [10].TSCI cases were identified by means of ICD-10 codes for damage to the spinal cord at the level of the neck (S.140; S.141), chest (S.240; S.241), and abdominal, lower back and pelvic regions (S.340; S.341; S.343) and non-classified (T.060; T.061; T.093; T.913) (see appendix 1 for a detailed list of codes).Segmented levels of lesion location were obtained from the main and additional diagnoses, and correspond to the 22 separate levels (appendix 1).Lesion severity was inferred from ICD-10 coding as either complete (S14.11,S24.11 and S34.10) or incomplete (S14.12,S14.13, S24.12 and S34.11).

Descriptive statistics
Descriptive analyses and statistical tests were used to evaluate coverage error of the SwiSCI data.We quantified differences in distributions of sociodemographic and SCI-specific characteristics using a two-sample Kolmogorov-Smirnov test (test 1) [13].In a second Kolmogorov-Smirnov test, we additionally excluded patients from the SFSO data known to have visited a SwiSCI-covered centre (test 2).Third, we further excluded the known SwiSCI-covered clinic that did not allow for identification from the SwiSCI data (test 3).These additional Kolmogorov-Smirnov tests were done in order to ensure independent samples.For Kolmogorov-Smirnov testing, age was used as a continuous variable.

Sensitivity analysis
Sensitivity analyses are needed when using administrative data given that previous research, investigating use of administrative data to identify cases of chronic or acute diseases, found administrative data to tend to overestimate cases due to coding inaccuracies [14,15].Therefore, various restriction criterion were employed in analyses to account for potential inaccuracies of coding and case identification of TSCIs within SFSO data.These criterion were selected based on evidence from previous research to optimise identification of true cases of TSCI, and also to improve comparability with SwiSCI data [14,16,17].The criterion and their reasoning are as follows (see also table 1): -Criterion A applied to all TSCI-related ICD-10 codes without further restrictions (see appendix 1).-Criterion B was based on a study by Hagen et al. [14], who identified a selection of seven ICD-10 codes that jointly showed a relatively high level of sensitivity (0.83), specificity (0.97) and positive predictive value (0.88) to identify new cases of TSCI.-Criterion C, in addition to for the exclusions in criterion B, excluded bruises and oedema in the spinal cord, given their potentially transient nature [8].-Criterion D added onto the specifications for criterion C, by further excluding cases within the SFSO data without a Swiss passport.This is done in an effort to approximate SwiSCI inclusion criteria (i.e., those persons with a permanent resident in Switzerland); the SFSO data do not facilitate direct discrimination between permanent Swiss residents and non-residents.
-Criterion E includes only those cases identified with a TSCI-related ICD-10 code as the main diagnosis; a previous study has found inclusion of only cases with a main diagnosis code of interest to reduce overestimation when using administrative data [17].-Criterion F includes only those cases identified with a TSCI-related main diagnosis and only those cases identified using the seven-code selection criteria described for Criterion B.

Regression analysis
We performed three logistic regression analyses to evaluate predictors of attendance to specialised rehabilitation (yes or no).The first model included only those sociodemographic variables and variables with limited risk of non-differential misclassification: age, sex, and year of TSCI.The second model additionally included lesion level (paraplegia or tetraplegia), given that this variable is considered to be at risk for non-differential misclassification due to coding inconsistencies within patient records (see appendix 2).In the third model, a variable further specifying the segmented levels of lesion (e.g., C1-C4, C5-C8) was included as a further specification of the broader groups of lesion level (i.e., paraplegia or tetraplegia).Post-hoc analysis using Bonferroni testing was used to detect differences between multi-level groupings.Completeness of lesion was not included as an independent variable given the large amount of missing information and associated concerns regarding information bias and unmeasured confounding (see appendix 3).As more than 20% of information regarding completeness of lesion and level of TSCI in SFSO data was missing, no form of imputation was performed to estimate these missing values.All regressions used patients identified using criterion B to improve inclusion of only those with a true TSCI.

Incidence rates
Age-and sex-specific incidence rates (IRs) per one million population were calculated with use of the Swiss population data for the years 2012 and 2013 stratified by age and sex [10].Given that TSCI is a relatively rare event and presumably independent of other new cases, Poisson regression was used to estimate annual incidence rates per million population including an interaction term between age and sex [18].Incidence rates reported by lesion level reflect stratified rates adjusted for the underlying age and sex population structure.Inverse probability weighting was used to account for missing data of lesion level in incidence rate calculations.Incidence rates were calculated for the SFSO data using various restriction criteria for case identification in order to provide a possible range within which the "true" incidence is included.All data management and analyses were performed using STATA Version 14.2 for Windows.

Comparison of SFSO data with SwiSCI data
Comparisons between SwiSCI data and the SFSO data revealed no differences when including all cases identified using criterion A and known SwiSCI-covered centres (table 2, test 1).This pattern remained when only cases identified with criterion B were used.When known SwiSCI-covered specialised rehabilitation centres were excluded, the proportion of paraplegia, level of lesion and completeness of lesion remained similar across SFSO and SwiSCI datasets (fig.2; table 2).However, notable differences between the two datasets were observed for gender and age at TSCI, with the SwiSCI study including a smaller proportion of women and individuals with a younger age at time of TSCI (fig.2; table 2).Furthermore, exclusion of known SwiSCI-covered centres caused the average age of the SFSO population to increase slightly from 52.5 years of age to 55.6 years (p = 0.003) (table 2, test 2).There was further a tendency for SwiSCI data to include a greater proportion of complete injuries (fig.2; table 2).

Regression analysis
Regardless of the model used, the youngest group (age 16-30 years), and persons with the highest lesion levels (C1-C4) were more likely to have visited a specialised rehabilitation centre.For example, with selection criterion B, model three, persons aged 76 years or older were nearly six times less likely to have visited a specialised rehabilitation centrr as compared with those between 16 and 30 years of age (odds ratio [OR] 0.13, 95% CI 0.05-0.33)(table 4).Post-hoc testing found that the groups including individuals aged between 61 and 75 years and individuals older than 75 years were significantly different from the youngest age group (table 4).Similarly, persons with the lowest lesion level (L1-S5) were more than four times less likely to visit a specialised rehabilitation centre as compared with those with a high cervical lesion (C1-C4) (OR 0.21, 95% CI 0.09-0.50)(table 4).In the base model and model two, men were more likely to visit specialised rehabilitation, but this relationship became nonsignificant when segmented level of lesion was additionally included in the model (table 4).Similarly, persons with an incident TSCI occurring in 2013 were less likely to visit specialised rehabilitation (as compared with 2012) in the base model and model two, but this relationship also became non-significant upon inclusion of segmented level of lesion (table 4).When cases identified with selection criterion A were included, these relationships did not change, but were slightly weaker.

Discussion
Using routinely collected administrative data, this study found a coverage error in the SwiSCI cohort study both in absolute terms and in relation to selection bias.Comparisons between the distributions of study characteristics of the administrative and cohort data demonstrated notable differences with respect to age, gender and, tentatively, completeness of lesion.The overall estimated incidence rate of TSCI ranged between 19.9 and 49.7 pmp.Higher IRs were observed for males, the elderly and paraplegia.However, regression analyses found people of male sex, younger age, and higher lesion level to be more likely to visit a specialised rehabilitation centre.Together, these results suggest a likely coverage error in the SwiSCI Medical Records study.

Coverage errorabsolute
This study quantified the absolute coverage error in the SwiSCI cohort study, which affects the estimation of the overall IR of TSCI in Switzerland when using only data from the SwiSCI Medical Records study.For Switzerland between 2005 and 2009, the rehabilitation-based IR was 18 pmp [7].This study estimated an IR of nearly 50 pmp with use of the least restrictive criterion, and 20 pmp with the most restrictive criterion.Regardless of the criterion used, the previous SwiSCI rehabilitation-based IR point estimate is not included within the range of IRs estimated from SFSO data; this suggests the presence of coverage error in the SwiSCI study in absolute terms.
Comparisons with similar studies that identified TSCI cases using ICD-coded administrative data show that the IR estimates of this study are within the range of reported estimates [8,19,20].A recent study in the Netherlands that assessed ICD-10 coding accuracy for case identification found that roughly 50% of patients identified using ICD-10 coding corresponded to a true case of TSCI [8].
Assuming that the present study includes coding inaccuracies similar to those observed in the Netherlands, this would suggest an IR of roughly 25 pmp in Switzerland.
Results from previous studies with comparable methodology (criterion E) gave similar estimates [21].However, in order to substantiate this estimate scenario a follow-up study is needed in which the medical charts of all potential TSCI cases identified by ICD-10 coding are reviewed by medical professionals experienced in SCI diagnosis and care [8,14].Coverage errorselection bias As well as an absolute coverage error, this study observed evidence of selection bias within SwiSCI as key groups including women, the elderly and those with very low lesions (L1-S5)appear to be underrepresented.Such discharge patterns have been reflected in previous studies [8,22].The observed discharge patterns could be partially due to rehabilitation policy to preferentially provide specialised rehabilitation to individuals with a high capacity to regain functioning, particularly to return to work.Both age and severity of injury have been found to influence return to work [23].Specialised rehabilitation uses an interdisciplinary approach, notably including occupational therapists who aid in work reintegration or reeducation to promote labour market participation.Therefore, individuals with a higher perceived likelihood to return to work may be more likely to attend specialised rehabilitation [24].
The selection bias identified in SwiSCI could affect future cohort-based estimates of functioning and other health outcomes that vary according to age, gender and severity of lesion [25].This consequence has been demonstrated in previous studies comparing populationbased cohorts with hospital-based cohorts in order to determine the effect of selection bias on risk of mortality and life expectancy, such as in the case of strokes [2,26].
Given the potential repercussions of selection bias in cohort studies, it is imperative to identify and understand the cause of such biases to prevent erroneous conclusions.
It is also important to evaluate the effect of selection bias on epidemiological indicators in the light of potential inequity in access to optimal health.An understanding of the capacity of specialised rehabilitation to serve as a secondary form of prevention (i.e., to further reduce complications, prevent premature mortality, and so on) is key to informing future health interventions.The interdisciplinary approach in specialised rehabilitation centres allows for improved management of spinal cord injuries, aimed at reducing complications and facilitating rehabilitation and community integration [27].Furthermore, extant literature indicates the capacity of specialised rehabilitation to improve mortality outcomes, reduce length of stay, improve neurological recovery and reduce morbidities (e.g., pressure ulcers, respiratory complications) [28][29][30].To understand the extent to which selection bias of discharge to a specialised rehabilitation centre can affect health outcomes, contemporary studies in SCI that firmly delineate the benefits of specialised rehabilitation over nonspecialised care or general rehabilitation are needed.Furthermore, to understand interactions for secondary prevention, future studies need to take into account the potential influence of sociodemographic and SCI-specific characteristics that could influence admission to specialised rehabilitation.

Strengths and limitations
In Switzerland, administrative data has previously been used to identify spinal cord injuries and other trauma-related events [31,32], as well as ambulatory care-sensitive conditions (e.g., influenza, asthma, diabetes) [33].However, this is the first studyto our knowledgethat compared a Swiss cohort study with administration-based population statistics in order to understand and quantify the true representativeness of a study.The SwiSCI study covered all SCI specialised rehabilitation clinics in Switzerland, and thus provided accurate and reliable specialised rehabilitation-based epidemiological indicators (e.g., IR) for comparisons.One of the strengths of the use of the hospital discharge data collected by the SFSO is that it is nationally representative, covering nearly 98% of admitted cases in 99% of Swiss hospitals [9].Within this dataset each patient has a unique, non-identifiable, identification number that allows for tracking across 2year periods and thus removal of duplicate IDs within each 2-year subset.From this dataset, the present study also used data coded with the ICD-10, which previous research has found to be superior to older ICD versions [34].In addition, this study used data collected after the introduction of DRG-based reimbursement in Switzerland, which probably improved coding accuracy [34], admission to rehabilitation and, potentially, care received [35].Finally, a major strength of this study is the use of selection criteria informed by previous studies and literature that serve as sensitivity analyses and that, therefore, provide a range to help specify the level of uncertainty in the data used.Although the use of nationally representative administrative data is a strength of this study, it is also a limitation, given potential coding inaccuracies of this data source.The impact of these inaccuracies is difficult to predict, but could potentially lead to inaccurate estimates of the overall incidence or number of complete lesions.A previous Norwegian study found that, out of 1080 patients identified as having a potential TSCI (defined using ICD coding), only 24% really had a TSCI; however, this study included a mixture of ICD-8, ICD-9 and ICD-10 coding [14].In a Canadian study [15], although the positive predictive value of using ICD-10 coding was found to be superior than that reported by the Norwegian study [14], it was found that incomplete lumbar and thoracic spinal cord injuries were often miscoded as being complete, and that 10.9% of true TSCIs were missed using only administrative data.Another limitation of the present study is the identification of lesion level of TSCI using ICD-10 coding.Accurate assessments of the level and severity of a spinal cord injury require use of the American Spinal Injury Association (ASIA) Impairment Scale (AIS), which is co-dependent on completeness of lesion.Determination of an AIS score involves a detailed assessment of motor and sensory impairment and is therefore a time consuming, labour-intensive and costly process requiring specialist training [12,36].Evidence from a previous SwiSCI study show that a substantial portion of the reported lesion levels were not assessed with the AIS, as nearly 60% of persons admitted to first rehabilitation did not have a neurological examination during acute care [5].Finally, given the limited data, it was not possible to investigate the interplay between lesion level and age within the present study; such interplay was observed in previous studies [7].

Conclusion
Using hospital-based administrative discharge data, this study found absolute coverage error and selection bias in a Swiss-based cohort study including SCI-specialised rehabilitation centres in Switzerland.Administrative data are routinely collected in many high-resource countries and offer a wealth of information related to health.Therefore, regardless of limitations, administrative data remain a valuable resource for future epidemiological studies.In order to address some limitations associated with using administrative-based data sources, a follow-up study that assesses the accuracy of ICD-10 coding within Switzerland to identify cases using ICD-coded datasimilar to those performed in Norway and the Netherlandscould provide a concrete understanding of coding discrepancies [8,14].Results from this study can help inform future SwiSCI-based studies aiming to reliably estimate nationally representative epidemiological indicators while accounting for coverage bias of the target population as part of sensitivity analyses.

Figure 1 :
Figure 1: Selection criterion of ICD-10 diagnosis codes.Flow chart of case identification using hospital administrative data, including case numbers based on selection criteria used.

A
flow chart showing the number of patients retained in each step is shown in figure 1.The SFSO data initially included 2 323 474 hospitalisations (1 353 521 in 2012; 969 953 in 2013), including multiple hospitalisations for individual patients; excluding non-SCI related hospitalisations, 8530 observations were left (fig.1).After exclusion of duplicate patient IDs, chronic SCIs and deaths, 621 cases remained (criterion A).Of these 621 cases, 6.8% (n = 42) were coded at one time as having a transient lesion and at another time as having a nontransient lesion; about 40% of nontransient lesions were first coded as transient.The number of included incident cases of TSCI varied according to the criterion applied, with criterion B including 564 incident TSCI cases, criterion C 419 cases, criterion D 305 cases, criterion E 297 cases and criterion F 251 cases (fig.1).

Figure 2 :
Figure 2: Population-average estimates with 95% confidence intervals of study characteristics.The axis on the left, age in years, corresponds only to the first category: average age.The axis on the right corresponds to the categories: male, paraplegia and incomplete lesion.Circles filled in completely indicate cases identified using criterion A. Half-filled circles indicate cases identified using criterion B. Open circles indicate cases identified in the SwiSCI cohort.

Figure 3 :
Figure 3: Overall annual incidence rate estimates including 95% confidence intervals.The incidence rate point estimate is indicated by a circle; the filled-in circle is the estimated incidence rate in the paper by Chamberlain et al. [7], which used the rehabilitation-based SwiSCI data to estimate the incidence rate.Error bars represent the 95% confidence interval.

Table 1 :
Code selection criteria for sensitivity analyses.

Table 2 :
Characteristics of the SFSO data and SwiSCI data.

Table 2 (
continued) Swiss Federal Statistical Office; SwiSCI = Swiss Spinal Cord Injury cohort study Tests are unadjusted for other variables.Missing values not included within calculations of percentages (%).* Test 1: including SFSO data identified using criterion A † Test 2: including SFSO data identified using criterion A; excluding known SwiSCI clinics from the SFSO data ‡ Test 3: including SFSO data identified using criterion A; Excluding known SwiSCI clinics from the SFSO data; and excluding from the SwiSCI dataset, the one SwiSCI clinic that did not allow for identification within the SFSO data

Table 3 :
Annually estimated incidence rates per million population using administrative data, according to case identification criteria.All estimates are stratified according to age, sex and year of TSCI; Letters relate to overall incidence rates displayed in figure2.* IRs stratified by lesion characteristics and adjusted for underlying population distributions of age and sex.

Table 4 :
Logistic regression of characteristics associated with discharge to a specialised rehabilitation facility.