Statistical analysis and reporting : common errors found during peer review and how to avoid them

When performing statistical peer review for Swiss Medical Weekly papers there often appear to be common errors or recurring themes regarding the reporting of study designs, statistical analysis methods, results and their interpretation. In order to help authors with choosing and describing the most appropriate analysis methods and reporting their results, we have created a guide to the most common issues and how to avoid them. This guide will follow the recommended structure for original papers as provided in the guidelines for authors (http://blog.smw.ch/what-smw-hasto-offer/guidelines-for-authors/), and provide advice for each section. This paper is intended to provide an overview of statistical methods and tips for writing your paper; it is not a comprehensive review of all statistical methods. Guidance is provided about the choice of statistical methods for different situations and types of data, how to report the methods, present figures and tables, and how to correctly present and interpret the results.


Introduction
When performing statistical peer review for Swiss Medical Weekly papers there often appear to be common errors or recurring themes regarding the reporting of study designs, statistical analysis methods, results and their interpretation.In order to help authors with choosing and describing the most appropriate analysis methods and reporting their results, we have created a guide to the most common issues and how to avoid them.This is not intended to provide advice on study design; once a study has been completed and the paper submitted for peer review the design cannot be altered.Good statistical analysis cannot benefit a poorly designed study and it is recommended that assistance in designing the study is sought from a statistician.An excellent textbook on study design that covers the design of, and sample size calculations for different study designs, including randomised controlled trials, cross-sectional, cohort and case-control studies, as well as surveys is provided by Machin and Campbell [1].Not all studies will require sample size calculations, for example, pilot or small-scale feasibility studies which are the first assessment of a treatment in a particular setting and are used to collect data to inform the design of a larger study.However, sample size calculations should be undertaken for a randomised controlled trial to ensure that it has sufficient statistical power to detect an effect in the primary outcome of interest.An introduction to sample size calculations is provided by Noordzij et al. [2].This guide will follow the organisation for original papers as provided in the guidelines for authors (http://blog.smw.ch/what-smw-has-to-offer/guidelines-forauthors/), and provide advice for each section.Authors should make sure that they provide a clear statement of the study design and ensure that their reporting follows the recommended reporting guidelines for that design, as provided by the EQUATOR network (http://www.equatornetwork.org/).Other papers and text books providing guidance on statistical analysis and reporting are available [3][4][5][6][7] including previous guides published in Swiss Medical Weekly [8][9][10].

Introduction
Please provide a clear aim.A common problem is that the aim of the study is not very clear, or appears to differ from the aim addressed by the results and discussion.Use the PICOS framework as a guide, which covers: P population under evaluation; I intervention(s) being assessed; C comparators; O outcomes; S study design.Also state why there is a need for your study; maybe there is a lack of research in a particular area, or a clear need for additional evidence.Make sure your research is original and not repeating previous work.

Material and methods
-If possible report the hypotheses under evaluation in the analysis.If there were no pre-specified hypotheses and the analysis is exploratory then make this clear; data dredging should be avoided.-Outcomes: provide a separate section detailing all the study outcomes, how they were measured, when and by whom (as appropriate).Split it into primary and secondary outcomes if relevant, especially for a clinical trial.All outcomes need to be listed to prevent outcome reporting bias (only reporting those outcomes which show statistically significant or favourable results).-Details of the patients such as the number included in the study, age and gender are results, not methods and should be part of the description of the data in the first part of the results section.

Statistical methods
The statistical methods section is often poorly reported.Details of all statistical tests and models should be reported in sufficient detail to enable the reader to understand what has been done.All analysis methods should be reported, the outcomes being analysed and which comparisons are being made.Details of how the results are reported should also be given.For example, quality of life data are summarised using means and standard deviations, results from logistic regression models are reported as odds ratios with 95% confidence intervals (CI).All analyses listed in the methods should have a corresponding set of results and vice versa, it is quite common to find results being reported which have not been previously mentioned in the methods section.The number of statistical tests or analyses should be kept to a minimum and ideally pre-specified in order to avoid multiple hypothesis testing.I did once review a paper that had more statistical tests than participants!This section is split into tips regarding the choice of analysis method, and how to report them.

Choosing an appropriate statistical analysis method
A summary of the statistical analysis methods applicable to continuous and categorical data and different numbers of groups is presented in table 1 (adapted from Petrie [11]).
Other issues are discussed below, this is not intended to be a complete list, but covers the main points arising from the statistical review of recent submissions.Before performing any statistical analysis it is important to summarise the data, and assess any underlying assumptions required by the statistical tests.

Descriptive statistics
Descriptive statistics should be used to summarise the data, especially the characteristics of the study population.Continuous data should be summarised using means and standard deviations (SD) for normally distributed variables, or medians and ranges (or inter-quartile ranges) if the variable is skewed.Categorical data should be summarised using numbers and percentages.

Parametric versus non-parametric tests
It is the test which is parametric or non-parametric NOT the data.Statements such as 'Non-parametric data are presented as median and range' are incorrect.Analysis methods such as a t test require that the data follow a normal distribution.If this assumption is doubtful then transforming the data (e.g., by taking logarithms) can often help.If data transformation does not improve the distribution or is not appropriate, then use the relevant non-parametric test (see table 1) although note that these have less statistical power (are less likely to detect a true effect).

Correlation and regression
Correlation measures the degree of linear association between two numerical variables, not agreement or 'cause and effect'.For assessing whether one or more variables can predict another regression is needed, correlation and regression are often confused.Correlation analyses should be accompanied by scatterplots so the reader can visualise the patterns of the data and whether there are any outlying values.There are different methods for calculating the correlation coefficient, the two most common are: Pearson (assumes that at least one of the two variables is normally distributed) and Spearman (the non-parametric equivalent which can be used for smaller samples, where one or both are ordinal variables, or when the relationship is non-linear).

Categorising continuous variables
This is often done and should be avoided as it reduces statistical power.The choice of cut-off points could influence the results, especially if they were chosen once data analysis had started.Unless an acceptable clinical categorisation (such as cholesterol lowering thresholds) is being used, continuous variables should be left as they are in regression modelling.

Paired or clustered data
If two measurements are made on each participant such as before and after treatment then it is incorrect to treat these as two separate measurements as the within patient correlation needs to be accounted for.Paired data needs to be analysed with paired tests (see table 1).Clustered data, including repeated measurements over time (such as quality of life) also need to be analysed using methods which account for the fact that there were multiple measurements on the same participant.Options include using a simple summary measure (overall mean, change from baseline to a specified time, the maximum value, or the area under the curve over the whole time period); repeated measures regression; or more complex regression models (multilevel models, generalised estimating equations).

Multivariable regression
Multiple or multivariable regression seems to be less widely used in papers and the peer review process often suggest that this is included in a paper.Multivariable regression should be used to adjust for any variables that differ between groups in an observational study, to adjust treatment estimates in a randomised controlled trial for any known prognostic factors, or to look at the effect of a variable when accounting for the effects of other variables (e.g., age and gender).Specifically analyses of mean change or percentage change from baseline need to adjust for each participant's baseline value (for example reduction in wound area).However, the size of the study needs to be considered in that a multivariable regression would require more data than a simple univariable regression (which contains only one variable).Approximately 10 people with the outcome need to be included for each variable in the model, so an analysis of blood pressure adjusting for age, gender and baseline blood pressure would need to include at least 30 people.
A continuous outcome should be analysed with linear regression, counts or rates with Poisson regression, categorical outcomes with logistic regression and time to event outcomes with Cox proportional hazards regression or a parametric survival model (see below).A helpful guide to the methods and interpretation of multivariable analyses is given by Katz [12].

Survival analysis
Time to event data, such as time to healing or progressionfree survival should be analysed using appropriate survival analysis methods.Using the mean time to event for those who experienced the event is incorrect as this loses information about those who were lost to follow-up or did not experience an event.Survival curves should be plotted and survival can be compared between groups using a log-rank or Wilcoxon test.Regression models such as the Cox proportional hazards model (the underlying proportional hazards assumption should be checked) or parametric models (such as Weibull) can be used to adjust for other variables.

Diagnostic tests
The performance of a diagnostic test or measurement should be compared to a reference or gold standard test or measurement.Ideally all participants should undergo both tests.For a binary outcome (diseased or not diseased) a 2 by 2 table should be presented, from which measures of sensitivity, specificity, positive and negative predictive values with 95% CI can be calculated.For a continuous test score a receiver operating characteristic (ROC) curve can be used and the area under the curve with 95% CI calculated.If one or more cut-off thresholds have been used to calculated sensitivity or specificity these should be clearly reported along with the reasons for their choice.

Reporting analysis methods
It should be clear from the description which variables were analysed with each different analysis method.Vague statements such as 'data were analysed with the chisquared test, t-test and regression' are not helpful, as it is unclear which data were analysed with each method.
-If there was a sample size calculation then report it in sufficient detail to enable it to be replicated by a statistician.This requires information about the type I error (alpha, usually 0.05), type II error (1 -beta, the power often 80% to 90%), the minimum clinically relevant difference (the smallest difference between the groups that would be clinically relevant), and the outcome for the control group based on previous research (the event rate for a dichotomous outcome, or the mean and SD for a continuous outcome).-If there was no sample size calculation but there was some information about the study size then do report this ('no formal sample size calculation was performed but all available patients in two centres were included in the study', or 'this was a pilot study and a sample size calculation was not relevant').-Report full details of how the underlying analysis assumptions were checked (e.g.normal distribution, constant variance between groups, and a linear relationship between two variables for correlation or regression) and how any transformations were performed.-Analyses should, where possible, be accompanied by relevant plots.Scatterplots for correlation, survival curves for time-to-event analyses, boxplots or means with 95% CI for summaries of continuous variables,

or more
Linear or multiple linear regression (for assessing the effect of one or more explanatory variables) Logistic regression (for assessing the effect of one or more explanatory variables) * Non-parametric indicates the equivalent non-parametric test which does not make any assumptions about the distribution of the data.T-tests and ANOVA assume that the data being analysed follow a normal distribution with similar variance in each group.This is not intended to be an exhaustive list, for details of other statistical methods consult a suitable textbook or seek advice from statistician.

Table 1 :
Choosing the correct statistical test.
Kruskal-Wallis (non-parametric*)Chi-squared test for trend (for ordered categories, e.g., mild, moderate, severe pain) Report p-values in full (to 2 or 3 decimal places).Very small values such as p <0.001 can be reported as such but avoid the use of *, **.Do not use 'NS', '>0.05' for results which are not statistically significant.-Forregressionmodels report a measure of the 'goodness of fit' of the model to the data, e.g., R 2 or a Hosmer-Lemeshow test.Make sure it is clear which statistics are being reported, either through labels in the table or as a footnote.For example, 34 (2.8) is the mean and standard error.-Reportthenumber of participants in each group for tables which report descriptive data.Also provide the numbers included in each analysis on all tables and/or figures which contain results.Check that percentages are correct.-Results of regression models should be reported in full in the tables (i.e., regression coefficients and SE, or effect sizes with 95% CI or SE and p-values, for all the terms in each model).-Allfiguresshould have clear titles.-Allfiguresshould have clearly labelled axes with units, and any symbols should be labelled.It is quite common to see symbols on figures without any indication of what they represent.-Donotmake your figures too complicated by including too much information or too many groups.Discussion-Only discuss those results which have been presented in the results section.It is a common error to find extra results in the discussion which haven't previously been reported.-Do not repeat effect sizes and confidence intervals from the results.-Check that all results have been interpreted correctly in terms of the statistical and clinical significance and the direction of effects.
Swiss Medical Weekly • PDF of the online version • www.smw.ch -