Skip to contents

Epidemiologic studies can suffer from more than one bias. Bias functions in episensr can be applied sequentially to quantify bias resulting from multiple biases.

Following the example in Lash et al., we can use the study by Chien et al.. It is a case-control study looking at the association between antidepressant use and the occurrence of breast cancer. The observed OR was 1.2 [0.9–1.6].

chien <- matrix(c(118, 832, 103, 884),
                dimnames = list(c("BC+", "BC-"), c("AD+", "AD-")),
                nrow = 2, byrow = TRUE)
AD+ AD-
BC+ 118 832
BC- 103 884

Records on medication use differed between participants, from pharmacy records and self-reported use, leading to misclassification:

## Loading required package: ggplot2
## Thank you for using episensr!
## This is version 2.0.0 of episensr
## Type 'citation("episensr")' for citing this R package in publications.
seq_bias1 <- chien %>%
    misclass(., type = "exposure",
             bias_parms = c(24/(24+19), 18/(18+13), 144/(144+2), 130/(130+4)))
seq_bias1
## 
## ── Observed data ───────────────────────────────────────────────────────────────
## • Outcome: BC+
## • Comparing: AD+ vs. AD-
## 
##     AD+ AD-
## BC+ 118 832
## BC- 103 884
##                                        2.5%     97.5%
## Observed Relative Risk: 1.1012443 0.9646019 1.2572431
##    Observed Odds Ratio: 1.2172330 0.9192874 1.6117443
## ── Bias-adjusted measures ──
##                                                              2.5%    97.5%
## Misclassification Bias Corrected Relative Risk: 1.256946                  
##    Misclassification Bias Corrected Odds Ratio: 1.628058 1.113458 2.380487

Controls and cases also enrolled into the study at different rates. By storing the result of misclass() in an object named here seq_bias1, we have access to the various elements returned by the function (see the help() for a given function to know what is returned). The bias-adjusted cell frequencies can be accessed as corr_data.

seq_bias1$corr_data
##          AD+      AD-
## BC+ 192.8332 757.1668
## BC- 133.5114 853.4886

This 2-by-2 table will be used as starting values for the next sequential bias adjustment, selection().

seq_bias2 <- seq_bias1$corr_data %>%
    selection(., bias_parms = c(.734, .605, .816, .756))
seq_bias2
## ── Observed data ───────────────────────────────────────────────────────────────
## • Outcome: BC+
## • Comparing: AD+ vs. AD-
## 
##          AD+      AD-
## BC+ 192.8332 757.1668
## BC- 133.5114 853.4886
##                                      2.5%    97.5%
## Observed Relative Risk: 1.256946 1.132669 1.394858
##    Observed Odds Ratio: 1.628058 1.278900 2.072541
## ── Bias-adjusted measures ──
##                                                 
## Selection Bias Corrected Relative Risk: 1.172097
##    Selection Bias Corrected Odds Ratio: 1.448430

The adjusted OR is now 1.45 Again, the bias-adjusted cell frequencies can be used in the next bias analysis, confounders().

seq_bias2$corr_data
##          AD+      AD-
## BC+ 262.7156 1251.515
## BC- 163.6169 1128.953

The association between antidepressant use and breast cancer was adjusted for various confounders (race/ethnicity, income, etc.). None of these confounders were found to change the association by more than 10%. However, for illustration, we can add the effect of a potential confounder (e.g. physical activity):

seq_bias3 <- seq_bias2$corr_data %>%
    confounders(., type = "OR", bias_parms = c(.8, .299, .436))
seq_bias3
## ── Observed data ───────────────────────────────────────────────────────────────
## • Outcome: BC+
## • Comparing: AD+ vs. AD-
## 
##           AD+       AD-
## BC+  262.7156 1251.5153
## BC-  163.6169 1128.9532
##                                        2.5%    97.5%
##         Crude Odds Ratio: 1.448430 1.172758 1.788903
## Odds Ratio, Confounder +: 1.406219                  
## Odds Ratio, Confounder -: 1.406219
## ── Bias-adjusted measures ──
##                                        OR due to confounding
## Standardized Morbidity Ratio: 1.406219              1.030018
##              Mantel-Haenszel: 1.406219              1.030018

The serially bias-adjusted OR is 1.406. And the adjusted cells by confounders, for regular exercise:

seq_bias3$cfder_data
##          AD+      AD-
## BC+ 66.83851 478.2302
## BC- 48.92144 492.2236

And for no regular exercise:

seq_bias3$nocfder_data
##          AD+      AD-
## BC+ 195.8771 773.2851
## BC- 114.6954 636.7296

The same process can be realized in a probabilistic framework, as each probsens(), probsens.sel() and probsens_conf() provides the adjusted cell frequencies as A1, B1, C1, D1 in sim_df.

Exposed Not exposed
Cases A1 B1
Controls C1 D1
mod1 <- chien %>%
    probsens(., type = "exposure",
             seca = list("trapezoidal", c(.45, .5, .6, .65)),
             seexp = list("trapezoidal", c(.4, .48, .58, .63)),
             spca = list("trapezoidal", c(.95, .97, .99, 1)),
             spexp = list("trapezoidal", c(.96, .98, .99, 1)),
             corr_se = .8, corr_sp = .8)
##  Calculating observed measures
## ⠙ Assign probability distributions
##  Assign probability distributions [15ms]
## 
## ⠙ Simple bias analysis
##  Simple bias analysis [21ms]
## 
## ⠙ Incorporating random error
##  Incorporating random error [42ms]
## 
str(mod1)
## List of 7
##  $ obs_data    : num [1:2, 1:2] 118 103 832 884
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "BC+" "BC-"
##   .. ..$ : chr [1:2] "AD+" "AD-"
##  $ obs_measures: num [1:2, 1:3] 1.101 1.217 0.965 0.919 1.257 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] " Observed Relative Risk:" "    Observed Odds Ratio:"
##   .. ..$ : chr [1:3] " " "2.5%" "97.5%"
##  $ adj_measures: num [1:4, 1:3] 1.071 1.076 1.147 1.155 0.963 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:4] "Relative Risk -- systematic error:" "                      total error:" "   Odds Ratio -- systematic error:" "                      total error:"
##   .. ..$ : chr [1:3] "Median" "p2.5" "p97.5"
##  $ sim_df      :'data.frame':    1000 obs. of  27 variables:
##   ..$ seca   : num [1:1000] 0.491 0.568 0.46 0.573 0.601 ...
##   ..$ seexp  : num [1:1000] 0.445 0.534 0.421 0.496 0.534 ...
##   ..$ spca   : num [1:1000] 0.971 0.977 0.973 0.973 0.97 ...
##   ..$ spexp  : num [1:1000] 0.984 0.985 0.973 0.976 0.983 ...
##   ..$ A1     : num [1:1000] 195 177 214 169 156 ...
##   ..$ B1     : num [1:1000] 755 773 736 781 794 ...
##   ..$ C1     : num [1:1000] 204 170 194 168 166 ...
##   ..$ D1     : num [1:1000] 783 817 793 819 821 ...
##   ..$ prevca : num [1:1000] 0.194 0.177 0.233 0.186 0.152 ...
##   ..$ prevexp: num [1:1000] 0.208 0.176 0.187 0.157 0.154 ...
##   ..$ ppvca  : num [1:1000] 0.801 0.844 0.839 0.828 0.78 ...
##   ..$ ppvexp : num [1:1000] 0.882 0.883 0.783 0.794 0.85 ...
##   ..$ npvca  : num [1:1000] 0.888 0.913 0.856 0.909 0.931 ...
##   ..$ npvexp : num [1:1000] 0.871 0.908 0.88 0.912 0.92 ...
##   ..$ ab     : num [1:1000] 175 159 254 172 149 176 212 182 199 202 ...
##   ..$ bb     : num [1:1000] 775 791 696 778 801 774 738 768 751 748 ...
##   ..$ cb     : num [1:1000] 207 177 199 172 166 162 223 195 181 216 ...
##   ..$ db     : num [1:1000] 780 810 788 815 821 825 764 792 806 771 ...
##   ..$ corr_RR: num [1:1000] 0.919 0.958 1.196 1.024 0.958 ...
##   ..$ corr_OR: num [1:1000] 0.851 0.92 1.445 1.048 0.92 ...
##   ..$ rr_se_b: num [1:1000] 0.0612 0.0629 0.0499 0.0597 0.0646 ...
##   ..$ or_se_b: num [1:1000] 0.115 0.12 0.108 0.119 0.123 ...
##   ..$ z      : num [1:1000] 0.397 -0.718 -2.71 2.796 0.379 ...
##   ..$ tot_RR : num [1:1000] 0.897 1.002 1.369 0.866 0.935 ...
##   ..$ tot_OR : num [1:1000] 0.813 1.003 1.937 0.751 0.878 ...
##   ..$ syst_RR: num [1:1000] 0.996 1.049 1.089 1.027 0.985 ...
##   ..$ syst_OR: num [1:1000] 0.992 1.101 1.187 1.054 0.97 ...
##  $ reps        : num 1000
##  $ fun         : chr "probsens"
##  $ warnings    : NULL
##  - attr(*, "class")= chr [1:3] "episensr" "episensr.probsens" "list"

Each of the lines from mod1$sim_df[, 5:8] can then be tabulated and fed to the next bias function. Be careful, as the number of observations can quickly become unmanageable.

head(mod1$sim_df[, 5:8])
##         A1       B1       C1       D1
## 1 195.2980 754.7020 204.2431 782.7569
## 2 176.7932 773.2068 169.7718 817.2282
## 3 213.7715 736.2285 194.0093 792.9907
## 4 168.7198 781.2802 167.8590 819.1410
## 5 156.1432 793.8568 166.3320 820.6680
## 6 143.5092 806.4908 154.8496 832.1504