This is the code and configuration for cox-ipw
, which is an R reusable
action for the OpenSAFELY framework.
The action:
- Samples data and applies inverse probability weights
- Performs survival data setup
- Checks covariate variation
- Fits the specified Cox model
The arguments/options to the action are specified using the flags style
(i.e., --argname=argvalue
), the arguments are as follows.
Usage: cox-ipw:[version] [options]
Options:
--df_input=FILENAME.CSV
Input dataset csv filename (this is assumed to be within the output directory)
[default input.csv]
--ipw=TRUE/FALSE
Logical, indicating whether sampling and IPW are to be applied [default TRUE]
--sample_exposed=TRUE/FALSE
Logical, indicating whether exposed individuals should be sampled [default
FALSE]
--exposure=EXPOSURE_VARNAME
Exposure variable name [default exp_date_covid19_confirmed]
--outcome=OUTCOME_VARNAME
Outcome variable name [default out_date_vte]
--strata=VARNAME_1;VARNAME_2;...
Semi-colon separated list of variable names to be included as strata in the
regression model [default cov_cat_region]
--covariate_sex=SEX_VARNAME
Variable name for the sex covariate; specify argument as NULL to model without
sex covariate [default cov_cat_sex]
--covariate_age=AGE_VARNAME
Variable name for the age covariate; specify argument as NULL to model without
age covariate [default cov_num_age]
--covariate_other=VARNAME_1;VARNAME_2;...
Semi-colon separated list of other covariates to be included in the regression
model; specify argument as NULL to run age, age squared, sex adjusted model
only [default
cov_cat_ethnicity;cov_num_consulation_rate;cov_bin_healthcare_worker;cov_bin_carehome_status]
--cox_start=VARNAME_1;VARNAME_2;...
Semi-colon separated list of variable names used to define start of patient
follow-up or single variable if already defined [default pat_index_date]
--cox_stop=VARNAME_1;VARNAME_2;...
semicolon separated list of variable names used to define end of patient
follow-up or single variable if already defined [default
death_date;out_date_vte;vax_date_covid_1]
--study_start=YYYY-MM-DD
Study start date; this is used to remove events outside study dates [default
2021-06-01]
--study_stop=YYYY-MM-DD
Study end date; this is used to remove events outside study dates [default
2021-12-14]
--cut_points=CUTPOINT_1;CUTPOINT_2
Semi-colon separated list of cut points to be used to define time post exposure
[default 28;197]
--controls_per_case=INTEGER
Number of controls to retain per case in the analysis [default 20]
--total_event_threshold=INTEGER
Number of events that must be present for any model to run [default 50]
--episode_event_threshold=INTEGER
Number of events that must be present in a time period; if threshold is not
met, time periods are collapsed [default 5]
--covariate_threshold=INTEGER
Minimum number of individuals per covariate level for covariate to be retained
[default 5]
--age_spline=TRUE/FALSE
Logical, if age should be included in the model as a spline with knots at 0.1,
0.5, 0.9 [default TRUE]
--df_output=FILENAME.CSV
Output data csv filename (this is assumed to be within the output directory)
[default results.csv]
--seed=INTEGER
Random number generator seed passed to IPW sampling [default 137]
--save_analysis_ready=TRUE/FALSE
Logical, if analysis ready dataset for Stata should be saved [default FALSE]
--run_analysis=TRUE/FALSE
Logical, if analysis should be run [default TRUE]
-h, --help
Show this help message and exit
This action can be specified in the project.yaml
with its options at
their default values as follows, where you should replace [version]
with the latest tag from
here, e.g.,
v0.0.1
. Note that no space is allowed between cox-ipw:
and
[version]
.
generate_study_population:
run: cohortextractor:latest generate_cohort --study-definition study_definition
outputs:
highly_sensitive:
cohort: output/input.csv
cox_ipw:
run: cox-ipw:[version]
needs:
- generate_study_population
outputs:
highly_sensitive:
analysis_ready: output/ready-*.dta
moderately_sensitive:
arguments: output/args-results.csv
estimates: output/results.csv
Note that the csv file of argument values is automatically named with
args-
prepended to the name of the output data csv file. Hence, both
the output data file and the file of argument values should be listed as
moderately_sensitive
outputs as shown above.
This action can be run specifying arguments as follows (in YAML >
indicates to treat the subsequent nested lines as a single line).
generate_study_population:
run: cohortextractor:latest generate_cohort --study-definition study_definition
outputs:
highly_sensitive:
cohort: output/input.csv
cox_ipw_2:
run: >
cox-ipw:[version]
--df_output=results_2.csv
needs:
- generate_study_population
outputs:
highly_sensitive:
analysis_ready: output/ready-*.dta
moderately_sensitive:
arguments: output/args-results_2.csv
estimates: output/results_2.csv
Please see DEVELOPERS.md.
For more information about reusable actions see here.
The OpenSAFELY framework is a Trusted Research Environment (TRE) for electronic health records research in the NHS, with a focus on public accountability and research quality.
Read more at OpenSAFELY.org.
As standard, research projects have a MIT license.