Cohort-report generates a report for each variable in an input file.
Consider the following extract from a study's project.yaml:
actions: generate_study_population: run: cohortextractor:latest generate_cohort outputs: highly_sensitive: cohort: output/input.csv generate_report: run: cohort-report:v3.0.0 output/input.csv needs: [generate_study_population] config: variable_types: age: int sex: categorical ethnicity: categorical bmi: float diabetes: binary chronic_liver_disease: binary imd: categorical region: categorical stp: categorical rural_urban: categorical prior_covid_date: date output_path: output/cohort_reports_outputs outputs: moderately_sensitive: reports: output/cohort_reports_outputs/descriptives_input.html
generate_report action generates a report that contains a table and a chart for each variable in the input file.
The table contains unsafe statistics and should be checked thoroughly before it is released.
The chart contains statistics that have been made safe.
Cells in the underlying frequency table have been redacted, if:
- They contain less than 10 units
- They contain greater than 90% of the total number of units
run property passes an input file to a named version of cohort-report.
In this case, it passes output/input.csv to v3.0.0 of cohort-report.
config property passes configuration to cohort-report; for more information, see Configuration.
Notice that the report is called
descriptives_[the name of the input file, without the extension].html.
It is saved to the
output_path; for more information, see Configuration.
output_path, which defaults to
Save the outputs to the given path.
If the given path does not exist, then it is created.
variable_types, which is required for
.csv.gz input files.
Cast the given variables to the given types.
Multiple input files
run property can pass multiple input files to a named version of cohort-report.
actions: # 3.0.0. generate_report: run: cohort-report:v3.0.0 output/input_2021-01-01.csv output/input_2021-02-01.csv # 3.0.0.
However, if one or more input files are
.csv.gz input files, then
variable_types is required;
this will cast the given variables to the given types in all input files.
It will fail if an input file does not have the given variables.
Please see DEVELOPERS.md.