edstr_config(): pipeline configuration • edstr

library(edstr)

edstr_config() sets global options that all downstream functions rely on. It must be called before any other edstr_* function.

Required arguments

Two arguments are mandatory: the output directory and the file name prefix.

edstr_config(
  edstr_dirname = "output/my_study",
  edstr_filename = "my_study"
)

The directory is created automatically if it does not exist. Output files are named {edstr_filename}_{step}.parquet for import and clean steps (e.g. my_study_import.parquet, my_study_clean.parquet), and {edstr_filename}_{step}.rds for the extract step.

Glue syntax

Both edstr_dirname and edstr_filename support glue expressions, which are evaluated at call time.

edstr_config(
  edstr_dirname = "output/{Sys.Date()}",
  edstr_filename = "study_{format(Sys.Date(), '%Y%m%d')}"
)

Text column

Setting edstr_text avoids repeating the column name in edstr_clean(), edstr_extract(), and edstr_view().

edstr_config(
  edstr_dirname = "output/my_study",
  edstr_filename = "my_study",
  edstr_text = "note_text"
)

If omitted, the text column must be passed explicitly to each function.

Caching behaviour

The edstr_overwrite option controls what happens when an output file already exists.

Value	Behaviour
`TRUE`	Overwrite without prompting
`FALSE`	Load the cached file without prompting
`NULL`	Prompt an interactive menu (load / overwrite / cancel)

NULL is the default — useful during interactive development. Set TRUE when re-running a full pipeline, FALSE when resuming work on a cached dataset.

edstr_config(
  edstr_dirname = "output/my_study",
  edstr_filename = "my_study",
  edstr_overwrite = TRUE
)

Additional options

Extra named arguments are passed directly to options(), which can be useful for setting project-level options alongside edstr.

edstr_config(
  edstr_dirname = "output/my_study",
  edstr_filename = "my_study",
  dplyr.summarise.inform = FALSE
)