Skip to contents

edstr_import() executes a SQL query against an Oracle database and saves the result as a Parquet file. It requires edstr_config() to be called first.

Prerequisites

edstr_config(
  edstr_dirname = "output/my_study",
  edstr_filename = "my_study",
  edstr_text = "note_text",
  edstr_overwrite = FALSE
)

Running a query

The query argument accepts either a SQL string or a path to a .sql file. Connection parameters (user, password, tns) configure access to the Oracle database.

df_import <- edstr_import(
  query = "SELECT * FROM clinical_notes WHERE rownum <= 1000",
  user = "my_user"
)

Using a .sql file keeps queries out of R scripts and makes them easier to version:

df_import <- edstr_import(
  query = "sql/clinical_notes.sql",
  user = "my_user"
)

Limiting rows

The head argument appends FETCH FIRST ... ROWS ONLY to the query. This is useful for development and testing without modifying the SQL itself.

df_import <- edstr_import(
  query = "sql/clinical_notes.sql",
  head = 500,
  user = "my_user"
)

Loading cached data

When the output file already exists, edstr_import() skips the database query entirely and loads from cache. The behaviour depends on the edstr_overwrite option set in edstr_config().

df_import <- edstr_import()

Column names

By default, lower = TRUE converts all column names to lowercase after import. Set lower = FALSE to preserve the original casing from the database.