edstr_view(): interactive exploration • edstr

library(edstr)

edstr_view() searches for regex patterns in a text column and displays matching tokens, counts, and highlighted text. Unlike other pipeline functions, it does not save any files — it is designed for fast, iterative pattern development before running edstr_extract().

Prerequisites

edstr_config(
  edstr_dirname = "output/my_study",
  edstr_filename = "my_study",
  edstr_text = "note_text",
  edstr_overwrite = FALSE
)

df_clean <- edstr_clean()

Basic usage

Pass a regex pattern to search for in the text column.

result <- edstr_view(
  data = df_clean,
  pattern = "fractur"
)

The function prints a frequency table of distinct matches and a summary of match counts, then invisibly returns a list with three elements:

match: a tibble of all matches (one row per match, with the document ID)
count: a tibble of distinct matches sorted by frequency
text: highlighted output from stringr::str_view()

Capturing multi-word expressions

The ngrams argument controls how many tokens are captured after the initial match. This is useful for discovering common multi-word patterns.

result <- edstr_view(
  data = df_clean,
  pattern = "fractur",
  ngrams = 3
)

With ngrams = 3, a match on "fractur" captures up to two additional tokens — e.g. "fracture col femoral" or "fracture extremite superieure".

Applying cleaning rules inline

The replace argument accepts the same format as edstr_clean() and applies replacements before matching. This lets you test cleaning rules without modifying the data.

result <- edstr_view(
  data = df_clean,
  pattern = "diabete",
  replace = c("diabète" = "diabete"),
  ngrams = 2
)

Typical workflow

A common pattern is to iterate with edstr_view() until the regex captures the right matches, then use the finalised pattern in edstr_extract():

Start broad: edstr_view(data, pattern = "diabet", ngrams = 3)
Review the count table to identify common expressions and false positives
Refine the pattern or add exclusions
Once satisfied, pass the pattern as a concept to edstr_extract()