edstr_view() searches for regex patterns in a text column and displays matching tokens, counts, and highlighted text. Unlike other pipeline functions, it does not save any files — it is designed for fast, iterative pattern development before running edstr_extract().
Prerequisites
edstr_config(
edstr_dirname = "output/my_study",
edstr_filename = "my_study",
edstr_text = "note_text",
edstr_overwrite = FALSE
)
df_clean <- edstr_clean()Basic usage
Pass a regex pattern to search for in the text column.
result <- edstr_view(
data = df_clean,
pattern = "fractur"
)The function prints a frequency table of distinct matches and a summary of match counts, then invisibly returns a list with three elements:
-
match: a tibble of all matches (one row per match, with the document ID) -
count: a tibble of distinct matches sorted by frequency -
text: highlighted output fromstringr::str_view()
Capturing multi-word expressions
The ngrams argument controls how many tokens are captured after the initial match. This is useful for discovering common multi-word patterns.
result <- edstr_view(
data = df_clean,
pattern = "fractur",
ngrams = 3
)With ngrams = 3, a match on "fractur" captures up to two additional tokens — e.g. "fracture col femoral" or "fracture extremite superieure".
Applying cleaning rules inline
The replace argument accepts the same format as edstr_clean() and applies replacements before matching. This lets you test cleaning rules without modifying the data.
result <- edstr_view(
data = df_clean,
pattern = "diabete",
replace = c("diabète" = "diabete"),
ngrams = 2
)Typical workflow
A common pattern is to iterate with edstr_view() until the regex captures the right matches, then use the finalised pattern in edstr_extract():
- Start broad:
edstr_view(data, pattern = "diabet", ngrams = 3) - Review the
counttable to identify common expressions and false positives - Refine the pattern or add exclusions
- Once satisfied, pass the pattern as a concept to
edstr_extract()