Explore text matches interactively — edstr

Search for regex patterns in a text column and display matching tokens, counts, and highlighted text. Unlike other pipeline functions, edstr_view() does not save results — it is meant for iterating on patterns before extraction.

Usage

edstr_view(
  data,
  text_input = getOption("edstr_text"),
  id = NULL,
  replace = NULL,
  pattern,
  ngrams = 1,
  ...
)

Arguments

data: <data.frame> The data to search.
text_input: <character(1)> Name of the text column. Defaults to the edstr_text option set by edstr_config().
id: <character(1)> Name of the unique identifier column. Auto- detected automatically if not provided.
replace: A named character vector or list of named character vectors. Optional regex replacements applied to the text before matching (see edstr_clean() for details).
pattern: <character(1)> Regex pattern to search for.
ngrams: <integer(1)> Total n-gram window size including the matched token (default 1). For example, ngrams = 3 with pattern = "diabete" matches "diabete type 2".
...: Additional arguments passed to stringr::str_view().

Value

Invisibly returns a list with three elements:

match: A tibble of all matches with the id and match columns.
count: A tibble of distinct matches with their frequency.
text: Output of stringr::str_view() for visual inspection.

Examples

df <- data.frame(
  id = 1:3,
  note = c("diabete type 2", "bilan normal", "diabete gestationnel")
)
edstr_view(data = df, text_input = "note", pattern = "diabete", ngrams = 3)
#> 
#> ── edstr_view ──────────────────────────────────────────────────────────────────
#> 
#> # A tibble: 2 × 2
#>   match                    n
#>   <chr>                <int>
#> 1 diabete gestationnel     1
#> 2 diabete type 2           1
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> Full steps: 0.038 sec elapsed
#> 
#> ℹ Documents: 3 id
#> 
#> ℹ Matches
#>   • Total: 2 across 2 id (66.7% id)
#>   • Distinct: 2
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#>