Skip to contents

Search for regex patterns in a text column and display matching tokens, counts, and highlighted text. Unlike other pipeline functions, edstr_view() does not save results — it is meant for iterating on patterns before extraction.

Usage

edstr_view(
  data,
  text_input = getOption("edstr_text"),
  id = NULL,
  replace = NULL,
  pattern,
  ngrams = 1,
  ...
)

Arguments

data

<data.frame> The data to search.

text_input

<character(1)> Name of the text column. Defaults to the edstr_text option set by edstr_config().

id

<character(1)> Name of the unique identifier column. Auto- detected automatically if not provided.

replace

A named character vector or list of named character vectors. Optional regex replacements applied to the text before matching (see edstr_clean() for details).

pattern

<character(1)> Regex pattern to search for.

ngrams

<integer(1)> Total n-gram window size including the matched token (default 1). For example, ngrams = 3 with pattern = "diabete" matches "diabete type 2".

...

Additional arguments passed to stringr::str_view().

Value

Invisibly returns a list with three elements:

match

A tibble of all matches with the id and match columns.

count

A tibble of distinct matches with their frequency.

text

Output of stringr::str_view() for visual inspection.

Examples

df <- data.frame(
  id = 1:3,
  note = c("diabete type 2", "bilan normal", "diabete gestationnel")
)
edstr_view(data = df, text_input = "note", pattern = "diabete", ngrams = 3)
#> 
#> ── edstr_view ──────────────────────────────────────────────────────────────────
#> 
#> # A tibble: 2 × 2
#>   match                    n
#>   <chr>                <int>
#> 1 diabete gestationnel     1
#> 2 diabete type 2           1
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> Full steps: 0.041 sec elapsed
#> 
#>  Documents: 3 id
#> 
#>  Matches
#>   • Total: 2 across 2 id (66.7% id)
#>   • Distinct: 2
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#>