Indrajeet Patil

Source code for these slides can be found on GitHub.
The goal of a unit test is to capture the expected output of a function using code and making sure that actual output after any changes matches the expected output.
{testthat} is a popular framework for writing unit tests in R.
Benefits of unit testing
Test output
Test pass only when actual function behaviour matches expected.
| actual | expected | tests |
|---|---|---|
{testthat}: A recapTest organization
Testing infrastructure for R package has the following hierarchy:
| Component | Role |
|---|---|
|
Test file |
Tests for R/foo.R will typically be in tests/testthat/test-foo.R. |
| Tests | A single file can contain multiple tests. |
| Expectations | A single test can have multiple expectations. |
Example test file
Every test is a call to testthat::test_that() function.
Every expectation is represented by testthat::expect_*() function.
You can generate a test file using usethis::use_test() function.
A unit test records the code to describe expected output.
(actual) (expected)
A snapshot test records expected output in a separate, human-readable file.
(actual) (expected)
If you develop R packages and have struggled to
then you should be excited to know more about snapshot tests (aka golden tests)! π€©
Familiarity with writing unit tests using {testthat}.
If not, have a look at this chapter from R Packages book.
Package versions assumed
Examples in these slides assume testthat β₯ 3.3.0, vdiffr β₯ 1.0.8, and shinytest2 β₯ 0.5.0.
Snapshot tests can be used to test that text output prints as expected.
Important for testing functions that pretty-print R objects to the console, create elegant and informative exceptions, etc.
Letβs say you want to write a unit test for the following function:
Source code
Note that you want to test that the printed output looks as expected.
Therefore, you need to check for all the little bells and whistles in the printed output.
Even testing this simple function is a bit painful because you need to keep track of every escape character, every space, etc.
Test passed with 1 success π.
With a more complex code, itβd be impossible for a human to reason about what the output is supposed to look like.
Important
If this is a utility function used by many other functions, changing its behaviour would entail manually changing expected outputs for many tests.
This is not maintainable! π©
Instead, you can use expect_snapshot(), which, when run for the first time, generates a Markdown file with expected/reference output.
ββ Warning: `print_movies()` prints as expected ββββββββββββββββββββββββββββββββ
Adding new snapshot:
Code
cat(print_movies(c("Title", "Director"), c("Salaam Bombay!", "Mira Nair")))
Output
Movie:
Title: Salaam Bombay!
Director: Mira Nair
Test passed with 1 success πΈ.
Warning
The first time a snapshot is created, it becomes the truth against which future function behaviour will be compared.
Thus, it is crucial that you carefully check that the output is indeed as expected. π
Compared to your unit test code representing the expected output
notice how much more human-friendly the Markdown output is!
It is easy to see what the printed text output is supposed to look like. In other words, snapshot tests are useful when the intent of the code can only be verified by a human.
More about snapshot Markdown files
If test file is called test-foo.R, the snapshot will be saved to test/testthat/_snaps/foo.md.
If there are multiple snapshot tests in a single file, corresponding snapshots will also share the same .md file.
By default, expect_snapshot() will capture the code, the object values, and any side-effects.
If you run the test again, itβll succeed:
Test passed with 1 success π₯³.
Why does my test fail on a re-run?
If testing a snapshot you just generated fails on re-running the test, this is most likely because your test is not deterministic. For example, if your function deals with random number generation.
In such cases, setting a seed (e.g. set.seed(42)) should help.
When function changes, snapshot doesnβt match the reference, and the test fails:
Changes to function
Failure message provides expected (-) vs observed (+) diff.
Test failure
ββ Failure: `print_movies()` prints as expected ββββββββββββββββββββββββββββββββ
Snapshot of code has changed:
old[2:6] vs new[2:6]
cat(print_movies(c("Title", "Director"), c("Salaam Bombay!", "Mira Nair")))
Output
Movie:
- Title: Salaam Bombay!
+ Title- Salaam Bombay!
- Director: Mira Nair
+ Director- Mira Nair
* Run `testthat::snapshot_accept("slides.qmd")` to accept the change.
* Run `testthat::snapshot_review("slides.qmd")` to review the change.
Error:
! Test failed with 1 failure and 0 successes.
Message accompanying failed tests make it explicit how to fix them.
.new snapshot files.Fixing multiple snapshot tests
If this is a utility function used by many other functions, changing its behaviour would lead to failure of many tests.
You can update all new snapshots with snapshot_accept(), or discard them all with snapshot_reject(). And, of course, check the diffs to make sure that the changes are expected.
Sometimes the expected output differs legitimately across environments β e.g. on Windows vs. macOS, or across R versions. Rather than maintaining separate test files, you can use snapshot variants.
Pass a variant string directly to expect_snapshot():
Snapshots are then saved to _snaps/windows/ instead of _snaps/, so each variant gets its own reference file.
Note
Variants are most useful when you know the output will differ and you want to track each variant explicitly. For output that should be identical everywhere, avoid variants β they multiply the maintenance burden.
So far you have tested text output printed to the console, but you can also use snapshots to capture messages, warnings, and errors.
message
ββ Warning: f() messages βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Adding new snapshot:
Code
f()
Message
Some info for you.
Test passed with 1 success πΈ.
warning
ββ Warning: g() warns ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Adding new snapshot:
Code
g()
Condition
Warning in `g()`:
Managed to recover.
Test passed with 1 success π₯³.
Tip
Snapshot records both the condition and the corresponding message.
You can now rest assured that the users are getting informed the way you want! π
In case of an error, the function expect_snapshot() itself will produce an error.
You can capture it:
ββ Warning: `log()` errors βββββββββββββββββββββββββββββββββββββββββββββββββββββ
Adding new snapshot:
Code
log("x")
Condition
Error in `log()`:
! non-numeric argument to mathematical function
Test passed with 1 success πΈ.
testthat article on snapshot testing
Introduction to golden testing
Docs for Jest library in JavaScript, which inspired snapshot testing implementation in testthat
To create graphical expectations, you will use testthat extension package: {vdiffr}.
{vdiffr} work?vdiffr introduces expect_doppelganger() to generate testthat expectations for graphics. It does this by writing SVG snapshot files for outputs!
The figure to test can be:
ggplot object (from ggplot2::ggplot())recordedplot object (from grDevices::recordPlot())print() methodNote
If test file is called test-foo.R, the snapshot will be saved to test/testthat/_snaps/foo folder.
In this folder, there will be one .svg file for every test in test-foo.R.
The name for the .svg file will be sanitized version of title argument to expect_doppelganger().
Letβs say you want to write a unit test for the following function:
Source code
Note that you want to test that the graphical output looks as expected, and this expectation is difficult to capture with a unit test.
You can use expect_doppelganger() from vdiffr to test this!
The first time you run the test, itβd generate an .svg file with expected output.
ββ Warning: `create_scatter()` plots as expected βββββββββββββββββββββββββββββββ
Adding new file snapshot: 'tests/testthat/_snaps/slides.qmd/create-scatter.svg'
Test passed with 1 success π.
Warning
The first time a snapshot is created, it becomes the truth against which future function behaviour will be compared.
Thus, it is crucial that you carefully check that the output is indeed as expected. π
You can open .svg snapshot files in a web browser for closer inspection.
If you run the test again, itβll succeed:
When function changes, snapshot doesnβt match the reference, and the test fails:
Changes to function
Test failure
test_that("`create_scatter()` plots as expected", {
expect_doppelganger(
title = "create scatter",
fig = create_scatter(),
)
})
ββ Failure ('<text>:3'): `create_scatter()` plots as expected ββββββββββββββββββ
Snapshot of `testcase` to 'slides.qmd/create-scatter.svg' has changed
Run `testthat::snapshot_review('slides.qmd/')` to review changes
Backtrace:
1. vdiffr::expect_doppelganger(...)
3. testthat::expect_snapshot_file(...)
Error in `reporter$stop_if_needed()`:
! Test failedBehaviour on GitHub Actions vs CRAN
expect_doppelganger() has a cran argument (default FALSE) that controls where failures are enforced:
CI environment variable automatically, so no extra configuration is neededIf you deliberately want CRAN to enforce graphical snapshots too, pass cran = TRUE:
Running snapshot_review() launches a Shiny app which can be used to either accept or reject the new output(s).
Why are my snapshots for plots failing?! π
If tests fail even if the function didnβt change, it can be due to any of the following reasons:
For these reasons, snapshot tests for plots tend to be fragile and are not run on CRAN machines by default.
Whole file snapshot testing makes sure that media, data frames, text files, etc. are as expected.
Letβs say you want to test JSON files generated by jsonlite::write_json().
Test
ββ Warning: json writer works ββββββββββββββββββββββββββββββββββββββββββββββββββ
Adding new file snapshot: 'tests/testthat/_snaps/slides.qmd/demo.json'
Test passed with 1 success π₯³.
Snapshot
Note
To snapshot a file, you need to write a helper function that provides its path.
If a test file is called test-foo.R, the snapshot will be saved to test/testthat/_snaps/foo folder.
In this folder, there will be one file (e.g. .json) for every expect_snapshot_file() expectation in test-foo.R.
The name for snapshot file is taken from name argument to expect_snapshot_file().
If you run the test again, itβll succeed:
If the new output doesnβt match the expected one, the test will fail:
ββ Failure: json writer works ββββββββββββββββββββββββββββββββββββββββββββββββββ
Snapshot of `r_to_json(x)` has changed.
Differences:
old | new
[1] [[1],{"x":["a"]}] - [[1],{"x":["b"]}] [1]
* Run `testthat::snapshot_accept("slides.qmd/")` to accept the change.
* Run `testthat::snapshot_review("slides.qmd/")` to review the change.
Error:
! Test failed with 1 failure and 0 successes.
Running snapshot_review() launches a Shiny app which can be used to either accept or reject the new output(s).
Documentation for expect_snapshot_file()
To write formal tests for Shiny applications, you will use testthat extension package: {shinytest2}.
{shinytest2} work?shinytest2 uses a Shiny app (how meta! π ) to record user interactions with the app and generate snapshots of the applicationβs state. Future behaviour of the app will be compared against these snapshots to check for any changes.
Exactly how tests for Shiny apps in R package are written depends on how the app is stored. There are two possibilities, and you will discuss them both separately.
Stored in /inst folder
βββ DESCRIPTION
βββ R
βββ inst
β βββ sample_app
β βββ app.R
Returned by a function
βββ DESCRIPTION
βββ R
β βββ app-function.R
βββ DESCRIPTION
βββ R
βββ inst
β βββ sample_app
β βββ app.R
Letβs say this app resides in the inst/unitConverter/app.R file.
To create a snapshot test, go to the app directory and run record_test().
Test
Snapshot

Note
record_test() saves the test file directly to the packageβs tests/testthat/ directory (e.g. tests/testthat/test-shinytest2.R). No separate driver script is needed.
The snapshots are saved in tests/testthat/_snaps/{test-name}/. The {.variant} subdirectory (e.g. _snaps/windows-4.1/) is used when tests produce OS- or R-version-specific output.
Letβs say, while updating the app, you make a mistake, which leads to a failed test.
Changed code with mistake
Test failure JSON diff
Fixing this test will be similar to fixing any other snapshot test youβve seen thus far.
shinytest2 provides a Shiny app for comparing the old and new snapshots.
βββ DESCRIPTION
βββ R
β βββ app-function.R
The only difference in testing workflow when Shiny app objects are created by functions is that you will write the test ourselves, instead of shinytest2 auto-generating it.
Source code
# File: R/unit-converter.R
unitConverter <- function() {
ui <- fluidPage(
titlePanel("Convert kilograms to grams"),
numericInput("kg", "Weight (in kg)", value = 0),
textOutput("g")
)
server <- function(input, output, session) {
output$g <- renderText(
paste0("Weight (in g): ", input$kg * 1000)
)
}
shinyApp(ui, server)
}you call record_test() directly on a Shiny app object, copy-paste commands to the test script, and run devtools::test_active_file() to generate snapshots.
This testing workflow is also relevant for app frameworks (e.g. {golem}, {rhino}, etc.).
Function in run_app.R returns app.
βββ DESCRIPTION
βββ NAMESPACE
βββ R
β βββ app_config.R
β βββ app_server.R
β βββ app_ui.R
β βββ run_app.R
Function in app.R returns app.
βββ app
β βββ js
β β βββ index.js
β βββ logic
β β βββ __init__.R
β βββ static
β β βββ favicon.ico
β βββ styles
β β βββ main.scss
β βββ view
β β βββ __init__.R
β βββ main.R
βββ tests
β βββ ...
βββ app.R
βββ RhinoApplication.Rproj
βββ dependencies.R
βββ renv.lock
βββ rhino.yml
The final location of the tests and snapshots should look like the following for the two possible ways Shiny apps are included in R packages.
Stored in /inst folder
βββ DESCRIPTION
βββ R
βββ inst
β βββ sample_app
β βββ app.R
β βββ tests
β βββ testthat
β β βββ _snaps
β β β βββ shinytest2
β β β βββ 001.json
β β βββ test-shinytest2.R
β βββ testthat.R
βββ tests
βββ testthat
β βββ test-inst-apps.R
βββ testthat.R
Returned by a function
βββ DESCRIPTION
βββ R
β βββ app-function.R
βββ tests
βββ testthat
β βββ _snaps
β β βββ app-function
β β βββ 001.json
β βββ test-app-function.R
βββ testthat.R
For the sake of completeness, here is what the test directory structure would like when there are multiple apps in a single package.
Stored in /inst folder
βββ DESCRIPTION
βββ R
βββ inst
β βββ sample_app1
β βββ app.R
β βββ tests
β βββ testthat
β β βββ _snaps
β β β βββ shinytest2
β β β βββ 001.json
β β βββ test-shinytest2.R
β βββ testthat.R
β βββ sample_app2
β βββ app.R
β βββ tests
β βββ testthat
β β βββ _snaps
β β β βββ shinytest2
β β β βββ 001.json
β β βββ test-shinytest2.R
β βββ testthat.R
βββ tests
βββ testthat
β βββ test-inst-apps.R
βββ testthat.R
Returned by a function
βββ DESCRIPTION
βββ R
β βββ app-function1.R
β βββ app-function2.R
βββ tests
βββ testthat
β βββ _snaps
β β βββ app-function1
β β βββ 001.json
β β βββ app-function2
β β βββ 001.json
β βββ test-app-function1.R
β βββ test-app-function2.R
βββ testthat.R
The following are some advanced topics that are beyond the scope of the current presentation, but you may wish to know more about.
Extra
If you want to test Shiny apps with continuous integration using shinytest2, read this article.
shinytest2 is a successor to shinytest package. If you want to migrate from the latter to the former, have a look at this.
Testing chapter from Mastering Shiny book
shinytest2 article introducing its workflow
shinytest2 article on how to test apps in R packages
Itβs not all kittens and roses when it comes to snapshot testing.
Letβs see some issues you might run into while using them. π€
Letβs say you write a graphical snapshot test for a function that produces a ggplot object. If ggplot2 authors make some modifications to this object, your tests will fail, even though your function works as expected!
In other words, your tests are now at the mercy of other package authors because snapshots are capturing things beyond your packageβs control.
Caution
Tests that fail for reasons other than what they are testing for are problematic. Thus, be careful about what you snapshot and keep in mind the maintenance burden that comes with dependencies with volatile APIs.
Note
A way to reduce the burden of keeping snapshots up-to-date is to automate this process. But there is no free lunch in this universe, and now you need to maintain this automation! π€·
If snapshots fail locally, you can just run snapshot_review(), but what if they fail in non-interactive environments (on CI/CD platforms, during R CMD Check, etc.)?
The easiest solution is to download the new snapshots to your local folder and run snapshot_review().
Tip
If expected snapshot is called (e.g.) foo.svg, there will be a new snapshot file foo.new.svg in the same folder when the test fails.
snapshot_review() compares these files to reveal how the outputs have changed.
New snapshots must be created locally first
expect_snapshot() intentionally fails on CI when it encounters a new snapshot that has never been committed β this signals that you forgot to run the tests locally before pushing.
Always run your tests locally first and commit the generated snapshot files before opening a pull request.
But where can you find the new snapshots?
In local R CMD Check, you can find new snapshots in .Rcheck folder:
On GitHub Actions, upload snapshots as artifacts so you can retrieve them:
Then download them directly into your local package with snapshot_download_gh():
This replaces the manual step of downloading the artifact zip and copying files by hand.
Despite snapshot tests making the expected outputs more human-readable, given a big enough change and complex enough output, sometimes it can be challenging to review changes to snapshots.
How do you review pull requests with complex snapshots changes?
A few naming-related mistakes that silently cause trouble:
Caution
Orphaned .md / .svg snapshot files accumulate silently β they donβt cause test failures. Running your full test suite with devtools::test() automatically removes orphaned snapshots at the end of the run.
Given their fragile nature, snapshot tests are skipped on CRAN by default.
Although this makes sense, it means that you miss out on anything but a breaking change from upstream dependency. E.g., if ggplot2 (hypothetically) changes how the points look, you wonβt know about this change until you happen to run your snapshot tests again locally or on CI/CD.
Unit tests run on CRAN, on the other hand, will fail and you will be immediately informed about it.
Tip
A way to insure against such silent failures is to run tests daily on CI/CD platforms (e.g. nightly builds).
What not to do
It is tempting to use them everywhere out of laziness. But they are sometimes inappropriate (e.g. when testing requires external benchmarking).
Letβs say you write a function to extract estimates from a regression model.
Its test should compare results against an external benchmark, and not a snapshot.
Snapshot testing is appropriate when the human needs to be in the loop to make sure that things are working as expected. Therefore, the snapshots should be human readable.
E.g. if you write a function that plots something:
To test it, you should snapshot the plot, and not the underlying data, which is hard to make sense of for a human.
Resist formation of such a habit.
testthat provides tools to make it very easy to review changes, so no excuses!
In this presentation, you deliberately kept the examples and the tests simple.
To see a more realistic usage of snapshot tests, you can study open-source test suites.
And Happy Snapshotting! π
Check out my other slide decks on software development best practices
β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
setting value
version R version 4.6.0 (2026-04-24)
os Ubuntu 24.04.4 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate C.UTF-8
ctype C.UTF-8
tz UTC
date 2026-05-31
pandoc 3.9.0.2 @ /opt/hostedtoolcache/pandoc/3.9.0.2/x64/ (via rmarkdown)
quarto 1.10.6 @ /usr/local/bin/quarto
β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
package * version date (UTC) lib source
base * 4.6.0 2026-04-24 [3] local
brio 1.1.5 2024-04-24 [1] RSPM
cli 3.6.6 2026-04-09 [1] RSPM
compiler 4.6.0 2026-04-24 [3] local
crayon 1.5.3 2024-06-20 [1] RSPM
datasets * 4.6.0 2026-04-24 [3] local
desc 1.4.3 2023-12-10 [1] RSPM
diffobj 0.3.6 2025-04-21 [1] RSPM
digest 0.6.39 2025-11-19 [1] RSPM
evaluate 1.0.5 2025-08-27 [1] RSPM
farver 2.1.2 2024-05-13 [1] RSPM
fastmap 1.2.0 2024-05-15 [1] RSPM
ggplot2 * 4.0.3 2026-04-22 [1] RSPM
glue 1.8.1 2026-04-17 [1] RSPM
graphics * 4.6.0 2026-04-24 [3] local
grDevices * 4.6.0 2026-04-24 [3] local
grid 4.6.0 2026-04-24 [3] local
gtable 0.3.6 2024-10-25 [1] RSPM
htmltools 0.5.9 2025-12-04 [1] RSPM
jsonlite 2.0.0 2025-03-27 [1] RSPM
knitr 1.51 2025-12-20 [1] RSPM
labeling 0.4.3 2023-08-29 [1] RSPM
lattice 0.22-9 2026-02-09 [3] CRAN (R 4.6.0)
lifecycle 1.0.5 2026-01-08 [1] RSPM
magrittr 2.0.5 2026-04-04 [1] RSPM
Matrix 1.7-5 2026-03-21 [3] CRAN (R 4.6.0)
methods * 4.6.0 2026-04-24 [3] local
mgcv 1.9-4 2025-11-07 [3] CRAN (R 4.6.0)
nlme 3.1-169 2026-03-27 [3] CRAN (R 4.6.0)
otel 0.2.0 2025-08-29 [1] RSPM
pkgload 1.5.2 2026-04-22 [1] RSPM
png 0.1-9 2026-03-15 [1] RSPM
R6 2.6.1 2025-02-15 [1] RSPM
RColorBrewer 1.1-3 2022-04-03 [1] RSPM
rlang 1.2.0 2026-04-06 [1] RSPM
rmarkdown 2.31 2026-03-26 [1] RSPM
rprojroot 2.1.1 2025-08-26 [1] RSPM
S7 0.2.2 2026-04-22 [1] RSPM
scales 1.4.0 2025-04-24 [1] RSPM
sessioninfo 1.2.3 2025-02-05 [1] any (@1.2.3)
splines 4.6.0 2026-04-24 [3] local
stats * 4.6.0 2026-04-24 [3] local
testthat * 3.3.2 2026-01-11 [1] RSPM
tools 4.6.0 2026-04-24 [3] local
utils * 4.6.0 2026-04-24 [3] local
vctrs 0.7.3 2026-04-11 [1] RSPM
vdiffr * 1.0.9 2026-02-13 [1] RSPM
waldo 0.6.2 2025-07-11 [1] RSPM
withr 3.0.2 2024-10-28 [1] RSPM
xfun 0.57 2026-03-20 [1] RSPM
yaml 2.3.12 2025-12-10 [1] RSPM
[1] /home/runner/work/_temp/Library
[2] /opt/R/4.6.0/lib/R/site-library
[3] /opt/R/4.6.0/lib/R/library
* ββ Packages attached to the search path.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ