## `{ggstatsplot}`: `{ggplot2}` Based Plots with Statistical Details | Status | Usage | Miscellaneous | |----|----|----| | [![R build status](https://github.com/IndrajeetPatil/ggstatsplot/workflows/R-CMD-check/badge.svg)](https://github.com/IndrajeetPatil/ggstatsplot) | [![Total downloads](https://cranlogs.r-pkg.org/badges/grand-total/ggstatsplot?color=blue)](https://CRAN.R-project.org/package=ggstatsplot) | [![codecov](https://codecov.io/gh/IndrajeetPatil/ggstatsplot/branch/main/graph/badge.svg?token=ddrxwt0bj8)](https://app.codecov.io/gh/IndrajeetPatil/ggstatsplot) | | [![lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html) | [![Daily downloads](https://cranlogs.r-pkg.org/badges/last-day/ggstatsplot?color=blue)](https://CRAN.R-project.org/package=ggstatsplot) | [![DOI](https://joss.theoj.org/papers/10.21105/joss.03167/status.svg)](https://doi.org/10.21105/joss.03167) | ## Raison d’être ![ggstatsplot package logo](reference/figures/logo.png) > “What is to be sought in designs for the display of information is the > clear portrayal of complexity. Not the complication of the simple; > rather … the revelation of the complex.” - Edward R. Tufte [`{ggstatsplot}`](https://www.indrapatil.com/ggstatsplot/) is an extension of [`{ggplot2}`](https://github.com/tidyverse/ggplot2) package for creating graphics with details from statistical tests included in the information-rich plots themselves. In a typical exploratory data analysis workflow, data visualization and statistical modeling are two different phases: visualization informs modeling, and modeling in its turn can suggest a different visualization method, and so on and so forth. The central idea of [ggstatsplot](https://www.indrapatil.com/ggstatsplot/) is simple: combine these two phases into one in the form of graphics with statistical details, which makes data exploration simpler and faster. ## Installation | Type | Command | |:------------|:-----------------------------------------| | Release | `install.packages("ggstatsplot")` | | Development | `pak::pak("IndrajeetPatil/ggstatsplot")` | ## Citation If you want to cite this package in a scientific journal or in any other context, run the following code in your `R` console: ``` r citation("ggstatsplot") To cite package 'ggstatsplot' in publications use: Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach. Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167 A BibTeX entry for LaTeX users is @Article{, doi = {10.21105/joss.03167}, url = {https://doi.org/10.21105/joss.03167}, year = {2021}, publisher = {{The Open Journal}}, volume = {6}, number = {61}, pages = {3167}, author = {Indrajeet Patil}, title = {{Visualizations with statistical details: The {'ggstatsplot'} approach}}, journal = {{Journal of Open Source Software}}, } ``` ## Acknowledgments I would like to thank all the contributors to [ggstatsplot](https://www.indrapatil.com/ggstatsplot/) who pointed out bugs or requested features I hadn’t considered. I would especially like to thank other package developers (especially Daniel Lüdecke, Dominique Makowski, Mattan S. Ben-Shachar, Brenton Wiernik, Patrick Mair, Salvatore Mangiafico, etc.) who have patiently and diligently answered my relentless questions and supported feature requests in their projects. I also want to thank Chuck Powell for his initial contributions to the package. The hexsticker was generously designed by Sarah Otterstetter (Max Planck Institute for Human Development, Berlin). This package has also benefited from the larger `#rstats` community on Twitter, LinkedIn, and `StackOverflow`. Thanks are also due to my postdoc advisers (Mina Cikara and Fiery Cushman at Harvard University; Iyad Rahwan at Max Planck Institute for Human Development) who patiently supported me spending hundreds (?) of hours working on this package rather than what I was paid to do. 😁 ## Documentation and Examples To see the detailed documentation for each function in the stable **CRAN** version of the package, see: - [Publication](https://joss.theoj.org/papers/10.21105/joss.03167) - [Presentation](https://www.indrapatil.com/intro-to-ggstatsplot/#/ggstatsplot-informative-statistical-visualizations) - [Vignettes](https://www.indrapatil.com/ggstatsplot/articles/) ## Summary of available plots | Function | Plot | Description | |:---|:---|:---| | [`ggbetweenstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggbetweenstats.md) | **violin plots** | for comparisons *between* groups/conditions | | [`ggwithinstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggwithinstats.md) | **violin plots** | for comparisons *within* groups/conditions | | [`gghistostats()`](https://www.indrapatil.com/ggstatsplot/reference/gghistostats.md) | **histograms** | for distribution about numeric variable | | [`ggdotplotstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggdotplotstats.md) | **dot plots/charts** | for distribution about labeled numeric variable | | [`ggscatterstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggscatterstats.md) | **scatterplots** | for correlation between two variables | | [`ggcorrmat()`](https://www.indrapatil.com/ggstatsplot/reference/ggcorrmat.md) | **correlation matrices** | for correlations between multiple variables | | [`ggpiestats()`](https://www.indrapatil.com/ggstatsplot/reference/ggpiestats.md) | **pie charts** | for categorical data | | [`ggbarstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggbarstats.md) | **bar charts** | for categorical data | | [`ggcoefstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggcoefstats.md) | **dot-and-whisker plots** | for regression models and meta-analysis | In addition to these basic plots, [ggstatsplot](https://www.indrapatil.com/ggstatsplot/) also provides **`grouped_`** versions (see below) that makes it easy to repeat the same analysis for any grouping variable. ## Summary of types of statistical analyses The table below summarizes all the different types of analyses currently supported in this package- | Functions | Description | Parametric | Non-parametric | Robust | Bayesian | |:---|:---|:---|:---|:---|:---| | [`ggbetweenstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggbetweenstats.md) | Between group/condition comparisons | ✅ | ✅ | ✅ | ✅ | | [`ggwithinstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggwithinstats.md) | Within group/condition comparisons | ✅ | ✅ | ✅ | ✅ | | [`gghistostats()`](https://www.indrapatil.com/ggstatsplot/reference/gghistostats.md), [`ggdotplotstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggdotplotstats.md) | Distribution of a numeric variable | ✅ | ✅ | ✅ | ✅ | | `ggcorrmat` | Correlation matrix | ✅ | ✅ | ✅ | ✅ | | [`ggscatterstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggscatterstats.md) | Correlation between two variables | ✅ | ✅ | ✅ | ✅ | | [`ggpiestats()`](https://www.indrapatil.com/ggstatsplot/reference/ggpiestats.md), [`ggbarstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggbarstats.md) | Association between categorical variables | ✅ | ✅ | ❌ | ✅ | | [`ggpiestats()`](https://www.indrapatil.com/ggstatsplot/reference/ggpiestats.md), [`ggbarstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggbarstats.md) | Equal proportions for categorical variable levels | ✅ | ✅ | ❌ | ✅ | | [`ggcoefstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggcoefstats.md) | Regression model coefficients | ✅ | ✅ | ✅ | ✅ | | [`ggcoefstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggcoefstats.md) | Random-effects meta-analysis | ✅ | ❌ | ✅ | ✅ | Summary of Bayesian analysis | Analysis | Hypothesis testing | Estimation | |:-----------------------------|:-------------------|:-----------| | (one/two-sample) *t*-test | ✅ | ✅ | | one-way ANOVA | ✅ | ✅ | | correlation | ✅ | ✅ | | (unpaired) contingency table | ✅ | ✅ | | (paired) contingency table | ✅ | ❌ | | random-effects meta-analysis | ✅ | ✅ | ## Statistical reporting For **all** statistical tests reported in the plots, the default template abides by the gold standard for statistical reporting. For example, here are results from Yuen’s test for trimmed means (robust *t*-test): ![Example of statistical reporting format showing Yuen's test results with test statistic, degrees of freedom, p-value, effect size, and confidence interval](reference/figures/stats_reporting_format.png) ## Summary of statistical tests and effect sizes Statistical analysis is carried out by [statsExpressions](https://www.indrapatil.com/statsExpressions/) package, and thus a summary table of all the statistical tests currently supported across various functions can be found in article for that package: ## Primary functions ### `ggbetweenstats()` This function creates either a violin plot, a box plot, or a mix of two for **between**-group or **between**-condition comparisons with results from statistical tests in the subtitle. The simplest function call looks like this- ``` r set.seed(123) ggbetweenstats( data = iris, x = Species, y = Sepal.Length, title = "Distribution of sepal length across Iris species" ) ``` ![Violin plot with boxplot overlay showing distribution of sepal length across three Iris species with statistical test results](reference/figures/README-ggbetweenstats1-1.png) **Defaults** return\ ✅ raw data + distributions\ ✅ descriptive statistics\ ✅ inferential statistics\ ✅ effect size + CIs\ ✅ pairwise comparisons\ ✅ Bayesian hypothesis-testing\ ✅ Bayesian estimation\ A number of other arguments can be specified to make this plot even more informative or change some of the default options. Additionally, there is also a `grouped_` variant of this function that makes it easy to repeat the same operation across a **single** grouping variable: ``` r set.seed(123) grouped_ggbetweenstats( data = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")), x = mpaa, y = length, grouping.var = genre, ggsignif.args = list(textsize = 4, tip_length = 0.01), p.adjust.method = "bonferroni", palette = "ggsci::default_jama", plotgrid.args = list(nrow = 1), annotation.args = list(title = "Differences in movie length by mpaa ratings for different genres") ) ``` ![Grouped violin plots comparing movie length by MPAA rating for Action and Comedy genres with statistical annotations](reference/figures/README-ggbetweenstats2-1.png) Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: For more, also read the following vignette: ### `ggwithinstats()` [`ggbetweenstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggbetweenstats.md) function has an identical twin function [`ggwithinstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggwithinstats.md) for repeated measures designs that behaves in the same fashion with a few minor tweaks introduced to properly visualize the repeated measures design. As can be seen from an example below, the only difference between the plot structure is that now the group means are connected by paths to highlight the fact that these data are paired with each other. If your repeated-measures data include an explicit subject identifier, it is recommended that you pass it via `subject.id`; rows with missing identifiers are ignored for paired grouping and repeated-measures statistics. ``` r set.seed(123) library(WRS2) ## for data library(afex) ## to run ANOVA ggwithinstats( data = WineTasting, x = Wine, y = Taste, subject.id = Taster, title = "Wine tasting" ) ``` ![Within-subjects violin plot showing wine taste ratings by wine type with paired data paths and statistical results](reference/figures/README-ggwithinstats1-1.png) **Defaults** return\ ✅ raw data + distributions\ ✅ descriptive statistics\ ✅ inferential statistics\ ✅ effect size + CIs\ ✅ pairwise comparisons\ ✅ Bayesian hypothesis-testing\ ✅ Bayesian estimation\ As with the [`ggbetweenstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggbetweenstats.md), this function also has a `grouped_` variant that makes repeating the same analysis across a single grouping variable quicker. We will see an example with only repeated measurements- ``` r set.seed(123) grouped_ggwithinstats( data = dplyr::filter(bugs_long, region %in% c("Europe", "North America"), condition %in% c("LDLF", "LDHF")), x = condition, y = desire, subject.id = subject, type = "np", xlab = "Condition", ylab = "Desire to kill an artrhopod", grouping.var = region ) ``` ![Grouped within-subjects violin plots showing desire to kill arthropods by condition for Europe and North America](reference/figures/README-ggwithinstats2-1.png) Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: For more, also read the following vignette: ### `gghistostats()` To visualize the distribution of a single variable and check if its mean is significantly different from a specified value with a one-sample test, [`gghistostats()`](https://www.indrapatil.com/ggstatsplot/reference/gghistostats.md) can be used. ``` r set.seed(123) gghistostats( data = ggplot2::msleep, x = awake, title = "Amount of time spent awake", test.value = 12, binwidth = 1 ) ``` ![Histogram showing distribution of time spent awake in mammals with one-sample test results](reference/figures/README-gghistostats1-1.png) **Defaults** return\ ✅ counts + proportion for bins\ ✅ descriptive statistics\ ✅ inferential statistics\ ✅ effect size + CIs\ ✅ Bayesian hypothesis-testing\ ✅ Bayesian estimation\ There is also a `grouped_` variant of this function that makes it easy to repeat the same operation across a **single** grouping variable: ``` r set.seed(123) grouped_gghistostats( data = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")), x = budget, test.value = 50, type = "nonparametric", xlab = "Movies budget (in million US$)", grouping.var = genre, ggtheme = ggthemes::theme_tufte(), ## modify the defaults from `{ggstatsplot}` for each plot plotgrid.args = list(nrow = 1), annotation.args = list(title = "Movies budgets for different genres") ) ``` ![Grouped histograms showing movie budget distributions for Action and Comedy genres with statistical tests](reference/figures/README-gghistostats2-1.png) Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: For more, also read the following vignette: ### `ggdotplotstats()` This function is similar to [`gghistostats()`](https://www.indrapatil.com/ggstatsplot/reference/gghistostats.md), but is intended to be used when the numeric variable also has a label. ``` r set.seed(123) ggdotplotstats( data = dplyr::filter(gapminder::gapminder, continent == "Asia"), y = country, x = lifeExp, test.value = 55, type = "robust", title = "Distribution of life expectancy in Asian continent", xlab = "Life expectancy" ) ``` ![Dot plot showing life expectancy distribution across Asian countries with robust one-sample test results](reference/figures/README-ggdotplotstats1-1.png) **Defaults** return\ ✅descriptives (centrality measure + uncertainty + sample size)\ ✅ inferential statistics\ ✅ effect size + CIs\ ✅ Bayesian hypothesis-testing\ ✅ Bayesian estimation\ As with the rest of the functions in this package, there is also a `grouped_` variant of this function to facilitate looping the same operation for all levels of a single grouping variable. ``` r set.seed(123) grouped_ggdotplotstats( data = dplyr::filter(ggplot2::mpg, cyl %in% c("4", "6")), x = cty, y = manufacturer, type = "bayes", xlab = "city miles per gallon", ylab = "car manufacturer", grouping.var = cyl, test.value = 15.5, point.args = list(color = "red", size = 5, shape = 13), annotation.args = list(title = "Fuel economy data") ) ``` ![Grouped dot plots showing city miles per gallon by car manufacturer for 4 and 6 cylinder vehicles](reference/figures/README-ggdotplotstats2-1.png) Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: For more, also read the following vignette: ### `ggscatterstats()` This function creates a scatterplot with marginal distributions overlaid on the axes and results from statistical tests in the subtitle: ``` r ggscatterstats( data = ggplot2::msleep, x = sleep_rem, y = awake, xlab = "REM sleep (in hours)", ylab = "Amount of time spent awake (in hours)", title = "Understanding mammalian sleep" ) ``` ![Scatterplot with marginal distributions showing relationship between REM sleep and time awake in mammals with correlation results](reference/figures/README-ggscatterstats1-1.png) **Defaults** return\ ✅ raw data + distributions\ ✅ marginal distributions\ ✅ inferential statistics\ ✅ effect size + CIs\ ✅ Bayesian hypothesis-testing\ ✅ Bayesian estimation\ There is also a `grouped_` variant of this function that makes it easy to repeat the same operation across a **single** grouping variable. ``` r set.seed(123) grouped_ggscatterstats( data = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")), x = rating, y = length, grouping.var = genre, label.var = title, label.expression = length > 200, xlab = "IMDB rating", ggtheme = ggplot2::theme_grey(), ggplot.component = list(ggplot2::scale_x_continuous(breaks = seq(2, 9, 1), limits = (c(2, 9)))), plotgrid.args = list(nrow = 1), annotation.args = list(title = "Relationship between movie length and IMDB ratings") ) ``` ![Grouped scatterplots showing IMDB rating vs movie length for Action and Comedy genres with correlation annotations](reference/figures/README-ggscatterstats2-1.png) Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: For more, also read the following vignette: ### `ggcorrmat` `ggcorrmat` makes a correlalogram (a matrix of correlation coefficients) with minimal amount of code. Just sticking to the defaults itself produces publication-ready correlation matrices. But, for the sake of exploring the available options, let’s change some of the defaults. For example, multiple aesthetics-related arguments can be modified to change the appearance of the correlation matrix. ``` r set.seed(123) ## as a default this function outputs a correlation matrix plot ggcorrmat( data = ggplot2::msleep, colors = c("#B2182B", "white", "#4D4D4D"), title = "Correlalogram for mammals sleep dataset", subtitle = "sleep units: hours; weight units: kilograms" ) ``` ![Correlation matrix heatmap for mammals sleep dataset showing pairwise correlations with significance indicators](reference/figures/README-ggcorrmat1-1.png) **Defaults** return\ ✅ effect size + significance\ ✅ careful handling of `NA`s If there are `NA`s present in the selected variables, the legend will display minimum, median, and maximum number of pairs used for correlation tests. There is also a `grouped_` variant of this function that makes it easy to repeat the same operation across a **single** grouping variable: ``` r set.seed(123) grouped_ggcorrmat( data = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")), type = "robust", colors = c("#cbac43", "white", "#550000"), grouping.var = genre, p.adjust.method = "fdr", matrix.type = "lower" ) ``` ![Grouped correlation matrices for Action and Comedy movie genres showing robust correlations](reference/figures/README-ggcorrmat2-1.png) Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: For more, also read the following vignette: ### `ggpiestats()` This function creates a pie chart for categorical or nominal variables with results from contingency table analysis (Pearson’s chi-squared test for between-subjects design and McNemar’s chi-squared test for within-subjects design) included in the subtitle of the plot. If only one categorical variable is entered, results from one-sample proportion test (i.e., a chi-squared goodness of fit test) will be displayed as a subtitle. To study an interaction between two categorical variables: ``` r set.seed(123) ggpiestats( data = mtcars, x = am, y = cyl, palette = "wesanderson::Royal1", title = "Dataset: Motor Trend Car Road Tests", legend.title = "Transmission" ) ``` ![Pie charts showing transmission type distribution across cylinder groups in mtcars data with contingency table analysis](reference/figures/README-ggpiestats1-1.png) **Defaults** return\ ✅ descriptives (frequency + %s)\ ✅ inferential statistics\ ✅ effect size + CIs\ ✅ Goodness-of-fit tests\ ✅ Bayesian hypothesis-testing\ ✅ Bayesian estimation\ There is also a `grouped_` variant of this function that makes it easy to repeat the same operation across a **single** grouping variable. Following example is a case where the theoretical question is about proportions for different levels of a single nominal variable: ``` r set.seed(123) grouped_ggpiestats( data = mtcars, x = cyl, grouping.var = am, label.repel = TRUE, palette = "ggsci::default_ucscgb" ) ``` ![Grouped pie charts showing cylinder distribution for automatic and manual transmission vehicles](reference/figures/README-ggpiestats2-1.png) Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: For more, also read the following vignette: ### `ggbarstats()` In case you are not a fan of pie charts (for very good reasons), you can alternatively use [`ggbarstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggbarstats.md) function which has a similar syntax—including support for one-sample goodness-of-fit tests. To study an interaction between two categorical variables: ``` r set.seed(123) library(ggplot2) ggbarstats( data = movies_long, x = mpaa, y = genre, title = "MPAA Ratings by Genre", xlab = "movie genre", legend.title = "MPAA rating", ggplot.component = list(ggplot2::scale_x_discrete(guide = ggplot2::guide_axis(n.dodge = 2))), palette = "RColorBrewer::Set2" ) ``` ![Stacked bar chart showing MPAA ratings distribution by movie genre with chi-squared test results](reference/figures/README-ggbarstats1-1.png) **Defaults** return\ ✅ descriptives (frequency + %s)\ ✅ inferential statistics\ ✅ effect size + CIs\ ✅ Goodness-of-fit tests\ ✅ Bayesian hypothesis-testing\ ✅ Bayesian estimation\ There is also a `grouped_` variant of this function that makes it easy to repeat the same operation across a **single** grouping variable. Following example is a case where the theoretical question is about proportions for different levels of a single nominal variable: ``` r set.seed(123) grouped_ggbarstats( data = mtcars, x = cyl, grouping.var = am, label.repel = TRUE, palette = "ggsci::default_ucscgb" ) ``` ![Grouped bar charts showing cylinder distribution for automatic and manual transmission vehicles](reference/figures/README-ggbarstats2-1.png) Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: For more, also read the following vignette: ### `ggcoefstats()` The function [`ggcoefstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggcoefstats.md) generates **dot-and-whisker plots** for regression models. The tidy data frames are prepared using [`parameters::model_parameters()`](https://easystats.github.io/parameters/reference/model_parameters.html). Additionally, if available, the model summary indices are also extracted from [`performance::model_performance()`](https://easystats.github.io/performance/reference/model_performance.html). ``` r set.seed(123) ## model mod <- stats::lm(formula = mpg ~ am * cyl, data = mtcars) ggcoefstats(mod) ``` ![Dot-and-whisker plot showing regression coefficients for mpg model with confidence intervals](reference/figures/README-ggcoefstats1-1.png) **Defaults** return\ ✅ inferential statistics\ ✅ estimate + CIs\ ✅ model summary (AIC and BIC)\ Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: For more, also read the following vignette: ### Extracting expressions and data frames with statistical details [ggstatsplot](https://www.indrapatil.com/ggstatsplot/) also offers a convenience function to extract data frames with statistical details that are used to create expressions displayed in [ggstatsplot](https://www.indrapatil.com/ggstatsplot/) plots. ``` r set.seed(123) p <- ggbetweenstats(mtcars, cyl, mpg) # extracting expression present in the subtitle extract_subtitle(p) #> list(italic("F")["Welch"](2, 18.03) == "31.62", italic(p) == #> "1.27e-06", widehat(omega["p"]^2) == "0.74", CI["95%"] ~ #> "[" * "0.53", "1.00" * "]", italic("n")["obs"] == "32") # extracting expression present in the caption extract_caption(p) #> list(log[e] * (BF["01"]) == "-14.92", widehat(italic(R^"2"))["Bayesian"]^"posterior" == #> "0.71", CI["95%"]^HDI ~ "[" * "0.57", "0.79" * "]", italic("r")["Cauchy"]^"JZS" == #> "0.71") # a list of tibbles containing statistical analysis summaries extract_stats(p) #> $subtitle_data #> # A tibble: 1 × 14 #> statistic df df.error p.value #> #> 1 31.6 2 18.0 0.00000127 #> method effectsize estimate #> #> 1 One-way analysis of means (not assuming equal variances) Omega2 0.744 #> conf.level conf.low conf.high conf.method conf.distribution n.obs expression #> #> 1 0.95 0.531 1 ncp F 32 #> #> $caption_data #> # A tibble: 6 × 17 #> term pd prior.distribution prior.location prior.scale bf10 #> #> 1 mu 1 cauchy 0 0.707 3008850. #> 2 cyl-4 1 cauchy 0 0.707 3008850. #> 3 cyl-6 0.780 cauchy 0 0.707 3008850. #> 4 cyl-8 1 cauchy 0 0.707 3008850. #> 5 sig2 1 cauchy 0 0.707 3008850. #> 6 g_cyl 1 cauchy 0 0.707 3008850. #> method log_e_bf10 effectsize estimate std.dev #> #> 1 Bayes factors for linear models 14.9 Bayesian R-squared 0.714 0.0503 #> 2 Bayes factors for linear models 14.9 Bayesian R-squared 0.714 0.0503 #> 3 Bayes factors for linear models 14.9 Bayesian R-squared 0.714 0.0503 #> 4 Bayes factors for linear models 14.9 Bayesian R-squared 0.714 0.0503 #> 5 Bayes factors for linear models 14.9 Bayesian R-squared 0.714 0.0503 #> 6 Bayes factors for linear models 14.9 Bayesian R-squared 0.714 0.0503 #> conf.level conf.low conf.high conf.method n.obs expression #> #> 1 0.95 0.574 0.788 HDI 32 #> 2 0.95 0.574 0.788 HDI 32 #> 3 0.95 0.574 0.788 HDI 32 #> 4 0.95 0.574 0.788 HDI 32 #> 5 0.95 0.574 0.788 HDI 32 #> 6 0.95 0.574 0.788 HDI 32 #> #> $pairwise_comparisons_data #> # A tibble: 3 × 9 #> group1 group2 statistic p.value alternative distribution p.adjust.method #> #> 1 4 6 -6.67 0.00110 two.sided q Holm #> 2 4 8 -10.7 0.0000140 two.sided q Holm #> 3 6 8 -7.48 0.000257 two.sided q Holm #> test expression #> #> 1 Games-Howell #> 2 Games-Howell #> 3 Games-Howell #> #> $descriptive_data #> NULL #> #> $one_sample_data #> NULL #> #> $tidy_data #> NULL #> #> $glance_data #> NULL #> #> attr(,"class") #> [1] "ggstatsplot_stats" "list" ``` Note that all of this analysis is carried out by [statsExpressions](https://www.indrapatil.com/statsExpressions/) package: ### Using `{ggstatsplot}` statistical details with custom plots Sometimes you may not like the default plots produced by [ggstatsplot](https://www.indrapatil.com/ggstatsplot/). In such cases, you can use other **custom** plots (from [ggplot2](https://ggplot2.tidyverse.org) or other plotting packages) and still use [ggstatsplot](https://www.indrapatil.com/ggstatsplot/) functions to display results from relevant statistical test. For example, in the following chunk, we will create our own plot using [ggplot2](https://ggplot2.tidyverse.org) package, and use [ggstatsplot](https://www.indrapatil.com/ggstatsplot/) function for extracting expression: ``` r ## loading the needed libraries set.seed(123) library(ggplot2) ## using `{ggstatsplot}` to get expression with statistical results stats_results <- ggbetweenstats(morley, Expt, Speed) |> extract_subtitle() ## creating a custom plot of our choosing ggplot(morley, aes(x = as.factor(Expt), y = Speed)) + geom_boxplot() + labs( title = "Michelson-Morley experiments", subtitle = stats_results, x = "Speed of light", y = "Experiment number" ) ``` ![Custom boxplot of Michelson-Morley experiment data with ggstatsplot-generated statistical subtitle](reference/figures/README-customplot-1.png) ## Summary of benefits of using `{ggstatsplot}` - No need to use scores of packages for statistical analysis (e.g., one to get stats, one to get effect sizes, another to get Bayes Factors, and yet another to get pairwise comparisons, etc.). - Minimal amount of code needed for all functions (typically only `data`, `x`, and `y`), which minimizes chances of error and makes for tidy scripts. - Conveniently toggle between statistical approaches. - Truly makes your figures worth a thousand words. - No need to copy-paste results to the text editor (MS-Word, e.g.). - Disembodied figures stand on their own and are easy to evaluate for the reader. - More breathing room for theoretical discussion and other text. - No need to worry about updating figures and statistical details separately. ## Misconceptions about `{ggstatsplot}` This package is… ❌ an alternative to learning [ggplot2](https://ggplot2.tidyverse.org)\ ✅ (The better you know [ggplot2](https://ggplot2.tidyverse.org), the more you can modify the defaults to your liking.) ❌ meant to be used in talks/presentations\ ✅ (Default plots can be too complicated for effectively communicating results in time-constrained presentation settings, e.g. conference talks.) ❌ the only game in town\ ✅ (GUI software alternatives: [JASP](https://jasp-stats.org/) and [jamovi](https://www.jamovi.org/)). ## Extensions In case you use the GUI software [`jamovi`](https://www.jamovi.org/), you can install a module called [`jjstatsplot`](https://github.com/sbalci/jjstatsplot), which is a wrapper around [ggstatsplot](https://www.indrapatil.com/ggstatsplot/). # Package index ## Hypothesis about group differences - [`ggbetweenstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggbetweenstats.md) : Box/Violin plots for between-subjects comparisons - [`ggwithinstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggwithinstats.md) : Box/Violin plots for repeated measures comparisons - [`gghistostats()`](https://www.indrapatil.com/ggstatsplot/reference/gghistostats.md) : Histogram for distribution of a numeric variable - [`ggdotplotstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggdotplotstats.md) : Dot plot/chart for labeled numeric data. ## Hypothesis about correlation - [`ggscatterstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggscatterstats.md) : Scatterplot with marginal distributions and statistical results - [`ggcorrmat()`](https://www.indrapatil.com/ggstatsplot/reference/ggcorrmat.md) : Visualization of a correlation matrix ## Hypothesis about composition of categorical variables - [`ggpiestats()`](https://www.indrapatil.com/ggstatsplot/reference/ggpiestats.md) : Pie charts with statistical tests - [`ggbarstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggbarstats.md) : Stacked bar charts with statistical tests ## Hypothesis about regression coefficients - [`ggcoefstats()`](https://www.indrapatil.com/ggstatsplot/reference/ggcoefstats.md) : Dot-and-whisker plots for regression analyses ## Grouped variants of primary functions Convenience functions to repeat analysis across a single grouping - [`grouped_ggbarstats()`](https://www.indrapatil.com/ggstatsplot/reference/grouped_ggbarstats.md) : Grouped bar charts with statistical tests - [`grouped_ggbetweenstats()`](https://www.indrapatil.com/ggstatsplot/reference/grouped_ggbetweenstats.md) : Violin plots for group or condition comparisons in between-subjects designs repeated across all levels of a grouping variable. - [`grouped_ggcorrmat()`](https://www.indrapatil.com/ggstatsplot/reference/grouped_ggcorrmat.md) : Visualization of a correlalogram (or correlation matrix) for all levels of a grouping variable - [`grouped_ggdotplotstats()`](https://www.indrapatil.com/ggstatsplot/reference/grouped_ggdotplotstats.md) : Grouped histograms for distribution of a labeled numeric variable - [`grouped_gghistostats()`](https://www.indrapatil.com/ggstatsplot/reference/grouped_gghistostats.md) : Grouped histograms for distribution of a numeric variable - [`grouped_ggpiestats()`](https://www.indrapatil.com/ggstatsplot/reference/grouped_ggpiestats.md) : Grouped pie charts with statistical tests - [`grouped_ggscatterstats()`](https://www.indrapatil.com/ggstatsplot/reference/grouped_ggscatterstats.md) : Scatterplot with marginal distributions for all levels of a grouping variable - [`grouped_ggwithinstats()`](https://www.indrapatil.com/ggstatsplot/reference/grouped_ggwithinstats.md) : Violin plots for group or condition comparisons in within-subjects designs repeated across all levels of a grouping variable. ## Helper functions Statistics and graphics-related helpers - [`extract_stats()`](https://www.indrapatil.com/ggstatsplot/reference/extract_stats.md) [`extract_subtitle()`](https://www.indrapatil.com/ggstatsplot/reference/extract_stats.md) [`extract_caption()`](https://www.indrapatil.com/ggstatsplot/reference/extract_stats.md) : Extracting data frames or expressions from [ggstatsplot](https://www.indrapatil.com/ggstatsplot/) plots - [`theme_ggstatsplot()`](https://www.indrapatil.com/ggstatsplot/reference/theme_ggstatsplot.md) : Default theme used in [ggstatsplot](https://www.indrapatil.com/ggstatsplot/) - [`combine_plots()`](https://www.indrapatil.com/ggstatsplot/reference/combine_plots.md) : Combining and arranging multiple plots in a grid ## Data Datasets included in the package. - [`movies_long`](https://www.indrapatil.com/ggstatsplot/reference/movies_long.md) : Movie information and user ratings from IMDB.com (long format). - [`Titanic_full`](https://www.indrapatil.com/ggstatsplot/reference/Titanic_full.md) : Titanic dataset. - [`iris_long`](https://www.indrapatil.com/ggstatsplot/reference/iris_long.md) : Edgar Anderson's Iris Data in long format. - [`bugs_long`](https://www.indrapatil.com/ggstatsplot/reference/bugs_long.md) : Tidy version of the "Bugs" dataset. # Articles ### Primary functions Details about primary functions and their `grouped_` variants - [ggbetweenstats](https://www.indrapatil.com/ggstatsplot/articles/web_only/ggbetweenstats.md): - [ggwithinstats](https://www.indrapatil.com/ggstatsplot/articles/web_only/ggwithinstats.md): - [gghistostats](https://www.indrapatil.com/ggstatsplot/articles/web_only/gghistostats.md): - [ggdotplotstats](https://www.indrapatil.com/ggstatsplot/articles/web_only/ggdotplotstats.md): - [ggcorrmat](https://www.indrapatil.com/ggstatsplot/articles/web_only/ggcorrmat.md): - [ggscatterstats](https://www.indrapatil.com/ggstatsplot/articles/web_only/ggscatterstats.md): - [ggbarstats](https://www.indrapatil.com/ggstatsplot/articles/web_only/ggbarstats.md): - [ggpiestats](https://www.indrapatil.com/ggstatsplot/articles/web_only/ggpiestats.md): - [ggcoefstats](https://www.indrapatil.com/ggstatsplot/articles/web_only/ggcoefstats.md): ### Miscellaneous Salad of various things - [Additional vignettes](https://www.indrapatil.com/ggstatsplot/articles/additional.md): - [Pairwise comparisons with \`{ggstatsplot}\`](https://www.indrapatil.com/ggstatsplot/articles/web_only/pairwise.md): - [Frequently Asked Questions (FAQ)](https://www.indrapatil.com/ggstatsplot/articles/web_only/faq.md): - [Using 'ggstatsplot' with the 'purrr' package](https://www.indrapatil.com/ggstatsplot/articles/web_only/purrr_examples.md): - [Graphic design and statistical reporting principles](https://www.indrapatil.com/ggstatsplot/articles/web_only/principles.md): - [Interpretation of Bayes Factors](https://www.indrapatil.com/ggstatsplot/articles/web_only/interpretation.md):