ℹ️ This is the R package “tidypolars”. The Python one is here: markfairbanks/tidypolars
Overview
tidypolars
provides a polars
backend for the tidyverse
. The aim of tidypolars
is to enable users to keep their existing tidyverse
code while using polars
in the background to benefit from large performance gains. The only thing that needs to change is the way data is imported in the R session.
See the “Getting started” vignette for a gentle introduction to tidypolars
.
Since most of the work is rewriting tidyverse
code into polars
syntax, tidypolars
and polars
have very similar performance.
Click to see a small benchmark
The main purpose of this benchmark is to show that polars
and tidypolars
are close and to give an idea of the performance. For more thorough, representative benchmarks about polars
, take a look at DuckDB benchmarks instead.
library(collapse, warn.conflicts = FALSE)
#> collapse 2.1.1, see ?`collapse-package` or ?`collapse-documentation`
library(dplyr, warn.conflicts = FALSE)
library(dtplyr)
library(polars)
library(tidypolars)
large_iris <- data.table::rbindlist(rep(list(iris), 100000))
large_iris_pl <- as_polars_lf(large_iris)
large_iris_dt <- lazy_dt(large_iris)
format(nrow(large_iris), big.mark = ",")
#> [1] "15,000,000"
bench::mark(
polars = {
large_iris_pl$
select(c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"))$
with_columns(
pl$when(
(pl$col("Petal.Length") / pl$col("Petal.Width") > 3)
)$then(pl$lit("long"))$
otherwise(pl$lit("large"))$
alias("petal_type")
)$
filter(pl$col("Sepal.Length")$is_between(4.5, 5.5))$
collect()
},
tidypolars = {
large_iris_pl |>
select(starts_with(c("Sep", "Pet"))) |>
mutate(
petal_type = ifelse((Petal.Length / Petal.Width) > 3, "long", "large")
) |>
filter(between(Sepal.Length, 4.5, 5.5)) |>
compute()
},
dplyr = {
large_iris |>
select(starts_with(c("Sep", "Pet"))) |>
mutate(
petal_type = ifelse((Petal.Length / Petal.Width) > 3, "long", "large")
) |>
filter(between(Sepal.Length, 4.5, 5.5))
},
dtplyr = {
large_iris_dt |>
select(starts_with(c("Sep", "Pet"))) |>
mutate(
petal_type = ifelse((Petal.Length / Petal.Width) > 3, "long", "large")
) |>
filter(between(Sepal.Length, 4.5, 5.5)) |>
as.data.frame()
},
collapse = {
large_iris |>
fselect(c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")) |>
fmutate(
petal_type = data.table::fifelse((Petal.Length / Petal.Width) > 3, "long", "large")
) |>
fsubset(Sepal.Length >= 4.5 & Sepal.Length <= 5.5)
},
check = FALSE,
iterations = 40
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 5 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 polars 103.47ms 121.17ms 7.30 2.29MB 0.183
#> 2 tidypolars 109.71ms 140.55ms 6.03 2.19MB 0.754
#> 3 dplyr 3.04s 3.25s 0.306 1.79GB 0.942
#> 4 dtplyr 781.41ms 945.32ms 1.03 1.72GB 2.49
#> 5 collapse 323.83ms 469.67ms 2.02 745.96MB 1.36
# NOTE: do NOT take the "mem_alloc" results into account.
# `bench::mark()` doesn't report the accurate memory usage for packages calling
# Rust code.
If you want to do your own benchmarks, please take a look at How to benchmark tidypolars first for some best practices.
Installation
tidypolars
is built on polars
, which is not available on CRAN. This means that tidypolars
also can’t be on CRAN. However, you can install it from R-universe.
Sys.setenv(NOT_CRAN = "true")
install.packages("tidypolars", repos = c("https://community.r-multiverse.org", 'https://cloud.r-project.org'))
The development version contains the latest improvements and bug fixes:
# install.packages("remotes")
remotes::install_github("etiennebacher/tidypolars")
Contributing
Did you find some bugs or some errors in the documentation? Do you want tidypolars
to support more functions?
Take a look at the contributing guide for instructions on bug report and pull requests.
Acknowledgements
The website theme was heavily inspired by Matthew Kay’s ggblend
package: https://mjskay.github.io/ggblend/.
The package hex logo was created by Hubert Hałun as part of the Appsilon Hex Contest.