tidypolars 0.15.0
Breaking changes
- For consistency with
dplyr,distinct()now only keeps the selected columns. To keep all columns, use.keep_all = TRUE(#227, @ppanko).
New features
New argument
mkdirin allsink_*()functions to recursively create the folder(s) specified in the path(s) to files (#236).New functions
partition_by_key()andpartition_by_max_size()that can be used in thepathargument ofsink_*()functions. Those enable writing a LazyFrame to several files as partitioned output. See more details in?sink_parquet()(#237).bind_cols_polars()now works with more than two LazyFrames (#244).Add partial support for
stringr::str_equal()(#228).Add support for
lubridatefunctionsrollbackward(),rollback(), androllforward()(#252).Support
stringr::fixed()in morestringrfunctions (#250).Add support for argument
.keep_allindistinct()(#227, @ppanko).
Bug fixes
Better error message in
group_by()for unsupported argument.drop(#230).Better error message in
group_by()when passing named expressions in....dplyrsupports those but it is more and more recommended to use the.by/byargument in individual functions rather than usinggroup_by()andungroup()(#238).Better error message in
count()when passing named expressions in...(#239).Fix bug in
join_where()when all common column names between two DataFrames are used in the join conditions (#254).Using
%in%withNAnow retains theNAin the data. Using%in% NAwill error (#256).Remove occasional deprecation message coming from Polars when using
%in%(#259, @ppanko).Better handling of functions prefixed with
<pkg>::(#261).Fix wrong behavior of
paste()andpaste0()withcollapse(#263).
tidypolars 0.14.1
-
tidypolarsrequirespolars>= 1.1.0 (#222).
tidypolars 0.14.0
-
tidypolarsrequirespolars>= 1.0.0. This release ofpolarscontains many breaking changes. Those should be invisible totidypolarsusers, with the exception of deprecation messages (see below). However, if your code contains user-defined functions that usepolarssyntax, you may need to revise those (#194).
Deprecations and breaking changes
-
The following arguments are deprecated and will be removed in a future version. The recommended replacement is indicated on the right of the arrow (#194):
- in
compute()andcollect():streaming->engine; - in
read_csv_polars()andscan_csv_polars():-
dtypes->schema_overrides -
reuse_downloaded-> no replacement
-
- in
read_ndjson_polarsandscan_ndjson_polars():-
reuse_downloaded-> no replacement
-
- in
read_ipc_polarsandscan_ipc_polars():-
memory_map-> no replacement
-
- in
write_csv_polars()andsink_csv():-
null_values->null_value -
quote->quote_char
-
- in
write_ndjson_polars():-
pretty-> no replacement -
row_oriented-> no replacement
-
- in
write_ipc_polars():-
future->compat_level
-
- in
fetch()is deprecated, usehead()beforecollect()instead (#194).group_keys()now returns atibbleand not adata.frameanymore (#194).lubridate::make_date(),lubridate::make_datetime(), andISOdatetime()now error if some components go over their expected range, e.g.month = 20orhour = 25. Before, those functions were returningNAin this situation (#194).summary()returns an additional row for the 50% percentile (#194).
New features
-
Added support for various
lubridatefunctions:-
force_tz()andwith_tz()(@atsyplenkov, #170); -
date()(@atsyplenkov, #181); -
today()andnow()(#183); -
weeks(),days(),hours(),minutes(),seconds(),milliseconds(),microseconds(),nanoseconds()(#184).
-
-
tidypolarscan now use expressions that contain non-translated functions if those expressions do not use columns from the data.Example:
agrep()is not a translated function so this used to error:Error in `filter()`: ! `tidypolars` doesn't know how to translate this function: `agrep()`.However, we see that
agrep("a", a)doesn’t use any column but instead an object in the environment so it can be evaluated without caring whethertidypolarsknows this function or not:shape: (1, 1) ┌─────┐ │ foo │ │ --- │ │ f64 │ ╞═════╡ │ 2.0 │ └─────┘Note that this is evaluated before running
polarsin the background so this expression can’t benefit frompolarsparallel evaluation for instance. Thanks @mgacc0 for the suggestion. Error messages due to untranslated functions now suggest opening an issue to ask for their translation (#197).
Add support for
%>%in expressions (#200).Add support for
dplyr::tally()(#203).count()andadd_count()now warn or error when argumentwtis used since it is not supported. The behavior depends on the global optiontidypolars_unknown_args(#204).tidypolarshas experimental support for fallback to R when a function is not internally translated to polars syntax. The default behavior is still to error, but the user can now setoptions(tidypolars_fallback_to_r = TRUE)to handle those unknown functions. See?tidypolars_optionsfor details on the drawbacks of this approach (#205).Large performance improvement when using selection helpers (such as
contains()) on data with many columns (#211).tidypolarsnow exports rules to be used withflirfor detecting deprecated functionsdescribe_plan()anddescribe_optimized_plan(). Those can be used in your project by following this article. Note that this requiresflir0.5.0.9000 or higher (#214).
tidypolars 0.13.0
New features
Added support for
stringr::str_replace_na()(#153).-
Better checks for unknown and unsupported arguments in
compute(),collect(),*_join(),pivot_*(),sink_*(),slice_sample()anduncount()(#158, thanks @fkohrt for the report). Now, when those functions receive:- an argument that exists in the
tidyverseimplementation but not supported bytidypolars, they warn the user. This default behaviour can be changed to error instead withoptions(tidypolars_unknown_args = "error"). - an argument that doesn’t exist at all, they error.
- an argument that exists in the
Add support for argument
explicitintidyr::complete().Add option to keep track of filenames in
scan_csv_polars()(#171, @ginolhac).Add partial support for
seq()(argumentlength.outis not supported) andseq_len().complete()now accepts named elements, e.g.complete(df, group, value = 1:4)(#176).-
Add support for several
lubridatefunctions:-
am(),pm(),leap_year(),days_in_month()(#178);
-
Bug fixes
Fix edge cases in the
tidypolarsimplementation ofstringr::str_sub()andsubstr()compared to their original implementation (#159).arrange()now placesNAvalues last, likedplyr.
tidypolars 0.12.0
tidypolars requires polars >= 0.21.0.
Breaking changes
-
summarize()now drops the last group of the output by default (for consistency withdplyr). Previously it kept the same groups as in the input data (#149).
New features
Add support for argument
.groupsinsummarize(). Value"rowwise"is not supported for now (#149).Added support for
dplyr::lead(). Indplyr::lead()anddplyr::lag(), the argumentsdefaultandorder_byare now supported (#151).
tidypolars 0.11.0
tidypolars requires polars >= 0.20.0.
Breaking changes
arrange()now errors with unknown variable names (likedplyr::arrange()). Previously, unknown variables were silently ignored. Using expressions (likea + b) is now accepted (#144).The parameter
inherit_optimizationis removed from allsink_*()functions.
New features
The power operators
^and**now work.New function
sink_ndjson()to write the results of a lazy query to a NDJSON file without collecting it in memory.inner_join()now accepts inequality joins in thebyargument, including the following helpers:between(),overlaps(),within()(#148).
Bug fixes
Using an external object in
case_when(),ifelse()andifelse()now works.str_sub()doesn’t error anymore whenstartis positive andendis negative.read_*_polars()functions used to return a standarddata.frameby mistake. They now return a Polars DataFrame.Using
[for subsetting in expressions now works. Thanks @ginolhac for the report (#141).bind_cols_polars()andbind_rows_polars()now error (as expected before) if elements are a mix of Polars DataFrames and LazyFrames.
tidypolars 0.10.1
Bug fixes
- Do not error when handling columns with datatype
Null. Note that converting those columns to R withas.data.frame(),as_tibble(), orcollect()is still an issue as ofpolars0.19.1.
tidypolars 0.10.0
tidypolars requires polars >= 0.19.1.
Breaking changes and deprecations
describe()is deprecated as of tidypolars 0.10.0 and will be removed in a future update. Usesummary()with the same arguments instead (#127).describe_plan()anddescribe_optimized_plan()are deprecated as of tidypolars 0.10.0 and will be removed in a future update. Useexplain()withoptimized = TRUE/FALSEinstead (#128).In
sink_parquet()andsink_csv(), all arguments except for.dataandpathmust be named (#136).
New features
-
Add support for more functions:
- from package
base:substr().
- from package
Better error message when a function can come from several packages but only one version is translated (#130).
row_number()now works without argument (#131).-
New functions to import data as Polars DataFrames and LazyFrames (#136):
-
read_<format>_polars()to import data as a Polars DataFrame; -
scan_<format>_polars()to import data as a Polars LazyFrame; -
<format>can be “csv”, “ipc”, “json”, “parquet”.
Those can replace functions from
polars. For example,polars::pl$read_parquet(...)can be replaced byread_parquet_polars(...). -
New functions to write Polars DataFrames to external files:
write_<format>_polars()where<format>can be “csv”, “ipc”, “json”, “ndjson”, “parquet” (#136).New function
sink_ipc()that is similar tosink_parquet()andsink_csv()but for IPC files (#136).across()now throws a better error message when the user passes an external list to.fns. This works withdplyrbut cannot work withtidypolars(#135).Added support for argument
.addingroup_by().
Bug fixes
stringr::str_sub()now works when bothstartandendare negative.Fixed a bug in
str_sub()whenstartwas greater than 1.stringr::str_starts()andstringr::str_ends()now work with a regex.fill()doesn’t error anymore when...is empty. Instead, it returns the input data.unite()now provides a proper error message whencolis missing.unite()doesn’t error anymore when...is empty. Instead, it uses all variables in the dataset.-
filter(),mutate()andsummarize()now work when using a column from another data.frame, e.g. replace_na()no longer converts the column to the datatype of the replacement, e.g.data |> replace_na("a")will error if the input data is numeric.n_distinct()now correctly applies thena.rmargument when several columns are passed as input (#137).
tidypolars 0.9.0
tidypolars requires polars >= 0.18.0.
New features
-
Add support for several functions:
from package
base:%%and%/%.from package
dplyr:dense_rank(),row_number().from package
lubridate:wday().
Better handling of missing values to match
Rbehavior. In the following functions, if there is at least one missing value andna.rm = FALSE(the default), then the output will beNA:max(),mean(),median(),min(),sd(),sum(),var()(#120).New argument
cluster_with_columnsincollect(),compute(), andfetch().Add a global option
tidypolars_unknown_argsto control what happens whentidypolarsdoesn’t know how to handle an argument in a function. The default is to warn and the only other accepted value is"error".
Bug fixes
-
count()andadd_count()no longer overwrite a variable namednif the argumentnameis unspecified.
tidypolars 0.8.0
tidypolars requires polars >= 0.17.0.
Breaking changes
As announced in
tidypolars0.7.0, the behavior ofcollect()has changed. It now returns a standard Rdata.frameand not a PolarsDataFrameanymore. Replacecollect()bycompute()(with the same arguments) to keep the old behavior.In
bind_rows_polars(), if.idis passed, the resulting column now is of type character instead of integer.
New features
-
Add support for several functions:
from package
base:all(),any(),diff(),ISOdatetime(),length(),rev(),unique().from package
dplyr:consecutive_id(),min_rank(),na_if(),n_distinct(),nth().from package
lubridate:make_datetime().from package
stringr:str_dup(),str_split(),str_split_i(),str_trunc().from package
tidyr:replace_na()(the data.frame method was already translated but not the vector one that can be used inmutate()for example).
It is now possible to use explicit namespaces (such as
dplyr::first()instead offirst()) inmutate(),summarize()andfilter()(#114).In
bind_rows_polars(), if all elements are named and.idis specified, the.idcolumn will use the names of the elements (#116).Add support for argument
na_matchesin all join functions (exceptcross_join()that doesn’t need it) (#109).
Bug fixes
Local variables in custom functions could not be used in tidypolars functions (reported in a blog post of Art Steinmetz). This is now fixed.
across()now works when.colscontains only one variable and.fnscontains only one function.-
In
across(), the.colsargument now takes into account variables created in the samemutate()orsummarize()call beforeacross().as_polars_df(mtcars) |> head(n = 3) |> mutate( foo = 1, across(.cols = contains("oo"), \(x) x - 1) ) shape: (3, 12) ┌──────┬─────┬───────┬───────┬───┬─────┬──────┬──────┬─────┐ │ mpg ┆ cyl ┆ disp ┆ hp ┆ … ┆ am ┆ gear ┆ carb ┆ foo │ │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │ │ f64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞══════╪═════╪═══════╪═══════╪═══╪═════╪══════╪══════╪═════╡ │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 1.0 ┆ 4.0 ┆ 4.0 ┆ 0.0 │ │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 1.0 ┆ 4.0 ┆ 4.0 ┆ 0.0 │ │ 22.8 ┆ 4.0 ┆ 108.0 ┆ 93.0 ┆ … ┆ 1.0 ┆ 4.0 ┆ 1.0 ┆ 0.0 │ └──────┴─────┴───────┴───────┴───┴─────┴──────┴──────┴─────┘Note that the
where()function is not supported here. For example:as_polars_df(mtcars) |> mutate( foo = 1, across(.cols = where(is.numeric), \(x) x - 1) )will not return 0 for the variable
foo. A warning is emitted about this behavior. Better handling of negative values in
c()when called inmutate()andsummarize().
tidypolars 0.7.0
tidypolars requires polars >= 0.16.0.
Breaking changes and deprecations
as_polars()is now removed. It was deprecated in 0.6.0. Useas_polars_df()oras_polars_lf()instead.to_r()is now removed. It was deprecated in 0.6.0. Useas.data.frame()oras_tibble()instead.For consistency with
dplyr, the behavior ofcollect()will change in 0.8.0 as it will perform the lazy query and convert the result to a standarddata.frame. For now,collect()only throws a warning about this future change. It is recommended to usecompute()to only perform the query and get a Polars DataFrame as output (#101).
New features
-
Several improvements and changes for
pivot_wider()(#95):-
names_fromcan now takes several variables; - add support for
id_colsandnames_glue; - default value of
names_sepnow is_, for consistency withtidyr; - fix documentation as
pivot_wider()doesn’t work on LazyFrame.
-
Add support for
stringr::regex(). Note that only the argumentignore_caseis supported for now (#97).Add support for several
lubridatefunctions:dweeks(),ddays(),dhours(),dminutes(),dseconds(),dmilliseconds(),make_date()(#107).When a
polarsfunction called internally fails, the original error message is now displayed.Add support for
group_split()(forDataFrameonly).Add support for argument
relationshipinleft_join(),right_join(),full_join()andinner_join()(#106).
tidypolars 0.6.0
tidypolars requires polars >= 0.15.0.
Breaking changes and deprecations
as_polars()is deprecated and will be removed in 0.7.0. Useas_polars_lf()oras_polars_df()instead.as_polars()doesn’t have an argumentwith_string_cacheanymore. When set toTRUE, this enabled the string cache globally, which could lead to undesirable side effects.to_r()is deprecated and will be removed in 0.7.0. Useas.data.frame()oras_tibble()instead. This used to silently return aLazyFrameif the input wasLazyFrame. It now automatically collects theLazyFrame(#88).
New features
Add support for
group_vars()andgroup_keys()(#81).Experimental support of
rowwise(). For now, this is limited to a few functions:mean(),median(),min(),max(),sum(),all(),any().rowwise()andgroup_by()cannot be used at the same time (#40).All functions that return a polars
Data/LazyFramenow add the class"tidypolars"to the output (#86).Support
which.min(),which.max(),dplyr::n().Support
.data[[and.env[[in addition to.data$and.env$. Better error messages when the objects specified in.dataor.envdon’t exist.
Bug fixes
-
pull()now errors whenvaris of length > 1.
tidypolars 0.5.0
tidypolars requires polars >= 0.12.0.
Breaking changes
across()now errors if the argument.colsis not provided (either named or unnamed). This behavior was deprecated indplyr1.1.0.It is no longer possible to use
!inarrange()to sort by decreasing order, for compatibility withdplyr::arrange(). Use-ordesc()instead.
New features
summarize()now works on ungrouped data and returns a 1-row output.It is now possible to use
desc(x1)inarrange()to sort in decreasing order ofx1(this is equivalent to-x1).Add support for argument
names_prefixinpivot_longer().Add support for arguments
names_prefixandnames_sepinpivot_wider().Add support for
tidyr::uncount().All
*_join()functions now work whenbyis a specification created bydplyr::join_by(). Notice that this is limited to equality joins for now.You can now use the “embrace” operator
{{ }}to pass unquoted column names (among other things) as arguments of custom functions. See the “Programming with dplyr” vignette for some examples.bind_cols_polars()now works with twoLazyFrames, but not more.Add support for argument
.name_repairinbind_cols_polars()(#74).Support for
.env$and.data$pronouns in expressions offilter(),mutate()andsummarize().Support named vector in the argument
patternofstr_replace_all(), where names are patterns and values are replacements.Using
%in%for factor variables doesn’t require enabling the string cache anymore.
Bug fixes
summarize()no longer errors whenacross(everything(), ...)is used with.by.All
*_join()functions no longer error when a named vector is provided in the argumentby.Expressions with values only are not named “literal” anymore.
tidypolars 0.4.0
tidypolars requires polars >= 0.11.0.
Breaking changes
- It is no longer possible to pass a list in
rename().
New features
The argument
with_string_cacheinas_polars()now enables the string cache globally if set toTRUE(#54).Better error message in
filter()when comparing factors to strings while the string cache is disabled.Basic support for
strptime(). It is possible to usestrptime(*, strict = FALSE)to not error when the parsing of some characters fails.New argument
.byinfilter(),mutate(), andsummarize(), and new argumentbyin theslice_*()functions. This allows to do operations on groups without usinggroup_by()andungroup(). See thedplyrvignette for more information (#59).rename()now accepts unquoted names both old and new names.Support fixed regexes in
str_detect()(usingfixed()) and ingrepl()(usingfixed = TRUE).
Bug fixes
Improve robustness of sequential expressions in
mutate()andsummarize()(i.e expressions that should be run one after the other because they depend on variables created in the same call) (#58).relocate()now works correctly when.after = last_col().All functions that work on grouped data now correctly restore the groups structure (#62).
Misc
Error messages coming from
mutate(),summarize(), andfilter()now give the right function call.Faster tidy selection (#61).
tidypolars 0.3.0
tidypolars requires polars >= 0.10.0.
Breaking changes
All functions starting with
pl_have been removed to the benefit of the S3 methods. For example,pl_distinct()doesn’t exist anymore so the only way to use it is to loaddplyrand to usedistinct()on a Polars DataFrame or LazyFrame. This is to avoid confusion about compatibility withdplyrandtidyr. See #49 for a more detailed explanation.pl_bind_rows()andpl_bind_cols()are renamedbind_rows_polars()andbind_cols_polars()respectively. This is becausebind_rows()andbind_cols()are not S3 methods (this might change in future versions ofdplyr).
New features
New function
duplicated_rows()that is the opposite ofdistinct()(#50).New argument
.idinbind_rows_polars().bind_rows_polars()can now bind Data/LazyFrames that don’t have the same schema. Columns will be upcast to common types if necessary. Unknown columns will be filled withNA.
Bug fixes
-
complete()now works correctly on grouped data.
tidypolars 0.2.0
tidypolars requires polars >= 0.9.0.
New features
Rename
pl_fetch()tofetch().New functions supported:
describe(),sink_csv(),slice_sample().New argument
fillinpl_complete().Support
stringr::str_to_title()andtools::toTitleCase().Support
stringr::fixed()to use literal strings.Support replacements with captured groups like
\\1instringr::str_replace()andstringr::str_replace_all().
Bug fixes
-
sink_parquet()didn’t use the user inputs (apart from thepath).
tidypolars 0.1.0
New features
Support
as.numeric(),as.character(),as.logical(),grepl(), andpaste()in expressions inpl_filter(),pl_mutate()andpl_summarize().Support
sink_parquet()(#38).Support for additional
stringrfunctions:str_detect(),str_extract_all(),str_pad(),str_squish(),str_trim(),word()(some arguments or corner cases are not supported yet).Add all optimization parameters in
collect().
