Skip to content

arunsrinivasan/dplyr_benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This page was created in response to the benchmarks from a much earlier version of dplyr (v0.1), which (incorrectly) claimed to be faster than data.table in many aspects of data manipulation (due to oversights).

Because it was a response to that benchmark, we used the same dataset. And it's size is <10MB, which is very very small. It is hard to conclude the performance of tools designed for really large data with such small datasets.

Since then, there have been many releases and improvements from both data.table and dplyr. For the most recent version of benchmarks, on data sizes upto 100GB in RAM, visit the data.table wiki page. We compare data.table, dplyr and pandas, although only grouping for now.

We hope to benchmark other operations and add it to the wiki when we find more time.

About

Corrections to the "baseball" benchmark from dplyr vignette

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published