GitHub - arunsrinivasan/dplyr_benchmark: Corrections to the "baseball" benchmark from dplyr vignette

This page was created in response to the benchmarks from a much earlier version of dplyr (v0.1), which (incorrectly) claimed to be faster than data.table in many aspects of data manipulation (due to oversights).

Because it was a response to that benchmark, we used the same dataset. And it's size is <10MB, which is very very small. It is hard to conclude the performance of tools designed for really large data with such small datasets.

Since then, there have been many releases and improvements from both data.table and dplyr. For the most recent version of benchmarks, on data sizes upto 100GB in RAM, visit the data.table wiki page. We compare data.table, dplyr and pandas, although only grouping for now.

We hope to benchmark other operations and add it to the wiki when we find more time.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
css		css
fonts		fonts
js		js
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

arunsrinivasan/dplyr_benchmark

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages