distributr

Tidy distributed grid search in R and Sun/Open Grid Engine

devtools::install_github("patr1ckm/distributr")

The basic function is grid_apply, which applies a function over a grid of its arguments expand.grid(...), returning results in a list. Function applications can be executed repeatedly and in parallel.

grid_apply

do.one <- function(n, mu, sd){ mean(rnorm(n, mu, sd)) }

sim <- grid_apply(do.one, n = c(50, 100, 500), mu = c(1,5), sd = c(1, 5, 10), 
              .reps=50, .mc.cores=5)

[[1]]
[1] 1.053669

[[2]]
[1] 1.244468

[[3]]
[1] 1.267939

[[4]]
[1] 1.157546

[[5]]
[1] 0.786027

The arguments to grid over must be scalar. Other arguments (such as data) can be passed as a list to .args.

tidy

A tidy method is provided that merges the list of results with the argument grid, putting the results in tidy form. This format is convenient for plotting and further data analysis. tidy works with lists of vectors, lists, and data frames.

The function gapply runs grid_apply followed by tidy.

res <- sim(tidy)
res <- gapply(do.one, n = c(50, 100, 500), mu = c(1,5), sd = c(1, 5, 10), 
              .reps=50, .mc.cores=5)

   n mu sd .rep     value
1 50  1  1    1 0.9476228
2 50  1  1    2 0.7545730
3 50  1  1    3 0.9154810
4 50  1  1    4 1.0704074
5 50  1  1    5 0.9840148
6 50  1  1    6 1.1933439

If results are of varying length, it can be helpful to stack them into key, value pairs with tidy(., stack=TRUE).

Warnings and Errors

gapply captures both warnings and errors. These can be accessed very simply:

err(sim)
warn(sim)

Sun/Open Grid Engine

A compute plan can be setup and executed using the Sun/Open Grid Engine scheduler. Rows of the argument grid are submitted to nodes, and replications are carried out in parallel via mclapply.

sim <- gapply(do.one, n = c(50, 100, 500), mu = c(1,5), sd = c(1, 5, 10), .eval=F)
sim <- setup(sim, .reps=500, .mc.cores = 5)
submit(sim)   
res <- tidy(collect())

The setup function asks for user confirmation if an existing argument grid would be overwritten.

Job Access, Adding jobs, Selecting a subset of jobs

Jobs can be added to the compute plan via add_jobs. A set of jobs can be selected from the argument grid using filter_jobs and the usual dplyr syntax to filter.

jobs(sim)                              # access jobs grid (argument grid)
add_jobs(sim, n=1000, mu=10, sd=50)    # add jobs to plan
filter_jobs(sim, n < 100, .mc.cores=5) # filter jobs as in dplyr
collect(filter="n < 100")         # collect results from jobs matching filter

More Information

More information is available on the wiki, for example, illustrating how chunking, random number control, and caching can all be done transparently via do.one. These slides give a general overview.

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
R		R
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
distributr.Rproj		distributr.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

distributr

grid_apply

tidy

Warnings and Errors

Sun/Open Grid Engine

Job Access, Adding jobs, Selecting a subset of jobs

More Information

About

Releases

Packages

Languages

License

patr1ckm/distributr

Folders and files

Latest commit

History

Repository files navigation

distributr

grid_apply

tidy

Warnings and Errors

Sun/Open Grid Engine

Job Access, Adding jobs, Selecting a subset of jobs

More Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages