Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend functionality of nafill to use 'fill' argument for all 'type's #3594

Closed
ben519 opened this issue May 24, 2019 · 10 comments · Fixed by #4491
Closed

Extend functionality of nafill to use 'fill' argument for all 'type's #3594

ben519 opened this issue May 24, 2019 · 10 comments · Fixed by #4491

Comments

@ben519
Copy link

ben519 commented May 24, 2019

Suppose I have this vector

x = c(NA,1,NA,NA,5,3,NA,NA)

and I want to fill NAs with the preceding non NA values. I can do this

nafill(x = c(NA,1,NA,NA,5,3,NA,NA), type = "locf")
[1] NA  1  1  1  5  3  3  3

Great, but sometimes I want to specify a fill value to catch the NA(s) at the front of the vector. I tried this which seemed obvious to me,

nafill(x = c(NA,1,NA,NA,5,3,NA,0), type = "locf", fill = -1)

but it didn't work and instead gave a warning, "argument 'fill' ignored, only make sense for type='const'".

My request is to extend the method so that 'fill' is applied to the front/back of the vector for types 'locf' and 'nocb' respectively. Thanks

Output of sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin18.5.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /usr/local/Cellar/openblas/0.3.6/lib/libopenblasp-r0.3.6.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.3

loaded via a namespace (and not attached):
[1] compiler_3.6.0 tools_3.6.0
@jangorecki jangorecki changed the title Request: Extend functionality of nafill to use 'fill' argument for all 'type's Extend functionality of nafill to use 'fill' argument for all 'type's May 25, 2019
@jangorecki
Copy link
Member

jangorecki commented May 25, 2019

Thanks. For now the you can use

nafill(nafill(x = c(NA,1,NA,NA,5,3,NA,0), type = "locf"), fill = -1)

@saraswatmks
Copy link
Contributor

@jangorecki if no one else is working on this, can I take it up ?

@MichaelChirico
Copy link
Member

@saraswatmks I just assigned you, go for it!

@jangorecki
Copy link
Member

jangorecki commented May 28, 2019

@saraswatmks note that nafill tests are in inst/tests/nafill.Rraw, so new tests should goes there. It can be useful to avoid merge conflicts and to easily test only this script test.data.table(script="inst/tests/nafill.Rraw")

@saraswatmks
Copy link
Contributor

saraswatmks commented May 28, 2019

@MichaelChirico thanks! I see this function is still in dev. How do I reproduce it on my local machine ? My local master is up to date with remote. If I do Clean and Rebuild, I get this error (I am kind of stuck on this):

==> R CMD INSTALL --preclean --no-multiarch --with-keep.source data.table

* installing to library ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library’
* installing *source* package ‘data.table’ ...
** libs
/usr/local/opt/llvm/bin/clang -fopenmp -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/usr/local/opt/gettext/include -I/usr/local/opt/llvm/include  -fopenmp -fPIC  -g -O3 -Wall -pedantic -std=gnu99 -mtune=native -pipe -c assign.c -o assign.o
In file included from assign.c:1:
In file included from ./data.table.h:1:
/Library/Frameworks/R.framework/Resources/include/R.h:55:11: fatal error: 'stdlib.h' file not found
# include <stdlib.h> /* Not used by R itself, but widely assumed in packages */
          ^~~~~~~~~~
1 error generated.
make: *** [assign.o] Error 1
ERROR: compilation failed for package ‘data.table’
* removing ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/data.table’
* restoring previous ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/data.table’

Exited with status 1.

@jangorecki
Copy link
Member

jangorecki commented May 29, 2019

@saraswatmks I found that when working with RStudio features like Clean and Rebuild it was actually resulting into more time wasted into debugging issues than it helped. Although for package where no compile code was present it was much more reliable. Anyway, I suggest to use cc() which is much faster, and AFAIR never introduced issues that would waste my time for debugging.
more info in https://github.com/Rdatatable/data.table/tree/master/.dev

@MichaelChirico
Copy link
Member

@saraswatmks regarding your installation error:

/Library/Frameworks/R.framework/Resources/include/R.h:55:11: fatal error: 'stdlib.h' file not found
# include <stdlib.h> /* Not used by R itself, but widely assumed in packages */

I just came across the same issue on my new laptop. It appears something in the installation of Developer Tools was mixed up. Found this comment on another repo:

catboost/catboost#137 (comment)

And it worked on my machine. Hope it can help.

@jangorecki
Copy link
Member

related issue #3700

@MichaelChirico
Copy link
Member

Just happened on a use case for this playing around with COVID data 😃

library(data.table)
URL = file.path(
  'https://raw.githubusercontent.com',
  'nytimes/covid-19-data/master/us-counties.csv'
)
covid = fread(URL, colClasses = c(date = 'IDate'), key = 'state,county,date')
covid[state == 'Pennsylvania', dcast(.SD, date ~ county, value.var = 'cases')][ , 1:5]
#           date Adams Allegheny Armstrong Beaver
#  1: 2020-03-06    NA        NA        NA     NA
#  2: 2020-03-07    NA        NA        NA     NA
#  3: 2020-03-08    NA        NA        NA     NA
#  4: 2020-03-09    NA        NA        NA     NA
#  5: 2020-03-10    NA        NA        NA     NA
#  6: 2020-03-11    NA        NA        NA     NA
#  7: 2020-03-12    NA        NA        NA     NA
#  8: 2020-03-13    NA        NA        NA     NA
#  9: 2020-03-14    NA         1        NA     NA
# 10: 2020-03-15    NA         3        NA     NA
# 11: 2020-03-16    NA         5        NA     NA
# 12: 2020-03-17    NA        10        NA      1
# 13: 2020-03-18     1        12        NA      2
# 14: 2020-03-19     2        18        NA      2
# 15: 2020-03-20     5        28        NA      3
# 16: 2020-03-21     5        31        NA      3
# 17: 2020-03-22     5        40        NA      3
# 18: 2020-03-23     6        48        NA      3
# 19: 2020-03-24     6        58         1      3
# 20: 2020-03-25     6        88         1      7
# 21: 2020-03-26     7       133         1     13
#           date Adams Allegheny Armstrong Beaver

I want to fill the initial missing values with 0 (which is correct), but would use LOCF to carry-forward most recent data in the event it's missing (not observed here...)

lapply(.SD, nafill, type = 'locf', fill = 0) seems natural enough.

@jangorecki
Copy link
Member

AFAIR nafill is vectorized, so no need to lapply .SD it. nafill(.SD) should be enough, and also parallel.

@jangorecki jangorecki assigned jangorecki and unassigned saraswatmks May 25, 2020
@jangorecki jangorecki linked a pull request May 25, 2020 that will close this issue
8 tasks
@mattdowle mattdowle added this to the 1.14.1 milestone Mar 3, 2021
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
MichaelChirico added a commit that referenced this issue Mar 4, 2024
* nafill rework for more robust coerce

* initial change for #4101

* nafill simple fill coerce

* nafill const works for locf and nocb as well, closes #3594

* fix tests for #4101

* coverage

* placeholder for nafill #3992

* use coerceAs in froll

* enable disabled test

* nafill retain names, tests for fill list

* coerceAs gets copyArg so now can return its input

* better control of verbose, and better find class for coerceAs

* proper verbose to int

* verbose changes coverage

* rm unused anymore

* more precise verbose arg check

* memrecycle escape warnings and simple verbose for numcol==0

* coerceAs does not emit extra warning anymore

* coerceAs verbose, more tests

* use older nanotime api

* update error msg

* coverage

* codecov of coerceAs

* catch all verbose

* Revert "initial change for #4101"

This reverts commit 1fa2069.

* Revert "fix tests for #4101"

This reverts commit cc2cc0e.

* use coerceAs in fcast.c

* restore actual fix

* ws

* incomplete merge

* vestigial bad merge

* minimize diff

* coerceAs throws warning for string->double, and errors on list

* comment

* example without warning

* bad NEWS merge

* warning is still needed

---------

Co-authored-by: jangorecki <[email protected]>
Co-authored-by: Michael Chirico <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants