Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add spark to lambda dockerfile #2480

Merged
merged 13 commits into from
Apr 4, 2022

Conversation

achals
Copy link
Member

@achals achals commented Apr 2, 2022

Signed-off-by: Achal Shah [email protected]

What this PR does / why we need it:

There were two issues with the AWS Lambda feature server tests:

  • The image would fail to start because of missing pyspark dependencies:
[ERROR] Runtime.ImportModuleError: Unable to import module 'app': No module named 'pyspark' Traceback (most recent call last):

This was temporarily fixed by adding pyspark to the image used for Lambda. Note that this should be removed as soon as possible since the image size doubled as a result of adding this dependency.

  • After fixing the previous issue, we'd see this kind of error:
Exception: Failed to get online features from remote feature server ***'detail': 'Failed to parse entities field: from_json_object() takes 3 positional arguments but 4 were given.'

This was traced to an unpinned version of protobuf - 3.20.0 changed the API in the proto_json module, which we were patching in the feature server. We've temporarily constrained the upper bound of protobuf used but should follow up with an upgrade to the version of protobuf along with a fix to the patched version.

Which issue(s) this PR fixes:

Fixes #

@achals achals requested a review from a team as a code owner April 2, 2022 06:31
@achals achals requested review from mavysavydav and removed request for a team April 2, 2022 06:31
@codecov-commenter
Copy link

codecov-commenter commented Apr 2, 2022

Codecov Report

Merging #2480 (dfe125e) into master (37971a4) will increase coverage by 25.93%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           master    #2480       +/-   ##
===========================================
+ Coverage   58.62%   84.55%   +25.93%     
===========================================
  Files         126      127        +1     
  Lines       10850    10913       +63     
===========================================
+ Hits         6361     9228     +2867     
+ Misses       4489     1685     -2804     
Flag Coverage Δ
integrationtests 74.78% <100.00%> (?)
unittests 58.61% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/proto_json.py 92.42% <ø> (ø)
sdk/python/feast/infra/offline_stores/bigquery.py 87.08% <100.00%> (+55.19%) ⬆️
...ython/feast/embedded_go/online_features_service.py 95.08% <0.00%> (ø)
sdk/python/feast/data_source.py 67.87% <0.00%> (+1.35%) ⬆️
sdk/python/feast/infra/online_stores/sqlite.py 95.23% <0.00%> (+1.58%) ⬆️
sdk/python/feast/errors.py 70.05% <0.00%> (+2.99%) ⬆️
sdk/python/feast/infra/provider.py 89.68% <0.00%> (+4.76%) ⬆️
sdk/python/feast/feature.py 90.24% <0.00%> (+4.87%) ⬆️
sdk/python/feast/infra/infra_object.py 79.71% <0.00%> (+5.79%) ⬆️
...k/python/feast/infra/offline_stores/file_source.py 89.52% <0.00%> (+6.66%) ⬆️
... and 65 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 37971a4...dfe125e. Read the comment docs.

achals and others added 2 commits April 2, 2022 00:25
Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Felix Wang <[email protected]>
@felixwang9817 felixwang9817 changed the title fix: add spark to lambda dockerfile fix: Add spark to lambda dockerfile Apr 2, 2022
Signed-off-by: Felix Wang <[email protected]>
felixwang9817 and others added 3 commits April 2, 2022 15:07
Signed-off-by: Felix Wang <[email protected]>
Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Achal Shah <[email protected]>
) -> None:
print("going to convert message")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove print

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@@ -52,7 +52,7 @@
"mmh3",
"pandas>=1.0.0",
"pandavro==1.5.*",
"protobuf>=3.10",
"protobuf>=3.10,<3.20",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we want to also update py3.7-requirements.txt and other files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah good point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just read through what the make lock-python-*dependencies command does and existing requirements file - I don't think any changes are needed here.

Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Achal Shah <[email protected]>
achals added 3 commits April 4, 2022 10:54
Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Achal Shah <[email protected]>
Copy link
Collaborator

@felixwang9817 felixwang9817 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, felixwang9817

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [achals,felixwang9817]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 514666f into feast-dev:master Apr 4, 2022
@achals achals deleted the achal/spark-in-lambda branch April 4, 2022 20:04
felixwang9817 added a commit that referenced this pull request Apr 6, 2022
* add spark to lambda dockerfile

Signed-off-by: Achal Shah <[email protected]>

* add *args

Signed-off-by: Achal Shah <[email protected]>

* Add *args

Signed-off-by: Felix Wang <[email protected]>

* Pin protobuf==3.19.4

Signed-off-by: Felix Wang <[email protected]>

* Remove *args

Signed-off-by: Felix Wang <[email protected]>

* Add a range

Signed-off-by: Achal Shah <[email protected]>

* Add a todo

Signed-off-by: Achal Shah <[email protected]>

* cleanup prints

Signed-off-by: Achal Shah <[email protected]>

* lock deps

Signed-off-by: Achal Shah <[email protected]>

* lock deps correctly

Signed-off-by: Achal Shah <[email protected]>

* fix lint

Signed-off-by: Achal Shah <[email protected]>

* fix lint take 2

Signed-off-by: Achal Shah <[email protected]>

* Undo general updates

Signed-off-by: Achal Shah <[email protected]>

Co-authored-by: Felix Wang <[email protected]>
felixwang9817 pushed a commit that referenced this pull request Apr 6, 2022
## [0.19.4](v0.19.3...v0.19.4) (2022-04-06)

### Bug Fixes

* Add spark to lambda dockerfile ([#2480](#2480)) ([f7911b7](f7911b7))
* Don't prevent apply from running given duplicate empty names in data sources. Also fix repeated apply of Spark data source. ([#2415](#2415)) ([9baba23](9baba23))
* Fix DataSource constructor to unbreak custom data sources ([#2492](#2492)) ([597c543](597c543))
felixwang9817 added a commit that referenced this pull request Apr 6, 2022
* add spark to lambda dockerfile

Signed-off-by: Achal Shah <[email protected]>

* add *args

Signed-off-by: Achal Shah <[email protected]>

* Add *args

Signed-off-by: Felix Wang <[email protected]>

* Pin protobuf==3.19.4

Signed-off-by: Felix Wang <[email protected]>

* Remove *args

Signed-off-by: Felix Wang <[email protected]>

* Add a range

Signed-off-by: Achal Shah <[email protected]>

* Add a todo

Signed-off-by: Achal Shah <[email protected]>

* cleanup prints

Signed-off-by: Achal Shah <[email protected]>

* lock deps

Signed-off-by: Achal Shah <[email protected]>

* lock deps correctly

Signed-off-by: Achal Shah <[email protected]>

* fix lint

Signed-off-by: Achal Shah <[email protected]>

* fix lint take 2

Signed-off-by: Achal Shah <[email protected]>

* Undo general updates

Signed-off-by: Achal Shah <[email protected]>

Co-authored-by: Felix Wang <[email protected]>
felixwang9817 pushed a commit that referenced this pull request Apr 6, 2022
## [0.19.4](v0.19.3...v0.19.4) (2022-04-06)

### Bug Fixes

* Add spark to lambda dockerfile ([#2480](#2480)) ([ba22c28](ba22c28))
* Don't prevent apply from running given duplicate empty names in data sources. Also fix repeated apply of Spark data source. ([#2415](#2415)) ([88e01a2](88e01a2))
* Fix DataSource constructor to unbreak custom data sources ([#2492](#2492)) ([2115bd0](2115bd0))
achals pushed a commit that referenced this pull request Apr 14, 2022
# [0.20.0](v0.19.0...v0.20.0) (2022-04-14)

### Bug Fixes

* Add inlined data sources to the top level registry ([#2456](#2456)) ([356788a](356788a))
* Add new value types to types.ts for web ui ([#2463](#2463)) ([ad5694e](ad5694e))
* Add PushSource proto and Python class ([#2428](#2428)) ([9a4bd63](9a4bd63))
* Add spark to lambda dockerfile ([#2480](#2480)) ([514666f](514666f))
* Added private_key auth for Snowflake ([#2508](#2508)) ([c42c9b0](c42c9b0))
* Added Redshift and Spark typecheck to data_source event_timestamp_col inference ([#2389](#2389)) ([04dea73](04dea73))
* Building of go extension fails ([#2448](#2448)) ([7d1efd5](7d1efd5))
* Bump the number of versions bumps expected to 27 ([#2549](#2549)) ([ecc9938](ecc9938))
* Create __init__ files for the proto-generated python dirs ([#2410](#2410)) ([e17028d](e17028d))
* Don't prevent apply from running given duplicate empty names in data sources. Also fix repeated apply of Spark data source. ([#2415](#2415)) ([b95f441](b95f441))
* Dynamodb deduplicate batch write request by partition keys ([#2515](#2515)) ([70d4a13](70d4a13))
* Ensure that __init__ files exist in proto dirs ([#2433](#2433)) ([9b94f7b](9b94f7b))
* Fix DataSource constructor to unbreak custom data sources ([#2492](#2492)) ([712653e](712653e))
* Fix default feast apply path without any extras ([#2373](#2373)) ([6ba7fc7](6ba7fc7))
* Fix definitions.py with new definition ([#2541](#2541)) ([eefc34a](eefc34a))
* Fix entity row to use join key instead of name ([#2521](#2521)) ([c22fa2c](c22fa2c))
* Fix Java Master ([#2499](#2499)) ([e083458](e083458))
* Fix registry proto ([#2435](#2435)) ([ea6a9b2](ea6a9b2))
* Fix some inconsistencies in the docs and comments in the code ([#2444](#2444)) ([ad008bf](ad008bf))
* Fix spark docs ([#2382](#2382)) ([d4a606a](d4a606a))
* Fix Spark template to work correctly on feast init -t spark ([#2393](#2393)) ([ae133fd](ae133fd))
* Fix the feature repo fixture used by java tests  ([#2469](#2469)) ([32e925e](32e925e))
* Fix unhashable Snowflake and Redshift sources ([cd8f1c9](cd8f1c9))
* Fixed bug in passing config file params to snowflake python connector ([#2503](#2503)) ([34f2b59](34f2b59))
* Fixing Spark template to include source name ([#2381](#2381)) ([a985f1d](a985f1d))
* Make name a keyword arg for the Entity class ([#2467](#2467)) ([43847de](43847de))
* Making a name for data sources not a breaking change ([#2379](#2379)) ([71d7ae2](71d7ae2))
* Minor link fix in `CONTRIBUTING.md` ([#2481](#2481)) ([2917e27](2917e27))
* Preserve ordering of features in _get_column_names ([#2457](#2457)) ([495b435](495b435))
* Relax click python requirement to >=7 ([#2450](#2450)) ([f202f92](f202f92))
* Remove date partition column field from datasources that don't s… ([#2478](#2478)) ([ce35835](ce35835))
* Remove docker step from unit test workflow ([#2535](#2535)) ([6f22f22](6f22f22))
* Remove spark from the AWS Lambda dockerfile ([#2498](#2498)) ([6abae16](6abae16))
* Request data api update ([#2488](#2488)) ([0c9e5b7](0c9e5b7))
* Schema update ([#2509](#2509)) ([cf7bbc2](cf7bbc2))
* Simplify DataSource.from_proto logic ([#2424](#2424)) ([6bda4d2](6bda4d2))
* Snowflake api update ([#2487](#2487)) ([1181a9e](1181a9e))
* Support passing batch source to streaming sources for backfills ([#2523](#2523)) ([90db1d1](90db1d1))
* Timestamp update ([#2486](#2486)) ([bf23111](bf23111))
* Typos in Feast UI error message ([#2432](#2432)) ([e14369d](e14369d))
* Update feature view APIs to prefer keyword args ([#2472](#2472)) ([7c19cf7](7c19cf7))
* Update file api ([#2470](#2470)) ([83a11c6](83a11c6))
* Update Makefile to cd into python dir before running commands ([#2437](#2437)) ([ca32155](ca32155))
* Update redshift api ([#2479](#2479)) ([4fa73a9](4fa73a9))
* Update some fields optional in UI parser ([#2380](#2380)) ([cff7ac3](cff7ac3))
* Use a single version of jackson libraries and upgrade to 2.12.6.1 ([#2473](#2473)) ([5be1cc6](5be1cc6))
* Use dateutil parser to parse materialization times ([#2464](#2464)) ([6c55e49](6c55e49))
* Use the correct dockerhub image tag when building feature servers ([#2372](#2372)) ([0d62c1d](0d62c1d))

### Features

* Add `/write-to-online-store` method to the python feature server ([#2423](#2423)) ([d2fb048](d2fb048))
* Add description, tags, owner fields to all feature view classes ([#2440](#2440)) ([ed5e928](ed5e928))
* Add DQM Logging on GRPC Server with FileLogStorage for Testing ([#2403](#2403)) ([57a97d8](57a97d8))
* Add Feast types in preparation for changing type system ([#2475](#2475)) ([4864252](4864252))
* Add Field class ([#2500](#2500)) ([1279612](1279612))
* Add support for DynamoDB online_read in batches ([#2371](#2371)) ([702ec49](702ec49))
* Add Support for DynamodbOnlineStoreConfig endpoint_url parameter ([#2485](#2485)) ([7b863d1](7b863d1))
* Add templating for dynamodb table name ([#2394](#2394)) ([f591088](f591088))
* Allow local feature server to use Go feature server if enabled ([#2538](#2538)) ([a2ef375](a2ef375))
* Allow using entity's join_key in get_online_features ([#2420](#2420)) ([068c765](068c765))
* Data Source Api Update ([#2468](#2468)) ([6b96b21](6b96b21))
* Go server ([#2339](#2339)) ([d12e7ef](d12e7ef)), closes [#2354](#2354) [#2361](#2361) [#2332](#2332) [#2356](#2356) [#2363](#2363) [#2349](#2349) [#2355](#2355) [#2336](#2336) [#2361](#2361) [#2363](#2363) [#2344](#2344) [#2354](#2354) [#2347](#2347) [#2350](#2350) [#2356](#2356) [#2355](#2355) [#2349](#2349) [#2352](#2352) [#2341](#2341) [#2336](#2336) [#2373](#2373) [#2315](#2315) [#2372](#2372) [#2332](#2332) [#2349](#2349) [#2336](#2336) [#2361](#2361) [#2363](#2363) [#2344](#2344) [#2354](#2354) [#2347](#2347) [#2350](#2350) [#2356](#2356) [#2355](#2355) [#2349](#2349) [#2352](#2352) [#2341](#2341) [#2336](#2336) [#2373](#2373) [#2379](#2379) [#2380](#2380) [#2382](#2382) [#2364](#2364) [#2366](#2366) [#2386](#2386)
* Graduate write_to_online_store out of experimental status ([#2426](#2426)) ([e7dd4b7](e7dd4b7))
* Make feast PEP 561 compliant ([#2405](#2405)) ([3c41f94](3c41f94)), closes [#2420](#2420) [#2418](#2418) [#2425](#2425) [#2426](#2426) [#2427](#2427) [#2431](#2431) [#2433](#2433) [#2420](#2420) [#2418](#2418) [#2425](#2425) [#2426](#2426) [#2427](#2427) [#2431](#2431) [#2433](#2433)
* Makefile for contrib for Issue [#2364](#2364) ([#2366](#2366)) ([a02325b](a02325b))
* Support on demand feature views in go feature server ([#2494](#2494)) ([6edd274](6edd274))
* Switch from Feature to Field ([#2514](#2514)) ([6a03bed](6a03bed))
* Use a daemon thread to monitor the go feature server exclusively ([#2391](#2391)) ([0bb5e8c](0bb5e8c))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants