-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add aisample uplift modelling #1640
docs: add aisample uplift modelling #1640
Conversation
Hey @whiskyboy 👋! |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #1640 +/- ##
==========================================
- Coverage 85.83% 85.76% -0.07%
==========================================
Files 272 272
Lines 14230 14230
Branches 739 739
==========================================
- Hits 12214 12205 -9
- Misses 2016 2025 +9
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@mhamilton723 Hey Mark, this is another AI sample added, pls review it once you get a chance 😄 |
"if not IS_CUSTOMER_DATA:\n", | ||
" # Download demo data files into lakehouse if not exist\n", | ||
" remote_url = \"http://go.criteo.net/criteo-research-uplift-v2.1.csv.gz\"\n", | ||
" download_file = \"criteo-research-uplift-v2.1.csv.gz\"\n", | ||
"\n", | ||
" # For this demo, we first check if the dataset files are already prepared in the default lakehouse. If not, we'll download the dataset.\n", | ||
" import os\n", | ||
" import requests\n", | ||
"\n", | ||
" if not os.path.exists(\"/lakehouse/default\"):\n", | ||
" # ask user to add a lakehouse if no default lakehouse added to the notebook.\n", | ||
" # a new notebook will not link to any lakehouse by default.\n", | ||
" raise FileNotFoundError(\n", | ||
" \"Default lakehouse not found, please add a lakehouse for the notebook.\"\n", | ||
" )\n", | ||
" else:\n", | ||
" # check if the needed files are already in the lakehouse, try to download if not.\n", | ||
" # raise an error if downloading failed.\n", | ||
" os.makedirs(f\"/lakehouse/default/{DATA_FOLDER}/raw/\", exist_ok=True)\n", | ||
"\n", | ||
" if not os.path.exists(f\"/lakehouse/default/{DATA_FOLDER}/raw/{DATA_FILE}\"):\n", | ||
" try:\n", | ||
" r = requests.get(f\"{remote_url}\", timeout=30)\n", | ||
" with open(\n", | ||
" f\"/lakehouse/default/{DATA_FOLDER}/raw/{download_file}\", \"wb\"\n", | ||
" ) as f:\n", | ||
" f.write(r.content)\n", | ||
" print(f\"Downloaded {download_file} into {DATA_FOLDER}/raw/.\")\n", | ||
"\n", | ||
" with gzip.open(\n", | ||
" f\"/lakehouse/default/{DATA_FOLDER}/raw/{download_file}\", \"rb\"\n", | ||
" ) as fin:\n", | ||
" with open(\n", | ||
" f\"/lakehouse/default/{DATA_FOLDER}/raw/{DATA_FILE}\", \"wb\"\n", | ||
" ) as fout:\n", | ||
" fout.write(fin.read())\n", | ||
" print(f\"Unzip {download_file} into {DATA_FOLDER}/raw/{DATA_FILE}.\")\n", | ||
" except Exception as e:\n", | ||
" print(f\"Failed on downloading {DATA_FILE}, error message: {e}\")\n", | ||
" else:\n", | ||
" print(f\"{DATA_FILE} already exists in {DATA_FOLDER}/raw/.\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make this a single line of spark readers if we save this loaded dataset to our public blob storage. I let others slip by with this large code, but are you up to show them how its done :) ?
notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb
Outdated
Show resolved
Hide resolved
notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb
Outdated
Show resolved
Hide resolved
notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb
Outdated
Show resolved
Hide resolved
notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb
Outdated
Show resolved
Hide resolved
notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb
Outdated
Show resolved
Hide resolved
notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely work @whiskyboy, left a few comments. Regarding the blob one, if you do this we can promote this to an official demo and test it multiplatform. It will be great to show people how to do this analysis as it will help many people explore their datasets
Hi @mhamilton723 , thanks for your comments! They are very helpful! I have resolved most of them except the data loader one. Pls help to take a review again! |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb
Outdated
Show resolved
Hide resolved
* docs: fix command to launch jupyter notebook (#1649) * docs: add aisample uplift modelling (#1640) * add uplift model sample * update uplift model notebook * Update notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb Co-authored-by: weitian <[email protected]> Co-authored-by: Mark Hamilton <[email protected]> * chore: publish test jars * docs: improve example notebooks * docs: remove unused docs and fix links * docs: move magic command forward since it restarts interpreter * chore: added `synapse-internal` to platform detector function (#1651) * chore: added trident platform * chore: update naming * Update core/src/main/python/synapse/ml/core/platform/Platform.py * chore: update detection rules * chore: remove redundant check * docs: apply black formatter Co-authored-by: Li Jiang <[email protected]> Co-authored-by: Mark Hamilton <[email protected]> * fix: KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656) Signed-off-by: Jason Wang <[email protected]> Signed-off-by: Jason Wang <[email protected]> * chore: turn off failing synapse tests temporarily (#1658) * docs: simplify data downloading and add mlflow to uplift modelling (#1659) * chore: simplify data downloading section update save dataframe with delta format * docs: add mlflow logging and loading * Update notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb Co-authored-by: Li Jiang <[email protected]> * fix: fix Uplift Modelling style * docs: improve error msg to make it clearer for users and fix typos (#1662) * docs: improve error msg to make it clearer for users. * fix: fix typos Co-authored-by: Li Jiang <[email protected]> * docs: fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663) Co-authored-by: Serena Ruan <[email protected]> * chore: clean up github workflows and add issue label remover (#1674) * chore: re open github issues after a comment (#1676) * chore: fix typo in issue reopen yaml * chore: fix reopen on comment workflow * chore: fix reopen comment action * Update reopen-issue-on-comment.yml * build: bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680) Bumps [amannn/action-semantic-pull-request](https://github.com/amannn/action-semantic-pull-request) from 4 to 5.0.1. - [Release notes](https://github.com/amannn/action-semantic-pull-request/releases) - [Changelog](https://github.com/amannn/action-semantic-pull-request/blob/main/CHANGELOG.md) - [Commits](amannn/action-semantic-pull-request@v4...v5.0.1) --- updated-dependencies: - dependency-name: amannn/action-semantic-pull-request dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: fix linked service on cog service base (#1685) * fix: fix pyarrow failure in deeplearning test (#1689) * build: bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688) Bumps [amannn/action-semantic-pull-request](https://github.com/amannn/action-semantic-pull-request) from 5.0.1 to 5.0.2. - [Release notes](https://github.com/amannn/action-semantic-pull-request/releases) - [Changelog](https://github.com/amannn/action-semantic-pull-request/blob/main/CHANGELOG.md) - [Commits](amannn/action-semantic-pull-request@v5.0.1...v5.0.2) --- updated-dependencies: - dependency-name: amannn/action-semantic-pull-request dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mark Hamilton <[email protected]> * docs: update developer readme instruction on python env creation (#1693) * update developer readme instruction on python env creation * Update developer-readme.md * Update developer-readme.md * fix: don't throw on invalid columns in DropColumns (#1695) * remove unused imports * batch prompts * don't throw for invalid columns in DropColumn * remove now unused verify method from DropColumns * Remove trident.mlflow APIs. (#1687) Co-authored-by: Li Jiang <[email protected]> * using predictionCol for isolation forest (#1686) fixing issue #1060 Co-authored-by: Mark Hamilton <[email protected]> * fix: update isolation forest notebook (#1696) * fix: remove synapse E2E testing exclusion - cyber ml (#1699) * remove CyberML exclusion from Synapse test pipeline * black style Co-authored-by: Jessica Wang <[email protected]> * chore: remove notebooks (#1703) * chore: fix ado integration (#1704) * chore: pin az and python versions (#1705) * chore: remove synapse test exclusions (#1698) * chore: update scalatest and scalactic (#1706) * remove unused imports * batch prompts * don't throw for invalid columns in DropColumn * remove now unused verify method from DropColumns * update scalatest and scalactic * strip unicode out of transliteration result * clean transliterate test * override assertDFEq in TransliterateSuite to strip out zero-width chars * fix: remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708) * remove Vowpal Wabbit exclusion, add Interpretability exclusion * exclude all Interpretability Notebooks from Synapse E2E Co-authored-by: Jessica Wang <[email protected]> Co-authored-by: Mark Hamilton <[email protected]> * feat: Remove CNTK functionality and replace with ONNX (#1593) * Switch usage of CNTK to use ONNX * style fix * small test fixes * resolved comments * some test fixes * test fixes and comment responses * more test fixes * fix dbx tests * fix notebooks * misc small fixes * Switch usage of CNTK to use ONNX * style fix * small test fixes * resolved comments * some test fixes * test fixes and comment responses * more test fixes * fix dbx tests * fix notebooks * misc small fixes * test fix * dotnet fix * add protobuf to sbt * notebook style fixes * add protobuf to databricks * revert dependency * dependency override * add shading * more shading * add maven shade plugin * add shade to subproject * switch to using separate onnx dependency * increase onnx version * add ignoreDecodingErrors * fix decoding errors param * fix style error * changed to sonatype package * add snapshots repo to tests - revert later * downgrade package to Java 8 * more snapshot repo refs * databricks onnx version fix * add error comment * centralize package constants * add debugging for image name * debugging image name * add conversion of black and white * style fix * fix to b&w usage * cherry pick python version fix * throw if only 1 channel * cli fix * cleanup of review 1 * fix typo * responded to comments * remove explicit dependency * responded to comments Co-authored-by: Mark Hamilton <[email protected]> * build: bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709) Bumps [loader-utils](https://github.com/webpack/loader-utils) from 2.0.2 to 2.0.3. - [Release notes](https://github.com/webpack/loader-utils/releases) - [Changelog](https://github.com/webpack/loader-utils/blob/v2.0.3/CHANGELOG.md) - [Commits](webpack/loader-utils@v2.0.2...v2.0.3) --- updated-dependencies: - dependency-name: loader-utils dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mark Hamilton <[email protected]> * chore: ScalaStyle fixes (#1716) * Style fixes * Style fixes * Style fixes in tests * test style fixes * make some warnings errors * make some warnings errors * make some warnings errors * make some warnings errors * fix vw * fixes from review pass * test: Improve ONNXtests reliability (#1713) * Improve ONNXtests reliability * fix constructor build * make new defaults * chore: Move new ImageFeaturizer to onnx namespace (#1711) * feat: Deprecate CNTK objects (#1712) * Deprecate CNTK objects * remove CNTK-based notebooks * ignore 1 failing test * deprecate more util classes * chore: add secret scanning infrastructure (#1724) * feat: Add SpeakerEmotionInference transformer for generating SSML t… (#1691) * feat: Add TextToSpeechSSMLGenerator transformer for generating SSML to augment TTS requestsfeat: Add TextToSpeechSSMLGenerator transformer for generating SSML to augment TTS requests. Adding support for SSML to the TTS endpoint. * Fixing style issues * More style issues * ok it finally passes scalastyle Co-authored-by: Brendan Walsh <[email protected]> * test: Additional E2E testing infrastructure (#1727) * Addition E2E testing infrastructure * Ignore cleanup test * Style fix * Update platform.py Co-authored-by: Mark Hamilton <[email protected]> * chore: Making secrets optional and cached (#1726) * Making secrets optional and cached * do not cache normal secrets * set publishing for dotnet * more dotnet pipeline edits * testing dotnet no env vars * enable publish only Co-authored-by: Mark Hamilton <[email protected]> * chore: autodelete old models (#1729) * autodelete old models * move cleanup calls to afterAll * fix style Co-authored-by: Mark Hamilton <[email protected]> * clarify date comparisons when deleting old models/groups (#1733) * chore: automate clean-acr with github action workflow (#1735) * hello workflow * make pre-commit executable * clean acr * gh-set-secret * pat->github * echo * env * Create gh-set-secret.yml * Update gh-set-secret.yml * Update gh-set-secret.yml * Create manual.yml * Update manual.yml * Update manual.yml * Update manual.yml * Update manual.yml * use azurecli for keyvault access * remove pip cache * remove columns * fix indent * fix env var name * split off script file * azurecli@v1 * shorten path * lengthen path * add query option to az cmd * re-indent * re-indent again * echo * print * test maniehtestkv * back to azure kv task * back to mmlspark-keys * quot arg * typo * use popen for pipeline-run * run through deletions in whatif mode * print result code * delete result code * format changes and check result of transfer * remove tqdn * remove tqdn * restore actual deletion * formatize prints * delete cruft * switch from manual to cron * sundays at 1am * chmod pre-commit Co-authored-by: Mark Hamilton <[email protected]> * feat: add simple deep learning text classifier (#1591) * refactor deep vision model params * add Text Classifier and tests * update text classifier * add default values for transformation edit fields and removed fields * add deep text classification notebook * update hovorod installation script * update environment * update env * add installing packages on dbx * fix python environment * add more tests * skip deep text model test * address comments * add learning_rate param * fix notebook style * fix _find_num_proc * update newtonsoft.json version in dotnet to resolve security issue * fix missing learning rate param * fix dataframe partition error and strange read output * fix failing test * fix param updates * update notebook to make it run faster * make train data size smaller * update models * remove minor version of python to avoid warnings in pipeline * ignore dl notebooks in Synapse tests * update mardown name * chore: fix style (#1736) * build: bump actions/checkout from 2 to 3 (#1737) Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v2...v3) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: bump version to 0.10.2 (#1738) * docs: removing beta tag from R * build: bump loader-utils from 2.0.3 to 2.0.4 in /website (#1719) Bumps [loader-utils](https://github.com/webpack/loader-utils) from 2.0.3 to 2.0.4. - [Release notes](https://github.com/webpack/loader-utils/releases) - [Changelog](https://github.com/webpack/loader-utils/blob/v2.0.4/CHANGELOG.md) - [Commits](webpack/loader-utils@v2.0.3...v2.0.4) --- updated-dependencies: - dependency-name: loader-utils dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mark Hamilton <[email protected]> * chore: bump docusaurus (#1740) * chore: bump spark to 3.2.3 (#1744) * downgrade spark 3.1 * remove exlusion of org.json4s * fix json4s-ast version * resolve dependency conflicts * remove notebook * fix environment * fix onnxhub load method and add error message in platform.py * update platform.py Signed-off-by: Jason Wang <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: elswork <[email protected]> Co-authored-by: Tian Wei <[email protected]> Co-authored-by: weitian <[email protected]> Co-authored-by: Mark Hamilton <[email protected]> Co-authored-by: lhrotk <[email protected]> Co-authored-by: Li Jiang <[email protected]> Co-authored-by: Li Jiang <[email protected]> Co-authored-by: Jason Wang <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: niehaus59 <[email protected]> Co-authored-by: Markus Cozowicz <[email protected]> Co-authored-by: JessicaXYWang <[email protected]> Co-authored-by: Jessica Wang <[email protected]> Co-authored-by: Scott Votaw <[email protected]> Co-authored-by: Brendan Walsh <[email protected]> Co-authored-by: Brendan Walsh <[email protected]> Co-authored-by: Kyle Rush <[email protected]>
Related Issues/PRs
None
What changes are proposed in this pull request?
Add a uplift modelling notebook for aisample under notebooks/community/aisample.
How is this patch tested?
Does this PR change any dependencies?
Does this PR add a new feature? If so, have you added samples on website?
website/docs/documentation
folder.Make sure you choose the correct class
estimators/transformers
and namespace.DocTable
points to correct API link.yarn run start
to make sure the website renders correctly.<!--pytest-codeblocks:cont-->
before each python code blocks to enable auto-tests for python samples.WebsiteSamplesTests
job pass in the pipeline.AB#1957564