docs: add aisample uplift modelling #1640

whiskyboy · 2022-09-01T05:28:03Z

Related Issues/PRs

None

What changes are proposed in this pull request?

Add a uplift modelling notebook for aisample under notebooks/community/aisample.

How is this patch tested?

I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

No. You can skip this section.
Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

No. You can skip this section.
Yes. Make sure you have added samples following below steps.

Find the corresponding markdown file for your new feature in website/docs/documentation folder.
Make sure you choose the correct class estimators/transformers and namespace.
Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
Make sure the DocTable points to correct API link.
Navigate to website folder, and run yarn run start to make sure the website renders correctly.
Don't forget to add  before each python code blocks to enable auto-tests for python samples.
Make sure the WebsiteSamplesTests job pass in the pipeline.

AB#1957564

github-actions · 2022-09-01T05:28:30Z

Hey @whiskyboy 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.
We appreciate your patience and contributions 💯!

thinkall · 2022-09-02T01:58:06Z

/azp run

azure-pipelines · 2022-09-02T01:58:22Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov-commenter · 2022-09-02T02:07:02Z

Codecov Report

Merging #1640 (1c98b48) into master (8d55274) will decrease coverage by 0.06%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1640      +/-   ##
==========================================
- Coverage   85.83%   85.76%   -0.07%     
==========================================
  Files         272      272              
  Lines       14230    14230              
  Branches      739      739              
==========================================
- Hits        12214    12205       -9     
- Misses       2016     2025       +9

Impacted Files	Coverage Δ
...crosoft/azure/synapse/ml/io/http/HTTPClients.scala	`75.00% <0.00%> (-7.36%)`	⬇️
...oft/azure/synapse/ml/lightgbm/NetworkManager.scala	`89.44% <0.00%> (-2.78%)`	⬇️
...rosoft/azure/synapse/ml/stages/EnsembleByKey.scala	`87.32% <0.00%> (+1.40%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

serena-ruan · 2022-09-02T05:44:51Z

@mhamilton723 Hey Mark, this is another AI sample added, pls review it once you get a chance 😄

mhamilton723 · 2022-09-02T17:31:19Z

notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb

+    "if not IS_CUSTOMER_DATA:\n",
+    "    # Download demo data files into lakehouse if not exist\n",
+    "    remote_url = \"http://go.criteo.net/criteo-research-uplift-v2.1.csv.gz\"\n",
+    "    download_file = \"criteo-research-uplift-v2.1.csv.gz\"\n",
+    "\n",
+    "    # For this demo, we first check if the dataset files are already prepared in the default lakehouse. If not, we'll download the dataset.\n",
+    "    import os\n",
+    "    import requests\n",
+    "\n",
+    "    if not os.path.exists(\"/lakehouse/default\"):\n",
+    "        # ask user to add a lakehouse if no default lakehouse added to the notebook.\n",
+    "        # a new notebook will not link to any lakehouse by default.\n",
+    "        raise FileNotFoundError(\n",
+    "            \"Default lakehouse not found, please add a lakehouse for the notebook.\"\n",
+    "        )\n",
+    "    else:\n",
+    "        # check if the needed files are already in the lakehouse, try to download if not.\n",
+    "        # raise an error if downloading failed.\n",
+    "        os.makedirs(f\"/lakehouse/default/{DATA_FOLDER}/raw/\", exist_ok=True)\n",
+    "\n",
+    "        if not os.path.exists(f\"/lakehouse/default/{DATA_FOLDER}/raw/{DATA_FILE}\"):\n",
+    "            try:\n",
+    "                r = requests.get(f\"{remote_url}\", timeout=30)\n",
+    "                with open(\n",
+    "                    f\"/lakehouse/default/{DATA_FOLDER}/raw/{download_file}\", \"wb\"\n",
+    "                ) as f:\n",
+    "                    f.write(r.content)\n",
+    "                print(f\"Downloaded {download_file} into {DATA_FOLDER}/raw/.\")\n",
+    "\n",
+    "                with gzip.open(\n",
+    "                    f\"/lakehouse/default/{DATA_FOLDER}/raw/{download_file}\", \"rb\"\n",
+    "                ) as fin:\n",
+    "                    with open(\n",
+    "                        f\"/lakehouse/default/{DATA_FOLDER}/raw/{DATA_FILE}\", \"wb\"\n",
+    "                    ) as fout:\n",
+    "                        fout.write(fin.read())\n",
+    "                print(f\"Unzip {download_file} into {DATA_FOLDER}/raw/{DATA_FILE}.\")\n",
+    "            except Exception as e:\n",
+    "                print(f\"Failed on downloading {DATA_FILE}, error message: {e}\")\n",
+    "        else:\n",
+    "            print(f\"{DATA_FILE} already exists in {DATA_FOLDER}/raw/.\")"


We can make this a single line of spark readers if we save this loaded dataset to our public blob storage. I let others slip by with this large code, but are you up to show them how its done :) ?

notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb

mhamilton723

Lovely work @whiskyboy, left a few comments. Regarding the blob one, if you do this we can promote this to an official demo and test it multiplatform. It will be great to show people how to do this analysis as it will help many people explore their datasets

whiskyboy · 2022-09-10T00:58:44Z

Hi @mhamilton723 , thanks for your comments! They are very helpful! I have resolved most of them except the data loader one. Pls help to take a review again!

thinkall · 2022-09-13T04:28:12Z

/azp run

azure-pipelines · 2022-09-13T04:28:26Z

Azure Pipelines successfully started running 1 pipeline(s).

notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb

* docs: fix command to launch jupyter notebook (#1649) * docs: add aisample uplift modelling (#1640) * add uplift model sample * update uplift model notebook * Update notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb Co-authored-by: weitian <[email protected]> Co-authored-by: Mark Hamilton <[email protected]> * chore: publish test jars * docs: improve example notebooks * docs: remove unused docs and fix links * docs: move magic command forward since it restarts interpreter * chore: added `synapse-internal` to platform detector function (#1651) * chore: added trident platform * chore: update naming * Update core/src/main/python/synapse/ml/core/platform/Platform.py * chore: update detection rules * chore: remove redundant check * docs: apply black formatter Co-authored-by: Li Jiang <[email protected]> Co-authored-by: Mark Hamilton <[email protected]> * fix: KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656) Signed-off-by: Jason Wang <[email protected]> Signed-off-by: Jason Wang <[email protected]> * chore: turn off failing synapse tests temporarily (#1658) * docs: simplify data downloading and add mlflow to uplift modelling (#1659) * chore: simplify data downloading section update save dataframe with delta format * docs: add mlflow logging and loading * Update notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb Co-authored-by: Li Jiang <[email protected]> * fix: fix Uplift Modelling style * docs: improve error msg to make it clearer for users and fix typos (#1662) * docs: improve error msg to make it clearer for users. * fix: fix typos Co-authored-by: Li Jiang <[email protected]> * docs: fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663) Co-authored-by: Serena Ruan <[email protected]> * chore: clean up github workflows and add issue label remover (#1674) * chore: re open github issues after a comment (#1676) * chore: fix typo in issue reopen yaml * chore: fix reopen on comment workflow * chore: fix reopen comment action * Update reopen-issue-on-comment.yml * build: bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680) Bumps [amannn/action-semantic-pull-request](https://github.com/amannn/action-semantic-pull-request) from 4 to 5.0.1. - [Release notes](https://github.com/amannn/action-semantic-pull-request/releases) - [Changelog](https://github.com/amannn/action-semantic-pull-request/blob/main/CHANGELOG.md) - [Commits](amannn/action-semantic-pull-request@v4...v5.0.1) --- updated-dependencies: - dependency-name: amannn/action-semantic-pull-request dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: fix linked service on cog service base (#1685) * fix: fix pyarrow failure in deeplearning test (#1689) * build: bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688) Bumps [amannn/action-semantic-pull-request](https://github.com/amannn/action-semantic-pull-request) from 5.0.1 to 5.0.2. - [Release notes](https://github.com/amannn/action-semantic-pull-request/releases) - [Changelog](https://github.com/amannn/action-semantic-pull-request/blob/main/CHANGELOG.md) - [Commits](amannn/action-semantic-pull-request@v5.0.1...v5.0.2) --- updated-dependencies: - dependency-name: amannn/action-semantic-pull-request dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mark Hamilton <[email protected]> * docs: update developer readme instruction on python env creation (#1693) * update developer readme instruction on python env creation * Update developer-readme.md * Update developer-readme.md * fix: don't throw on invalid columns in DropColumns (#1695) * remove unused imports * batch prompts * don't throw for invalid columns in DropColumn * remove now unused verify method from DropColumns * Remove trident.mlflow APIs. (#1687) Co-authored-by: Li Jiang <[email protected]> * using predictionCol for isolation forest (#1686) fixing issue #1060 Co-authored-by: Mark Hamilton <[email protected]> * fix: update isolation forest notebook (#1696) * fix: remove synapse E2E testing exclusion - cyber ml (#1699) * remove CyberML exclusion from Synapse test pipeline * black style Co-authored-by: Jessica Wang <[email protected]> * chore: remove notebooks (#1703) * chore: fix ado integration (#1704) * chore: pin az and python versions (#1705) * chore: remove synapse test exclusions (#1698) * chore: update scalatest and scalactic (#1706) * remove unused imports * batch prompts * don't throw for invalid columns in DropColumn * remove now unused verify method from DropColumns * update scalatest and scalactic * strip unicode out of transliteration result * clean transliterate test * override assertDFEq in TransliterateSuite to strip out zero-width chars * fix: remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708) * remove Vowpal Wabbit exclusion, add Interpretability exclusion * exclude all Interpretability Notebooks from Synapse E2E Co-authored-by: Jessica Wang <[email protected]> Co-authored-by: Mark Hamilton <[email protected]> * feat: Remove CNTK functionality and replace with ONNX (#1593) * Switch usage of CNTK to use ONNX * style fix * small test fixes * resolved comments * some test fixes * test fixes and comment responses * more test fixes * fix dbx tests * fix notebooks * misc small fixes * Switch usage of CNTK to use ONNX * style fix * small test fixes * resolved comments * some test fixes * test fixes and comment responses * more test fixes * fix dbx tests * fix notebooks * misc small fixes * test fix * dotnet fix * add protobuf to sbt * notebook style fixes * add protobuf to databricks * revert dependency * dependency override * add shading * more shading * add maven shade plugin * add shade to subproject * switch to using separate onnx dependency * increase onnx version * add ignoreDecodingErrors * fix decoding errors param * fix style error * changed to sonatype package * add snapshots repo to tests - revert later * downgrade package to Java 8 * more snapshot repo refs * databricks onnx version fix * add error comment * centralize package constants * add debugging for image name * debugging image name * add conversion of black and white * style fix * fix to b&w usage * cherry pick python version fix * throw if only 1 channel * cli fix * cleanup of review 1 * fix typo * responded to comments * remove explicit dependency * responded to comments Co-authored-by: Mark Hamilton <[email protected]> * build: bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709) Bumps [loader-utils](https://github.com/webpack/loader-utils) from 2.0.2 to 2.0.3. - [Release notes](https://github.com/webpack/loader-utils/releases) - [Changelog](https://github.com/webpack/loader-utils/blob/v2.0.3/CHANGELOG.md) - [Commits](webpack/loader-utils@v2.0.2...v2.0.3) --- updated-dependencies: - dependency-name: loader-utils dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mark Hamilton <[email protected]> * chore: ScalaStyle fixes (#1716) * Style fixes * Style fixes * Style fixes in tests * test style fixes * make some warnings errors * make some warnings errors * make some warnings errors * make some warnings errors * fix vw * fixes from review pass * test: Improve ONNXtests reliability (#1713) * Improve ONNXtests reliability * fix constructor build * make new defaults * chore: Move new ImageFeaturizer to onnx namespace (#1711) * feat: Deprecate CNTK objects (#1712) * Deprecate CNTK objects * remove CNTK-based notebooks * ignore 1 failing test * deprecate more util classes * chore: add secret scanning infrastructure (#1724) * feat: Add SpeakerEmotionInference transformer for generating SSML t… (#1691) * feat: Add TextToSpeechSSMLGenerator transformer for generating SSML to augment TTS requestsfeat: Add TextToSpeechSSMLGenerator transformer for generating SSML to augment TTS requests. Adding support for SSML to the TTS endpoint. * Fixing style issues * More style issues * ok it finally passes scalastyle Co-authored-by: Brendan Walsh <[email protected]> * test: Additional E2E testing infrastructure (#1727) * Addition E2E testing infrastructure * Ignore cleanup test * Style fix * Update platform.py Co-authored-by: Mark Hamilton <[email protected]> * chore: Making secrets optional and cached (#1726) * Making secrets optional and cached * do not cache normal secrets * set publishing for dotnet * more dotnet pipeline edits * testing dotnet no env vars * enable publish only Co-authored-by: Mark Hamilton <[email protected]> * chore: autodelete old models (#1729) * autodelete old models * move cleanup calls to afterAll * fix style Co-authored-by: Mark Hamilton <[email protected]> * clarify date comparisons when deleting old models/groups (#1733) * chore: automate clean-acr with github action workflow (#1735) * hello workflow * make pre-commit executable * clean acr * gh-set-secret * pat->github * echo * env * Create gh-set-secret.yml * Update gh-set-secret.yml * Update gh-set-secret.yml * Create manual.yml * Update manual.yml * Update manual.yml * Update manual.yml * Update manual.yml * use azurecli for keyvault access * remove pip cache * remove columns * fix indent * fix env var name * split off script file * azurecli@v1 * shorten path * lengthen path * add query option to az cmd * re-indent * re-indent again * echo * print * test maniehtestkv * back to azure kv task * back to mmlspark-keys * quot arg * typo * use popen for pipeline-run * run through deletions in whatif mode * print result code * delete result code * format changes and check result of transfer * remove tqdn * remove tqdn * restore actual deletion * formatize prints * delete cruft * switch from manual to cron * sundays at 1am * chmod pre-commit Co-authored-by: Mark Hamilton <[email protected]> * feat: add simple deep learning text classifier (#1591) * refactor deep vision model params * add Text Classifier and tests * update text classifier * add default values for transformation edit fields and removed fields * add deep text classification notebook * update hovorod installation script * update environment * update env * add installing packages on dbx * fix python environment * add more tests * skip deep text model test * address comments * add learning_rate param * fix notebook style * fix _find_num_proc * update newtonsoft.json version in dotnet to resolve security issue * fix missing learning rate param * fix dataframe partition error and strange read output * fix failing test * fix param updates * update notebook to make it run faster * make train data size smaller * update models * remove minor version of python to avoid warnings in pipeline * ignore dl notebooks in Synapse tests * update mardown name * chore: fix style (#1736) * build: bump actions/checkout from 2 to 3 (#1737) Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v2...v3) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: bump version to 0.10.2 (#1738) * docs: removing beta tag from R * build: bump loader-utils from 2.0.3 to 2.0.4 in /website (#1719) Bumps [loader-utils](https://github.com/webpack/loader-utils) from 2.0.3 to 2.0.4. - [Release notes](https://github.com/webpack/loader-utils/releases) - [Changelog](https://github.com/webpack/loader-utils/blob/v2.0.4/CHANGELOG.md) - [Commits](webpack/loader-utils@v2.0.3...v2.0.4) --- updated-dependencies: - dependency-name: loader-utils dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mark Hamilton <[email protected]> * chore: bump docusaurus (#1740) * chore: bump spark to 3.2.3 (#1744) * downgrade spark 3.1 * remove exlusion of org.json4s * fix json4s-ast version * resolve dependency conflicts * remove notebook * fix environment * fix onnxhub load method and add error message in platform.py * update platform.py Signed-off-by: Jason Wang <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: elswork <[email protected]> Co-authored-by: Tian Wei <[email protected]> Co-authored-by: weitian <[email protected]> Co-authored-by: Mark Hamilton <[email protected]> Co-authored-by: lhrotk <[email protected]> Co-authored-by: Li Jiang <[email protected]> Co-authored-by: Li Jiang <[email protected]> Co-authored-by: Jason Wang <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: niehaus59 <[email protected]> Co-authored-by: Markus Cozowicz <[email protected]> Co-authored-by: JessicaXYWang <[email protected]> Co-authored-by: Jessica Wang <[email protected]> Co-authored-by: Scott Votaw <[email protected]> Co-authored-by: Brendan Walsh <[email protected]> Co-authored-by: Brendan Walsh <[email protected]> Co-authored-by: Kyle Rush <[email protected]>

add uplift model sample

827b742

whiskyboy requested a review from mhamilton723 as a code owner September 1, 2022 05:28

microsoft-github-policy-service bot deleted a comment Sep 1, 2022

mhamilton723 reviewed Sep 2, 2022

View reviewed changes

notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb Outdated Show resolved Hide resolved