Skip to content

Releases: microsoft/SynapseML

v0.11.2-spark3.3

10 Jul 23:15
Compare
Choose a tag to compare
v0.11.2-spark3.3 Pre-release
Pre-release
chore: bump to spark 3.3.1

v0.11.1-spark3.3

04 May 20:02
Compare
Choose a tag to compare
v0.11.1-spark3.3 Pre-release
Pre-release
chore: make it so custom versions are possible

SynapseML v0.11.1

24 Apr 23:13
866261c
Compare
Choose a tag to compare

SynapseML v0.11.1

Bug Fixes 🐞

  • set default values for aadToken & url for internal Synapse (#1918)
  • ONNX model shape inference cannot handle batch with shape [-1] (#1906)
  • forgot to add getPValue to python side (#1909)
  • generate random dir for each test (#1908)
  • add back diagnosticsInfo for MVAD (#1892)
  • DML run get timeout if big dataset has more feature columns (Workaround Synapse Spark optimizer issue) (#1903)
  • fix date parsing in FaceSuite test (#1896)
  • fix Build pipeline (#1904)
  • Retry OnnxHub call to improve test reliability (#1889)
  • Normalize line-endings (#1883)
  • Remove case matching for erased generic type (#1880)
  • fix bug #1869, DML .setFitIntercept should be set to true (#1876)
  • Remove extraneous "Foo" type from Py codegen (#1867)
  • Allow variable size in ONNX inputs (#1851)
  • Abstain from CodeQL for markdown-only changes (#1865)
  • fix style
  • update OpenAIEmbedding internalServiceType

Build 🏭

  • bump peter-evans/create-or-update-comment from 2 to 3 (#1907)
  • bump ossf/scorecard-action from 2.1.2 to 2.1.3 (#1898)
  • bump amannn/action-semantic-pull-request from 5.1.0 to 5.2.0 (#1878)
  • bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#1874)
  • bump webpack from 5.75.0 to 5.76.1 in /website (#1870)

Documentation 📘

  • Fix installation instruction in the webpage for the build.sbt file (#1921)
  • note discrete treatment data type (#1905)
  • add custom chatbot creation to form demo (#1888)
  • add overview page for simple DNN and fix some typos (#1879)
  • Fix a typo in installation docs
  • fix link issue in CONTRIBUTING.md (#1864)
  • fix a few issues in cognitive service demo (#1861)

Features 🌈

  • add streaming API for MVAD (#1893)
  • [DistributionBalanceMeasure] Add implementation + unit tests for custom reference distribution (#1885)
  • Add ChatGPT through the OpenAIChatCompletion transformer (#1887)
  • support new api version of form recognizer (#1882)
  • Add a new function to DMLModel, getPValue (#1863)
  • update default internal endpoint for cog services (#1859)

Maintenance 🔧

  • bump to v0.11.1 (#1933)
  • Adding telemetry for the dataset metadata. This one is specially for … (#1917)
  • fix r tests (#1927)
  • fix build issues (#1916)
  • disable test until Synapse is fixed (#1915)
  • add .bloop to .gitignore (#1897)
  • clean up old/missed search indexes in SearchWriterSuite (#1901)
  • Add utility to clean azure search indexes
  • update website docs to point to correct developer API docs (#1877)
  • Update pipeline.yaml for Azure Pipelines (#1866)
  • make sure nightly build has new commit

Changes:

  • 866261c chore: bump to v0.11.1 (#1933)
  • 3c09702 chore: Adding telemetry for the dataset metadata. This one is specially for … (#1917)
  • 0d0d10c feat: add streaming API for MVAD (#1893)
  • 1b71c1d chore: fix r tests (#1927)
  • 0df97ad chore: fix build issues (#1916)
  • 78695fb Update Regression - Vowpal Wabbit vs. LightGBM vs. Linear Regressor.ipynb (#1922)
  • 87d5bc5 docs: Fix installation instruction in the webpage for the build.sbt file (#1921)
  • 8320b2b fix: set default values for aadToken & url for internal Synapse (#1918)
  • 4912ae4 chore: disable test until Synapse is fixed (#1915)
  • 469445b fix: ONNX model shape inference cannot handle batch with shape [-1] (#1906)
See More
  • 3fa001e build: bump peter-evans/create-or-update-comment from 2 to 3 (#1907)
  • f51327e Update LightGBM version to 3.3.5 (#1910)
  • b1e584e fix: forgot to add getPValue to python side (#1909)
  • a09a6f7 docs: note discrete treatment data type (#1905)
  • 0fa3f2a fix: generate random dir for each test (#1908)
  • 736c317 fix: add back diagnosticsInfo for MVAD (#1892)
  • 13afff6 fix: DML run get timeout if big dataset has more feature columns (Workaround Synapse Spark optimizer issue) (#1903)
  • 7546e7f build: bump ossf/scorecard-action from 2.1.2 to 2.1.3 (#1898)
  • f227f02 fix: fix date parsing in FaceSuite test (#1896)
  • 0f02626 fix: fix Build pipeline (#1904)
  • ce9fe41 chore: add .bloop to .gitignore (#1897)
  • 7ffa970 chore: clean up old/missed search indexes in SearchWriterSuite (#1901)
  • 9a6cf03 chore: Add utility to clean azure search indexes
  • 52919ce fix: Retry OnnxHub call to improve test reliability (#1889)
  • 979c629 feat: [DistributionBalanceMeasure] Add implementation + unit tests for custom reference distribution (#1885)
  • 412620a docs: add custom chatbot creation to form demo (#1888)
  • 9f634a6 feat: Add ChatGPT through the OpenAIChatCompletion transformer (#1887)
  • 7657089 fix: Normalize line-endings (#1883)
  • c156792 feat: support new api version of form recognizer (#1882)
  • ed842a5 docs: add overview page for simple DNN and fix some typos (#1879)
  • 87e1c78 fix: Remove case matching for erased generic type (#1880)
  • cd72bc9 build: bump amannn/action-semantic-pull-request from 5.1.0 to 5.2.0 (#1878)
  • 564d047 fix: fix bug #1869, DML .setFitIntercept should be set to true (#1876)
  • 392dbbf chore: update website docs to point to correct developer API docs (#1877)
  • 129abde build: bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#1874)
  • 4d1c560 build: bump webpack from 5.75.0 to 5.76.1 in /website (#1870)
  • 62c79d8 docs: Fix a typo in installation docs
  • 1f63dab feat: Add a new function to DMLModel, getPValue (#1863)
  • 83f8260 fix: Remove extraneous "Foo" type from Py codegen (#1867)
  • a5bec45 fix: Allow variable size in ONNX inputs (#1851)
  • 23c9b0a chore: Update pipeline.yaml for Azure Pipelines (#1866)
  • dedcbda docs: fix link issue in CONTRIBUTING.md (#1864)
  • a7f31d5 fix: Abstain from CodeQL for markdown-only changes (#1865)
  • a5f38b1 Update DoubleMLEstimator test CI verification (#1862)
  • a44f917 fix: fix style
  • cc931af fix: update OpenAIEmbedding internalServiceType
  • 424d586 feat: update default internal endpoint for cog services (#1859)
  • e4a0e2c docs: fix ...
Read more

SynapseML v0.11.0

05 Mar 13:37
7b23764
Compare
Choose a tag to compare

SynapseML: Simple and distributed machine learning

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.11.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights

ChatGPT and GPT-4 at Scale Simple Deep Learning LightGBM v2
Intelligent chat and embeddings. Simplified Prompting APIs. Train custom image and text classifiers with ease Higher performance, >10x lower memory footprint, same API
View Notebook Learn More Try an example
ONNX Model Hub Causal Learning Vowpal Wabbit v2
Embed >150 state of the art deep networks into your pipelines Discover and measure causal treatment effects New second generation integration
Learn More View Docs Explore Samples

New Features

General ✨

  • R Support is no longer Beta! (#1586)
  • Support for Spark 3.2.3

Open AI 🤖

  • Add OpenAI Prompt Template support (#1843)
  • Add Azure OpenAI embedding support (#1832)
  • Add Azure Active Directory authentication for OpenAI (#1829)
  • Add Null-value handling for OpenAI models (#1854)

Deep Learning 🕸

  • Remove CNTK functionality and replace with ONNX (#1593)
  • Add the DeepTextClassifier a simple API for fine tuning a wide array of Hugging Face 🤗 text transformers using PyTorch Lightning (#1591)
  • Add the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)

Azure Cognitive Services for Big Data 🧠

  • Add SpeakerEmotionInference transformer to generate emotion annotation tags for emotive reading in SpeechToText (#1691)
  • Add new AnalyzeText API (#1760)
  • Support Azure Active Directory (AAD) authentication for the cognitive services (#1778, #1797)
  • Move different cognitive services into sub packages (#1746)
  • Add audiobook generation example (#1852)
  • Add a notebook for advanced cognitive service usage (#1825)
  • Upgrade MVAD to v1.1 (#1788)
  • Remove MVAD's dependence on hardwired credentials and azure SDKs (#1629)
  • Add word-level timing to SpeechToTextSDK and ConversationTranscription (#1801)
  • Add the descriptionExcludes parameter to AnalyzeImage (#1590)

Causal Learning 📈

  • Add the causal DoubleMLEstimator for learning causal treatment effects from data (#1715)
  • Add a DoubleMLEstimator document and sample notebook (#1730)
  • Fix DML regression bug, should remove both treatment and outcome columns as feature columns (#1820)
  • Add TreatmentCol type checking (#1816)
  • Update test to validate ATE value should be positive (#1821)
  • Fix issue with missing causal test coverage (#1799)

LightGBM 🌳

  • Add LightGBM streaming execution mode for more reliable performance with orders of magnitude less memory. (#1580)
  • Add maxNumClasses param to LightGBMClassifier for multi-class (#1841)
  • Added the passThroughArgs feature which allows users to set low level LGBM parameters before they are wrapped in SparkML (#1749)

Vowpal Wabbit 🐇

Additional Updates

Bug Fixes 🐞

  • Support grayscale images in toNDArray (#1592)
  • Adjust learning rate in VW example notebook (#1853)
  • Correct copy/paste error in acr cleanup (#1838)
  • Fix synapse test config, and isolation forest notebook (#1833)
  • Add spark config to fix ArrayStoreException (#1757)
  • Fix breeze NoSuchMethodError (#1807)
  • Fix modelVersion param in TextAnalytics (#1756)
  • Make logging infrastructure consistent and add logging checks (#1755)
  • Fix website sidebars and vulnerabilities in packages (#1753)
  • Remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
  • Update isolation forest notebook (#1696)
  • Remove error on invalid columns in DropColumns (#1695)
  • Fix PyArrow failure in deeplearning test (#1689)
  • Fix linked service setters on cog service base class (#1685)
  • KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)
  • Fix flaky translate tests (#1643)
  • Fix speechToTextSuite serialization Fuzzing failure (#1626)
  • Fix translator endpoint and update all endpoints for gov regions (#1623)
  • Finder runtime issues (#1598)
  • Clean up cluster if Databricks tests pass ([#1599](https://github....
Read more

SynapseML v0.10.2

22 Nov 14:30
cd1d2ea
Compare
Choose a tag to compare

v0.10.2

Bug Fixes 🐞

  • remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
  • remove synapse E2E testing exclusion - cyber ml (#1699)
  • update isolation forest notebook (#1696)
  • don't throw on invalid columns in DropColumns (#1695)
  • fix pyarrow failure in deeplearning test (#1689)
  • fix linked service on cog service base (#1685)
  • fix Uplift Modelling style
  • KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)
  • fix flaky translate tests (#1643)
  • update ubuntu to 20.04 in pipeline (#1624)

Build 🏭

  • bump actions/checkout from 2 to 3 (#1737)
  • bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)
  • bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)
  • bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)

Documentation 📘

  • update developer readme instruction on python env creation (#1693)
  • fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)
  • improve error msg to make it clearer for users and fix typos (#1662)
  • simplify data downloading and add mlflow to uplift modelling (#1659)
  • move magic command forward since it restarts interpreter
  • remove unused docs and fix links
  • improve example notebooks
  • add aisample uplift modelling (#1640)
  • fix command to launch jupyter notebook (#1649)
  • add mlflow in ai samples time series forecasting (#1645)
  • add mlflow logging and loading (#1641)
  • update spark version in Readme
  • improve readme overview
  • add aisample on text classification (#1617)

Features 🌈

  • add simple deep learning text classifier (#1591)
  • Add SpeakerEmotionInference transformer for generating SSML t… (#1691)
  • Deprecate CNTK objects (#1712)
  • Remove CNTK functionality and replace with ONNX (#1593)
  • R test generation (#1586)

Maintenance 🔧

  • bump version to 0.10.2 (#1738)
  • fix style (#1736)
  • automate clean-acr with github action workflow (#1735)
  • autodelete old models (#1729)
  • Making secrets optional and cached (#1726)
  • add secret scanning infrastructure (#1724)
  • Move new ImageFeaturizer to onnx namespace (#1711)
  • ScalaStyle fixes (#1716)
  • update scalatest and scalactic (#1706)
  • remove synapse test exclusions (#1698)
  • pin az and python versions (#1705)
  • fix ado integration (#1704)
  • remove notebooks (#1703)
  • fix reopen comment action
  • fix reopen on comment workflow
  • fix typo in issue reopen yaml
  • re open github issues after a comment (#1676)
  • clean up github workflows and add issue label remover (#1674)
  • turn off failing synapse tests temporarily (#1658)
  • added synapse-internal to platform detector function (#1651)
  • publish test jars
  • improve test coverage (#1631)
  • Remove MVAD's dependence on hardwired credentials and azure SDKs (#1629)
  • clean up TextAnalytics cog service APIs (#1622)

Testing 💚

  • Additional E2E testing infrastructure (#1727)
  • Improve ONNXtests reliability (#1713)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

See More
  • 0b96cc5 chore: add secret scanning infrastructure (#1724)
  • 2a7a67b feat: Deprecate CNTK objects (#1712)
  • e38e3ad chore: Move new ImageFeaturizer to onnx namespace (#1711)
  • 0ff6802 test: Improve ONNXtests reliability (#1713)
  • fe4c5d2 chore: ScalaStyle fixes (#1716)
  • 050b541 build: bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)
  • f2e88fd feat: Remove CNTK functionality and replace with ONNX (#1593)
  • abdfe19 fix: remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
  • 6a1f994 chore: update scalatest and scalactic (#1706)
  • 144674f chore: remove synapse test exclusions (#1698)
  • 32c654b chore: pin az and python versions (#1705)
  • c8fba28 chore: fix ado integration (#1704)
  • 92d4095 chore: remove notebooks (#1703)
  • a953780 fix: remove synapse E2E testing exclusion - cyber ml (#1699)
  • b257c70 fix: update isolation forest notebook (#1696)
  • 9120b05 using predictionCol for isolation forest (#1686) [ #1060 ]
  • 448f6b7 Remove trident.mlflow APIs. (#1687)
  • f4af33f fix: don't throw on invalid columns in DropColumns (#1695)
  • c531bbb docs: update developer readme instruction on python env creation (#1693)
  • 467e651 build: bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)
  • 302831f fix: fix pyarrow failure in deeplearning test (#1689)
  • e857511 fix: fix linked service on cog service base (#1685)
  • f29318a build: bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)
  • 50ac0c8 Update reopen-issue-on-comment.yml
  • c9278b5 chore: fix reopen comment action
  • b3a9ba9 chore: fix reopen on comment workflow
  • 9fe273b chore: fix typo in issue reopen yaml
  • a7c50de chore: re open github issues after a comment (#1676)
  • 8914750 chore: clean up github workflows and add issue label remover (#1674)
  • 965231a docs: fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)
  • 4fa7249 docs: improve error msg to make it clearer for users and fix typos (#1...
Read more

v0.10.1

23 Aug 03:41
0f54bc6
Compare
Choose a tag to compare

SynapseML v0.10.1

Bug Fixes 🐞

  • fix speechToTextSuite serializationFuzzing failure (#1626)
  • fix translator endpoint and update all endpoints for gov regions (#1623)
  • binder runtime issues (#1598)
  • clean up cluster if databricks tests pass (#1599)
  • fix deep-learning test flakiness (#1600)
  • update dotnetTestBase assembly version (#1601)
  • fix flaky forms test (#1584)

Build 🏭

  • bump EnricoMi/publish-unit-test-result-action from 1 to 2 (#1609)
  • bump actions/setup-node from 2 to 3 (#1610)
  • bump actions/setup-python from 2.3.2 to 4.2.0 (#1611)
  • bump actions/setup-java from 2 to 3 (#1612)
  • simplify e2e test pipeline with test matrix

Documentation 📘

  • add aisample notebooks into community folder (#1606)
  • add aisample time series forecasting (#1614)
  • fix .NET logo on website (#1604)
  • improve OpenAI notebook (#1596)
  • pin mybinder to v0.10.0 to avoid thrashing
  • add demo into videos on website (#1581)
  • update installation guidance of v0.10.0 (#1578)
  • add more .net samples (#1570)
  • add dotnet installation & example doc (#1567)
  • Update issue template

Features 🌈

  • add stale bot for issues (#1602)
  • Support grayscale images in toNDArray (#1592)
  • Add the descriptionExcludes parameter to AnalyzeImage (#1590)
  • Added the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)

Maintenance 🔧

  • bump to v0.10.1 (#1628)
  • deprecate old Text analytics APIs to prepare for refactoring (#1627)
  • remove deprecated lime APIs (#1620)
  • update openai service to the official deployment, and disable test due to outage (#1619)
  • Auto update GitHub actions with dependabot (#1608)
  • hotfix binder badge
  • pin binder version for users (#1607)
  • Bump spark to 3.2.2
  • bump spark version
  • Format welcome message with emojis (#1583)
  • Add welcome message to new PRs/Issues (#1573)
  • Add GH workflow to label new/reopened issues (#1571)
  • update website (#1566)

Testing 💚

  • stabilize unit tests (#1576)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

  • 0f54bc6 chore: bump to v0.10.1 (#1628)
  • 3d0f3f4 chore: deprecate old Text analytics APIs to prepare for refactor (#1627)
  • 2052e13 chore: remove deprecated lime APIs (#1620)
  • 09213b0 fix: fix speechToTextSuite serializationFuzzing failure (#1626)
  • 9f78bf0 fix: fix translator endpoint and update all endpoints for gov regions (#1623)
  • 7e90d19 docs: add aisample notebooks into community folder (#1606)
  • ac40e5a chore: update openai service to official, and disable test due to outage (#1619)
  • f54f7f6 docs: add aisample time series forecasting (#1614)
  • 7b4b0e1 build: bump EnricoMi/publish-unit-test-result-action from 1 to 2 (#1609)
  • 43b0d17 build: bump actions/setup-node from 2 to 3 (#1610)
See More
  • c48a07a build: bump actions/setup-python from 2.3.2 to 4.2.0 (#1611)
  • b1a331c build: bump actions/setup-java from 2 to 3 (#1612)
  • 78e40cb chore: Auto update github actions with dependabot (#1608)
  • 69d2d20 chore: hotfix binder badge
  • 93d7ccf chore: pin binder version for users (#1607)
  • c7a61ec fix: binder runtime issues (#1598)
  • c960c06 docs: fix .NET logo on website (#1604)
  • 28a35b4 fix: clean up cluster if databricks tests pass (#1599)
  • 5a28740 fix: fix deep-learning test flakiness (#1600)
  • adf1a61 fix: update dotnetTestBase assembly version (#1601)
  • c659b33 feat: add stale bot for issues (#1602)
  • 05a4202 docs: improve OpenAI notebook (#1596)
  • e019756 feat: Support gray scale images in toNDArray (#1592)
  • 51beaa0 feat: Add the descriptionExcludes parameter to AnalyzeImage (#1590)
  • b9ac22a docs: pin mybinder to v0.10.0 to avoid thrashing
  • 1808a0f chore: Bump spark to 3.2.2
  • 8e7d453 build: simplify e2e test pipeline with test matrix
  • 8e34c7b chore: bump spark version
  • 44c8ed5 feat: Added the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)
  • e4f0883 fix: fix flaky forms test (#1584)
  • 7da5f49 chore: Format welcome message with emojis (#1583)
  • 0e6bb35 Serena/update issue template (#1582)
  • a6a2718 docs: add demo into videos on website (#1581)
  • 7c34fc4 test: stabilize unit tests (#1576)
  • 49f3a58 chore: Add welcome message to new PRs/Issues (#1573)
  • 4868e8b Add back LightGBM library initialization in booster (#1575)
  • d427b88 docs: update installation guidance of v0.10.0 (#1578)
  • 55a60c9 docs: add more .net samples (#1570)
  • 39fe2d8 chore: Add GH workflow to label new/reopened issues (#1571)
  • 0febe3c docs: add dotnet installation & example doc (#1567)
  • db95a10 chore: update website (#1566)

This list of changes was auto generated.

v0.10.0

18 Jul 02:50
e9986fe
Compare
Choose a tag to compare

SynapseML

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.10.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights

OpenAI Language Models .NET, C#, and F# Support Full MLFlow Support Live Demos in Browser
Embed 175-billion parameter models into your databases with ease Use or train any SynapseML model from .NET Quick and easy MLOps, model management, and autologging Explore the SynapseML library with zero setup
Learn More Getting Started Guide Explore the Docs Run in Browser

New Features

General ✨

Azure Cognitive Services for Big Data 🧠

Responsible AI at Scale 😇

  • Added partial dependence plots (PDP) to allow for understanding how independent variables affect a model's prediction (#1426)
  • Updated ICE/PDP documentation with PDP-based feature importance and additional examples (#1441, #1352)
  • Added a notebook for ICE and PDP feature explainers (#1318)
  • Updated data balance documentation to better describe how it can be used to ensure model fairness (#1540)

MLFlow 🔃

LightGBM on Spark 🌳

  • Added the ability to pass in generic argument strings to LightGBM enabling many complex parameterizations (#1444)
  • Added seed parameters to LightGBM (#1387)
  • Added a method to get LightGBM native model string directly (#1515)
  • Fixed issue with validation data creation during useSingleDataset mode (#1527)
  • Fixed multiclass training with initial scores (#1526)
  • Fixed saving LightGBM model iterations with early stopping (#1497)
  • Fixed issue where chunk size parameter was incorrectly specified during data copy (#1490)
  • Fixed issue where when empty partition is chosen as the main worker in singleDatasetMode (#1458)
  • Fixed bug with data repartitioning in LightGBMRanker (#1368)
  • Fixed outdated docs for useSingleDatasetMode (#1562)
  • Refactored LightGBM class structure to improve logging and debugging (#1557)

Vowpal Wabbit 🐇

  • Fixed issues with the saveNativeModel for the VWRegressionModel #1364 (#1366)
  • Fixed issues with building quadratic interaction terms (#1460)

Isolation Forests 🌲

Additional Updates

Maintenance 🔧

  • Removed unused debugging code (#1546)
  • Remove Synapse test exclusion for Explanation Dashboard notebook (#1531)
  • Made python style checks verbose (#1532)
  • Fixed library checking while installing library on Databricks cluster (#1488)
  • Upgraded and fix Dockerfiles (#1472)
  • Added Developer Docker Image build to pipeline (#1480)
  • Fixed ADO area path in Issue Linker (#1464)
  • Fix master version badge display
  • Improved Databricks error reporting
  • Updated azure cli to stop build errors
  • Fixed SSL handshake flakiness
  • Added itsdangerous as a dependency to ADB tests (#1412)
  • Turned on debug for pr to work item workflow
  • Pointed pr linker to official implementation
  • Changed GitHub action trigger from pull_request_target to pull_request (#1413)
  • Fixed issue where Unit Tests were not executing ([#1409](https://github.com/Microsoft/SynapseML/issu...
Read more

SynapseML v0.9.5

12 Jan 22:42
79d92d3
Compare
Choose a tag to compare

SynapseML

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.9.5 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.

Highlights

Geospatial Intelligence Multivariate Anomaly Detection Responsible AI at Scale Text To Speech Healthcare Analytics
Large-scale map and geocoding operations Build custom time series anomaly detection systems Distributed Conditional Expectation and Partial Dependence Analysis East-to-use Neural Text to Speech for large datasets Quickly understand entities and relationships in corpora of medical text.

New Features

Geospatial Intelligence 🗺️

  • Added support for distributed geospatial queries backed by the Azure Maps API
  • Added the geospatial usage overview (#1339)
  • Explore how to use the geospatial intelligence services to analyze flood risks. (#1339)
  • Added the AddressGeocoder transformer to map informal addresses to standardized adresses with latitude and longitude (#1294)
  • Added the ReverseGeocoder transformer to map latitude and longitude measurements to standardized addresses. (#1339)
  • Added the CheckPointInPolygon, to detect if latitude and longitude queries lie inside regions of interest (#1339)

Azure Cognitive Services for Big Data 🧠

  • Added the Healthcare Analytics Transformer for extracting medical information, entities, and relationships for text. [Example Usage] (#1329)
  • Added the FitMultivariateAnomaly estimator for training custom anomaly detection models on DataFrames of multivariate time series data (#1272)
  • Added example notebook for Multivariate Anomaly Detector
  • See how to train a custom Multivariate Anomaly detector in the Estimators reference docs (#1323)
  • Added simplified Text Analytics transformers that support auto-batching (#1329)
  • Added the TextToSpeech Transformer for transforming Dataframes of text to audio files with neural voice synthesis (#1320)
  • Added the TextAnalyze transformer to support executing multiple text analytics workloads within a single API call (#1267, #1312)

Responsible AI at Scale 😇

  • Added Individual Conditional Expectation explanations and Partial Dependence Plots with the ICETransformer. This tool gives detailed explanations of how features in opaque-box models affect the model prediction. (#1284)
  • Learn about how to use the ICETransformer through an example with the Adult Census dataset

MLFlow 🔃

  • Add MLFlow support for saving and loading SynapseML models (#1277)

LightGBM on Spark 🌳

  • Improved LightGBM training performance 4x-10x by setting num_threads to be cores-1 (#1282)
  • Added the predict_disable_shape_check in LightGBM (#1273)
  • Reduced temporary file bloat by creating the LightGBM native temp directory lazily (#1326)
  • Added logging for number of columns and rows when creating datasets, set useSingleDatasetMode=True by default (#1222)

Infrastructure 🏭

  • SynapseML now installable from Maven Central!
  • SynapseML now supports spark v3.2.x

Additional Updates

Bug Fixes 🐞

  • Allowed FlattenBatch to propagate non-array values (#1286)
  • Fixed flaky tests (#1342)
  • Fixed website bugs and migrated docSearch (#1331)
  • Fixed issue where IsolationForestModel does not properly exchange params with the inner model (#1330)
  • Corrected the objective param when using fobj (#1292)
  • Fixed issue where broadcasted sum in breeze 1.0 breaks in Spark 3.2.0 (#1299)
  • Hotfixes for R test runners (#1283)
  • fix installation instruction (#1268)
  • Removing broadcast hint (#1255)
  • fix install instructions (#1259)

Build 🏭

  • bump algoliasearch-helper from 3.6.1 to 3.6.2 in /website (#1270)
  • remove some deps that cause sec issues (#1264)

Documentation 📘

  • Fixed broken link to CyberML notebook (#1322)
  • Added website announcement bar (#1263)
  • Updated and improve readme (#1262)
  • Removed references to runme in contributing.md
  • Supported Math expressions in website markdown (#1278)
  • Corrected Synapse typo in website (#1335)

Maintenance 🔧

  • Stopped lightGBM tests from timing out (#1315)
  • Fixed r test flakiness (#1314)
  • Updated VerifyLightGBMClassifier.scala (#1313)
  • Update speech SDK test results
  • Add in missing tests in build (#1300)
  • Fix flaky build steps (#1298)
  • Fix website telemetry (#1261)
  • Add website telemetry (#1260)
  • Added missing test classes to pipeline

Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:

Serena Ruan Ilya Matiach Sudhindra Kovalam
Serena is an engineer on the Azure Synapse team in Beijing. In this release, Serena has continued her unbelievable speed of contributions with support for Multivariate Anomaly Detection, MLFlow, and installation from Maven Central. These contributions are just a few of the many projects Serena has contributed since she joined just a few months ago! Ilya is a prolific engineer on the Azure Machine Learning Boston team working on responsible AI. Ilya contributed LightGBM on Spark and worked tirelessly to improve and support this feature. Ilya has been an active contributor to the SynapseML project for 5 years and has built many of the tools in the library. Sudhindra is an engineer on the Microsoft Maps team and has contributed intelligent geospatial APIs to SynapseML v0.9.5. Sudhindra developed new ways to automate generation of Spa...
Read more

SynapseML v0.9.4

16 Nov 05:19
e6da4d5
Compare
Choose a tag to compare

SynapseML

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.

Highlights

General Availability on Synapse ONNX on Spark Responsible AI Form Recognition and Translation Reinforcement Learning
We are ready to help you productionalize on Azure Synapse Analytics Distributed and hardware accelerated model inference on Spark Understand opaque-box models, measure dataset biases, Explainable Boosting Machines Parse PDFs and translate dataframes between over 100 languages Contextual Bandit Reinforcement Learning with Vowpal Wabbit

New Features

General ✨

  • Renamed and rebranded! Microsoft ML for Apache Spark is now SynapseML
  • New modular library sub-packages for standalone install of each major set of features
  • Support Spark 3.1.2 and Scala 2.12
  • Support pip install synapseml for python bindings

ONNX on Spark 🕸

Cognitive Services for Big Data🧠

  • Added Multilingual Translation APIs (#1108) (Tutorial)
  • Added FormRecognition APIs (Invoice, IDs, BusinessCards, Layouts, Custom Models) (#1099) (Tutorial)
  • Added the FormOntologyLearner to extract meaningful "ontologies" of objects from collections of forms
  • Add notebook to Create a Multilingual Search Engine from Forms
  • Updated Text Analytics API to V3.1 (#1193)
  • Add redactedText to PIIV3 (#1247)
  • Added Personally Identifying Information (PII) identification
  • Added Read API
  • Added Conversation Transcription API
  • Cognitive service now support data exfiltration protected (DEP) VNET allowing for individualized security solutions on Synapse Analytics (Learn More)
  • Added support for the m4a codec in Speech to Text models
  • Added predictive maintenance notebook
  • Added Cognitive Service overview notebook
  • Added support for linked service authentication in Synapse Analytics
  • Simple no-code support in in Synapse Analytics

Responsible AI at Scale 😇

  • Added Additive Shapley Explanations (SHAP) for understanding the predictions of opaque-box models (#1077)
  • New API for Locally Interpretable Model-Agnostic Explanations (LIME), now supports background distributions text models, and has the same API as SHAP (#1077)
  • Added Measure transformers for Data Balance Analysis (#1218)
  • Add more notebook samples for documentation (#1043)
  • Documentation and notebooks for Interpretability on Spark
  • Introduce Responsible AI section on website (Interpretability + DataBalanceAnalysis) (#1241)
  • Adding document and notebook for Data Balance Analysis (#1226)
  • Explainable Boosting Machines for performant and interpretable ML (Private preview on Synapse Analytics only)

Vowpal Wabbit 🐇

  • Added ContextualBandit reinforcement learning (#896)
  • Added Vowpal Wabbit Overview Notebook

LightGBM 🌳

  • Added matrix type parameter and improve logic to automatically infer dataset sparsity (#1052)
  • Added several parameters related to dart boosting type (#1045)
  • Added chunk size parameter for copying java data to native (#1041)
  • Added number of threads parameter (#1055)
  • Added custom objective function to LightGBM learners (#1054)
  • Added singleton dataset mode for faster performance and reduced memory usage (#1066)
  • Add num iteration and start iteration parameters to LightGBM model (#1024)
  • Added the average precision metric (#1034)
  • Added overview notebook for LightGBM
  • Moved to new streaming API for dense data to reduce memory usage
  • Tuned chinking code for faster performance

Build and Infrastructure Improvements 🏭

  • New Docusaurus website generation system
  • E2E Tests on Synapse Analytics (#1014)
  • Split library into separately installable subprojects (#1073)
  • Added a unified logging and telemetry system (#1019)
  • Modernized R wrapper generation
  • New Automated Python test generation (#998)
  • New extensible code generation system
  • New two-tiered security for build secrets
  • Update ubuntu version to 18.04
  • Automated back-up ACR images

Additional Updates

Bug Fixes 🐞

  • Enable backwards compatibility for mmlspark python namespace imports (#1244)
  • Fix publishing to maven and pypi (#1242)
  • Fix broken link to notebook in Data Balance Analysis doc (#1240)
  • min_data_in_leaf missing from dataset parameters in lightgbm (#1239)
  • Fix performance issue in interpretability notebooks (#1238)
  • Fixed cognitive service errors (#1176)
  • Fixed flaky tests
  • Rename NERPii to PII
  • Fixed cog service test flakes
  • Fixed setLinkedService issues in Synapse (#1177)
  • Improved LGBM error message for invalid slot names (#1160)
  • Fixed generated python code (#1121)
  • Updated notebookUtils class path (#1118)
  • Fixed LIME NaN weight output (#1117, #1112)
  • Fixed Guava version issue in Azure Synapse and Databricks (#1103)
  • Fixed flakiness in spark session stopping
  • Fixed result parsing for forms
  • Fixed explainers returning wrong results when targetClassesCol is specified
  • Fixed CNTKModel issue due to catalyst bug on databricks (#1076)
  • Fixed null handling in bing image response (#1067)
  • Avoided strange issue with databricks json parser
  • Fixed dependency exclusions and build secret querying
  • Fixed issue in tabular lime sampler (#1058)
  • Updated Bing search URLs (#1048)
  • Refactored python wrappers to use common class (#758)
  • Updated java params patch (#1027)
  • Added missing returns in new python lightGBM model methods
  • Stop R binding generation from failing silently
  • Fixed conversation transcription participant column functionality
  • Reduce verbosity to...
Read more

SynapseML v0.9.2

03 Nov 03:11
81f5f80
Compare
Choose a tag to compare

v0.9.2

Bug Fixes 🐞

  • fix publish to central maven (#1233)
  • fix website (#1234)
  • fix typo in sbt install
  • lightgbm default params should not be specified if optional (#1232)
  • fix website broken links (#1230)
  • improve azure search writer error message in Array[Array[]] case
  • update baseUrl and fix static images (#1217)
  • Fixing flaky unit tests (#1215)
  • Docker image should install openjdk-8-jre as opposed to default-… (#1211)
  • Fixing flaky test

Documentation 📘

  • add explanation dashboard integration example notebook (#1236)
  • fix links to developer readme and R setup (#1229)

Feat

  • Build our new website (#1190)

Features 🌈

  • support direct pip install (#1223)
  • Measure transformers for Data Balance Analysis (#1218)
  • Add the FormOntologyLearner

Maintenance 🔧

  • release synapseml 0.9.2 (#1237)

Performance Improvements 🚀

  • website enhancement (#1221)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

See More
  • c5e1742 feat: Measure transformers for Data Balance Analysis (#1218)
  • 73c6a65 fix: improve azure search writer error message in Array[Array[]] case
  • d8344c5 feat: Add the FormOntologyLearner
  • 2d81b50 fix: update baseUrl and fix static images (#1217)
  • e23041f fix: Fixing flaky unit tests (#1215)
  • 5d31e3e fix: Docker image should install openjdk-8-jre as opposed to default-… (#1211)
  • 9623b3e Feat: Build our new website (#1190)
  • 3f74133 fix: Fixing flaky test

This list of changes was auto generated.