diff --git a/notebooks/features/cognitive_services/CognitiveServices - Overview.ipynb b/notebooks/features/cognitive_services/CognitiveServices - Overview.ipynb index 0bd793ac2c..fec62fa648 100644 --- a/notebooks/features/cognitive_services/CognitiveServices - Overview.ipynb +++ b/notebooks/features/cognitive_services/CognitiveServices - Overview.ipynb @@ -1,24 +1,8 @@ { - "metadata": { - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": 3 - }, - "orig_nbformat": 2 - }, - "nbformat": 4, - "nbformat_minor": 2, "cells": [ { "cell_type": "markdown", + "metadata": {}, "source": [ "\n", "\n", @@ -89,12 +73,12 @@ "\n", "### Search\n", "- [Bing Image search](https://azure.microsoft.com/en-us/services/cognitive-services/bing-image-search-api/) ([Scala](https://mmlspark.blob.core.windows.net/docs/0.9.5/scala/com/microsoft/azure/synapse/ml/cognitive/BingImageSearch.html), [Python](https://mmlspark.blob.core.windows.net/docs/0.9.5/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.BingImageSearch))\n", - "- [Azure Cognitive search](https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search) ([Scala](https://mmlspark.blob.core.windows.net/docs/0.9.5/scala/index.html#com.microsoft.azure.synapse.ml.cognitive.AzureSearchWriter$), [Python](https://mmlspark.blob.core.windows.net/docs/0.9.5/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.AzureSearchWriter))\n" - ], - "metadata": {} + "- [Azure Cognitive search](https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search) ([Scala](https://mmlspark.blob.core.windows.net/docs/0.9.5/scala/index.html#com.microsoft.azure.synapse.ml.cognitive.AzureSearchWriter$), [Python](https://mmlspark.blob.core.windows.net/docs/0.9.5/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.AzureSearchWriter))" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Prerequisites\n", "\n", @@ -104,21 +88,22 @@ "1. Replace any of the service subscription key placeholders with your own key.\n", "1. Choose the run button (triangle icon) in the upper right corner of the cell, then select **Run Cell**.\n", "1. View results in a table below the cell." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Shared code\n", "\n", "To get started, we'll need to add this code to the project:" - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from pyspark.sql.functions import udf, col\n", "from synapse.ml.io.http import HTTPTransformer, http_udf\n", @@ -127,13 +112,13 @@ "from pyspark.ml import PipelineModel\n", "from pyspark.sql.functions import col\n", "import os\n" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "if os.environ.get(\"AZURE_SERVICE\", None) == \"Microsoft.ProjectArcadia\":\n", " from pyspark.sql import SparkSession\n", @@ -148,13 +133,13 @@ " \"mmlspark-keys\", \"mmlspark-cs-key\")\n", " os.environ['AZURE_SEARCH_KEY'] = getSecret(\n", " \"mmlspark-keys\", \"azure-search-key\")" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from synapse.ml.cognitive import *\n", "\n", @@ -166,22 +151,22 @@ "anomaly_key = os.environ[\"ANOMALY_API_KEY\"]\n", "# A Translator subscription key\n", "translator_key = os.environ[\"TRANSLATOR_KEY\"]" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Text Analytics sample\n", "\n", "The [Text Analytics](https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/) service provides several algorithms for extracting intelligent insights from text. For example, we can find the sentiment of given input text. The service will return a score between 0.0 and 1.0 where low scores indicate negative sentiment and high score indicates positive sentiment. This sample uses three simple sentences and returns the sentiment for each." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Create a dataframe that's tied to it's column names\n", "df = spark.createDataFrame([\n", @@ -202,25 +187,21 @@ "# Show the results of your text query in a table format\n", "display(sentiment.transform(df).select(\"text\", col(\n", " \"sentiment\")[0].getItem(\"sentiment\").alias(\"sentiment\")))" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "## Healthcare Analytics Sample" - ], - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%% md\n" - } - } + "## Text Analytics for Health Sample\n", + "\n", + "The [Text Analytics for Heatlth Service](https://docs.microsoft.com/en-us/azure/cognitive-services/language-service/text-analytics-for-health/overview?tabs=ner) extracts and labels relevant medical information from unstructured texts such as doctor's notes, discharge summaries, clinical documents, and electronic health records." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, "outputs": [], "source": [ "df = spark.createDataFrame([\n", @@ -235,25 +216,21 @@ " .setOutputCol(\"response\"))\n", "\n", "display(healthcare.transform(df))" - ], - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - } - } + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Translator sample\n", "[Translator](https://azure.microsoft.com/en-us/services/cognitive-services/translator/) is a cloud-based machine translation service and is part of the Azure Cognitive Services family of cognitive APIs used to build intelligent apps. Translator is easy to integrate in your applications, websites, tools, and solutions. It allows you to add multi-language user experiences in 90 languages and dialects and can be used for text translation with any operating system. In this sample, we do a simple text translation by providing the sentences you want to translate and target languages you want to translate to." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from pyspark.sql.functions import col, flatten\n", "\n", @@ -276,21 +253,21 @@ " .withColumn(\"translation\", flatten(col(\"translation.translations\")))\n", " .withColumn(\"translation\", col(\"translation.text\"))\n", " .select(\"translation\"))" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Form Recognizer sample\n", "[Form Recognizer](https://azure.microsoft.com/en-us/services/form-recognizer/) is a part of Azure Applied AI Services that lets you build automated data processing software using machine learning technology. Identify and extract text, key/value pairs, selection marks, tables, and structure from your documents—the service outputs structured data that includes the relationships in the original file, bounding boxes, confidence and more. In this sample, we analyze a business card image and extract its information into structured data." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from pyspark.sql.functions import col, explode\n", "\n", @@ -311,22 +288,22 @@ " .transform(imageDf)\n", " .withColumn(\"documents\", explode(col(\"businessCards.analyzeResult.documentResults.fields\")))\n", " .select(\"source\", \"documents\"))" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Computer Vision sample\n", "\n", "[Computer Vision](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/) analyzes images to identify structure such as faces, objects, and natural-language descriptions. In this sample, we tag a list of images. Tags are one-word descriptions of things in the image like recognizable objects, people, scenery, and actions." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Create a dataframe with the image URLs\n", "df = spark.createDataFrame([\n", @@ -347,22 +324,22 @@ "# Show the results of what you wanted to pull out of the images.\n", "display(analysis.transform(df).select(\n", " \"image\", \"analysis_results.description.tags\"))\n" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Bing Image Search sample\n", "\n", "[Bing Image Search](https://azure.microsoft.com/en-us/services/cognitive-services/bing-image-search-api/) searches the web to retrieve images related to a user's natural language query. In this sample, we use a text query that looks for images with quotes. It returns a list of image URLs that contain photos related to our query." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Number of images Bing will return per query\n", "imgsPerBatch = 10\n", @@ -390,21 +367,21 @@ "\n", "# Show the results of your search: image URLs\n", "display(pipeline.transform(bingParameters))\n" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Speech-to-Text sample\n", "The [Speech-to-text](https://azure.microsoft.com/en-us/services/cognitive-services/speech-services/) service converts streams or files of spoken audio to text. In this sample, we transcribe one audio file." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Create a dataframe with our audio URLs, tied to the column called \"url\"\n", "df = spark.createDataFrame([(\"https://mmlspark.blob.core.windows.net/datasets/Speech/audio2.wav\",)\n", @@ -421,22 +398,52 @@ "\n", "# Show the results of the translation\n", "display(speech_to_text.transform(df).select(\"url\", \"text.DisplayText\"))\n" - ], + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Text-to-Speech sample\n", + "[Text to speech](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/#overview) is a service that allows one to build apps and services that speak naturally, choosing from more than 270 neural voices across 119 languages and variants." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, "outputs": [], - "metadata": {} + "source": [ + "from synapse.ml.cognitive import TextToSpeech\n", + "\n", + "# Create a dataframe with text and an output file location\n", + "df = spark.createDataFrame([(\"Reading out lod is fun! Check out aka.ms/spark for more information\", \"dbfs:/output.mp3\")], [\"text\", \"output_file\"])\n", + " \n", + "tts = (TextToSpeech()\n", + " .setSubscriptionKey(service_key)\n", + " .setTextCol(\"text\")\n", + " .setLocation(\"eastus\")\n", + " .setVoiceName(\"en-US-JennyNeural\") \n", + " .setOutputFileCol(\"output_file\"))\n", + "\n", + "# Check to make sure there were no errors during audio creation\n", + "display(tts.transform(df))" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Anomaly Detector sample\n", "\n", "[Anomaly Detector](https://azure.microsoft.com/en-us/services/cognitive-services/anomaly-detector/) is great for detecting irregularities in your time series data. In this sample, we use the service to find anomalies in the entire time series." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Create a dataframe with the point data that Anomaly Detector requires\n", "df = spark.createDataFrame([\n", @@ -470,22 +477,22 @@ "# Show the full results of the analysis with the anomalies marked as \"True\"\n", "display(anamoly_detector.transform(df).select(\n", " \"timestamp\", \"value\", \"anomalies.isAnomaly\"))" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Arbitrary web APIs\n", "\n", "With HTTP on Spark, any web service can be used in your big data pipeline. In this example, we use the [World Bank API](http://api.worldbank.org/v2/country/) to get information about various countries around the world." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Use any requests from the python requests library\n", "\n", @@ -514,22 +521,22 @@ "display(client.transform(df)\n", " .select(\"country\", udf(get_response_body)(col(\"response\"))\n", " .alias(\"response\")))\n" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Azure Cognitive search sample\n", "\n", "In this example, we show how you can enrich data using Cognitive Skills and write to an Azure Search Index using SynapseML." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "VISION_API_KEY = os.environ['VISION_API_KEY']\n", "AZURE_SEARCH_KEY = os.environ['AZURE_SEARCH_KEY']\n", @@ -555,9 +562,33 @@ " serviceName=search_service,\n", " indexName=search_index,\n", " keyCol=\"id\")\n" - ], - "outputs": [], - "metadata": {} + ] } - ] -} \ No newline at end of file + ], + "metadata": { + "application/vnd.databricks.v1+notebook": { + "dashboards": [], + "language": "python", + "notebookMetadata": { + "pythonIndentUnit": 2 + }, + "notebookName": "CognitiveServices - Overview.ipynb", + "notebookOrigID": 3559341777151035, + "widgets": {} + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": 3 + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}