Text Analytics with Azure Cognitive Services

When it comes to extracting valuable insights from unstructured data, I always fall back on Azure cognitive services. This gives a very strong set of build in AI capabilities to easily analyze text and classify unstructured data seamlessly. The Text Analytics API is a cloud-based service that provides Natural Language Processing (NLP) features for text mining and text analysis, including: sentiment analysis, opinion mining, key phrase extraction, language detection, and named entity recognition. Discover insights in unstructured text using natural language processing. Identify key phrases and entities such as people, places, and organizations to understand common topics and trends. Classify medical terminology using domain-specific, pretrained models. Gain a deeper understanding of customer opinions with sentiment analysis. Evaluate text in a wide range of languages.

In this blog I am going to show you how you can use the Text Analytics API in popular programming languages like python to build a simple text analyzer engine. Lets get started with the code

First step is to install the python package for Azure Cognitive Services in your python environment.

pip install azure.cognitiveservices.language.textanalytics
-- coding: utf-8 --
 import os
 from azure.cognitiveservices.language.textanalytics import TextAnalyticsClient
 from msrest.authentication import CognitiveServicesCredentials

If the pip install ran successfully the above code should execute without issues in a Jupyter notebook.

The next step is the create the Azure cognitive services resource in the Azure portal. This is very straight forward and the steps to create the resources are here. Once the resource is created, retrieve the endpoint and the api keys

Continuing the python notebook. Now define the authentication function as shown below.

def authenticateClient():
     credentials = CognitiveServicesCredentials('your api key here')
     text_analytics_client = TextAnalyticsClient(
         endpoint='https://yourtextanalyticsserice namehere.cognitiveservices.azure.com/', credentials=credentials)
     return text_analytics_client

Alternatively, a better approach is to use Azure Keyvault to get the keys or environment variables. I will be making another blog on how to use Azure key vaults to retrieve sensitive information. For now, below is the code to set and get environment variables.

import os 
# Set environment variables 
os.environ['API_KEY'] = 'KEY'  
# Get environment variables 
mykey = os.getenv('API_KEY')

Now that we have setup the Azure resource, authenticated to it and setup the python environment, we are now ready to explore the functionality and use case. Here is a function that will accept a document and return the sentiment score of the document. The score is from 0 to 1. 0 being the most negative and 1 being the most positive in terms of sentiment analysis.

def sentiment():

client = authenticateClient()
try:
   documents = [
             {"id": "1", "language": "en", 
                 "text": "Thank you for all that you're doing"},
             {"id": "2", "language": "en", "text": "I don't like this book at all."},
            {"id": "3", "language": "es", 
                 "text": "No tengo dinero ni nada que dar…"},
            {"id": "4", "languagez": "it",
                 "text": "L'hotel veneziano era meraviglioso. È un bellissimo pezzo di architettura."},
            {"id": "5", "language": "en",
                 "text": "IBM Intg Bus PVU Lic"}
         ]
response = client.sentiment(documents=documents)
         for document in response.documents:
             print("Document Id: ", document.id, ", Sentiment Score: ",
                   "{:.2f}".format(document.score))
except Exception as err:     
       print("Encountered exception. {}".format(err))
sentiment()            

Output

Document Id: 1 , Sentiment Score: 0.99
Document Id: 2 , Sentiment Score: 0.02
Document Id: 4 , Sentiment Score: 0.16
Document Id: 5 , Sentiment Score: 0.50
Document Id: 3 , Sentiment Score: 0.44

Published by Narayan Sujay Somasekhar

• 12+ years of experience leading the build of BI and Cloud Data Platform solutions using cloud technologies such as Snowflake, Azure Synapse, Databricks and AWS Redshift. • Over 8+ years as a Data Analytics and Engineering practice leader with demonstrated history of working with management consulting firms across Tax & Accounting, Finance, Power & Utility industry. • Experience in managing the team roadmap, and delivering actionable data insights to sales, product, marketing, and senior leadership. • Strong background in Data Technology Solutions delivery & Data Automation for business processes using various tools. • Expertise in bringing Data-Driven IT Strategic Planning to align metrics, communicate data changes across reporting, Enterprise Data Warehouses, Data Lakes and Customer Relationship Managements Systems. • Experienced working with cross functional teams, Data Scientists/Analysts and Business Managers in building Data Science and Data Engineering practice from the ground up. • Experienced in Designing and implementing NLP solutions with focus on sentiment analysis, opinion mining, key phase extraction using Azure Cognitive Services and Amazon Comprehend • Extensive programming experience with SQL, Python, C#, R, and Scala.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: