Introducing Classifications in Watson Natural Language Understanding

Gaurav Kumbhat
IBM Data Science in Practice
4 min readAug 9, 2021

--

Natural language processing is fast becoming one of the most used AI tools across industries. There is a large variety of tasks in the field of natural language processing (NLP) that serve many use cases. Classifying text into groups is one of the most popular NLP tasks.

a three dimensional grid with axes, showing a group of different colored circles in each section

Classification is the task of analyzing input text and assigning predefined labels to this text. Labels can be of any type depending on the application. For example, they can be “spam” and “ham” in the classic spam detection use case, or they can be “positive” and “negative” for a sentiment use case.

IBM’s Natural Language Understanding (NLU) provides variety of NLP capabilities to analyze text in many different languages.

To support large varieties of text classification use cases, NLU has now extended its capabilities and introduced a new classifications feature. This feature allows users to perform multi-label classification by training a model with their own data and using it along with any other features in the analyze API. In multi-label classification, each input can belong to more than one class. In other words, NLU’s classifications feature uses specialized algorithms that can predict multiple mutually non-exclusive labels.

Text classification can be used for popular use cases, like classifying emails, support tickets, resumes, and reviews, among many other things.

In this article, we will go through the popular example of spam detection to show how easily we can train a custom model with this new feature.

Make Classifications model using NLU

a cartoon-drawn stereotypical looking bad-guy with a black mask over eyes with a fishing pole trying to catch one of many fish in a pond with “@” symbols on them
  • Provision a NLU instance by visiting IBM Cloud Catalog or use your existing NLU instance.
  • Copy the credentials generated after the provisioning, and keep them somewhere safe.
  • For demonstration purposes, we will use the spam-ham dataset available here. Download the SpamHam-Train.csv file.
  • To train a classifications model using NLU, we will use the curl command line tool.
  • Open a new terminal window, and go to the folder that contains the dataset downloaded above.
  • Using the dataset downloaded above, create a classifications model with following command:
  • The above curl request will return a JSON response, which will contain a field called model_id. We will use this model_id to refer to this model later on.
  • Let’s check the status of our model training with the following curl command:
  • The above curl command will return a response that will show us the status of our model training (note the “status” field in the following response).
{
"name": "Spam-Ham Classification",
"user_metadata": null,
"language": "en",
"description": "Demo spam detection model",
"model_version": "1.0.1",
"version": "1.0.1",
"workspace_id": null,
"version_description": null,
"status": "training",
"notices": [],
"model_id": "<CLASSIFICATIONS-MODEL-ID>",
"features": [
"classifications"
],
"created": "2021-07-23T06:35:55Z",
"last_trained": "2021-07-23T06:35:55Z",
"last_deployed": null
}

The value of status = training shows that our model is getting trained currently. Once the training is over, the model will automatically get deployed to NLU, and the status will change to available as soon as it is ready to use.

Tip: Training the classifier can take some time. Meanwhile, you can explore more about classifications (or other NLU capabilities) in the documentation, or check out this cool notebook explaining how to use this feature in Python.

  • Once the model's status shows as available , we can start using it in our application using the Analyze API.
  • Let’s try to make an analyze request using the following spam text, and see what our model predicts.
Urgent! Please call 09061213237 from a landline. 5000 cash or a 4* holiday await collection. T &Cs SAE PO Box 177 M227XY. 16+
  • The following command shows how we can use this model for prediction using curl:
  • The above analyze request returns the following response:
{
"usage": {
"text_units": 1,
"text_characters": 125,
"features": 1
},
"language": "en",
"classifications": [
{
"confidence": 0.974358,
"class_name": "spam"
},
{
"confidence": 0.024414,
"class_name": "ham"
}
]
}

This shows that our custom classifications model was able to mark the given text as spam with high confidence.

Conclusion

a group of circles of multiple sizes
Photo by Munro Studio on Unsplash

Watson Natural Language Understanding’s classifications feature provides a scalable and powerful text classification solution. It allows users to train a custom text classification model using their own data in just a few steps.

Let us know how you would like to use the classifications feature in your use cases. Check out documentation for more details.

Sign-up for Watson Natural Language Understanding here and try out classifications today!

--

--