How Do Chatbots Learn?

How do chatbots learn on their own and become "intelligent"? Read on to learn about the major approaches to developing self-learning chatbots.

Toy robot with a growing brain.
Illustration: © IoT For All

One of the questions we get asked by customers a lot is “Can your bot learn on its own?”. The popular belief is that a bot is truly intelligent only when it’s able to learn on its own. Here, we will examine what the aforementioned question really means, based on our experience building enterprise chatbots.

Before we get started on self-learning bots, let’s first understand how bots are built. There are broadly two major approaches to building chatbots.

Approaches to Chatbot Development

1. Retrieval Based

Retrieval based bots work on the principle of directed flows or graphs. The bot is trained to rank the best response from a finite set of predefined responses. The responses here are entered manually, or based on a knowledge base of pre-existing information.

Eg. What are your store timings?

Answer: 9 to 5 pm

These systems can be extended to integrate with 3rd Party systems as well.

Eg. Where’s my order?

Answer: It’s on its way and should reach you in 10 minutes.

Retrieval based bots are the most common types of chatbots that you see today. They allow bot developers and UX to control the experience and match it to the expectations of customers. They work best for goal-oriented bots in customer support, lead generation and feedback. We can decide the tone of the bot and design the experience, keeping in mind the customer’s brand and reputation.

2. Generative

Another method of building chatbots is using a generative model. A good starting point for generative models is the seq2seq neural network. This network was initially released for machine translation but has also proved to be quite effective when it comes to building generative chatbots. These chatbots are not built with predefined responses. Instead, they’re trained using a large number of previous conversations, based upon which responses to the user are generated. We won’t delve too deeply into how generative models work. You can learn more about them in this video.

Generative models are good for conversational chatbots with whom the user is simply looking to exchange banter. These models will virtually always have a response ready for you. However, in many cases, the responses might be arbitrary and not make a lot of sense to you.  The chatbot is also prone to generating answers with incorrect grammar and syntax.

Generative chatbots also require a very large amount of conversational data to train. We trained our seq2seq implementation for our reminders bot using  2 million conversations. Customers usually don’t always have this large amount of data readily available.

Generative + Retrieval

It’s important to apply technology in the appropriate context to make sure we’re delivering value to customers as well as to the end users. Our approach has been to employ a mixed model that takes the best of both worlds.

The Generative model primarily helps us improve small talk capabilities, i.e. chit chat and banter that users might want to indulge in with the bot.  You can select and customize the tone of small talk – Funny, Formal etc.

However, the primary focus of bots is to keep in mind the goal of the customer, to help users resolve support queries and to provide them with information. A retrieval based system is best equipped to meet such needs today.


So how do we build a self-learning bot for retrieval based systems and generative systems?

Generative systems require simply feeding in the response to a particular question that may be missing from the initial data used to train the model. To make the bot self-learning, bot developers have taken an approach to let users themselves train the bot. This has had a lot of unexpected consequences.

The Microsoft Tay Bot, for example, was gamed by users using the ‘repeat after me’ function. This function was built to let users on Twitter train the bot and let it learn by itself. However, without any filter on what the bot was being trained on, it didn’t take long for users with mal-intent to retrain the bot with hate speech and extreme right-wing propaganda.

Thus, it’s evident that Self-learning generative models can be quite risky, given the potential for bots to be reprogrammed by users.

Training retrieval based systems required to keep the bot learning on its own involves a few categories of self-learning:

1. New Intents

Users are showing a new intent. The bot might have been built only for ordering a pizza, but not for cancellation of the order. This requires a bot developer to build the order cancellation intent and integrate with the Cancellation API.

Our system semi-automates this process. The system automatically aggregates and recommends new intents based on user conversations. A human agent can then choose to add it to the bot’s knowledge base or reject it.

Image Credit: Haptik

2. Missing Variations

The other category is missing data. The system reflects this as well, by recommending new variations for a particular intent that’s similar to the existing intents. However, to prevent bad data from polluting the bot’s intelligence, this also needs to be cleared by a human agent. The agent will evaluate the data and test the bot, to make sure that no existing functionality has been affected.

Image Credit: Haptik

3. Incorrectly Labelled Variations

This is the hardest category to solve for since, in this case, the bot has managed to respond to the user, but it’s an incorrect response. We focus on keeping a close eye on User Feedback, with our built-in CSAT system for every chat. Users can mark a particular conversation as helpful or not helpful. Our Bot QA then reviews these conversations to check if the bot needs tuning.

Image Credit: Haptik

4. Contextual Word Representations

This is an ongoing process to improve the word embeddings used by the bot. These embeddings provide and expand the vocabulary of the bot. By adding more data about how users from different geographies use colloquial language, the bot gains a better understanding of these nuances. Haptik is fortunate to have hundreds of millions of past conversational messages to learn from. New conversational data with new vocabulary is regularly used to retrain the word embedding models on a regular basis. This allows bot A to get smarter with data from bot B!


We might see self-learning chatbots like Iron Man’s JARVIS in the coming decade, but they’re not quite production ready solutions for businesses today.

The approach we’ve discussed above is known as Human-In-The-Loop (HITL) learning. HITL leverages both human and machine intelligence to create machine learning models. People are constantly involved in the training, tuning and testing of these bots. Our system is meant to empower these individuals, by giving them the tools to do their job better. Our unique chatbot analytics tool is built for that very purpose.

Our Active Learning system prompts our bot trainers to verify messages that have a low confidence score and validate those judgements before feeding them back into the model.

So, as you can see, there’s a lot that goes into preparing a self-learning bot to successfully cater to the needs of any enterprise and their customers.

This article has been authored by Prateek Gupte, Director of Product and Engineering at Haptik.

Haptik is one of the world's largest Conversational AI platforms, reaching over 30 million devices monthly. The company has been at the forefront of the paradigm shift from apps to chatbots, having built a robust set of technology and tools that e...
Haptik is one of the world's largest Conversational AI platforms, reaching over 30 million devices monthly. The company has been at the forefront of the paradigm shift from apps to chatbots, having built a robust set of technology and tools that e...