Creating Policies

The Policy is the core of your bot, and it really just has one important method:

def predict_action_probabilities(self, tracker, domain):
    # type: (DialogueStateTracker, Domain) -> List[float]

    return []

This uses the current state of the conversation (provided by the tracker) to choose the next action to take. The domain is there if you need it, but only some policy types make use of it. The returned array contains the probabilities for each action to be executed next. The action that is most likely will be executed.

Let’s look at the simplest possible example, taken from our hello world bot:

class SimplePolicy(Policy):
    def predict_action_probabilities(self, tracker, domain):
        # type: (DialogueStateTracker, Domain) -> List[float]

        responses = {
            "greet": 3,

        if tracker.latest_action_name == ACTION_LISTEN_NAME:
            key = tracker.latest_message.intent["name"]
            action = responses[key] if key in responses else 2
            return utils.one_hot(action, domain.num_actions)
            return np.zeros(domain.num_actions)

How does this work? When the controller processes a message from a user, it will keep asking for the next most likely action using predict_action_probabilities. The bot then executes that action, until it receives an ActionListen instruction. This breaks the loop and makes the bot await further instructions.

In pseudocode, what the SimplePolicy above does is:

-> a new message has come in

if we were previously listening:
    return a canned response
    we must have just said something, so let's Listen again

Note that the policy itself is stateless, and all the state is carried by the tracker object.

Creating Policies from Stories

Writing rules like in the SimplePolicy above is not a great way to build a bot, it gets messy fast & is hard to debug. If you’ve found Rasa Core, it’s likely you’ve already tried this approach and were looking for something better. A good next step is to use our story framework to build a policy by giving it some example conversations. We won’t use machine learning yet, we will just create a policy which memorises these stories.

We can use the MemoizationPolicy and the PolicyTrainer classes to do this.

Here is the PolicyTrainer class:

class PolicyTrainer(object):
    def __init__(self, ensemble, domain, featurizer):
        self.domain = domain
        self.ensemble = ensemble
        self.featurizer = featurizer

    def train(self, filename=None, max_history=3,
              augmentation_factor=20, max_training_samples=None,
              max_number_of_trackers=2000, **kwargs):
        """Trains a policy on a domain using training data from a file.

        :param augmentation_factor: how many stories should be created by
                                    randomly concatenating stories
        :param filename: story file containing the training conversations
        :param max_history: number of past actions to consider for the
                            prediction of the next action
        :param max_training_samples: specifies how many training samples to
                                     train on - `None` to use all examples
        :param max_number_of_trackers: limits the tracker generation during
                                       story file parsing - `None` for unlimited
        :param kwargs: additional arguments passed to the underlying ML trainer
                       (e.g. keras parameters)
        :return: trained policy

        logger.debug("Policy trainer got kwargs: {}".format(kwargs))

        X, y = self._prepare_training_data(filename, max_history,

        self.ensemble.train(X, y, self.domain, self.featurizer, **kwargs)

    def _prepare_training_data(self, filename, max_history, augmentation_factor,
        """Reads training data from file and prepares it for the training."""

        from rasa_core.training_utils import extract_training_data_from_file

        if filename:
            X, y = extract_training_data_from_file(
            if max_training_samples is not None:
                X = X[:max_training_samples, :]
                y = y[:max_training_samples]
            X = np.zeros((0, self.domain.num_features))
            y = np.zeros(self.domain.num_actions)
        return X, y

What the train() method does is the following:

  1. reads the stories from a file
  2. creates all possible dialogues from these stories
  3. creates the following variables:
    1. y - a 1D array representing all of the actions taken in the dialogues
    2. X - a 2D array where each row represents the state of the tracker when an action was taken
  4. calls the policy’s train() method to create a policy from these X, y state-action pairs (don’t mind the ensamble it is just a collection of policies - e.g. you can combine multiple policies and train them all at once using the ensemble)


In fact, the rows in X describe the state of the tracker when the previous max_history actions were taken. See Featurization for more details.

For the MemoizationPolicy, the train() method just memorises the actions taken in the story, so that when your bot encounters an identical situation it will make the decision you intended.

Generalising to new Dialogues

The stories data format gives you a compact way to describe a large number of possible dialogues without much effort. But humans are infinitely creative, and you could never hope to describe every possible dialogue programatically. Even if you could, it probably wouldn’t fit in memory :)

So how do we create a policy which behaves well even in scenarios you haven’t thought of? We will try to achieve this generalisation by creating a policy based on Machine Learning.

You can use whichever machine learning library you like to train your policy. One implementation that ships with Rasa is the KerasPolicy, which uses Keras as a machine learning library to train your dialogue model. These base classes have already implemented the logic of persisting and reloading models.

By default, each of these trains a linear model to fit the X, y data.

The model is defined here:

    def _build_model(self, num_features, num_actions, max_history_len):
        """Build a keras model and return a compiled model.

        :param max_history_len: The maximum number of historical
                                turns used to decide on next action
        from keras.layers import LSTM, Activation, Masking, Dense
        from keras.models import Sequential

        n_hidden = 32  # Neural Net and training params
        batch_shape = (None, max_history_len, num_features)
        # Build Model
        model = Sequential()
        model.add(Masking(-1, batch_input_shape=batch_shape))
        model.add(LSTM(n_hidden, batch_input_shape=batch_shape))
        model.add(Dense(input_dim=n_hidden, units=num_actions))


        return model

and the training is run here:

    def train(self, X, y, domain, **kwargs):
        self.model = self._build_model(domain.num_features,
        y_one_hot = np.zeros((len(y), domain.num_actions))
        y_one_hot[np.arange(len(y)), y] = 1

        number_of_samples = X.shape[0]
        idx = np.arange(number_of_samples)
        shuffled_X = X[idx, :, :]
        shuffled_y = y_one_hot[idx, :]

        validation_split = kwargs.get("validation_split", 0.0)"Fitting model with {} total samples and a validation "
                    "split of {}".format(number_of_samples, validation_split)), shuffled_y, **kwargs)
        self.current_epoch = kwargs.get("epochs", 10)"Done fitting keras policy model")

You can implement the model of your choice by overriding these methods.