Testing Models

Once you’ve created a model, you can test it. Testing takes a model and has it analyze a training object. A moment’s thought will tell you that the training object must be

One with the same root category as the model
Not the one that was used to create the model,

Schedule the test on the Testing Schedule tab: simply select the model, the TDO to test it on, and the start time for the test.

To see the test results for a model, select the model, on either the Testing Schedule or Models tab, and click the eye icon. The eye icon will be active only if you’ve select a model that has been tested.

Understanding the test results is where it gets interesting.

All Results tab

This tab shows the Average Results and Correct in Top N graphs, and the Category Confusion table

Average Results

Precision, Recall, and Confidence

This graph shows the Precision (black) and Recall (blue) ratings (vertical axis) at a given Confidence level (horizontal axis). But what do those terms mean? Read on:

Confidence

This is numerical score, from 1 to 100, that indicates the percent likelihood, according to the selected model, that a text object belongs in a certain category.

(In contrast, accuracy is an assessment, produced by testing, of the correctness of a model’s assignment of text objects to categories. In other words, confidence expresses a model’s guess about a categorization; accuracy rates the correctness of that guess)

Precision and Recall

To understand Precision and Recall, consider several possible ways of looking at the performance of a model. If your model attempts to assign a certain number of items to a category X, you can make the following counts:

a = the number of items the model correctly assigns to X
b = the number of items the model incorrectly assigns to X
c = the number of items the model incorrectly rejects from X (that is, items that the model should assign to X but does not)

From these quantities, you can calculate the following performance measures:

Precision = a /(a + b)
Recall = a /(a + c)

Generally, for increasing precision you pay the price of decreasing recall. That is, the model assigns an item to a category only when it is very sure that the item belongs. But by insisting on being very sure, it runs the risk of rejecting items that really do belong in the category. In the figure Precision, Recall, and Confidence, you can see this effect above the 85% Confidence level.

Correct in NTop N

Correct in Top N

When a model classifies a text object, it returns a list of categories and the probability (the Confidence rating) that the object belongs to them. Ranking the returned categories with the highest probability first, how likely is it that the correct category appears within the top two, the top three, and so on?

Includes Correct Category. The vertical axis: per cent likelihood.
N Best Categories. The horizontal axis: best, best two, best three, and so on.

Category Confusion

The Category Confusion table lists up to 10 pairs of categories that the model is likely to confuse.

Category Confusion

The Confusion column shows the probability, as a percentage, that the model will mistakenly classify a Category 1 item as Category 2. For example, the figure Category Confusion shows that this model classifies tigers as wolves 4 per cent of the time.

A rating of 50 would mean total confusion: the model cannot distinguish wolves from tigers. A rating of 100 would mean that the model always calls wolves tigers and always calls tigers wolves—a complete reversal.

If a pair of categories has a rating of over 20 and both categories have more than three or four members, you should consider modifying them. You can modify them in either of two directions:

Merge them; that is, decide that they are so similar they amount to a single category).
Further differentiate them by adding more highly contrasting training interactions to them in the Training Data Object.

Results by Category tab

This tab displays the same ratings as the All Results tab, but for a single category.

Results by Category

Category Confusion shows the category/ies that are likely to be confused with the category that's selected on the left.

Contents

Testing Models

All Results tab

Average Results

Confidence

Precision and Recall

Correct in NTop N

Category Confusion

Results by Category tab

Contact

Genesys

Customer Care

Legal