Revision as of 23:46, December 15, 2016 by Dsolnit (talk | contribs) (All Results tab)
Jump to: navigation, search

Testing Models

Once you’ve created a model, you can test it. Testing takes a model and has it analyze a training object. A moment’s thought will tell you that the training object must be

  • One with the same root category as the model
  • Not the one that was used to create the model,

Schedule the test on the Testing Schedule tab: simply select the model, the TDO to test it on, and the start time for the test.

To see the test results for a model, select the model, on either the Testing Schedule or Models tab, and click the eye icon. The eye icon will be active only if you’ve select a model that has been tested.

Understanding the test results is where it gets interesting.

All Results tab

Average Results

File:X.png This tab shows three graphs.

Precision and Recall

The vertical axis: the Precision (black) and Recall (blue) ratings at a given Confidence level.

To understand Precision and Recall, consider several possible ways of looking at the performance of a model. If your model attempts to assign a certain number of items to a category X, you can make the following counts:

  • a = the number of items the model correctly assigns to X
  • b = the number of items the model incorrectly assigns to X
  • c = the number of items the model incorrectly rejects from X (that is, items that the model should assign to X but does not)

From these quantities, you can calculate the following performance measures:

  • Precision = a /(a + b)
  • Recall = a /(a + c)

Generally, for increasing precision you pay the price of decreasing recall. That is, the model assigns an item to a category only when it is very sure that the item belongs. But by insisting on being very sure, it runs the risk of rejecting items that really do belong in the category.

Confidence

The horizontal axis: a numerical score, from 1 to 100, that indicates the likelihood, according to the selected model, that a text object belongs in a certain category.

(By contrast, accuracy is an assessment, produced by testing, of the correctness of a model’s assignment of text objects to categories. In other words, confidence expresses a model’s guess about a categorization; accuracy rates the correctness of that guess)

Correct in NTop N

When a model classifies a text object, it returns a list of categories and the probability (the Confidence rating) that the object belongs to them. Ranking the returned categories with the highest probability first, how likely is it that the correct category appears within the top two, the top three, and so on?

  • Includes Correct Category. The vertical axis: per cent likelihood.
  • N Best Categories. The horizontal axis: best, best two, best three, and so on.

Category Confusion

sldkfjsklj

Results by Category tab

Comments or questions about this documentation? Contact us for support!