Contents
Text Preprocessing Examples
Examples
This topic describes part of the functionality of Genesys Content Analyzer.
The table "Examples of Preprocessing Filters" displays simple examples of text-preprocessing filters.
Pattern Type |
Pattern Body |
Input Text |
Test Result |
---|---|---|---|
DELETE AFTER |
finch |
one two finch three four |
one two |
DELETE BEFORE |
finch |
one two finch three four |
three four |
[Mm]essage_?[Ss]tart |
x897 message_Start one two three |
one two three | |
DELETE ALL IF FIND |
finch |
one two finch three four |
|
one two three four |
one two three four | ||
internal\d\d |
one two three internal36 four |
| |
DELETE ALL IF NOT FIND |
finch |
one two finch three four |
one two finch three four |
finch |
one two three four |
| |
DELETE PATTERN |
f.*ch\s |
one two finch three four |
one two three four |
one two fach three four |
one two three four |
a. If you test this filter, the resulting window contains the message TEXT HAS BEEN DELETED. In actual use of DELETE ALL IF FIND or DELETE ALL IF NOT FIND, the entire text object is deleted from the training object.
Tables "Preprocessing Filters Example" and "Results of Testing the Example" present a more complex example using all five filter types.
The table, "Preprocessing Filters Example" lists the filters used in the example.
Filter Number |
Pattern Type |
Pattern Body |
---|---|---|
1 |
DELETE BEFORE |
MessageStart |
2 |
DELETE AFTER |
IDnumber= |
3 |
DELETE ALL IF FIND |
internal\d\d |
4 |
DELETE ALL IF NOT FIND |
nihil_obstat |
5 |
DELETE PATTERN |
company |
The table "Results of Testing the Example" shows an example of input text and the results of applying the filters from "Preprocessing Filters Example" to it.
Input Text |
Test Result |
---|---|
x88_2 MessageStart nihil_obstat: Hello, companyyes, good-bye.IDnumber=7989 |
nihil_obstat: Hello, yes, good-bye. |
The results in the table "Results of Testing the Example" come about as follows:
- Filter 1 deletes the text "x88_2 MessageStart."
- Filter 2 deletes the text "IDnumber=7989."
- Filter 3 does nothing (it finds a match for nihil_obstat).
- Filter 4 does nothing (it fails to find a match for internal/d/d).
- Filter 5 deletes "company."