Morphological analysis is a mechanism that recognizes individual words and phrases on a website. If the website exceeds a sufficient number of prohibited words and phrases, access to the site is blocked.
Morphological analysis is performed both as part of checking the user request and before passing the web server's response to the user. Having received a response from the web server, UserGate analyzes the web page text and counts its total "weight" based on the word "weights" set in the morphological categories. If the "weight" of the page exceeds that of a morphological category, the corresponding rule is triggered. When counting the "weight" of the page, all forms (lemmas) of the prohibited words are included in the count. To search for word forms, UserGate uses its built-in Russian, English, Japanese, Arabic, and German dictionaries.
You can subscribe to the dictionaries supplied by UserGate. These dictionaries are not editable. To use the dictionaries, an appropriate license is required. For more details on product licensing, see the chapter UserGate Licensing.
Name |
Description |
---|---|
Compliance with RU (Custom 460) |
A morphological dictionary containing the list of words and phrases prohibited by the Ministry of Justice of the Russian Federation. |
Compliance with KZ (Custom 487) |
A morphological dictionary containing the list of words and phrases prohibited by the Ministry of Justice of the Republic of Kazakhstan. |
Suicide |
A morphological dictionary containing a list of words and phrases related to suicide. |
Terrorism |
A morphological dictionary containing a list of words and phrases related to terrorism. |
Profanity |
A morphological dictionary containing a list of words and phrases categorized as profanity. |
Gambling |
A morphological dictionary containing a list of words and phrases related to gambling. |
Drugs |
A morphological dictionary containing a list of words and phrases related to drugs. |
Compliance with RU FZ436 |
A morphological dictionary containing a list of words and phrases on topics that are not safe for children. |
Pornography |
A morphological dictionary containing a list of words and phrases related to pornography. |
Accounting |
A morphological dictionary containing a list of terms, words, and phrases used in accounting. |
Marketing |
A morphological dictionary containing a list of terms, words, and phrases used in marketing. |
Personal data (DLP) |
A morphological dictionary containing a list of terms, words, and phrases encountered in personal data. |
Finance |
A morphological dictionary containing a list of terms, words, and phrases used in finance. |
Legal |
A morphological dictionary containing a list of terms, words, and phrases used in law. |
To configure filtering by the morphological content of a page, follow these steps:
Task |
Description |
---|---|
Step 1. Create one or more morphological categories and specify the weight of each category. |
Click Add and enter the name of the new category and its weight. |
Step 2. Specify the list of prohibited phrases along with their weights. |
Click Add and specify the undesired words and phrases. When adding a word to the morphological dictionary, you can precede it with the "!" modifier, such as "!bassterd". In that case, the slang word will not be converted to word forms, which can dramatically reduce the probability of erroneous blocking. |
Step 3. Create a content filtering rule containing one or more morphological categories. |
See the section Content Filtering. |
The administrator can create a custom dictionary and distribute it centrally to all UserGate devices in the organization. To create such a morphological database, follow these steps:
Task |
Description |
---|---|
Step 1. Generate a file with the undesired phrases. |
Create a file named list.txt with the word list in the following format: !word1 !word2 !word3 word4 50 ... Lastword In this case, the weight of the dictionary is 100, and the word weight can be specified. The default is 100. |
Step 2. Create an archive containing this file. |
Put the file in a ZIP archive named list.zip. |
Step 3. Generate a version file for the dictionary. |
Create a file named version.txt and specify the database version number inside it, such as 3. On each update of the morphological dictionary, the version number must be incremented. |
Step 4. Upload the files to a web server. |
Upload the list.zip and version.txt files to your website so that they can be downloaded. |
Step 5. Create a morphological category and specify the URL for dictionary update. |
On each UserGate server, create a morphological database. When creating the list, select Updatable as the list type and enter the address for downloading updates. UserGate will check for a new version on your website according to the set update download schedule. The schedule can be configured in the list properties. The available options are:
With the Advanced option, a crontab-like format is used where the date/time string consists of six space-separated fields. The fields specify the time as follows: (minutes: 0‑59) (hours: 0-23) (days of the month: 1-31) (month: 1-12) (days of the week: 0‑6, where 0 is Sunday). Each of the first five fields can be defined using:
An asterisk or range spacing: used for spacing out values in ranges. The increment is given after a slash. Examples: "2-10/2" means "2,4,6,8,10" while "*/2" in the "hours" field means "every two hours". |
Note
When creating morphological dictionaries, it is not recommended to add phrases that consist of more than three words without prepending the words with the "!" character. Remember that in the process of building the database, each word is converted to all existing forms (declensions, conjugations, plural forms, tenses, etc.), making the resulting number of phrases quite large. When you add a long phrase, use the "!" modifier before the words that do not need modification (usually, these are various prepositions and conjunctions). For example, you can add the German phrase "wie man schmerzlos stirbt" (how to die painlessly) as "!wie !man schmerzlos stirbt". This will reduce the number of possible phrase variants while keeping all phrases with the intended meaning.