10.1. Morphology

Morphological analysis is a mechanism designed to recognize certain words and phrases on websites. If a text contains too many unwanted words or phrases, the system will block access to the website.

Morphological analysis is performed both when a user sends a new search query and when the requested web server responds to this query. Once the web server responds to the query, UserGate scans the text on the web page and then calculates its total "weight" by matching words and phrases from various morphological categories. If the total "weight" of the web page is higher than that of a morphological category, the rule will be triggered. The system also takes into account all word forms of prohibited words when calculating the "weight". UserGate searches word forms in its built-in dictionaries available in English, German, Russian, Japanese and Arabic.

You can also subscribe for additional dictionaries offered by UserGate. These dictionaries are read-only. You will also need the corresponding license to use them. For more details on product licensing, please refer to UserGate licensing.

Name

Description

Suicide

Morphological dictionary containing words and phrases related to suicide

Terrorism

Morphological dictionary containing words and phrases related to terrorism

Profanity

Morphological dictionary containing profane words and phrases

Gambling

Morphological dictionary containing words and phrases related to gambling

Drugs

Morphological dictionary containing words and phrases related to drugs

Pornography

Morphological dictionary containing words and phrases related to pornography

Restricted materials (Custom country code)

Morphological dictionary containing words and phrases not recommended for children according to some national laws. The GS1 suffix code for UserGate dictionaries comply with the national laws of the country. See http://www.gs1.org/company-prefix for details

To set up morphology-based filtering, perform the following:

Name

Description

Step 1. Create one or more morphological categories and specify their weights

Click Add and specify the name and weight of the new category

Step 2. Specify the list of prohibited phrases with their weights

Click Add and specify the necessary words and phrases. When adding a new word to any morphological dictionary, you can put the "!" modifier before the word, e.g. "!bassterd". In this case, the jargon word will not be converted in word forms during analysis - this significantly reduces the risk of false positives

Step 3. Create a new content filtering rule containing one or more morphological categories

Please refer to Content filtering.

Network administrators can create custom dictionaries and distribute them from a single center to all UserGate servers. To create a custom morphological database, perform the following steps:

Name

Description

Step 1. Create a new file with necessary phrases

Create a new file called list.txt with words presented in the following format:

!word1 !word2

!word3

word4 50

...

Lastword

In this case, the total weight of the dictionary will be 100. You can also specify a weight for each word (the default value is 100)

Step 2. Put this file into a new archive

Zip the file into a new archive called list.zip

Step 3. Create a new file with the necessary version of your dictionary

Create a new file version.txt and specify the database version (e.g. "3") in it. Make sure to increment this value each time you update the morphological dictionary

Step 4. Publish files on your web server

Publish list.zip and version.txt on your website and make them available for download via http

Step 5. Create a new morphological category and provide the URL for updating your dictionary

Create a new morphological database on every UserGate server. When creating a new database, make sure to provide an URL for installing updates. UserGate will be checking for a new version on your website every 4 hours and automatically update your dictionary once a newer version is released

Important! When creating a new morphological dictionary, it is highly recommended that you put the "!" modifier before each word in phrases containing more than three words. Note that the system will convert each word into all possible word forms (including cases, plural forms, grammatical tenses, etc.) when building a new morphological database and the resulting amount of words will be large. When you add long phrases, make sure to put the "!" modifier before each word that does not have word forms, e.g. before articles, prepositions and conjunctions. For example, phrase "how to commit a painless suicide" should be added as "!how !to commit !a suicide !painlessly". This will reduce the amount of possible phrase variants while preserving the main idea of initial phrase.