Help

Max Planck Institute for Psycholinguistics

WebCelex Help - N-grams



N-grams allows you to retrieve information on the constituent features of the words in the home list based on a dictionary list. N-grams are specified by specifying the N-gram Size. The N-gram size is the size of the constituents which will be examined. For eample, for the word aarde, with an N-gram size of 2, the constituents will be aa, ar, rd and de. If the Blanks flag is set, the onsets and offsets will be included, such that the constituents will be _a, aa, ar, rd, de and e_. If we set the retrieve mode within N-grams to N-grams, the N-grams present within each word in the home list will be retrieved with the number of occurences of the N-grams within the dictionary list. For example, if the home list contains aarde and appel; and the dictionary list contains aardig, aap and apert; with an N-gram size of 2 and no blanks, we will obtain:

aa\2
ap\2
ar\1
de\0
el\0
pe\1
pp\0
rd\1

If we set the retrieve mode within N-grams to Words, we will obtain the mean of the log of the number of occurences (+1) for the N-grams within every word in the home list. In our example we will obtain:

aarde\0.62
appel\0.45

The lists we want to examine must be specified as the Home list and Dictionary list within the file selection fields. The lists can be tables and the column number or column name must be specified within the Column fields (1 represents the first field). The tables must be of a specific format, which is compliant with the default format when a lexicon is created by WebCelex.

The format is as follows:
{Column Name 1}\Column Name 2}\...
{Value 1}\Value 2}\...

In our case we could have:
Type\Freq
aarde\100
appel\5

as the home list, with the column field set to 'Type' or '1'. If one wants to find the N-grams within one list, the Home list and Dictionary list should be one and the same. This is accomplished by leaving the Dictionary list empty.

The Table checkbox displays retrieved results in the form of a table. Otherwise, results will be displayed in the standard WebCelex format.