N-Gram Transformation
The NGramTransformation can extract n-grams from free text fields e.g. descriptions and use those n-grams (words) in a LookupCache transformation to find the correct tags e.g. take the IPTC description and extract words to check against a keyword list for tagging.Â
Input: "this is the new Breithorn Release of Picturepark"
NGramTransformation:Â
Size: 1
Minimum word length: 5
Maximum word length: leave empty
Output: Breithorn Release Picturepark
Input: "this is the new Breithorn Release of Picturepark"
NGramTransformation:Â
Size: 2
Minimum word length: 5
Maximum word length: leave empty
Output: Breithorn Release
Specific Definitions
Property | Value |
kind | NGramTransformation |
Size | The maximum size of n-gram, if set to 3 would produce unigram, bigram, and trigram. The size depends on punctuation and not characters, so size should be set to the longest words in my keyword list. |
Minimum word length | Minimum word length: minimum length a word must have to be considered for the n-gram production |
Maximum word length | Maximum word length: maximum length a word can have to be considered for the n-gram production |
Transform the input into 4 words with a minimum 1 character
{
"kind": "NGramTransformation",
"size": 4,
"minWordLength": 1,
"maxWordLength": null,
"traceRefId": null
},