The NGramTransformation can extract n-grams from free text fields e.g. descriptions and use those n-grams (words) in a LookupCache transformation to find the correct tags e.g. take the IPTC description and extract words to check against a keyword list for tagging.
Input: "this is the new Breithorn Release of Picturepark"
NGramTransformation:
- Size: 1
- Minimum word length: 5
- Maximum word length: leave empty
Output: Breithorn Release Picturepark
Input: "this is the new Breithorn Release of Picturepark"
NGramTransformation:
- Size: 2
- Minimum word length: 5
- Maximum word length: leave empty
Output: Breithorn Release
Specific Definitions
Select the condition.
Property | Value |
kind | NGramTransformation |
Size | The maximum size of n-gram, if set to 3 would produce unigram, bigram, and trigram. The size depends on punctuation and not characters, so size should be set to the longest words in my keyword list. |
Minimum word length | Minimum word length: minimum length a word must have to be considered for the n-gram production |
Maximum word length | Maximum word length: maximum length a word can have to be considered for the n-gram production |
{ "kind": "NGramTransformation", "size": 4, "minWordLength": 1, "maxWordLength": null, "traceRefId": null },