The NGramTransformation can extract n-grams from free text fields e.g. descriptions and use those n-grams (words) in a LookupCache transformation to find the correct tags e.g. take the IPTC description and extract words to check against a keyword list for tagging.

Input: "this is the new Breithorn Release of Picturepark"

NGramTransformation:

Size: 1
Minimum word length: 5
Maximum word length: leave empty

Output: Breithorn Release Picturepark

Input: "this is the new Breithorn Release of Picturepark"

NGramTransformation:

Size: 2
Minimum word length: 5
Maximum word length: leave empty

Output: Breithorn Release

Specific Definitions

Select the condition.

Property	Value
kind	NGramTransformation
Size	The maximum size of n-gram, if set to 3 would produce unigram, bigram, and trigram. The size depends on punctuation and not characters, so size should be set to the longest words in my keyword list.
Minimum word length	Minimum word length: minimum length a word must have to be considered for the n-gram production
Maximum word length	Maximum word length: maximum length a word can have to be considered for the n-gram production

{
	"kind": "NGramTransformation",
	"size": 4,
	"minWordLength": 1,
	"maxWordLength": null,
	"traceRefId": null
},

N-Gram Transformation

Specific Definitions