N-Gram Transformation

The NGramTransformation can extract n-grams from free text fields e.g. descriptions and use those n-grams (words) in a LookupCache transformation to find the correct tags e.g. take the IPTC description and extract words to check against a keyword list for tagging. 

Input: "this is the new Breithorn Release of Picturepark"

NGramTransformation: 

  • Size: 1

  • Minimum word length: 5

  • Maximum word length: leave empty

Output: Breithorn Release Picturepark

Input: "this is the new Breithorn Release of Picturepark"

NGramTransformation: 

  • Size: 2

  • Minimum word length: 5

  • Maximum word length: leave empty

Output: Breithorn Release

Specific Definitions

Property

Value

kind

NGramTransformation

Size

The maximum size of n-gram, if set to 3 would produce unigram, bigram, and trigram. 

The size depends on punctuation and not characters, so size should be set to the longest words in my keyword list. 

Minimum word length

Minimum word length: minimum length a word must have to be considered for the n-gram production

Maximum word length

Maximum word length: maximum length a word can have to be considered for the n-gram production

Transform the input into 4 words with a minimum 1 character

{ "kind": "NGramTransformation", "size": 4, "minWordLength": 1, "maxWordLength": null, "traceRefId": null },