Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Version published after converting to the new editor


Excerpt

The NGramTransformation can extract n-grams from free text fields e.g. descriptions and use those n-grams (words) in a LookupCache transformation to find the correct tags e.g. take the IPTC description and extract words to check against a keyword list for tagging. 



Tip

Input: "this is the new Breithorn Release of Picturepark"

NGramTransformation: 

  • Size: 1
  • Minimum word length: 5
  • Maximum word length: leave empty

Output: Breithorn Release Picturepark


Tip

Input: "this is the new Breithorn Release of Picturepark"

NGramTransformation: 

  • Size: 2
  • Minimum word length: 5
  • Maximum word length: leave empty

Output: Breithorn Release

Specific Definitions

Select the condition.

PropertyValue
kind

NGramTransformation

Size

The maximum size of n-gram, if set to 3 would produce unigram, bigram, and trigram. 

The size depends on punctuation and not characters, so size should be set to the longest words in my keyword list. 

Minimum word lengthMinimum word length: minimum length a word must have to be considered for the n-gram production
Maximum word lengthMaximum word length: maximum length a word can have to be considered for the n-gram production

...