Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Current »

The NGramTransformation can extract n-grams from free text fields e.g. descriptions and use those n-grams (words) in a LookupCache transformation to find the correct tags e.g. take the IPTC description and extract words to check against a keyword list for tagging. 

Input: "this is the new Breithorn Release of Picturepark"

NGramTransformation: 

  • Size: 1
  • Minimum word length: 5
  • Maximum word length: leave empty

Output: Breithorn Release Picturepark

Input: "this is the new Breithorn Release of Picturepark"

NGramTransformation: 

  • Size: 2
  • Minimum word length: 5
  • Maximum word length: leave empty

Output: Breithorn Release

Specific Definitions

Select the condition.

PropertyValue
kind

NGramTransformation

Size

The maximum size of n-gram, if set to 3 would produce unigram, bigram, and trigram. 

The size depends on punctuation and not characters, so size should be set to the longest words in my keyword list. 

Minimum word lengthMinimum word length: minimum length a word must have to be considered for the n-gram production
Maximum word lengthMaximum word length: maximum length a word can have to be considered for the n-gram production
{
	"kind": "NGramTransformation",
	"size": 4,
	"minWordLength": 1,
	"maxWordLength": null,
	"traceRefId": null
},
  • No labels