Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Excerpt

The NGramTransformation can extract n-grams from free text fields e.g. descriptions and use those n-grams (words) in a LookupCache transformation to find the correct tags e.g. take the IPTC description and extract words to check against a keyword list for tagging. 

Tip

Input: "this is the new Breithorn Release of Picturepark"

NGramTransformation: 

  • Size: 1

  • Minimum word length: 5

  • Maximum word length: leave empty

Output: Breithorn Release Picturepark

...

Input: "this is the new Breithorn Release of Picturepark"

NGramTransformation: 

  • Size: 2

  • Minimum word length: 5

  • Maximum word length: leave empty

Output: Breithorn Release

Specific Definitions

...

Property

Value

kind

NGramTransformation

Size

The maximum size of n-gram, if set to 3 would produce unigram, bigram, and trigram. 

The size depends

on punctuation

on punctuation and not characters, so size should be set to the longest words in my keyword list. 

Minimum word length

Minimum word length: minimum length a word must have to be considered for the n-gram production

Maximum word length

Maximum word length: maximum length a word can have to be considered for the n-gram production

Transform the input into 4 words with a minimum 1 character

Code Block
languagejson
{
	"kind": "NGramTransformation",
	"size": 4,
	"minWordLength": 1,
	"maxWordLength": null,
	"traceRefId": null
},