Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleSimple Search Analyzer

Simple Search Analyzer

(info) access in search queries: simple

The simple search analyzer is a custom Picturepark implementation not using Elastic search defaults. The custom analyzer uses a regex:

  • Regex

    Code Block
    */"(\[^\\p\{L\}\\d\]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=\[\\p\{L\}&&\[^\\p\{Lu\}\]\])(?=\\p\{Lu\})|(?<=\\p\{Lu\})(?=\\p\{Lu\}\[\\p\{L\}&&\[^\\p\{Lu\}\]\])"/*
  • Outcome:

    • Lowercase / Uppercase

    • Digit / non-digit

    • Stemming

    • HTML Strip

  • Examples

    • Picturepark = Picturepark, picturepark

    • Case Study = Case, Study, case, study

If you want to test the simple search analyzer, you can check your terms in a regex tester to see the outcome.

  1. Open a regex checker

    1. https://regex101.com/

    2. https://regexr.com/

  2. Add your term as a test string

  3. Check the outcome

Expand
titleNo Diacritics Analyzer

No Diacritics Analyzer

(info) access in search queries: no-diacritics

The no diacritics analyzer:

  • only works for text fields

  • strip diacritic characters, so when the text value is: Kovačić Mateo you can search for “Kovačić Mateo” or “Kovacic Mateo”.

An example can be found in Elastic Search Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html

Expand
titlePath Hierarchy Analyzer

Path Hierarchy Analyzer

(info) access in search queries: pathHierarchy

The path hierarchy analyzer will:

  • Take a path found in a field (picturepark\platform\manual) and delimit the individual terms

  • Example

    • picturepark\platform\manual = picturepark\platform\manual, picturepark\platform, manual

    • Products/Family/Industry = Products/Family, Products, Products/Family/Industry

You should only configure this analyzer if being used via API. The simple search in Picturepark escapes Special Characters, and therefore you won't find assets when searching for some of the tokens generated by this analyzer.
An example can be found in Elastic Search Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pathhierarchy-tokenizer.html

Expand
titleLanguage Edge NGram Analyzer
Language

Edge NGram Analyzer

(info) access in search queries: languageThere are several language analyzers available for elastic search. Language analyzers prevent stemming from language-specific values and language-specific stopwords.  edgeNGram

This tokenizer is very similar to nGram but only keeps n-grams that start at the beginning of a token. Settings allow to define min and max grams created on indexing and token_chars, which are characters classes to keep in the tokens, Elasticsearch splits on characters that don't belong to any of these classes.

Examples are in Elastic Search Documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-langedgengram-analyzer.html
The current implementation is using the default Elastic Search Language analyzers as listed in the link. We are using the default stop words and rules for stemming, without any custom adaption.tokenizer.html

Expand
titleNgram Analyzer

Ngram Analyzer

(info) access in search queries: ngram

Starting point for exact substring matches was ngram tokenizing, which indexes all the substrings up to length n. The drawback of ngram tokenizing is a large amount of disk space used.
Best practice:

  • Use ngram only if required - use carefully and not for every string

Settings allow to define min and max grams created on indexing and token_chars, which are characters classes to keep in the tokens, Elasticsearch splits on characters that don't belong to any of these classes.
Example: Search "Raven"

  • NGrams (splits term into tokens with one character):

  • Rav

  • Rave

  • Raven

  • ave

  • aven

  • Ven

  • ...

Example: Search "Pegasus"

  • NGrams (splits term into tokens with one character):

  • Pegasus

  • Degas

Examples are in Elastic Search Documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

Expand
titleEdge NGram No Diacritics Analyzer
Edge NGram

No Diacritics Analyzer

(info) access in search queries: edgeNGram

This tokenizer is very similar to nGram but only keeps n-grams that start at the beginning of a token. Settings allow to define min and max grams created on indexing and token_chars, which are characters classes to keep in the tokens, Elasticsearch splits on characters that don't belong to any of these classes.

Examples are in Elastic Search Documentation: no-diacritics

The no diacritics analyzer:

  • only works for text fields

  • strip diacritic characters, so when the text value is: Kovačić Mateo you can search for “Kovačić Mateo” or “Kovacic Mateo”.

An example can be found in Elastic Search Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html

Expand
titleKeyword Lowercase Filter Analyzer

Keyword Lowercase Filter Analyzer

(info) no access in search queries, only available in filters

The filter analyzer converts multiple keywords in various spellings to lowercase but does not tokenize words, e.g., jpg, JPG, and JPEG will be converted into one keyword: "jpg" for admins to use in filters and search.

This filter only allows exact matches from non-translated values, ignoring letter casing, and does not tokenize (split) any words.

Expand
titleLanguage Analyzer

Language analyzer

(info) access in search queries: language

There are several language analyzers available for elastic search. Language analyzers prevent stemming from language-specific values and language-specific stopwords.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengramlang-tokenizer.htmlanalyzer.html
The current implementation is using the default Elastic Search Language analyzers as listed in the link. We are using the default stop words and rules for stemming, without any custom adaption.

Expand
titleUseful Links to ElasticSearch Documentation

Useful Links in ElasticSearch Documentation

Simple Analyzer: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html

No Diacritics:  https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html

Path Hierarchy: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pathhierarchy-tokenizer.html

Language: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html

NGram: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

EdgeNgram: https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis-edgengram-tokenizer.html

...