Predefined Page Facts

From RL3 Wiki
Jump to: navigation, search

Predefined Page Facts

If input text comes from a web page, RL3 enables to make use of page metadata in conditions with applying predefined page facts.

Every page serving as an input for an RL3 annotator can have the following information:


URL

Page URL, e.g. http://europe.yamaha.com/en/products/proaudio/


domain

E.g. the domain for http://europe.yamaha.com/en/products/proaudio/ will be "europe.yamaha.com".


domain_name

A top-level domain, e.g. ​http://europe.yamaha.com/en/products/proaudio/ has a domain name "yamaha.com".


domain_basename

The "base name" of a domain, i.e. a top-level domain without a zone (e.g. .com, .ru, .co.uk) or hosting info (.narod.ru). For a page ​http://europe.yamaha.com/en/products/proaudio/, domain_basename is "yamaha". For a page ​http://http://www.zorallabs.com/, domain_basename is "zorallabs".


languages

Page languages. If a page uses more than one language, several values will be available for the fact.


E.g. to annotate only for pages with a German text:

  annotation
       Product=product
  search text
       {product={german_specific_product_pattern}}
  if
       each xx in languages
            match xx german


In the example above, a predicate would be executed only if all the languages in the page match \Agerman\Z , i.e. a page has only one languages that equals "german". If it is enough to check that at least one of the languages on the page is German, try the following (either with 'match' or 'search'):

  annotation
       Product=product
  search text
       {product={german_specific_product_pattern}}
  if
       any xx in languages
            search xx german


category

Page category, e.g. "index", "about", "contacts". Category can have more than one value.


title

Page title, is extracted from an HTML markup (i.e. text from the element <title>...</title>). E.g. for a page http://zorallabs.com/ will be "Home Page - Zoral Labs".


description

Page description, is extracted from an HTML markup (i.e. text from the element <meta name="description">...</meta>).


pathway

So-called "breadcrumbs" (​http://en.wikipedia.org/wiki/Breadcrumb_(navigation) ) often used for navigation on corporate websites. For a page ​http://zorallabs.com/company/management-team will have a value "Home › Company › Management Team"


TBD!!!

abstract

lpaths

parent_categories

mf_parent_categories

mf_lpaths

smcat

ltexts

mf_ltexts

mp_ltexts

ltexts_from_index