RL3 Best Practices

From RL3 Wiki
Jump to: navigation, search

Most Common Mistakes

Using ^ and $ anchors instead of \A and \Z

It is often needed to check the context of a matched pattern. E.g. we need to forbid creation of a Person entity if followed by Ltd.


    Person = person [weight="1.0"]
search text
    not match $> ^ Ltd

The matcher ^ defines the start of a line that could be numerous in the right context of our entity. It is highly recommended to use the matcher \A (the start of sequence) instead.


Way 1 (with \A):

    Person = person [weight="1.0"]
search text
    not match $> \A Ltd

Way 2 (without \A):

    Person = person [weight="1.0"]
search text
    {person={pattern_person}}(?! Ltd)

Productivity and Optimization (Avoiding Slow-Down Issues)

Below are the most common RL3 constructs that can slow the annotator in the current version. They must be avoided.

(.*.*) vs (.*)

pattern fast
pattern slow1
pattern something_rare
    (this pattern rarely occurs in text)
pattern use_slow1
pattern use_fast

Patterns fast and slow1 are semantically equivalent, however there is a great difference.

Suppose {something_rare} = "test":

  • there is only one way to parse ({fast}{something_rare});
  • there are 5 ways to parse ({slow1}{something_rare}).

Look at the following table:

first subexpression .* second subexpression .*
t est
te st
tes t

With use_slow1 an RL3 engine will make 5 useless parsing attempts in every position with no pattern matches and get a rejection (as there are no matches for the pattern {something_rare}).

.*(delimiter)?.* vs .*(delimiter.*)?

".*(delimiter)?.*" is a masked variety of ".*.*"

Since the (delimiter)? group is optional, the pattern is equivalent to the following (incorrect):


As described above, the .*.* pattern causes slow-downs and thus affects productivity. Therefore, it is required to use the following pattern instead (correct):


Need for a defined left context

If the left margin is not clearly defined (e.g. (.{1,100}{entity_suffix}))), it can cause a slow-down when parsing.

Consider the following solution instead:


Where {left_margin} stopper may include: start of the line , punctuation, copyright sign, particular words (e.g. address, hotel, corporate prefix, etc.

Carrying out common prefixes

pattern slow
pattern faster
  • slow pattern: the prefix will be searched through 3 times in 3 combinations with suffixes 1, 2, 3;
  • faster pattern: the prefix will be searched through only once instead which will optimize the execution process.