RL3 Best Practices

Most Common Mistakes

Using ^ and $ anchors instead of \A and \Z

It is often needed to check the context of a matched pattern. E.g. we need to forbid creation of a Person entity if followed by Ltd.

Incorrect:

annotation
    Person = person [weight="1.0"]
search text
    {person={pattern_person}}
if
    not match $> ^ Ltd

The matcher ^ defines the start of a line that could be numerous in the right context of our entity. It is highly recommended to use the matcher \A (the start of sequence) instead.

Correct:

Way 1 (with \A):

annotation
    Person = person [weight="1.0"]
search text
    {person={pattern_person}}
if
    not match $> \A Ltd

Way 2 (without \A):

annotation
    Person = person [weight="1.0"]
search text
    {person={pattern_person}}(?! Ltd)
if
    true

Productivity and Optimization (Avoiding Slow-Down Issues)

Below are the most common RL3 constructs that can slow the annotator in the current version. They must be avoided.

(..) vs (.*)

pattern fast
    (.*)

pattern slow1
    (.*.*)

pattern something_rare
    (this pattern rarely occurs in text)

pattern use_slow1
    ({slow1}{something_rare})

pattern use_fast
    ({fast}{something_rare})

Patterns fast and slow1 are semantically equivalent, however there is a great difference.

Suppose {something_rare} = "test":

there is only one way to parse ({fast}{something_rare});
there are 5 ways to parse ({slow1}{something_rare}).

Look at the following table:

first subexpression .*	second subexpression .*
test
t	est
te	st
tes	t
test

With use_slow1 an RL3 engine will make 5 useless parsing attempts in every position with no pattern matches and get a rejection (as there are no matches for the pattern {something_rare}).

.(delimiter)?. vs .(delimiter.)?

".*(delimiter)?.*" is a masked variety of ".*.*"

Since the (delimiter)? group is optional, the pattern is equivalent to the following (incorrect):

   (.*.*|.*delimiter.*)

As described above, the .*.* pattern causes slow-downs and thus affects productivity. Therefore, it is required to use the following pattern instead (correct):

   .*(delimiter.*)?

Need for a defined left context

If the left margin is not clearly defined (e.g. (.{1,100}{entity_suffix}))), it can cause a slow-down when parsing.

Consider the following solution instead:

   ({left_margin}.{1,100}{entity_suffix})

Where {left_margin} stopper may include: start of the line , punctuation, copyright sign, particular words (e.g. address, hotel, corporate prefix, etc.

Carrying out common prefixes

pattern slow
    (
         {heavy_common_prefix}{suffix1}
        |{heavy_common_prefix}{suffix2}
        |{heavy_common_prefix}{suffix3}
    )

pattern faster
    (
        {heavy_common_prefix}({suffix1}|{suffix2}|{suffix3})
    )

slow pattern: the prefix will be searched through 3 times in 3 combinations with suffixes 1, 2, 3;
faster pattern: the prefix will be searched through only once instead which will optimize the execution process.

RL3 Best Practices

Contents

Most Common Mistakes

Using ^ and $ anchors instead of \A and \Z

Productivity and Optimization (Avoiding Slow-Down Issues)

(..) vs (.*)

.(delimiter)?. vs .(delimiter.)?

Need for a defined left context

Carrying out common prefixes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

RL3 Best Practices

Contents

Most Common Mistakes

Using ^ and $ anchors instead of \A and \Z

Productivity and Optimization (Avoiding Slow-Down Issues)

(.*.*) vs (.*)

.*(delimiter)?.* vs .*(delimiter.*)?

Need for a defined left context

Carrying out common prefixes

Navigation menu

Search

(..) vs (.*)

.(delimiter)?. vs .(delimiter.)?