Text Processing Pipeline

The goal of a text processing/ annotation pipeline is to provide:

a meaningful output (i.e. data annotated from text, e.g. location) ...
out of raw input (i.e. unstructured data, e.g. a text of any kind (webpage, non-annotated corpus of news articles, e-mails, etc.)).

Both input and output are considered Facts in terms of RL3 basic concepts. Therefore, it can be stated that:

the pipeline takes Facts as input (e.g. the text of a webpage);
as the result of processing, the pipeline produces new Facts (e.g. the category of a webpage) that can be further executed (e.g. extracted) or updated in an already existing RL3 Factsheet.

The simplified RL3 pipeline may look as follows:

RL3_engine object creation.
RL3_engine object initialization (as an RL3 type object).
Processing annotation patterns (in either of the 2 ways):
1. in case an RL3 model has already been compiled (by a built-in Compiler), the file with the compiled engine model can be loaded;
2. compilation can be performed directly from RL3 sources:
  1. first, by performing parsing of the source (can be performed as inline, from a single RL3 Module file or the whole RL3 Project);
  2. then, by linking (compiling) data from single parses.
Factsheet creation.
Assertion of fact(s) (e.g. input text) to a factsheet.
Running an engine to perform necessary annotations.
Either of the 2 ways is possible (depending on the selected execution mode):
1. execute (e.g. extract all annotated items and add resulting facts to an output factsheet);
2. update (e.g. update the facts in an existing factsheet).
RL3 object deletion.

Navigation menu