Text Processing Pipeline
The goal of a text processing/ annotation pipeline is to provide:
- a meaningful output (i.e. data annotated from text, e.g. location) ...
- out of raw input (i.e. unstructured data, e.g. a text of any kind (webpage, non-annotated corpus of news articles, e-mails, etc.)).
Both input and output are considered Facts in terms of RL3 basic concepts. Therefore, it can be stated that:
- the pipeline takes Facts as input (e.g. the text of a webpage);
- as the result of processing, the pipeline produces new Facts (e.g. the category of a webpage) that can be further executed (e.g. extracted) or updated in an already existing RL3 Factsheet.
The simplified RL3 pipeline may look as follows:
- RL3_engine object creation.
- RL3_engine object initialization (as an RL3 type object).
- Processing annotation patterns (in either of the 2 ways):
- in case an RL3 model has already been compiled (by a built-in Compiler), the file with the compiled engine model can be loaded;
- compilation can be performed directly from RL3 sources:
- first, by performing parsing of the source (can be performed as inline, from a single RL3 Module file or the whole RL3 Project);
- then, by linking (compiling) data from single parses.
- Factsheet creation.
- Assertion of fact(s) (e.g. input text) to a factsheet.
- Running an engine to perform necessary annotations.
- Either of the 2 ways is possible (depending on the selected execution mode):
- execute (e.g. extract all annotated items and add resulting facts to an output factsheet);
- update (e.g. update the facts in an existing factsheet).
- RL3 object deletion.