The goal of this demo is to show how the Import RTF Tool of the CLaRK system can be used for loading RTF documents into the system.
When loaded, RTF document is transformed to an XML document - well-formed according to TEI.2 DTD
.
Not all the data from an RTF document is detected and transferred into XML. From the structures which are not recognized in the RTF file only the text content is taken. The types of data for which the system is aware are: heading information (title, author, keywords, creation/last modification date and time), structural information (headers, paragraphs, sections) and text layout information (bold, italics, underlined). Here follows the correspondence table from RTF to XML (TEI.2).
RTF Heading Data |
XML(TEI.2) Corresponding Data |
title | TEI.2 > teiHeader > fileDesc > titleStmt > title |
author | TEI.2 > teiHeader > fileDesc > titleStmt > author |
operator | TEI.2 > teiHeader > fileDesc > titleStmt > respStmt > name |
company | TEI.2 > teiHeader > fileDesc > publicationStmt > publisher |
category | TEI.2 > teiHeader > encodingDesc > classDecl > taxonomy > category |
keywords | TEI.2 > teiHeader > profileDesc > textClass > keywords > term |
creation date/time | TEI.2 > teiHeader > profileDesc > creation > date / time |
words count | TEI.2 > teiHeader > fileDesc > sourceDesc > bibl > extent > XXX words |
characters count | TEI.2 > teiHeader > fileDesc > sourceDesc > bibl > extent > XXX characters |
characters count incl. whitespaces | TEI.2 > teiHeader > fileDesc > sourceDesc > bibl > extent > XXX characters with ws. |
Structure & Layout Data | XML Tags |
paragraph | <p> ... </p> |
section | <div> ... </div> |
header | <head> ... </head> |
footer | <trailer> ... </trailer> |
bold | <hi rend="Bold"> ... </hi> |
italics | <hi rend="Italics"> ... </hi> |
underlined | <hi rend="Underlined"> ... </hi> |
Each document is loaded with respect to an XML DTD. At least one DTD must be compiled in the system.
When a document is imported, a message about its validity appears - Valid
or Not valid
.
The document used for this demo is: RTFdemo.rtf
.
In order to run the demo you have to perform the following steps:
teixlite2x.dtd
is compiled in the system. If it is not, then compile it as
it is described in DTD Tool Demo.File
- a standard File Chooser appears.ImportRTF
.Files of Type:
combo box select the appropriate Encoding - ASCII Text File
.Open
button. - A dialog with the list of all DTDs imported in the system
opens.teixlite2x.dtd
and press OK
. This is the DTD according to which the
chosen files are validated. As a result, the document is opened in the system and the user can proceed with its treatment.
Many systems do not use the RTF specification strictly, so there might be slight deviations from the standard.
Warning: The document is not saved in the system, it depends on the user to save it in the system.