Import RTF Tool Demo

The goal of this demo is to show how the Import RTF Tool of the CLaRK system can be used for loading RTF documents into the system.

When loaded, RTF document is transformed to an XML document - well-formed according to TEI.2 DTD.

Not all the data from an RTF document is detected and transferred into XML. From the structures which are not recognized in the RTF file only the text content is taken. The types of data for which the system is aware are: heading information (title, author, keywords, creation/last modification date and time), structural information (headers, paragraphs, sections) and text layout information (bold, italics, underlined). Here follows the correspondence table from RTF to XML (TEI.2).

RTF Heading Data

XML(TEI.2) Corresponding Data

title TEI.2 > teiHeader > fileDesc > titleStmt > title
author TEI.2 > teiHeader > fileDesc > titleStmt > author
operator TEI.2 > teiHeader > fileDesc > titleStmt > respStmt > name
company TEI.2 > teiHeader > fileDesc > publicationStmt > publisher
category TEI.2 > teiHeader > encodingDesc > classDecl > taxonomy > category
keywords TEI.2 > teiHeader > profileDesc > textClass > keywords > term
creation date/time TEI.2 > teiHeader > profileDesc > creation > date / time
words count TEI.2 > teiHeader > fileDesc > sourceDesc > bibl > extent > XXX words
characters count TEI.2 > teiHeader > fileDesc > sourceDesc > bibl > extent > XXX characters
characters count incl. whitespaces TEI.2 > teiHeader > fileDesc > sourceDesc > bibl > extent > XXX characters with ws.
Structure & Layout Data XML Tags
paragraph <p> ... </p>
section <div> ... </div>
header <head> ... </head>
footer <trailer> ... </trailer>
bold <hi rend="Bold"> ... </hi>
italics <hi rend="Italics"> ... </hi>
underlined

<hi rend="Underlined"> ... </hi>

Each document is loaded with respect to an XML DTD. At least one DTD must be compiled in the system.

When a document is imported, a message about its validity appears - Valid or Not valid.

The document used for this demo is: RTFdemo.rtf.

In order to run the demo you have to perform the following steps:

  1. Check whether the DTD: teixlite2x.dtd is compiled in the system. If it is not, then compile it as it is described in DTD Tool Demo.
  2. Select the Import RTF tool from the menu item File - a standard File Chooser appears.
  3. Select files from the appropriate directory. In our case it is the demo directory ImportRTF.
  4. From Files of Type: combo box select the appropriate Encoding - ASCII Text File .
  5. Click on the Open button. - A dialog with the list of all DTDs imported in the system opens.
  6. Select teixlite2x.dtd and press OK. This is the DTD according to which the chosen files are validated.

As a result, the document is opened in the system and the user can proceed with its treatment.

Many systems do not use the RTF specification strictly, so there might be slight deviations from the standard.

Warning: The document is not saved in the system, it depends on the user to save it in the system.