DragonFly On-Line Manual Pages

lt-proc(1)                                                          lt-proc(1)

NAME
       lt-proc - This application is part of the lexical processing modules
       and tools ( lttoolbox )

       This tool is part of the apertium machine translation architecture:
       http://www.apertium.org.

SYNOPSIS
       lt-proc [ -a | -b | -o | -c | -d | -e | -g | -n | -p | -s | -t | -v |
       -h -z -w ] fst_file [input_file [output_file]]

       lt-proc [ --analysis | --bilingual | --surf-bilingual |
       --case-sensitive | --debugged-gen | --decompose-nouns | --generation |
       --non-marked-gen | --tagged-gen | --post-generation | --sao |
       --transliteration | --null-flush --dictionary-case
       --decompose-compounds | --version | --help ] fst_file [input_file
       [output_file]]

DESCRIPTION
       lt-proc is the application responsible for providing the four lexical
       processing functionalities

              o morphological analyser  ( option -a )

              o lexical transfer  ( option -n )

              o morphological generator  ( option -g )

              o post-generator  ( option -p )

       It accomplishes these tasks by reading binary files containing a
       compact and efficient representation of dictionaries (a class of
       finite-state transducers called augmented letter transducers). These
       files are generated by lt-comp(1).

       It is worth to mention that some characters (`[', `]', `$', `^', `/',
       `*') are special chars used for format and encapsulation. They should
       be escaped if they have to be used literally, for instance: `['...`]'
       are ignored and the format of a linefeed is `^...$'.

OPTIONS
       -a, --analysis
              Tokenizes the text in surface forms (lexical units as they
              appear in texts) and delivers, for each surface form, one or
              more lexical forms consisting of lemma, lexical category and
              morphological inflection information. Tokenization is not
              straightforward due to the existence, on the one hand, of
              contractions, and, on the other hand, of multi-word lexical
              units. For contractions, the system reads in a single surface
              form and delivers the corresponding sequence of lexical forms.
              Multi-word surface forms are analysed in a left-to-right,
              longest-match fashion. Multi-word surface forms may be
              invariable (such as a multi-word preposition or conjunction) or
              inflected (for example, in es, "echaban de menos", "they
              missed", is a form of the imperfect indicative tense of the verb
              "echar de menos", "to miss"). Limited support for some kinds of
              discontinuous multi-word units is also available. Single-word
              surface forms analysis produces output like the one in these
              examples:  "cantar" -> `^cantar/cantar<vblex><inf>$' or  `"daba"
              ->  `^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$'.

       -b, --bilingual
              Does lexical transference, attaching queues of morphological
              symbols not specified in the dictionaries. As the analysis mode,
              supports multiple lexical forms in the target language for a
              given lexical form in the source language. Works tipically with
              the output of apertium-pretransfer.

       -o, --surf-bilingual
              As with -b, but takes input from apertium-tagger -p , with
              surface forms, and if the lexical form is not found in the
              bilingual dictionary, it outputs the surface form of the word.

       -c, --case-sensitive
              Use the literal case of the incoming characters

       -d, --debugged-gen
              Morph. generation with all the stuff

       -e, --decompose-compounds
              Try to treat unknown words as compounds, and decompose them.

       -w, --dictionary-case
              Use the case information contained in the lexicon, instead of
              the surface case (only applied in analysis mode).

       -g, --generation
              Delivers a target-language surface form for each target-language
              lexical form, by suitably inflecting it.

       -n, --non-marked-gen
              Morphological generation (like -g) but without unknown word
              marks (asterisk `*').

       -b, --tagged-gen
              Morphological generation (like -g) but retaining part-of-speech
              tags.

       -p, --post-generation
              Performs orthographical operations such as contractions and
              apostrophations. The post-generator is usually dormant (just
              copies the input to the output) until a special alarm symbol
              contained in some target-language surface forms wakes it up to
              perform a particular string transformation if necessary; then it
              goes back to sleep.

       -s, --sao
              Input processing is in orthoepikon (previously `sao') annotation
              system format: http://orthoepikon.sf.net.

       -t, --transliteration
              Apply a transliteration dictionary

       -z, --null-flush
              Flush output on the null character

       -v, --version
              Display the version number.

       -h, --help
              Display this help.

FILES
       input_file The input compiled dictionary.

SEE ALSO
       lt-expand(1), lt-comp(1), apertium-tagger(1), apertium(1).

BUGS
       Lots of...lurking in the dark and waiting for you!

AUTHOR
       (c) 2005,2006 Universitat d'Alacant / Universidad de Alicante.

                                  2006-03-23                        lt-proc(1)