DragonFly On-Line Manual Pages
AFLEX(1) DragonFly General Commands Manual AFLEX(1)
NAME
aflex - fast lexical analyzer generator for Ada
SYNOPSIS
aflex [ -bdfipstvEILT -Sskeleton_file ] [ filename ]
DESCRIPTION
aflex is a version of the Unix tool lex , but it is written in Ada and
generates scanners in Ada. It is upwardly compatible with the UCI tool
alex, but is much faster and generates smaller scanners.
OPTIONS
Command line options are given in a different format than in the old
UCI alex. Aflex options are as follows
-t Write the scanner output to the standard output rather than to a
file. The default name of the scanner file for base.l is base.a
Note that this option is not as useful with aflex because in
addition to the scanner file there are files for the externally
visible dfa functions (base_dfa.a) and the external IO functions
(base_io.a)
-b Generate backtracking information to aflex.backtrack. This is a
list of scanner states which require backtracking and the input
characters on which they do so. By adding rules one can remove
backtracking states. If all backtracking states are eliminated
and -f is used, the generated scanner will run faster (see the
-p flag). Only users who wish to squeeze every last cycle out
of their scanners need worry about this option.
-d makes the generated scanner run in debug mode. Whenever a
pattern is recognized the scanner will write to stderr a line of
the form:
--accepting rule #n
Rules are numbered sequentially with the first one being 1.
Rule #0 is executed when the scanner backtracks; Rule #(n+1)
(where n is the number of rules) indicates the default action;
Rule #(n+2) indicates that the input buffer is empty and needs
to be refilled and then the scan restarted. Rules beyond (n+2)
are end-of-file actions.
-f has the same effect as lex's -f flag (do not compress the
scanner tables); the mnemonic changes from fast compilation to
(take your pick) full table or fast scanner. The actual
compilation takes longer, since aflex is I/O bound writing out
the big table. The compilation of the Ada file containing the
scanner is also likely to take a long time because of the large
arrays generated.
-i instructs aflex to generate a case-insensitive scanner. The
case of letters given in the aflex input patterns will be
ignored, and the rules will be matched regardless of case. The
matched text given in yytext will have the preserved case (i.e.,
it will not be folded).
-p generates a performance report to stderr. The report consists
of comments regarding features of the aflex input file which
will cause a loss of performance in the resulting scanner. Note
that the use of the ^ operator and the -I flag entail minor
performance penalties.
-s causes the default rule (that unmatched scanner input is echoed
to stdout) to be suppressed. If the scanner encounters input
that does not match any of its rules, it aborts with an error.
This option is useful for finding holes in a scanner's rule set.
-v has the same meaning as for lex (print to stderr a summary of
statistics of the generated scanner). Many more statistics are
printed, though, and the summary spans several lines. Most of
the statistics are meaningless to the casual aflex user, but the
first line identifies the version of aflex, which is useful for
figuring out where you stand with respect to patches and new
releases.
-E instructs aflex to generate additional information about each
token, including line and column numbers. This is needed for
the advanced automatic error option correction in ayacc.
-I instructs aflex to generate an interactive scanner. Normally,
scanners generated by aflex always look ahead one character
before deciding that a rule has been matched. At the cost of
some scanning overhead, aflex will generate a scanner which only
looks ahead when needed. Such scanners are called interactive
because if you want to write a scanner for an interactive system
such as a command shell, you will probably want the user's input
to be terminated with a newline, and without -I the user will
have to type a character in addition to the newline in order to
have the newline recognized. This leads to dreadful interactive
performance.
If all this seems to confusing, here's the general rule: if a
human will be typing in input to your scanner, use -I, otherwise
don't; if you don't care about how fast your scanners run and
don't want to make any assumptions about the input to your
scanner, always use -I.
Note, -I cannot be used in conjunction with full i.e., the -f
flag.
-L instructs aflex to not generate #line directives (see below).
-T makes aflex run in trace mode. It will generate a lot of
messages to stdout concerning the form of the input and the
resultant non-deterministic and deterministic finite automatons.
This option is mostly for use in maintaining aflex.
-Sskeleton_file
overrides the default internal skeleton from which aflex
constructs its scanners. You'll probably never need this option
unless you are doing aflex maintenance or development.
INCOMPATIBILITIES WITH LEX
aflex is fully compatible with lex with the following exceptions:
- Source file format:
The input specification file for aflex must use the following
format.
definitions section
%%
rules section
%%
user defined section
##
user defined section
- lex's %r (Ratfor scanners) and %t (translation table) options
are not supported.
- The do-nothing -n flag is not supported.
- When definitions are expanded, aflex encloses them in
parentheses. With lex, the following
NAME [A-Z][A-Z0-9]*
%%
foo{NAME}? text_io.put_line( "Found it" );
%%
will not match the string "foo" because when the macro is
expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the
precedence is such that the '?' is associated with "[A-Z0-9]*".
With aflex, the rule will be expanded to "foo([A-z][A-Z0-9]*)?"
and so the string "foo" will match. Note that because of this,
the ^, $, <s>, and / operators cannot be used in a definition.
- Input can be controlled by redefining the YY_INPUT function.
YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".
Its action is to place up to max_size characters in the
character buffer "buf" and return in the integer variable
"result" either the number of characters read or the constant
YY_NULL to indicate EOF. The default YY_INPUT reads from
Standard_Input.
You also can add in things like counting keeping track of the
input line number this way; but don't expect your scanner to go
very fast.
- Yytext is a function returning a vstring.
- aflex reads only one input file, while lex's input is made up of
the concatenation of its input files.
- The following lex constructs are not supported
- REJECT
- %T -- character set tables
- %x -- changes to internal array sizes (see below)
ENHANCEMENTS
- Exclusive start-conditions can be declared by using %x instead
of %s. These start-conditions have the property that when they
are active, no other rules are active. Thus a set of rules
governed by the same exclusive start condition describe a
scanner which is independent of any of the other rules in the
aflex input. This feature makes it easy to specify "mini-
scanners" which scan portions of the input that are
syntactically different from the rest (e.g., comments).
End-of-file rules. The special rule "<<EOF>>" indicates actions
which are to be taken when an end-of-file is encountered and
yywrap() returns non-zero (i.e., indicates no further files to
process). The action can either text_io.set_input() to a new
file to process, in which case the action should finish with
YY_NEW_FILE (this is a branch, so subsequent code in the action
won't be executed), or it should finish with a return statement.
<<EOF>> rules may not be used with other patterns; they may only
be qualified with a list of start conditions. If an unqualified
<<EOF>> rule is given, it applies only to the INITIAL start
condition, and not to %s start conditions. These rules are
useful for catching things like unclosed comments. An example:
%x quote
%%
...
<quote><<EOF>> {
error( "unterminated quote" );
}
<<EOF>> {
set_input( next_file );
YY_NEW_FILE;
}
- aflex dynamically resizes its internal tables, so directives
like "%a 3000" are not needed when specifying large scanners.
- aflex generates --#line comments mapping lines in the output to
their origin in the input file.
- All actions must be enclosed by curly braces.
- Comments may be put in the first section of the input by
preceding them with '#'.
- Ada style comments are supported instead of C style comments.
- All template files are internalized.
- The input source file must end with a ".l" extension.
FILES
The names of the files containing the generated scanner, IO,
and DFA packages are based on the basename of the input file.
For example if the input file is called scan.l then the scanner
file is called scan.a, the DFA package is in scan_dfa.a, and
scan_io.a is the IO package file. All of these file names may
be changed by modifying the external_file_manager package (see
the porting notes for more information.)
aflex.backtrack
backtracking information for -b
SEE ALSO
lex(1)
M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator. Technical
Report Computing Science Technical Report, 39, Bell Telephone
Laboratories, Murray Hill, NJ, 1975.
Military Standard Ada Programming Language (ANSI/MIL-STD-1815A-1983),
American National Standards Institute, January 1983.
T. Nguyen and K. Forester, Alex - An Ada Lexical Analysis Generator
Arcadia Document UCI-88-17, University of California, Irvine, 1988
D. Taback and D. Tolani, Ayacc User's Manual, Arcadia Document
UCI-85-10, University of California, Irvine, 1986
AUTHOR
John Self. Based on the tool flex written and designed by Vern Paxson.
It reimplements the functionality of the tool alex designed by Thieu Q.
Nguyen.
Send requests for aflex information to alex-info@ics.uci.edu
Send bug reports for aflex to alex-bugs@ics.uci.edu
DIAGNOSTICS
aflex scanner jammed - a scanner compiled with -s has encountered an
input string which wasn't matched by any of its rules.
old-style lex command ignored - the aflex input contains a lex command
(e.g., "%n 1000") which is being ignored.
BUGS
Some trailing context patterns cannot be properly matched and generate
warning messages ("Dangerous trailing context"). These are patterns
where the ending of the first part of the rule matches the beginning of
the second part, such as "zx*/xy*", where the 'x*' matches the 'x' at
the beginning of the trailing context. (Lex doesn't get these patterns
right either.)
variable trailing context (where both the leading and trailing parts do
not have a fixed length) entails a substantial performance loss.
For some trailing context rules, parts which are actually fixed-length
are not recognized as such, leading to the abovementioned performance
loss. In particular, parts using '|' or {n} are always considered
variable-length.
Nulls are not allowed in aflex inputs or in the inputs to scanners
generated by aflex. Their presence generates fatal errors.
Pushing back definitions enclosed in ()'s can result in nasty,
difficult-to-understand problems like:
{DIG} [0-9] -- a digit
In which the pushed-back text is "([0-9] -- a digit)".
Due to both buffering of input and read-ahead, you cannot intermix
calls to text_io routines, such as, for example, text_io.get() with
aflex rules and expect it to work. Call input() instead.
There are still more features that could be implemented (especially
REJECT) Also the speed of the compressed scanners could be improved.
The utility needs more complete documentation.
Version 1.4 10 March 1994 AFLEX(1)