DragonFly On-Line Manual Pages
SGMLS(1) DragonFly General Commands Manual SGMLS(1)
NAME
sgmls - a validating SGML parser
An SGML System Conforming to
International Standard ISO 8879 --
Standard Generalized Markup Language
SYNOPSIS
sgmls [ -deglprsuv ] [ -cfile ] [ -iname ] [ -mfile ] [ filename... ]
DESCRIPTION
Sgmls parses and validates the SGML document entity in filename... and
prints on the standard output a simple ASCII representation of its
Element Structure Information Set. (This is the information set which
a structure-controlled conforming SGML application should act upon.)
Note that the document entity may be spread amongst several files; for
example, the SGML declaration, document type declaration and document
instance set could each be in a separate file. If no filenames are
specified, then sgmls will read the document entity from the standard
input. A filename of - can also be used to refer to the standard
input.
The following options are available:
-cfile Report any capacity limits that are exceeded and write a report
of capacity usage to file. The report is in the format of a
RACT result. RACT is the Reference Application for Capacity
Testing defined in the Proposed American National Standard
Conformance Testing for Standard Generalized Markup Language
(SGL) Systems (X3.190-199X), Draft July 1991.
-d Warn about duplicate entity declarations.
-e Describe open entities in error messages. Error messages always
include the position of the most recently opened external
entity.
-g Show the GIs of open elements in error messages.
-iname Pretend that
<!ENTITY % name "INCLUDE">
occurs at the start of the document type declaration subset in
the SGML document entity. Since repeated definitions of an
entity are ignored, this definition will take precedence over
any other definitions of this entity in the document type
declaration. Multiple -i options are allowed. If the SGML
declaration replaces the reserved name INCLUDE then the new
reserved name will be the replacement text of the entity.
Typically the document type declaration will contain
<!ENTITY % name "IGNORE">
and will use %name; in the status keyword specification of a
marked section declaration. In this case the effect of the
option will be to cause the marked section not to be ignored.
-l Output L commands giving the current line number and filename.
-mfile Map public identifiers and entity names to system identifiers
using the catalog entry file file. Multiple -m options are
allowed. Catalog entry files specified with the -m option will
be searched before the defaults.
-p Parse only the prolog. Sgmls will exit after parsing the
document type declaration. Implies -s.
-r Warn about defaulted references.
-s Suppress output. Error messages will still be printed.
-u Warn about undefined elements: elements used in the DTD but not
defined.
-v Print the version number.
Entity Manager
An external entity resides in one or more files. The entity manager
component of sgmls maps a sequence of files into an entity in three
sequential stages:
1. each carriage return character is turned into a non-SGML
character;
2. each newline character is turned into a record end character,
and at the same time a record start character is inserted at the
beginning of each line;
3. the files are concatenated.
A system identifier is interpreted as a list of filenames separated by
colons. A filename of - can be used to refer to the standard input.
If a system identifier is not specified, then the entity manager can
generate one using catalog entry files in the format defined in the
SGML Open Draft Technical Resolution on Entity Management. A catalog
entry file contains a sequence of entries in one of the following four
forms:
PUBLIC pubid sysid
This specifies that sysid should be used as the system
identifier if the the public identifier is pubid. Sysid is a
system identifier as defined in ISO 8879 and pubid is a public
identifier as defined in ISO 8879.
ENTITY name sysid
This specifies that sysid should be used as the system
identifier if the entity is a general entity whose name is name.
ENTITY %name sysid
This specifies that sysid should be used as the system
identifier if the entity is a parameter entity whose name is
name. Note that there is no space between the % and the name.
DOCTYPE name sysid
This specifies that sysid should be used as the system
identifier if the entity is an entity declared in a document
type declaration whose document type name is name.
The last two forms are extensions to the SGML Open format. The
delimiters can be omitted from the sysid provided it does not contain
any white space. Comments are allowed between parameters delimited by
-- as in SGML. The environment variable SGML_CATALOG_FILES contains a
colon-separated list of catalog entry files. These will be searched
after any catalog entry files specified using the -m option. If this
environment variable is not set, then a system dependent list of
catalog entry files will be used. A match in a catalog entry file for
a PUBLIC entry will take precedence over a match in the same file for
an ENTITY or DOCTYPE entry. A filename in a system identifier in a
catalog entry file is interpreted relative to the directory containing
the catalog entry file.
If no match can be found in a catalog entry file, then the entity
manager will attempt to generate a filename using the public identifier
(if there is one) and other information available to it. Notation
identifiers are not subject to this treatment. This process is
controlled by the environment variable SGML_PATH; this contains a
colon-separated list of filename templates. A filename template is a
filename that may contain substitution fields; a substitution field is
a % character followed by a single letter that indicates the value of
the substitution. The value of a substitution can either be a string
or it can be null. The entity manager transforms the list of filename
templates into a list of filenames by substituting for each
substitution field and discarding any template that contained a
substitution field whose value was null. It then uses the first
resulting filename that exists and is readable. Substitution values
are transformed before being used for substitution: firstly, any names
that were subject to upper case substitution are folded to lower case;
secondly, space characters are mapped to underscores and slashes are
mapped to percents. The value of the %S field is not transformed. The
values of substitution fields are as follows:
%% A single %.
%D The entity's data content notation. This substitution will
succeed only for external data entities.
%N The entity, notation or document type name.
%P The public identifier if there was a public identifier,
otherwise null.
%S The system identifier if there was a system identifier otherwise
null.
%X (This is provided mainly for compatibility with ARCSGML.) A
three-letter string chosen as follows:
tab(&); c|c|c s c|c|c s c|c|c|c c|c|c|c l|lB|lB|lB. &&With
public identifier &&_ &No public&Device&Device
&identifier&independent&dependent _ Data or subdocument
entity&nsd&pns&vns General SGML text entity&gml&pge&vge
Parameter entity&spe&ppe&vpe Document type
definition&dtd&pdt&vdt Link process definition&lpd&plp&vlp
The device dependent version is selected if the public text
class allows a public text display version but no public text
display version was specified.
%Y The type of thing for which the filename is being generated:
tab(&); l lB. SGML subdocument entity&sgml Data entity&data
General text entity&text Parameter entity&parm Document type
definition&dtd Link process definition&lpd
The value of the following substitution fields will be null unless a
valid formal public identifier was supplied.
%A Null if the text identifier in the formal public identifier
contains an unavailable text indicator, otherwise the empty
string.
%C The public text class, mapped to lower case.
%E The public text designating sequence (escape sequence) if the
public text class is CHARSET, otherwise null.
%I The empty string if the owner identifier in the formal public
identifier is an ISO owner identifier, otherwise null.
%L The public text language, mapped to lower case, unless the
public text class is CHARSET, in which case null.
%O The owner identifier (with the +// or -// prefix stripped.)
%R The empty string if the owner identifier in the formal public
identifier is a registered owner identifier, otherwise null.
%T The public text description.
%U The empty string if the owner identifier in the formal public
identifier is an unregistered owner identifier, otherwise null.
%V The public text display version. This substitution will be null
if the public text class does not allow a display version or if
no version was specified. If an empty version was specified, a
value of default will be used.
Normally if the external identifier for an entity includes a system
identifier, the entity manager will use the specified system identifier
and not attempt to generate one. If, however, SGML_PATH uses the %S
field, then the entity manager will first search for a matching entry
in the catalog entry files. If a match is found, then this will be
used instead of the specified system identifier. Otherwise, if the
specified system identifier does not contain any colons, the entity
manager will use SGML_PATH to generate a filename. Otherwise the
entity manager will use the specified system identifier.
System declaration
The system declaration for sgmls is as follows:
tab(&); c1 s1 s1 s1 s1 s1 s1 s1 s c s s s s s s s s l l s s s s s s s l
l s s s s s s s l l s s s s s s s l l l s s s s s s c s s s s s s s s l
l l l l l l l l l l l l l l l l l l l l l l l l l l l l s s s s s s s l
l l s s s s s s l l l s s s s s s c s s s s s s s s l l l l l l l l l.
SYSTEM "ISO 8879:1986" CHARSET BASESET&"ISO 646-1983//CHARSET
& International Reference Version (IRV)//ESC 2/5 4/0" DESCSET&0 128 0
CAPACITY&PUBLIC&"ISO 8879:1986//CAPACITY Reference//EN" FEATURES
MINIMIZE&DATATAG&NO&OMITTAG&YES&RANK&NO&SHORTTAG&YES
LINK&SIMPLE&NO&IMPLICIT&NO&EXPLICIT&NO OTHER&CONCUR&NO&SUBDOC&YES
1&FORMAL&YES SCOPE&DOCUMENT SYNTAX&PUBLIC&"ISO 8879:1986//SYNTAX
Reference//EN" SYNTAX&PUBLIC&"ISO 8879:1986//SYNTAX Core//EN" VALIDATE
&GENERAL&YES&MODEL&YES&EXCLUDE&YES&CAPACITY&YES
&NONSGML&YES&SGML&YES&FORMAL&YES c s s s s s s s s l l l l l l l l l.
SDIF &PACK&NO&UNPACK&NO
Exceeding a capacity limit will be ignored unless the -c option is
given.
The memory usage of sgmls is not a function of the capacity points used
by a document; however, sgmls can handle capacities significantly
greater than the reference capacity set.
In some environments, higher values may be supported for the SUBDOC
parameter.
Documents that do not use optional features are also supported. For
example, if FORMAL NO is specified in the SGML declaration, public
identifiers will not be required to be valid formal public identifiers.
Certain parts of the concrete syntax may be changed:
The shunned character numbers can be changed.
Eight bit characters can be assigned to LCNMSTRT, UCNMSTRT,
LCNMCHAR and UCNMCHAR.
Uppercase substitution can be performed or not performed both
for entity names and for other names.
Either short reference delimiters assigned by the reference
delimiter set or no short reference delimiters are supported.
The reserved names can be changed.
The quantity set can be increased within certain limits subject
to there being sufficient memory available. The upper limit on
NAMELEN is 239. The upper limits on ATTCNT, ATTSPLEN, BSEQLEN,
ENTLVL, LITLEN, PILEN, TAGLEN, and TAGLVL are more than thirty
times greater than the reference limits. The upper limit on
GRPCNT, GRPGTCNT, and GRPLVL is 253. NORMSEP cannot be changed.
DTAGLEN are DTEMPLEN irrelevant since sgmls does not support the
DATATAG feature.
SGML declaration
The SGML declaration may be omitted, the following declaration will be
implied:
tab(&); c1 s1 s1 s1 s1 s1 s1 s1 s c s s s s s s s s l l s s s s s s s.
<!SGML "ISO 8879:1986" CHARSET BASESET&"ISO 646-1983//CHARSET
& International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET& 0 9 UNUSED & 9 2 9 & 11 2 UNUSED & 13 1 13
& 14 18 UNUSED & 32 95 32 &127 1 UNUSED l l l s s s s s s l l s s s s
s s s l l l s s s s s s c s s s s s s s s l l l l l l l l l.
CAPACITY&PUBLIC&"ISO 8879:1986//CAPACITY Reference//EN" SCOPE&DOCUMENT
SYNTAX&PUBLIC&"ISO 8879:1986//SYNTAX Reference//EN" FEATURES
MINIMIZE&DATATAG&NO&OMITTAG&YES&RANK&NO&SHORTTAG&YES
LINK&SIMPLE&NO&IMPLICIT&NO&EXPLICIT&NO OTHER&CONCUR&NO&SUBDOC&YES
99999999&FORMAL&YES c s s s s s s s s. APPINFO NONE>
with the exception that characters 128 through 254 will be assigned to
DATACHAR.
Sgmls identifies base character sets using the designating sequence in
the public identifier. The following designating sequences are
recognized:
tab(&); c c c c c c c c c ^ c c c c ^ l n n n l.
Designating&ISO&Minimum&Number&Description
Escape&Registration&Character&of& Sequence&Number&Number&Characters& _
ESC 2/5 4/0&-&0&128&full set of ISO 646 IRV ESC 2/8 4/0&2&33&94&G0 set
of ISO 646 IRV ESC 2/8 4/2&6&33&94&G0 set of ASCII ESC 2/13
4/1&100&32&96&G1 set of ISO 8859-1 ESC 2/1 4/0&1&0&32&C0 set of ISO 646
ESC 2/2 4/3&77&0&32&C1 set of ISO 6429 ESC 2/5 2/15 3/0&-&0&256&the
system character set
When one of the G0 sets is used as a base set, the characters SPACE and
DELETE are treated as occurring at positions 32 and 127 respectively;
although these characters are not part of the character sets designated
by the escape sequences, this mimics the behaviour of ISO 2022 with
respect to these code positions.
Output format
The output is a series of lines. Lines can be arbitrarily long. Each
line consists of an initial command character and one or more
arguments. Arguments are separated by a single space, but when a
command takes a fixed number of arguments the last argument can contain
spaces. There is no space between the command character and the first
argument. Arguments can contain the following escape sequences.
\\ A \.
\n A record end character.
\| Internal SDATA entities are bracketed by these.
\nnn The character whose code is nnn octal.
A record start character will be represented by \012. Most
applications will need to ignore \012 and translate \n into newline.
The possible command characters and arguments are as follows:
(gi The start of an element whose generic identifier is gi. Any
attributes for this element will have been specified with A
commands.
)gi The end an element whose generic identifier is gi.
-data Data.
&name A reference to an external data entity name; name will have been
defined using an E command.
?pi A processing instruction with data pi.
Aname val
The next element to start has an attribute name with value val
which takes one of the following forms:
IMPLIED
The value of the attribute is implied.
CDATA data
The attribute is character data. This is used for
attributes whose declared value is CDATA.
NOTATION nname
The attribute is a notation name; nname will have been
defined using a N command. This is used for attributes
whose declared value is NOTATION.
ENTITY name...
The attribute is a list of general entity names. Each
entity name will have been defined using an I, E or S
command. This is used for attributes whose declared
value is ENTITY or ENTITIES.
TOKEN token...
The attribute is a list of tokens. This is used for
attributes whose declared value is anything else.
Dename name val
This is the same as the A command, except that it specifies a
data attribute for an external entity named ename. Any D
commands will come after the E command that defines the entity
to which they apply, but before any & or A commands that
reference the entity.
Nnname nname. Define a notation This command will be preceded by a p
command if the notation was declared with a public identifier,
and by a s command if the notation was declared with a system
identifier. A notation will only be defined if it is to be
referenced in an E command or in an A command for an attribute
with a declared value of NOTATION.
Eename typ nname
Define an external data entity named ename with type typ (CDATA,
NDATA or SDATA) and notation not. This command will be preceded
by one or more f commands giving the filenames generated by the
entity manager from the system and public identifiers, by a p
command if a public identifier was declared for the entity, and
by a s command if a system identifier was declared for the
entity. not will have been defined using a N command. Data
attributes may be specified for the entity using D commands. An
external data entity will only be defined if it is to be
referenced in a & command or in an A command for an attribute
whose declared value is ENTITY or ENTITIES.
Iename typ text
Define an internal data entity named ename with type typ (CDATA
or SDATA) and entity text text. An internal data entity will
only be defined if it is referenced in an A command for an
attribute whose declared value is ENTITY or ENTITIES.
Sename Define a subdocument entity named ename. This command will be
preceded by one or more f commands giving the filenames
generated by the entity manager from the system and public
identifiers, by a p command if a public identifier was declared
for the entity, and by a s command if a system identifier was
declared for the entity. A subdocument entity will only be
defined if it is referenced in a { command or in an A command
for an attribute whose declared value is ENTITY or ENTITIES.
ssysid This command applies to the next E, S or N command and specifies
the associated system identifier.
ppubid This command applies to the next E, S or N command and specifies
the associated public identifier.
ffilename
This command applies to the next E or S command and specifies an
associated filename. There will be more than one f command for
a single E or S command if the system identifier used a colon.
{ename The start of the SGML subdocument entity ename; ename will have
been defined using a S command.
}ename The end of the SGML subdocument entity ename.
Llineno file
Llineno
Set the current line number and filename. The filename argument
will be omitted if only the line number has changed. This will
be output only if the -l option has been given.
#text An APPINFO parameter of text was specified in the SGML
declaration. This is not strictly part of the ESIS, but a
structure-controlled application is permitted to act on it. No
# command will be output if APPINFO NONE was specified. A #
command will occur at most once, and may be preceded only by a
single L command.
C This command indicates that the document was a conforming SGML
document. If this command is output, it will be the last
command. An SGML document is not conforming if it references a
subdocument entity that is not conforming.
BUGS
Some non-SGML characters in literals are counted as two characters for
the purposes of quantity and capacity calculations.
SEE ALSO
The SGML Handbook, Charles F. Goldfarb
ISO 8879 (Standard Generalized Markup Language), International
Organization for Standardization
ORIGIN
ARCSGML was written by Charles F. Goldfarb.
Sgmls was derived from ARCSGML by James Clark (jjc@jclark.com), to whom
bugs should be reported.
SGMLS(1)