DragonFly On-Line Manual Pages
VOCABULARY(1) User Contributed Perl Documentation VOCABULARY(1)
NAME
vocabulary -- extract vocabularies from Penn treebank files
SYNOPSIS
vocabulary [-NT ntfile] [-POS posfile] [-word wordfile] [-count]
[-binarized] [-verbose] file1 [file2...]
File1, file2 etc. are the names of Penn treebank files. If none are
specified, STDIN is used.
OPTIONS
NT Write the non-terminal node vocabulary to ntfile.
POS Write the part of speech vocabulary to posfile
word
Write the word vocabulary to wordfile.
count
Print the frequency counts for each of the categories.
binarized
The file is in binarized format.
verbose
Print filenames as they are processed.
DESCRIPTION
Given a list of Penn treebank files, this script extracts the words,
parts of speech, and non-terminal node names and emits each in a
separate file in order of frequency.
Note that giving a "-" argument for any of ntfile, posfile, or wordfile
causes the results to be written to STDOUT.
AUTHOR
W.P. McNeill <billmcn@ssli.ee.washington.edu>
perl v5.20.2 2005-01-05 VOCABULARY(1)