DragonFly On-Line Manual Pages

Search: Section:  


VOCABULARY(1)         User Contributed Perl Documentation        VOCABULARY(1)

NAME

vocabulary -- extract vocabularies from Penn treebank files

SYNOPSIS

vocabulary [-NT ntfile] [-POS posfile] [-word wordfile] [-count] [-binarized] [-verbose] file1 [file2...] File1, file2 etc. are the names of Penn treebank files. If none are specified, STDIN is used.

OPTIONS

NT Write the non-terminal node vocabulary to ntfile. POS Write the part of speech vocabulary to posfile word Write the word vocabulary to wordfile. count Print the frequency counts for each of the categories. binarized The file is in binarized format. verbose Print filenames as they are processed.

DESCRIPTION

Given a list of Penn treebank files, this script extracts the words, parts of speech, and non-terminal node names and emits each in a separate file in order of frequency. Note that giving a "-" argument for any of ntfile, posfile, or wordfile causes the results to be written to STDOUT.

AUTHOR

W.P. McNeill <billmcn@ssli.ee.washington.edu> perl v5.20.2 2005-01-05 VOCABULARY(1)

Search: Section: