DragonFly On-Line Manual Pages

Search: Section:  


TREEBANKFREQ(1)       User Contributed Perl Documentation      TREEBANKFREQ(1)

NAME

treebankFreq.pl - Compute Information Content from Penn Treebank 2

SYNOPSIS

treebankFreq.pl [--outfile=OUTFILE [--stopfile=STOPFILE] [--wnpath=WNPATH] [--resnik] [--smooth=SCHEME] PATH | --help --version]

DESCRIPTION

This program reads the Penn Treebank, Release 2, from the Linguistic Data Consortium, <http://ldc.upenn.edu>, and computes the frequency counts for each synset in WordNet. These frequency counts are used by the Lin, Resnik, and Jiang & Conrath measures of semantic relatedness to calculate the information content values of concepts. The output is generated in a format as required by the WordNet::Similarity modules for computing semantic relatedness. A more detailed description of how information content is calculated can be found in rawtextFreq.pl. This program uses exactly the same techniques as described there.

OPTIONS

--outfile=filename The name of a file to which output should be written --stopfile=filename A file containing a list of stop listed words that will not be considered in the frequency counts. A sample file can be down- loaded from http://www.d.umn.edu/~tpederse/Group01/WordNet/words.txt --wnpath=path Location of the WordNet data files (e.g., /usr/local/WordNet-3.0/dict) --resnik Use Resnik (1995) frequency counting --smooth=SCHEME Smoothing should used on the probabilities computed. SCHEME can only be ADD1 at this time --help Show a help message --version Display version information PATH Path to the raw Wall Stree Journal portion of the Treebank corpus. This is usually in the /raw/wsj subdirectory of the Treebank installation. Thus, you might run this program as treebankFreq.pl [OPTIONS] /home/sid/treebank/raw/wsj

BUGS

Report to WordNet::Similarity mailing list : <http://groups.yahoo.com/group/wn-similarity>

SEE ALSO

WordNet::Similarity Penn Treebank : <http://ldc.upenn.edu>, WordNet home page : <http://wordnet.princeton.edu> WordNet::Similarity home page : <http://wn-similarity.sourceforge.net>

AUTHORS

Ted Pedersen, University of Minnesota, Duluth tpederse at d.umn.edu Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh banerjee+ at cs.cmu.edu Siddharth Patwardhan, University of Utah, Salt Lake City sidd at cs.utah.edu

COPYRIGHT

Copyright (c) 2005-2008, Ted Pedersen, Satanjeev Banerjee, and Siddharth Patwardhan This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. perl v5.20.2 2015-08-31 TREEBANKFREQ(1)

Search: Section: