DragonFly On-Line Manual Pages
HUGE-SPLIT(1) User Contributed Perl Documentation HUGE-SPLIT(1)
NAME
huge-split.pl - Split bigram files from huge-count.pl into pieces.
DESCRIPTION
See perldoc huge-split.pl
USAGE
huge-split.pl [OPTIONS] SOURCE
INPUT
Required Arguments:
SOURCE
Input to huge-split.pl should be a file generated by huge-count.pl or
count.pl with tokenlist option. The results files have the same name
with the input source file and each split file has an extention
sequence number.
--split N
This parameter should be set. huge-split will divide the output bigrmas
tokenlist generated by count.pl or huge-count.pl. Each part created
with --split N will contain N lines. Value of N should be chosen such
that huge-sort.pl can be efficiently run on any part containing N lines
from the file contains all bigrams file.
We suggest that N is equal to the number of KB of memory you have. If
the computer has 8 GB RAM, which is 8,000,000 KB, N should be set to
8000000.
Other Options :
--help
Displays this message.
--version
Displays the version information.
AUTHOR
Amruta Purandare, Ted Pedersen, Ying Liu. University of Minnesota at
Duluth.
COPYRIGHT
Copyright (c) 2004-2011
Ted Pedersen, University of Minnesota, Duluth. tpederse@umn.edu
Ying Liu, University of Minnesota, Twin Cities. liux0395@umn.edu
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to
The Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.
perl v5.20.2 2011-03-31 HUGE-SPLIT(1)