DragonFly On-Line Manual Pages
TEXTMAIL(1) User Contributed Perl Documentation TEXTMAIL(1)
NAME
textmail - mail filter to replace MS Word/HTML attachments with plain
text
SYNOPSIS
usage: textmail [options]
options:
-h - Print the help message then exit
-m - Print the manpage then exit
-w - Print the manpage in html format then exit
-r - Print the manpage in nroff format then exit
-M - Output in mailbox format (mboxrd)
-T - Output in raw mail format (for smtp)
-W - Don't replace MS Word attachments with text
-E - Don't replace MS Excel attachments with csv
-H - Don't replace HTML attachments with text
-R - Don't replace RTF attachments with text
-P - Don't replace PDF attachments with text
-U - Don't translate winmail.dat attachments
-L - Don't reduce appledouble attachments
-I - Don't delete image attachments
-A - Don't delete audio attachments
-V - Don't delete video attachments
-X - Don't delete MS Windows executable attachments
-B - Don't recode text that was base64-encoded
-S - Don't replace spaces in filenames with underscores
-Z - Do translate signed content (discards signatures)
-O - Delete all application/octet-stream attachments
-! - Delete all application/* attachments
-D hdrs - Delete headers (list of header prefixes and filenames)
-K types - Keep attachments (list of mimetypes and filenames)
-f - On translation error, keep translation, not original
-? - Print paths of helper applications then exit
DESCRIPTION
textmail filters a mail message or mbox, replacing MS Word, MS Excel,
HTML, RTF and PDF attachments with the plain text contained therein.
By default, the following attachments are also deleted: image, audio,
video and MS Windows executables. MS "winmail.dat" attachments are
replaced by any attachments contained therein which are then replaced
by text or deleted in the same fashion. Any of these actions can be
suppressed with the command line options. Mail headers can also be
selectively deleted.
This is useful for increasing the accessibility of mail messages (by
reducing their dependence on proprietary file formats), for
dramatically reducing their size (and the time it takes to download
them and the time it takes to read them), and for dramatically reducing
the risk of mail-borne viruses. Its intended use is as a preprocessor
for mailing lists. This is more friendly than a strict "No Attachments"
policy.
OPTIONS
"-h"
Print the help message then exit.
"-m"
Print the manpage then exit. This is equivalent to executing "man
textmail" but this works even when the manpage isn't installed.
"-w"
Print the manpage in html format then exit. This lets you install
the manpage in html format with a command like:
mkdir -p /usr/local/share/doc/textmail/html &&
textmail -w > /usr/local/share/doc/textmail/html/textmail.1.html
"-r"
Print the manpage in nroff format then exit. This lets you install
the manpage with a command like:
textmail -r > /usr/local/share/man/man1/textmail.1
"-M"
This option causes the output to be in mboxrd format by adding a
mailbox "From" line at the top if there isn't one already and
ensures that there is a blank line at the bottom of the output. It
also performs mailbox quoting on any lines in the body that look
like mailbox "From" headers. Use this when the output is to be
stored directly in a mailbox file. It is not necessary when
textmail is being used as a mail filter by procmail(1).
"-T"
This option causes the output to be in raw mail format by removing
any mailbox "From" line and by not performing mailbox quoting. Use
this when the output is to be sent directly to an SMTP server. It
is not necessary when textmail is being used as a mail filter by
procmail(1).
"-W"
By default, textmail replaces MS Word attachments with inline plain
text attachments that contain just the plain text within the
original document. This option leaves MS Word attachments intact.
"-E"
By default, textmail replaces MS Excel attachments with CSV file
attachments that contain just the data within the original
document. This option leaves MS Excel attachments intact.
"-H"
By default, textmail replaces HTML attachments with inline plain
text attachments that contain just the text within the original
document. It also reduces text-versus-html alternative attachments
to just the text attachment. This option leaves HTML (and
alternative) attachments intact.
"-R"
By default, textmail replaces RTF attachments with inline plain
text attachments that contain just the plain text within the
original document. This option leaves RTF attachments intact.
"-P"
By default, textmail replaces PDF attachments with inline plain
text attachments that contain just the plain text within the
original document. This option leaves PDF attachments intact.
"-U"
By default, textmail replaces MS TNEF (i.e. "winmail.dat")
attachments with the attachments contained therein which are then
translated to text as normal. This option leaves "winmail.dat"
attachments intact. This option, together with the "-!" option will
cause winmail.dat attachments to be deleted rather than translated.
"-L"
By default, textmail replaces "multipart/appledouble" attachments
with just the data fork attachment contained therein which is then
translated to text as normal. This option leaves appledouble
attachments intact. However, the data fork attachment will still be
translated as normal resulting in a probably inappropriate and
possibly broken resource fork attachment. Therefore, this option
should probably only be used in conjunction with other options that
suppress the translation of the data fork attachment.
"-I"
By default, textmail deletes image attachments. This option leaves
image attachments intact.
"-A"
By default, textmail deletes audio attachments. This option leaves
audio attachments intact.
"-V"
By default, textmail deletes video attachments. This option leaves
video attachments intact.
"-X"
By default, textmail deletes attachments containing MS Windows
executables. That means "application/octet-stream" attachments with
the following filename extensions: "com", "exe", "pif", "dll",
"ocx", "scr", "vbs" and "js". This option leaves MS Windows
executable attachments intact. To delete "zip" files as well, you
could use either the "-O" option or the "-!" option.
"-B"
By default, when text is encountered that is "base64"-encoded,
textmail will recode it as either "7bit" or "quoted-printable",
whichever is appropriate. This option suppresses this recoding.
Note that if the text is large enough and contains a high enough
proportion of non-ASCII characters, it will remain "base64"-encoded
to minimise space.
"-S"
When translating attachments, textmail replaces bad filename
characters such as space characters with the underscore character.
This option causes underscore characters to subsequently be
converted into space characters. In other words, you can use this
option to preserve space characters in attachment filenames (other
bad filename characters will then be converted to spaces as well).
"-Z"
By default, textmail will not translate "multipart/signed"
attachments. This option causes "multipart/signed" attachments to
be replaced by the signed attachment contained therein, discarding
the signature control data. The no-longer-signed data is then
translated to text as normal. Note that "multipart/encrypted"
attachments are never translated.
"-O"
Delete all "application/octet-stream" attachments, not just MS
Windows executables. Note that this overrides "-X" but "-K"
overrides this.
"-!"
Delete all "application/*" attachments. Note that this overrides
"-X" but "-K" overrides this. Also note that translated documents
are no longer "application/*" attachments so they aren't deleted
unless their translation is suppressed with the appropriate command
line option.
"-D" hdrs
Delete particular headers. The hdrs argument is a comma separated
list of header name prefixes and/or the names of files containing
header name prefixes (blank lines, whitespace and shell style
comments are ignored). For example, "textmail -DX-" deletes all
headers whose names begin with "X-".
"-K" types
By default, textmail deletes several types of non-text attachment.
The "-O" and "-!" options delete even more. This option specifies,
by mimetype and/or filename extension, a list of attachments not to
delete. This overrides all deletions.
The types argument is a comma separated list of mimetypes and/or
filename extensions and/or the names of files containing mimetypes
and/or filename extensions (blank lines, whitespace and shell style
comments are ignored). Note that the elements are interpreted as a
complete mimetype, if they contain a slash character, or as either
the "*" in "application/*" or as a filename extension if they do
not contain a slash character. For example, "textmail -Wf!Kdoc"
deletes all "application/*" attachments except MS Word documents.
"-f"
Whenever textmail is unable to translate any attachment into text,
it will leave the attachment intact. This happens when the
requisite translation software can't be found, when it runs but
returns an error code, and when it produces an empty file. It also
happens when "winmail.dat" attachments are corrupt. This option
causes the empty translation to take the place of the original
attachment. Only the name of the attachment is preserved. This is
needed to ensure plain text even in the face of an MS Word document
that contains no text (e.g. only images).
"-?"
Print the paths of all helper applications then exit.
EXAMPLES
A procmail(1) recipe that insists on pure text and no "X-" headers
(with output in mailbox format):
:0 fw
| textmail -Mf!DX-
Do the same but to an existing mailbox file:
textmail -Mf!DX- < mailbox > mailbox-as-text
Delete all "application/*" attachments except for PostScript and PDF
(and don't translate PDF into text):
textmail -!PKps,pdf
Delete all "application/*" attachments except for zip files and gzipped
tar files:
textmail -!Ktar.gz,zip
A procmail(1) recipe that just unpacks winmail.dat attachments but
doesn't translate the attachments contained therein into text and
doesn't delete windows executables (with output in mailbox format):
:0 fw
| textmail -MWEHRPLIAVXS
REQUIREMENTS
MS Word and RTF documents are translated into plain text using
antiword(1) or catdoc(1). If textmail can't find antiword(1) or
catdoc(1), then MS Word and RTF attachments are left intact. So make
sure that antiword(1) or catdoc(1) is installed and in the $PATH.
MS Excel documents are translated into csv files using xls2csv(1). If
textmail can't find xls2csv(1), then MS Excel attachments are left
intact. So make sure that xls2csv(1) is installed and in the $PATH.
HTML documents are translated into plain text using lynx(1). If
textmail can't find lynx(1), then HTML attachments are left intact. So
make sure that lynx(1) is installed and in the $PATH.
PDF documents are translated into plain text using pdftotext(1). If
textmail can't find pdftotext(1), then PDF attachments are left intact.
So make sure that pdftotext(1) is installed and in the $PATH.
textmail also requires perl(1) and pod2man(1) and pod2html(1) (which
come with perl(1)) and mktemp(1).
If textmail fails to create a temporary directory, or if it is
instructed to do nothing (i.e. "-WEHRPULIAVX"), then it degenerates
into cat(1).
CAVEAT
The latest version of xls2csv(1) at the time of writing (i.e.
catdoc-0.93.3) loses data.
If textmail is unable to create a temporary directory (in "/tmp"), then
it degenerates into cat(1). Without a temporary directory, no
attachments will be translated or deleted no matter what options (even
"-f") were given to textmail. So make sure that "/tmp" is writable.
Also make sure that mktemp(1) is available otherwise an insecure
temporary directory will be created.
SEE ALSO
procmail(1), antiword(1), catdoc(1), xls2csv(1), lynx(1), pdftotext(1),
pod2man(1), pod2html(1), "http://raf.org/minimail/"
AUTHOR
20070803 raf <raf@raf.org>
URL
"http://raf.org/textmail/"
perl v5.20.3 2016-02-18 TEXTMAIL(1)