DragonFly On-Line Manual Pages
IPAGGCREATE(1) IPAGGCREATE(1)
NAME
ipaggcreate - produce aggregate statistics of network traffic or trace
SYNOPSIS
ipaggcreate [-r | -i | --netflow-summary] [--src, --dst, --sport,
--dport, ...] [other options] [files or interfaces]
DESCRIPTION
The ipaggcreate program reads IP packets from one or more data sources,
maps each packet to a label (such as "source address 192.4.10.9" or
"length 10"), and outputs a simply-formatted "aggregate" file reporting
the number of packets or bytes observed per label. The resulting file
is easy to process with text-based tools. (But see the --binary
option, which generates a compressed, quick-to-process binary file.)
Here are a couple lines of ipaggcreate output, from `ipaggcreate -s
/home/kohler/largedump.gz':
!IPAggregate 1.0
!creator "src/ipaggcreate -s /home/kohler/largedump.gz"
!counts packets
!times 976937726.638704 977337361.804592 399635.165888
!num_nonzero 1437
!ip
4.2.49.2 1
4.2.49.4 1
4.17.143.9 1
4.21.203.29 104
The `-s' option, which is equivalent to `--src', tells ipaggcreate to
categorize each packet by its source IP address.
`/home/kohler/largedump.gz' is a compressed tcpdump(1) file. Each data
line represents a label; the first field is the label number (here, an
IP source address), and the second field the number of packets that had
that label. Labels with 0 counts are not reported.
OPTIONS
Data Sources
Data source options tell ipaggcreate what kind of data source to use:
tcpdump(1) raw-packet files (--tcpdump), live network interfaces
(--interface), NetFlow summary files (--netflow-summary), ipsumdump
output files (--ipsumdump), DAG or NLANR-formatted files (--dag,
--nlanr), or others.
Non-option arguments specify the files, or interfaces, to read. For
example, `ipaggcreate -r eth0 eth1' will read two tcpdump(1) files,
named "eth0" and "eth1"; `ipaggcreate -i eth0 eth1' will read from two
live network interfaces, "eth0" and "eth1".
Options that read files read from the standard input when you supply a
single dash "-" as a filename, or when you give no filenames at all.
--tcpdump, -r
Read from one or more files produced by tcpdump(1)'s -w option
(also known as "pcap files"). Stop when all the files are
exhausted. This is the default. Files (except for standard input)
may be compressed by gzip(1) or bzip2(1); ipsumdump will uncompress
them on the fly.
--interface, -i
Read from live network interfaces. When run this way, ipsumdump
will continue until interrupted with SIGINT or SIGHUP. When
stopped, ipsumdump appends a comment to its output file, indicating
how many packets were dropped by the kernel before output.
--ipsumdump
Read from one or more ipsumdump files. Any packet characteristics
not specified by the input files are set to 0.
--format=format
Read from one or more ipsumdump files, using the specified default
format. The format should be a space-separated list of content
types; see ToIPSummaryDump(n) for a list.
--dag[=encap]
Read from one or more DAG-formatted trace files. For new-style ERF
dumps, which contain encapsulation type information, just say
--dag. For old-style dumps, you must supply the right encap
argument: "ATM" for ATM RFC-1483 encapsulation (the most common),
"ETHER" for Ethernet, "PPP" for PPP, "IP" for raw IP, "HDLC" for
Cisco HDLC, "PPP_HDLC" for PPP HDLC, or "SUNATM" for Sun ATM. See
<http://dag.cs.waikato.ac.nz/>.
--nlanr
Read from one or more NLANR-formatted trace files (fr, fr+, or tsh
format). See <http://pma.nlanr.net/Traces/>.
--ip-addresses
Read files containing IP addresses, one address per line. The
label must be either --src or --dst.
--tu-summary
Read TCP/UDP summary files. Each line represents one packet, and
carries the following information: timestamp, source address,
source port, destination address, destination port, protocol,
payload length. For example:
976937735.345744 18.26.4.9 22 64.55.139.202 26876 T 0
976937770.197008 128.10.5.110 63749 64.55.139.202 113 T 5
--bro-conn-summary
Read Bro connection summary files. Each line represents one
connection attempt, and carries the following information:
timestamp, source address, destination address, direction
(inbound/outbound).
--netflow-summary
Read from one or more NetFlow summary files. These are line-
oriented ASCII files; blank lines, and lines starting with '!' or
'#', are ignored. Other lines should contain 15 or more fields
separated by vertical bars '|'. Ipsumdump pays attention to some
of these fields:
Field Meaning Example
----- ---------------------------- ----------
0 Source IP address 192.4.1.32
1 Destination IP address 18.26.4.44
5 Packet count in flow 5
6 Byte count in flow 10932
7 Flow timestamp (UNIX-style) 998006995
8 Flow end timestamp 998006999
9 Source port 3917
10 Destination port 80
12 TCP flags (OR of all pkts) 18
13 IP protocol 6
14 IP TOS bits 0
--tcpdump-text
Read from one or more files containing tcpdump(1) textual output.
It's much better to use the binary files produced by 'tcpdump -w',
but if someone threw those away and all you have is the ASCII
output, you can still make do. Only works with tcpdump versions
3.7 and earlier.
Label
These options determine how packets are labeled; you can supply at most
one.
--src, -s
Label by IP source address; all packets with the same source
address form an aggregate.
--dst, -d
Label by IP destination address. This is the default.
--length, -l
Label by IP length.
--ip field
Label by the named IP field. Examples include "ip src" (equivalent
to --src), "ip ttl", "ip off", "udp sport", and so forth. See
AggregateIP(1) for a full list.
--flows
Label by TCP or UDP flow, or, essentially, by end-to-end transport-
level connection. Two packets have the same label if and only if
they are part of the same TCP or UDP connection. Each flow is
assigned its own label. The label number is not meaningful;
non-TCP/UDP packets are ignored.
--unidirectional-flows
Label by unidirectional TCP or UDP flow. Like --flows, but packets
from a single connection but heading in different directions are
assigned different labels.
--address-pairs
Label by address pair. Two packets have the same label if and only
if they involve the same pair of IP addresses. The label number is
not meaningful.
--unidirectional-address-pairs
Label by unidirectional address pair. Two packets have the same
label if and only if their source addresses match and their
destination address match.
Measurement Options
These options specify whether ipaggcreate should count packets or
bytes.
--packets
Count packets: the output file will report the number of packets
per label. This is the default.
--bytes, -B
Count bytes: the output file will report the number of bytes per
label. This number includes IP and transport headers, but not any
link headers.
Limit and Split Options
These options select portions of the trace file, and allow the user to
split trace data into multiple aggregate files.
--time-offset=time, -T time
Ignore the first time worth of packets in the input trace. If the
first packet has timestamp T, then all packets (including the
first) with timestamp less than T+time are ignored. The time
argument can be an absolute number of seconds (938.42), or use
suffixes such as "100s", "12ms", "1.5min", "2hr", and so forth.
--start-time=time
Ignore packets with timestamps less than time.
--interval=time, -t time
Stop after recording aggregate information for time worth of
packets. That is, if the first recorded packet has timestamp T,
then ipaggcreate will exit just before the first packet with
timestamp T+time, or the end of the trace, whichever comes first.
--limit-labels=count
Stop after recording information for count distinct labels. That
is, exit just before encountering a packet with the count+1
different label, or at the end of the trace, whichever comes first.
The four --split options generate multiple aggregate output files based
on characteristics of the input. To use --split, you must supply an
explicit --output filename containing a "%d"-style template; a file
number is plugged in to that template. For example, the template
"file%03d.txt" will generate files "file001.txt", "file002.txt", and so
forth.
--split-time=time
Start a new output file every time period. That is, each file will
contain data for at most time worth of packets.
--split-labels=count
Start a new output file every count distinct labels. That is, each
file will contain at most count different labels.
--split-packets=count
Start a new output file every count packets.
--split-bytes=count
Start a new output file every count bytes.
Other Options
--output=file, -o file
Write the summary dump to file instead of to the standard output.
--binary, -b
Write the summary dump in binary format. See below for more
information.
--write-tcpdump=file, -w file
Write processed packets to a tcpdump(1) file -- or to the standard
output, if file is a single dash "-" -- in addition to the usual
summary output.
--filter=filter, -f filter
Only include packets and flows matching a tcpdump(1) filter. For
example, `ipsumdump -f "tcp && src net 18/8"' will summarize data
only for TCP packets from net 18. (The syntax for filter is
currently a subset of tcpdump's syntax.)
--anonymize, -A
Anonymize IP addresses in the output. The anonymization preserves
prefix and class. This means, first, that two anonymized addresses
will share the same prefix when their non-anonymized counterparts
share the same prefix; and second, that anonymized addresses will
be in the same class (A, B, C, or D) as their non-anonymized
counterparts. The anonymization algorithm comes from tcpdpriv(1);
it works like `tcpdpriv -A50 -C4'.
If --anonymize and --write-tcpdump are both on, the tcpdump output
file will have anonymized IP addresses. However, the file will
contain actual packet data, unlike tcpdpriv output.
--no-promiscuous
Do not place interfaces into promiscuous mode. Promiscuous mode is
the default.
--sample=p
Sample packets with probability p. That is, p is the chance that a
packet will cause output to be generated. The actual probability
may differ from the specified probability, due to fixed point
arithmetic; check the output for a `"!sampling_prob"' comment to
see the real probability. Strictly speaking, this option samples
records, not packets, so for NetFlow summaries without
--multipacket, it will sample flows.
--multipacket
Supply this option if you are reading NetFlow or IP summaries --
files where each record might represent multiple packets -- and you
would like the output summary to have one line per packet, instead
of the default one line per record. See also --packet-count, above.
--collate
Sort output packets by increasing timestamp. Use this option when
reading from multiple tcpdump(1) files to ensure that the output
has sorted timestamps. Combine --collate with --write-tcpdump to
collate overlapping tcpdump(1) files into a single, sorted
tcpdump(1) file.
--random-seed=seed
Set the random seed deterministically to seed, an unsigned integer.
By default, the random seed is initialized to a random value using
/dev/random, if it exists, combined with other data. The random
seed indirectly determines which packets are sampled, and the
values of anonymized IP addresses.
--quiet, -q
Do not print a progress bar to standard error. This is the default
when ipsumdump isn't running interactively.
--config
Do not produce a summary. Instead, write the Click configuration
that ipsumdump would run to the standard output.
--verbose, -V
Produce more verbose error messages.
--help, -h
Print a help message to the standard output, then exit.
--version, -v
Print version number and license information to the standard
output, then exit.
SIGNALS
When killed with SIGTERM or SIGINT, ipaggcreate will exit cleanly (and
generate an output file). If you want it to flush its buffers without
exiting, kill it with SIGHUP.
BINARY FORMAT
Binary ipaggcreate files begin with several ASCII lines, just like
regular ipaggcreate files. A line `"!packed_be"' or `"!packed_le"'
indicates that the rest of the file, starting immediately after the
newline, consists of binary records (in big-endian or little-endian
order, respectively). Each record is 8 bytes long, and looks like
this:
+---------------+---------------+
| label | count |
+---------------+---------------+
<---4 bytes---> <---4 bytes--->
The initial word of data contains the label number, the second the
count.
CLICK
The ipaggcreate program uses the Click modular router, an extensible
system for processing packets. Click routers consist of C++ components
called elements. While some elements run only in a Linux kernel, most
can run either in the kernel or in user space, and there are user-level
elements for reading packets from libpcap or from tcpdump files.
Ipaggcreate creates and runs a user-level Click configuration.
However, you don't need to install Click to run ipsumdump; the libclick
directory contains all the relevant parts of Click, bundled into a
library.
If you're curious, try running `ipaggcreate --config' with some other
options to see the Click configuration ipsumdump would run.
This is, I think, a pleasant way to write a packet processor!
SEE ALSO
tcpdump(1), tcpdpriv(1), click(1), ipsumdump(1)
See http://www.pdos.csail.mit.edu/click/ for more on Click.
AUTHOR
Eddie Kohler <kohler@cs.ucla.edu>, based on the Click modular router.
Anonymization algorithm from tcpdpriv(1) by Greg Minshall.
Version 1.83 2013-09-29 IPAGGCREATE(1)