DragonFly On-Line Manual Pages
rwbagtool(1) SiLK Tool Suite rwbagtool(1)
NAME
rwbagtool - Perform high-level operations on binary Bag files
SYNOPSIS
rwbagtool { --add | --subtract | --minimize | --maximize
| --divide | --scalar-multiply=VALUE
| --compare={lt | le | eq | ge | gt} }
[--intersect=SETFILE | --complement-intersect=SETFILE]
[--mincounter=VALUE] [--maxcounter=VALUE]
[--minkey=VALUE] [--maxkey=VALUE]
[--invert] [--coverset] [--ipset-record-version=VERSION]
[--output-path=OUTPUTFILE]
[--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD]
[BAGFILE[ BAGFILE...]]
rwbagtool --help
rwbagtool --version
DESCRIPTION
rwbagtool performs various operations on Bags. It can add Bags
together, subtract a subset of data from a Bag, perform key
intersection of a Bag with an IP set, extract the key list of a Bag as
an IP set, or filter Bag records based on their counter value.
BAGFILE is a the name of a file or a named pipe, or the names "stdin"
or "-" to have rwbagtool read from the standard input. If no Bag file
names are given on the command line, rwbagtool attempts to read a Bag
from the standard input. If BAGFILE does not contain a Bag, rwbagtool
prints an error to stderr and exits abnormally.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an
exact match for an option. A parameter to an option may be specified
as --arg=param or --arg param, though the first form is required for
options that take optional parameters.
Operation switches
The first set of options are mutually exclusive; only one may be
specified. If none are specified, the counters in the Bag files are
summed.
--add
Sum the counters for each key for all Bag files given on the
command line. If a key does not exist, it has a counter of zero.
If no other operation is specified, the add operation is the
default.
--subtract
Subtract from the first Bag file all subsequent Bag files. If a
key does not appear in the first Bag file, rwbagtool assumes it has
a value of 0. If any counter subtraction results in a negative
number, the key will not appear in the resulting Bag file.
--minimize
Cause the output to contain the minimum counter seen for each key.
Keys that do not appear in all input Bags will not appear in the
output.
--maximize
Cause the output to contain the maximum counter seen for each key.
The output will contain each key that appears in any input Bag.
--divide
Divide the first Bag file by the second Bag file. It is an error
if more than two Bag files are specified. Every key in the first
Bag file must appear in the second file; the second Bag may have
keys that do not appear in the first, and those keys will not
appear in the output. Since Bags do not support floating point
numbers, the result of the division is rounded to the nearest
integer (values ending in .5 are rounded up). If the result of the
division is less than 0.5, the key will not appear in the output.
--scalar-multiply=VALUE
Multiply each counter in the Bag file by the scalar VALUE, where
VALUE is an integer in the range 1 to 18446744073709551615. This
switch accepts a single Bag as input.
--compare=OPERATION
Compare the key/counter pairs in exactly two Bag files. It is an
error if more than two Bag files are specified. The keys in the
output Bag will only be those whose counter in the first Bag is
OPERATION the counter in the second Bag. The counters for all keys
in the output will be 1. Any key that does not appear in both
input Bag files will not appear in the result. The possible
OPERATION values are the strings:
"lt"
GetCounter(Bag1, key) < GetCounter(Bag2, key)
"le"
GetCounter(Bag1, key) <= GetCounter(Bag2, key)
"eq"
GetCounter(Bag1, key) == GetCounter(Bag2, key)
"ge"
GetCounter(Bag1, key) >= GetCounter(Bag2, key)
"gt"
GetCounter(Bag1, key) > GetCounter(Bag2, key)
Masking/Limiting switches
The result of the above operation is an intermediate Bag file. The
following switches are applied next to remove entries from the
intermediate Bag:
--intersect=SETFILE
Mask the keys in the intermediate Bag using the set in SETFILE.
SETFILE is the name of a file or a named pipe containing an IPset,
or the name "stdin" or "-" to have rwbagtool read the IPset from
the standard input. If SETFILE does not contain an IPset,
rwbagtool prints an error to stderr and exits abnormally. Only
key/counter pairs where the key matches an entry in SETFILE are
written to the output. (IPsets are typically created by rrwwsseett(1)
or rrwwsseettbbuuiilldd(1).)
--complement-intersect=SETFILE
As --intersect, but only writes key/counter pairs for keys which do
not match an entry in SETFILE.
--mincounter=VALUE
Cause the output to contain only those records whose counter value
is VALUE or higher. The allowable range is 1 to the maximum
counter value; the default is 1.
--maxcounter=VALUE
Cause the output to contain only those records whose counter value
is VALUE or lower. The allowable range is 1 to the maximum counter
value; the default is the maximum counter value.
--minkey=VALUE
Cause the output to contain only those records whose key value is
VALUE or higher. Default is 0 (or 0.0.0.0). Accepts input as an
integer or as an IP address in dotted decimal notation.
--maxkey=VALUE
Cause the output to contain only those records whose key value is
VALUE or higher. Default is 4294967295 (or 255.255.255.255).
Accepts input as an integer or as an IP address in dotted decimal
notation.
Output switches
The following switches control the output.
--invert
Generate a new Bag whose keys are the counters in the intermediate
Bag and whose counter is the number of times the counter was seen.
For example, this turns the Bag {sip:flow} into the Bag
{flow:count(sip)}. Any counter in the intermediate Bag that is
larger than the maximum possible key will be attributed to the
maximum key; to prevent this, specify "--maxcounter=4294967295".
--coverset
Instead of creating a Bag file as the output, write an IPset which
contains the keys contained in the intermediate Bag.
--ipset-record-version=VERSION
Specify the format of the IPset records that are written to the
output when the --coverset switch is used. Valid values are 0, 2,
3, and 4. When the switch is not provided, the
SILK_IPSET_RECORD_VERSION environment variable is checked for a
version. A VERSION of 2 creates a file compatible with SiLK 2.x,
and it can only be used for IPsets containing IPv4 addresses. A
VERSION of 3 creates a file that can only be read by SiLK 3.0 or
later. A VERSION of 4 creates a file that can only be read by
SiLK 3.7 or later. Version 4 files are smaller than version 3
files. The default VERSION is 0, which uses version 2 for IPv4
IPsets and version 3 for IPv6 IPsets. Since SiLK 3.11.0.
--output-path=OUTPUTFILE
Redirect output to OUTPUTFILE. OUTPUTFILE is the name of a file or
a named pipe, or the name "stdout" or "-" to write the result to
the standard output.
--note-strip
Do not copy the notes (annotations) from the input files to the
output file. Normally notes from the input files are copied to the
output.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an
annotation. This switch may be repeated to add multiple
annotations to a file. To view the annotations, use the
rrwwffiilleeiinnffoo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of
the output file as an annotation. This switch may be repeated to
add multiple annotations. Currently the application makes no
effort to ensure that FILENAME contains text; be careful that you
do not attempt to add a SiLK data file as an annotation.
--compression-method=COMP_METHOD
Specify how to compress the output. When this switch is not given,
output to the standard output or to named pipes is not compressed,
and output to files is compressed using the default chosen when
SiLK was compiled. The valid values for COMP_METHOD are determined
by which external libraries were found when SiLK was compiled. To
see the available compression methods and the default method, use
the --help or --version switch. SiLK can support the following
COMP_METHOD values when the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zzlliibb(3) library for compressing the output, and always
compress the output regardless of the destination. Using zlib
produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression
library for compression, and always compress the output
regardless of the destination. This compression provides good
compression with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the
output when writing to a file.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was
configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ("$") represents the shell
prompt. The text after the dollar sign represents the command line.
Lines have been wrapped for improved readability, and the back slash
("\") is used to indicate a wrapped line.
The examples assume the following contents for the files:
Bag1.bag Bag2.bag Bag3.bag Bag4.bag Mask.set
3| 10| 1| 1| 2| 8| 1| 1| 2
4| 7| 4| 2| 4| 10| 4| 3| 4
6| 14| 7| 32| 6| 14| 6| 4| 6
7| 23| 8| 2| 7| 12| 7| 4| 8
8| 2| 9| 8| 8| 6|
Adding Bag files
$ rwbagtool --add Bag1.bag Bag2.bag > Bag-sum.bag
$ rwbagcat --integer-keys Bag-sum.bag
1| 1|
3| 10|
4| 9|
6| 14|
7| 55|
8| 4|
$ rwbagtool --add Bag1.bag Bag2.bag Bag3.bag > Bag-sum2.bag
$ rwbagcat --integer-keys Bag-sum2.bag
1| 1|
2| 8|
3| 10|
4| 19|
6| 28|
7| 67|
8| 4|
9| 8|
Subtracting Bag Files
$ rwbagtool --sub Bag1.bag Bag2.bag > Bag-diff.bag
$ rwbagcat --integer-keys Bag-diff.bag
3| 10|
4| 5|
6| 14|
$ rwbagtool --sub Bag2.bag Bag1.bag > Bag-diff2.bag
$ rwbagcat --integer-keys Bag-diff2.bag
1| 1|
7| 9|
Getting the Minimum Value
$ rwbagtool --minimize Bag1.bag Bag2.bag Bag3.bag > Bag-min.bag
$ rwbagcat --integer-keys Bag-min.bag
4| 2|
7| 12|
Getting the Maximum Value
$ rwbagtool --maximize Bag1.bag Bag2.bag Bag3.bag > Bag-max.bag
$ rwbagcat --integer-keys Bag-max.bag
1| 1|
2| 8|
3| 10|
4| 10|
6| 14|
7| 32|
8| 2|
9| 8|
Dividing Bag Files
$ rwbagtool --divide Bag2.bag Bag4.bag > Bag-div1.bag
$ rwbagcat --integer-keys Bag-div1.bag
1| 1|
4| 1|
7| 8|
However, when the order is reversed:
$ rwbagtool --divide Bag4.bag Bag2.bag > Bag-div2.bag
rwbagtool: Error dividing bags; key 6 not in divisor bag
To work around this issue, use the --coverset switch to create a copy
of Bag4.bag that contains only the keys in Bag2.bag
$ rwbagtool --coverset Bag2.bag > Bag2-keys.set
$ rwbagtool --intersect=Bag2-keys.set Bag4.bag > Bag4-small.bag
$ rwbagtool --divide Bag4-small.bag Bag2.bag > Bag-div2.bag
$ rwbagcat --integer-keys Bag-div2.bag
1| 1|
4| 2|
8| 3|
Or, in a single piped command without writing the IPset to disk:
$ rwbagtool --coverset Bag2.bag \
| rwbagtool --intersect=- Bag4.bag \
| rwbagtool --divide - Bag2.bag \
| rwbagcat --integer-keys
1| 1|
4| 2|
8| 3|
Scalar Multiplication
$ rwbagtool --scalar-multiply=7 Bag1.bag > Bag-multiply.bag
$ rwbagcat --integer-keys Bag-multiply.bag
3| 70|
4| 49|
6| 98|
7| 161|
8| 14|
Comparing Bag Files
$ rwbagtool --compare=lt Bag1.bag Bag2.bag > Bag-lt.bag
$ rwbagcat --integer-keys Bag-lt.bag
7| 1|
$ rwbagtool --compare=le Bag1.bag Bag2.bag > Bag-le.bag
$ rwbagcat --integer-keys Bag-le.bag
7| 1|
8| 1|
$ rwbagtool --compare=eq Bag1.bag Bag2.bag > Bag-eq.bag
$ rwbagcat --integer-keys Bag-eq.bag
8| 1|
$ rwbagtool --compare=ge Bag1.bag Bag2.bag > Bag-ge.bag
$ rwbagcat --integer-keys Bag-ge.bag
4| 1|
8| 1|
$ rwbagtool --compare=gt Bag1.bag Bag2.bag > Bag-gt.bag
$ rwbagcat --integer-keys Bag-gt.bag
4| 1|
Making a Cover Set
$ rwbagtool --coverset Bag1.bag Bag2.bag Bag3.bag > Cover.set
$ rwsetcat --integer-keys Cover.set
1
2
3
4
6
7
8
9
Inverting a Bag
$ rwbagtool --invert Bag1.bag > Bag-inv1.bag
$ rwbagcat --integer-keys Bag-inv1.bag
2| 1|
7| 1|
10| 1|
14| 1|
23| 1|
$ rwbagtool --invert Bag2.bag > Bag-inv2.bag
$ rwbagcat --integer-keys Bag-inv2.bag
1| 1|
2| 2|
32| 1|
$ rwbagtool --invert Bag3.bag > Bag-inv3.bag
$ rwbagcat --integer-keys Bag-inv3.bag
8| 2|
10| 1|
12| 1|
14| 1|
Masking Bag Files
$ rwbagtool --intersect=Mask.set Bag1.bag > Bag-mask.bag
$ rwbagcat --integer-keys Bag-mask.bag
4| 7|
6| 14|
8| 2|
$ rwbagtool --complement-intersect=Mask.set Bag1.bag > Bag-mask2.bag
$ rwbagcat --integer-keys Bag-mask2.bag
3| 10|
7| 23|
Restricting the Output
$ rwbagtool --add --maxkey=5 Bag1.bag Bag2.bag > Bag-res1.bag
$ rwbagcat --integer-keys Bag-res1.bag
1| 1|
3| 10|
4| 9|
$ rwbagtool --minkey=3 --maxkey=6 Bag1.bag > Bag-res2.bag
$ rwbagcat --integer-keys Bag-res2.bag
3| 10|
4| 9|
6| 14|
$ rwbagtool --mincounter=20 Bag1.bag Bag2.bag > Bag-res3.bag
$ rwbagcat --integer-keys Bag-res3.bag
7| 55|
$ rwbagtool --sub --maxcounter=9 Bag1.bag Bag2.bag > Bag-res4.bag
$ rwbagcat --integer-keys Bag-res4.bag
4| 5|
ENVIRONMENT
SILK_IPSET_RECORD_VERSION
This environment variable is used as the value for the
--ipset-record-version when that switch is not provided.
SILK_CLOBBER
The SiLK tools normally refuse to overwrite existing files.
Setting SILK_CLOBBER to a non-empty value removes this restriction.
SEE ALSO
rrwwbbaagg(1), rrwwbbaaggbbuuiilldd(1), rrwwbbaaggccaatt(1), rrwwffiilleeiinnffoo(1), rrwwsseett(1),
rrwwsseettbbuuiilldd(1), rrwwsseettccaatt(1), ssiillkk(7), zzlliibb(3)
SiLK 3.11.0.1 2016-02-19 rwbagtool(1)