DragonFly On-Line Manual Pages
rwbagbuild(1) SiLK Tool Suite rwbagbuild(1)
NAME
rwbagbuild - Create a binary Bag from non-flow data.
SYNOPSIS
rwbagbuild { --set-input=SETFILE | --bag-input=TEXTFILE }
[--delimiter=C] [--default-count=DEFAULTCOUNT]
[--key-type=FIELD_TYPE] [--counter-type=FIELD_TYPE]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD] [--output-path=OUTPUTFILE]
rwbagbuild --help
rwbagbuild --version
DESCRIPTION
rwbagbuild builds a binary Bag file from an IPset file or from textual
input.
When creating a Bag from an IPset, the value associated with each IP
address is the value given by the --default-count switch, or 1 if the
switch isn't provided.
The textual input read from the argument to the --bag-input switch is
processed a line at a time. Comments begin with a '"#"'-character and
continue to the end of the line; they are stripped from each line. Any
line that is blank or contains only whitespace is ignored. All other
lines must contain a valid key or key-count pair; whitespace around the
key and count is ignored.
If the delimiter character (specified by the --delimiter switch and
having pipe ('"|"') as its default) is not present, the line must
contain only an IP address or an integer key. If the delimiter is
present, the line must contain an IP address or integer key before the
delimiter and an integer count after the delimiter. These lines may
have a second delimiter after the integer count; the second delimiter
and any text to the right of it are ignored.
When the --default-count switch is specified, its value is used as the
count for each key, and the count value parsed from each line, if any,
is ignored. Otherwise, the parsed count is used, or 1 is used as the
count if no delimiter was present.
For each key-count pair, the key is inserted into Bag with its count
or, if the key is already present in the Bag, its total count is be
incremented by the count from this line. When using the
--default-count switch, the count for a key that appears in the input N
times is the product of N and DEFAULTCOUNT.
The IP address or integer key must be expressed in one of the following
formats. rwbagbuild complains if the key field contains a mixture of
IPv6 addresses and integer values.
o Dotted decimal---all 4 octets are required:
10.1.2.4
o An unsigned 32-bit integer:
167838212
o An IPv6 address in canonical form (when SiLK has been compiled with
IPv6 support):
2001:db8:a:1::2:4
::ffff:10.1.2.4
o Any of the above with a CIDR designation---for dotted decimal all
four octets are still required:
10.1.2.4/31
167838212/31
2001:db8:a:1::2:4/127
::ffff:10.1.2.4/31
o SiLK IP wildcard notation. A SiLK IP Wildcard can represent
multiple IPv4 or IPv6 addresses. An IP Wildcard contains an IP in
its canonical form, except each part of the IP (where part is an
octet for IPv4 or a hexadectet for IPv6) may be a single value, a
range, a comma separated list of values and ranges, or the letter
"x" to signify all values for that part of the IP (that is, "0-255"
for IPv4). You may not specify a CIDR suffix when using the IP
Wildcard notation.
10.x.1-2.4,5
2001:db8:a:x::1-2:4,5
If an IP address or count cannot be parsed, or if a line contains a
delimiter character but no count, rwbagbuild prints an error and exits.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an
exact match for an option. A parameter to an option may be specified
as --arg=param or --arg param, though the first form is required for
options that take optional parameters.
The following two switches control the type of input; one and only one
must be provided:
--set-input=SETFILE
Create a Bag from an IPset. SETFILE is a filename, a named pipe,
or the keyword "stdin" or "-" to read the IPset from the standard
input. Counts have a volume of 1 when the --default-count switch
is not specified. (IPsets are typically created by rrwwsseett(1) or
rrwwsseettbbuuiilldd(1).)
--bag-input=TEXTFILE
Create a Bag from a delimited text file. TEXTFILE is a filename, a
named pipe, or the keyword "stdin" or "-" to read the text from the
standard input. See the "DESCRIPTION" section for the syntax of
the TEXTFILE.
--delimiter=C
The delimiter to expect between each key-count pair of the TEXTFILE
read by the --bag-input switch. The default delimiter is the
vertical pipe ('"|"'). The delimiter is ignored if the --set-input
switch is specified. When the delimiter is a whitespace character,
any amount of whitespace may surround and separate the key and
counter. Since '"#"' is used to denote comments and newline is
used to denote records, neither is a valid delimiter character.
--default-count=DEFAULTCOUNT
Override the counts of all values in the input text or IPset with
the value of DEFAULTCOUNT. DEFAULTCOUNT must be a positive
integer.
--key-type=FIELD_TYPE
Write a entry into the header of the Bag file that specifies the
key contains FIELD_TYPE values. When this switch is not specified,
the key type of the Bag is set to "custom". The FIELD_TYPE is case
insensitive. The supported FIELD_TYPEs are:
sIPv4
source IP address, IPv4 only
dIPv4
destination IP address, IPv4 only
sPort
source port
dPort
destination port
protocol
IP protocol
packets
packets, see also "sum-packets"
bytes
bytes, see also "sum-bytes"
flags
bitwise OR of TCP flags
sTime
starting time of the flow record, seconds resolution
duration
duration of the flow record, seconds resolution
eTime
ending time of the flow record, seconds resolution
sensor
sensor ID
input
SNMP input
output
SNMP output
nhIPv4
next hop IP address, IPv4 only
initialFlags
TCP flags on first packet in the flow
sessionFlags
bitwise OR of TCP flags on all packets in the flow except the
first
attributes
flow attributes set by the flow generator
application
guess as to the content of the flow, as set by the flow
generator
class
class of the sensor
type
type of the sensor
icmpTypeCode
an encoded version of the ICMP type and code, where the type is
in the upper byte and the code is in the lower byte
sIPv6
source IP, IPv6
dIPv6
destination IP, IPv6
nhIPv6
next hop IP, IPv6
records
count of flows
sum-packets
sum of packet counts
sum-bytes
sum of byte counts
sum-duration
sum of duration values
any-IPv4
a generic IPv4 address
any-IPv6
a generic IPv6 address
any-port
a generic port
any-snmp
a generic SNMP value
any-time
a generic time value, in seconds resolution
custom
a number
--counter-type=FIELD_TYPE
Write a entry into the header of the Bag file that specifies the
counter contains FIELD_TYPE values. When this switch is not
specified, the counter type of the Bag is set to "custom". The
supported FIELD_TYPEs are the same as those for the key.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an
annotation. This switch may be repeated to add multiple
annotations to a file. To view the annotations, use the
rrwwffiilleeiinnffoo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of
the output file as an annotation. This switch may be repeated to
add multiple annotations. Currently the application makes no
effort to ensure that FILENAME contains text; be careful that you
do not attempt to add a SiLK data file as an annotation.
--compression-method=COMP_METHOD
Specify how to compress the output. When this switch is not given,
output to the standard output or to named pipes is not compressed,
and output to files is compressed using the default chosen when
SiLK was compiled. The valid values for COMP_METHOD are determined
by which external libraries were found when SiLK was compiled. To
see the available compression methods and the default method, use
the --help or --version switch. SiLK can support the following
COMP_METHOD values when the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zzlliibb(3) library for compressing the output, and always
compress the output regardless of the destination. Using zlib
produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression
library for compression, and always compress the output
regardless of the destination. This compression provides good
compression with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the
output when writing to a file.
--output-path=OUTPUTFILE
Redirect output to OUTPUTFILE. OUTPUTFILE is a filename, a named
pipe, or the keyword "stdout" or "-" to write the bag to the
standard output.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was
configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ("$") represents the shell
prompt. The text after the dollar sign represents the command line.
Lines have been wrapped for improved readability, and the back slash
("\") is used to indicate a wrapped line.
Create a bag with IP addresses as keys from a text file
Assume the file mybag.txt contains the following lines, where each line
contains an IP address, a comma as a delimiter, a count, and ends with
a newline.
192.168.0.1,5
192.168.0.2,500
192.168.0.3,3
192.168.0.4,14
192.168.0.5,5
To build a bag with it:
$ rwbagbuild --bag-input=mybag.txt --delimiter=, > mybag.bag
Use rrwwbbaaggccaatt(1) to view its contents:
$ rwbagcat mybag.bag
192.168.0.1| 5|
192.168.0.2| 500|
192.168.0.3| 3|
192.168.0.4| 14|
192.168.0.5| 5|
Create a bag with protocols as keys from a text file
To create a Bag of protocol data from the text file myproto.txt:
1| 4|
6| 138|
17| 131|
use
$ rwbagbuild --key-type=proto --bag-input=myproto.txt > myproto.bag
$ rwbagcat myproto.bag
1| 4|
6| 138|
17| 131|
When the --key-type switch is specified, rwbagcat knows the keys should
be printed as integers, and rrwwffiilleeiinnffoo(1) shows the type of the key:
$ rwfileinfo --fields=bag myproto.bag
myproto.bag:
bag key: protocol @ 4 octets; counter: custom @ 8 octets
Without the --key-type switch, rwbagbuild assumes the integers in
myproto.txt represent IP addresses:
$ rwbagbuild --bag-input=myproto.txt | rwbagcat
0.0.0.1| 4|
0.0.0.6| 138|
0.0.0.17| 131|
Although the --integer-keys switch on rwbagcat forces it to print keys
as integers, it is generally better to use the --key-type switch when
creating the bag.
$ rwbagbuild --bag-input=myproto.txt | rwbagcat --integer-keys
1| 4|
6| 138|
17| 131|
Create a bag and override the existing counter
To ignore the counts that exist in myproto.txt and set the counts for
each protocol to 1, use the --default-count switch which overrides the
existing value:
$ rwbagbuild --key-type=protocol --bag-input=myproto.txt \
--default-count=1 --output-path=myproto1.bag
$ rwbagcat myproto1.bag
1| 1|
6| 1|
17| 1|
Create a bag with IP addresses as keys from an IPset file
Given the IP set myset.set, create a bag where every entry in the bag
has a count of 3:
$ rwbagbuild --set-input=myset.set --default-count=3 \
--out=mybag2.bag
Create a bag from multiple input files
Suppose we have three IPset files, A.set, B.set, and C.set:
$ rwsetcat A.set
10.0.0.1
10.0.0.2
$ rwsetcat B.set
10.0.0.2
10.0.0.3
$ rwsetcat C.set
10.0.0.1
10.0.0.2
10.0.0.4
We want to create a bag file from these IPset files where the count for
each IP address is the number of files that IP appears in. rwbagbuild
accepts a single file as an argument, so we cannot do the following:
$ rwbagbuild --set-input=A.set --set-input=B.set ... # WRONG!
(Even if we could repeat the --set-input switch, specifying it multiple
times would be annoying if we had 300 files instead of only 3.)
The IPset files are (mathematical) sets, so if we join them together
first with rrwwsseettttooooll(1) and then run rwbagbuild, each IP address gets a
count of 1:
$ rwsettool --union A.set B.set C.set \
| rwbagbuild --set-input=- \
| rwbagcat
10.0.0.1| 1|
10.0.0.2| 1|
10.0.0.3| 1|
10.0.0.4| 1|
When rwbagbuild is processing textual input, it sums the counters for
keys that appear in the input multiple times. We can use rrwwsseettccaatt(1)
to convert each IPset file to text and feed that as single textual
stream to rwbagbuild. Use the --cidr-blocks switch on rwsetcat to
reduce the amount of input that rwbagbuild must process. This is
probably the best approach to the problem:
$ rwsetcat --cidr-block *.set | rwbagbuild --bag-input=- > total1.bag
$ rwbagcat total1.bag
10.0.0.1| 2|
10.0.0.2| 3|
10.0.0.3| 1|
10.0.0.4| 1|
A less efficient solution is to convert each IPset to a bag and then
use rrwwbbaaggttooooll(1) to add the bags together:
$ for i in *.set ; do
rwbagbuild --set-input=$i --output-file=/tmp/$i.bag ;
done
$ rwbagtool --add /tmp/*.set.bag > total2.bag
$ rm /tmp/*.set.bag
There is no need to create a bag file for each IPset; we can get by
with only two bag files, the final bag file, total3.bag, and a
temporary file, tmp.bag. We initialize total3.bag to an empty bag. As
we loop over each IPset, rwbagbuild converts the IPset to a bag on its
standard output, rwbagtool creates tmp.bag by adding its standard input
to total3.bag, and we rename tmp.bag to total3.bag:
$ rwbagbuild --bag-input=/dev/null --output-file=total3.bag
$ for i in *.set ; do
rwbagbuild --set-input=$i \
| rwbagtool --output-file=tmp.bag --add total3.bag stdin ;
/bin/mv tmp.bag total3.bag ;
done
$ rwbagcat total3.bag
10.0.0.1| 2|
10.0.0.2| 3|
10.0.0.3| 1|
10.0.0.4| 1|
ENVIRONMENT
SILK_CLOBBER
The SiLK tools normally refuse to overwrite existing files.
Setting SILK_CLOBBER to a non-empty value removes this restriction.
SEE ALSO
rrwwbbaagg(1), rrwwbbaaggccaatt(1), rrwwbbaaggttooooll(1), rrwwffiilleeiinnffoo(1), rrwwsseett(1),
rrwwsseettbbuuiilldd(1), rrwwsseettccaatt(1), rrwwsseettttooooll(1), ssiillkk(7), zzlliibb(3)
BUGS
The --default-count switch is poorly named.
SiLK 3.11.0.1 2016-02-19 rwbagbuild(1)