DragonFly On-Line Manual Pages

silkpython(3)                   SiLK Tool Suite                  silkpython(3)

NAME
       silkpython - SiLK Python plug-in

SYNOPSIS
        rwfilter --python-file=FILENAME [--python-file=FILENAME ...] ...

        rwfilter --python-expr=PYTHON_EXPRESSION ...

        rwcut --python-file=FILENAME [--python-file=FILENAME ...]
              --fields=FIELDS ...

        rwgroup --python-file=FILENAME [--python-file=FILENAME ...]
              --id-fields=FIELDS ...

        rwsort --python-file=FILENAME [--python-file=FILENAME ...]
              --fields=FIELDS ...

        rwstats --python-file=FILENAME [--python-file=FILENAME ...]
              --fields=FIELDS --values=VALUES ...

        rwuniq --python-file=FILENAME [--python-file=FILENAME ...]
              --fields=FIELDS --values=VALUES ...

DESCRIPTION
       The SiLK Python plug-in provides a way to use PySiLK (the SiLK
       extension for ppyytthhoonn(1) described in ppyyssiillkk(3)) to extend the
       capability of several SiLK tools.

       o   In rrwwffiilltteerr(1), new partitioning rules can be defined in PySiLK to
           determine whether a SiLK Flow record is written to the
           --pass-destination or --fail-destination.

       o   In rrwwccuutt(1), new fields can be defined in PySiLK and displayed for
           each record.

       o   New fields can also be defined in rrwwggrroouupp(1) and rrwwssoorrtt(1).  These
           fields are used as part of the key when grouping or sorting the
           records.

       o   For rrwwssttaattss(1) and rrwwuunniiqq(1), two types of fields can be defined:
           Key fields are used to categorize the SiLK Flow records into bins,
           and aggregate value fields compute a value across all the SiLK Flow
           records that are categorized into a bin.  (An example of a built-in
           aggregate value field is the number of packets that were seen for
           all flow records that match a particular key.)

       To extend the SiLK tools using PySiLK, the user writes a Python file
       that calls Python functions defined in the silk.plugin Python module
       and described in this manual page.  When the user specifies the
       --python-file switch to a SiLK application, the application loads the
       Python file and makes the new functionality available.

       The following sections will describe

       o   how to create a command line switch with PySiLK that allows one to
           modify the run-time behavior of their PySiLK code

       o   how to use PySiLK with rwfilter

       o   a simple API for creating fields in rwcut, rwgroup, rwsort,
           rwstats, and rwuniq

       o   the advanced API for creating fields in those applications

       Typically you will not need to explicitly import the silk.plugin
       module, since the --python-file switch does this for you.  In a module
       used by a Python plug-in, the module can gain access to the functions
       defined in this manual page by importing them from silk.plugin:

        from silk.plugin import *

       Hint: If you want to check whether the Python code in FILENAME is
       defining the switches and fields you expect, you can load the Python
       file and examine the output of --help, for example:

        rwcut --python-file=FILENAME --help

   User-defined command line switches
       Command line switches can be added and handled from within a SiLK
       Python plug-in.  In order to add a new switch, use the following
       function:

       register_switch(switch_name, handler=handler_func, [arg=needs_arg],
       [help=help_string])

       switch_name
           Provides the name of the switch you are registering, a string.  Do
           not include the leading "--" in the name.  If a switch already
           exists with the name switch_name, the application will exit with an
           error message.

       handler_func
           handler_func([string]).  Names a function that will be called by
           the application while it is processing its command line if and only
           if the command line includes the switch --switch_name.  (If the
           switch is not given, the handler_func function will not be called.)
           When the arg parameter is specified and its value is False, the
           handler_func function will be called with no arguments.  Otherwise,
           the handler_func function will be called with a single argument: a
           string representing the value the user passed to the --switch_name
           switch.  The return value from this function is ignored.  Note that
           the rreeggiisstteerr__sswwiittcchh(()) function requires a handler argument which
           must be passed by keyword.

       needs_arg
           Specifies a boolean value that determines whether the user must
           specify an argument to --switch_name, and determines whether the
           handler_func function should expect an argument.  When arg is not
           specified or needs_arg is True, the user must specify an argument
           to --switch_name and the handler_func function will be called with
           a single argument.  When needs_arg is False, it is an error to
           specify an argument to --switch_name and handler_func will be
           called with no arguments.

       help_string
           Provides the usage text to print describing this switch when the
           user runs the application with the --help switch.  This argument is
           optional; when it is not provided, a simple "No help for this
           switch" message is printed.

   rwfilter usage
       When used in conjunction with rrwwffiilltteerr(1), the SiLK Python plug-in
       allows users to define arbitrary partitioning criteria using the SiLK
       extension to the Python programming language.  To use this capability,
       the user creates a Python file and specifies its name with the
       --python-file switch in rwfilter.  The file should call the
       rreeggiisstteerr__ffiilltteerr(()) function for each filter that it wants to create:

       register_filter(filter_func, [finalize=finalize_func],
       [initialize=initialize_func])

       filter_func
           Boolean = filter_func(silk.RWRec).  Names a function that must
           accept a single argument, a silk.RWRec object (see ppyyssiillkk(3)).
           When the rwfilter program is run, it finds the records that match
           the selection options, and hands each record to the built-in
           partitioning switches.  A record that passes all of the built-in
           switches is handed to the first Python ffiilltteerr__ffuunncc(()) function as an
           RWRec object.  The return value of the function determines what
           happens to the record.  The record fails the ffiilltteerr__ffuunncc(()) function
           (and the record is immediately written to the --fail-destination,
           if specified) when the function returns one of the following:
           False, None, numeric zero of any type, an empty string, or an empty
           container (including strings, tuples, lists, dictionaries, sets,
           and frozensets).  If the function returns any other value, the
           record passes the first ffiilltteerr__ffuunncc(()) function, and the record is
           handed to the next Python ffiilltteerr__ffuunncc(()) function.  If all
           ffiilltteerr__ffuunncc(()) functions pass the record, the record is written to
           the --pass-destination, if specified.  (Note that when the --plugin
           switch is present, the code it specifies will be called after the
           PySiLK code.)

       initialize_func
           iinniittiiaalliizzee__ffuunncc(()).  Names a function that takes no arguments.  When
           this function is specified, is will be called after rwfilter has
           completed its argument processing, and just before rwfilter opens
           the first input file.  The return value of this function is
           ignored.

       finalize_func
           ffiinnaalliizzee__ffuunncc(()).  Names a function that takes no arguments.  When
           this function is specified, it will be called after all flow
           records have been processed.  One use of the these functions is to
           print any statistics that the ffiilltteerr__ffuunncc(()) function was computing.
           The return value from this function is ignored.

       If rreeggiisstteerr__ffiilltteerr(()) is called multiple times, the ffiilltteerr__ffuunncc(()),
       iinniittiiaalliizzee__ffuunncc(()), and ffiinnaalliizzee__ffuunncc(()) functions will be invoked in the
       order in which the rreeggiisstteerr__ffiilltteerr(()) functions were seen.

       NOTE: For backwards compatibility, when the file named by --python-file
       does not call rreeggiisstteerr__ffiilltteerr(()), rwfilter will search the Python file
       for functions named rrwwffiilltteerr(()) and ffiinnaalliizzee(()).  If it finds the
       rrwwffiilltteerr(()) function, rwfilter will act as if the file contained:

        register_filter(rwfilter, finalize=finalize)

       The --python-file switch requires the user to create a file containing
       Python code.  To allow the user to write a small filtering check in
       Python, rwfilter supports the --python-expr switch.  The value of the
       switch should be a Python expression whose result determines whether a
       given record passes or fails, using the same criterion as the
       ffiilltteerr__ffuunncc(()) function described above.  In the expression, the
       variable "rec" is bound to the current silk.RWRec object.  There is no
       support for the iinniittiiaalliizzee__ffuunncc(()) and ffiinnaalliizzee__ffuunncc(()) functions.  The
       user may consider --python-expr=PYTHON_EXPRESSION as being implemented
       by

        from silk import *
        def temp_filter(rec):
            return (PYTHON_EXPRESSION)

        register_filter(temp_filter)

       The --python-file and --python-expr switches allow for much flexibility
       but at the cost of speed: converting a SiLK Flow record into an RWRec
       is expensive relative to most operations in rwfilter.  The user should
       use rwfilter's built-in partitioning switches to whittle down the input
       as much as possible, and only use the Python code to do what is
       difficult or impossible to do otherwise.

   Simple field registration functions
       The silk.plugin module defines a function that can be used to define
       fields for use in rwcut, rwgroup, rwsort, rwstats, and rwuniq.  That
       function is powerful, but it is also complex.  To make it easy to
       define fields for the common cases, the silk.plugin provides the
       functions described in this section that create a key field or an
       aggregate value field.  The advanced function is described later in
       this manual page ("Advanced field registration function").

       Once you have created a key field or aggregate value field, you must
       include the field's name in the argument to the --fields or --values
       switch to tell the application to use the field.

       Integer key field

       The following function is used to create a key field whose value is an
       unsigned integer.

       register_int_field(field_name, int_function, min, max, [width])

       field_name
           The name of the new field, a string.  If you attempt to add a key
           field that already exists, you will get an an error message.

       int_function
           int = int_function(silk.RWRec).  A function that accepts a
           silk.RWRec object as its sole argument, and returns an unsigned
           integer which represents the value of this field for the given
           record.

       min A number representing the minimum integer value for the field.  If
           int_function returns a value less than min, an error is raised.

       max A number representing the maximum integer value for the field.  If
           int_function returns a value greater than max, an error is raised.

       width
           The column width to use when displaying the field.  This parameter
           is optional; the default is the number of digits necessary to
           display the integer max.

       IPv4 address key field

       This function is used to create a key field whose value is an IPv4
       address.  (See also rreeggiisstteerr__iipp__ffiieelldd(())).

       register_ipv4_field(field_name, ipv4_function, [width])

       field_name
           The name of the new field, a string.  If you attempt to add a key
           field that already exists, you will get an an error message.

       ipv4_function
           silk.IPv4Addr = ipv4_function(silk.RWRec).  A function that accepts
           a silk.RWRec object as its sole argument, and returns a
           silk.IPv4Addr object.  This IPv4Addr object will be the IPv4
           address that represents the value of this field for the given
           record.

       width
           The column width to use when displaying the field.  This parameter
           is optional, and it defaults to 15.

       IP address key field

       The next function is used to create a key field whose value is an IPv4
       or IPv6 address.

       register_ip_field(field_name, ip_function, [width])

       field_name
           The name of the new field, a string.  If you attempt to add a key
           field that already exists, you will get an an error message.

       ip_function
           silk.IPAddr = ip_function(silk.RWRec).  A function that accepts a
           silk.RWRec object as its sole argument, and returns a silk.IPAddr
           object which represents the value of this field for the given
           record.

       width
           The column width to use when displaying the field.  This parameter
           is optional.  The default width is 39.

       This key field requires more memory internally than fields registered
       by the rreeggiisstteerr__iippvv44__ffiieelldd(()) function.  If SiLK is compiled without
       IPv6 support, rreeggiisstteerr__iipp__ffiieelldd(()) works exactly like
       rreeggiisstteerr__iippvv44__ffiieelldd(()), including the default width of 15.

       Enumerated object key field

       The following function is used to create a key field whose value is any
       Python object.  The maximum number of different objects that can be
       represented is 4,294,967,296, or 2^32.

       register_enum_field(field_name, enum_function, width, [ordering])

       field_name
           The name of the new field, a string.  If you attempt to add a key
           field that already exists, you will get an an error message.

       enum_function
           object = enum_function(silk.RWRec).  A function that accepts a
           silk.RWRec object as its sole argument, and returns a Python object
           which represents the value of this field for the given record.  For
           typical usage, the Python objects returned by the enum_function
           will be strings representing some categorical value.

       width
           The column width to use when displaying this field.  The parameter
           is required.

       ordering
           A list of objects used to determine ordering for rwsort and rwuniq.
           This parameter is optional.  If specified, it lists the objects in
           the order in which they should be sorted.  If the enum_function
           returns a object that is not in ordering, the object will be sorted
           after all the objects in ordering.

       Integer sum aggregate value field

       This function is used to create an aggregate value field that maintains
       a running unsigned integer sum.

       register_int_sum_aggregator(agg_value_name, int_function, [max_sum],
       [width])

       agg_value_name
           The name of the new aggregate value field, a string.  The
           agg_value_name must be unique among all aggregate values, but an
           aggregate value field and key field can have the same name.

       int_function
           int = int_function(silk.RWRec).  A function that accepts a
           silk.RWRec object as its sole argument, and returns an unsigned
           integer which represents the value that should be added to the
           running sum for the current bin.

       max_sum
           The maximum possible sum.  This parameter is optional; if not
           specified, the default is 2^64-1 (18,446,744,073,709,551,615).

       width
           The column width to use when displaying the aggregate value.  This
           parameter is optional.  The default is the number of digits
           necessary to display max_sum.

       Integer maximum aggregate value field

       The following function is used to create an aggregate value field that
       maintains the maximum unsigned integer value.

       register_int_max_aggregator(agg_value_name, int_function, [max_max],
       [width])

       agg_value_name
           The name of the new aggregate value field, a string.  The
           agg_value_name must be unique among all aggregate values, but an
           aggregate value field and key field can have the same name.

       int_function
           int = int_function(silk.RWRec).  A function that accepts a
           silk.RWRec object as its sole argument, and returns an integer
           which represents the value that should be considered for the
           current highest value for the current bin.

       max_max
           The maximum possible value for the maximum.  This parameter is
           optional; if not specified, the default is 2^64-1
           (18,446,744,073,709,551,615).

       width
           The column width to use when displaying the aggregate value.  This
           parameter is optional.  The default is the number of digits
           necessary to display max_max.

       Integer minimum aggregate value field

       This function is used to create an aggregate value field that maintains
       the minimum unsigned integer value.

       register_int_min_aggregator(agg_value_name, int_function, [max_min],
       [width])

       agg_value_name
           The name of the new aggregate value field, a string.  The
           agg_value_name must be unique among all aggregate values, but an
           aggregate value field and key field can have the same name.

       int_function
           int = int_function(silk.RWRec).  A function that accepts a
           silk.RWRec object as its sole argument, and returns an integer
           which represents the value that should be considered for the
           current lowest value for the current bin.

       max_min
           The maximum possible value for the minimum.  When this optional
           parameter is not specified, the default is 2^64-1
           (18,446,744,073,709,551,615).

       width
           The column width to use when displaying the aggregate value.  This
           parameter is optional.  The default is the number of digits
           necessary to display max_min.

   Advanced field registration function
       The previous section provided functions to register a key field or an
       aggregate value field when dealing with common objects.  When you need
       to use a complex object, or you want more control over how the object
       is handled in PySiLK, you can use the rreeggiisstteerr__ffiieelldd(()) function
       described in this section.

       Many of the arguments to the rreeggiisstteerr__ffiieelldd(()) function are callback
       functions that you must create and that the application will invoke.
       (The simple registration functions above have already taken care of
       defining these callback functions.)

       Often the callback functions for handling fields will either take (as a
       parameter) or return a representation of a numeric value that can be
       processed from C.  The most efficient way to handle these
       representations is as a string containing binary characters, including
       the null byte.  We will use the term "byte sequence" for these
       representations; other possible terms include "array of bytes", "byte
       strings", or "binary values".  For hints on creating byte sequences
       from Python, see the "Byte sequences" section below.

       To define a new field or aggregate value, the user calls:

       register_field(field_name, [add_rec_to_bin=add_rec_to_bin_func,]
       [bin_compare=bin_compare_func,] [bin_bytes=bin_bytes_value,]
       [bin_merge=bin_merge_func,] [bin_to_text=bin_to_text_func,]
       [column_width=column_width_value,] [description=description_string,]
       [initial_value=initial_value,] [initialize=initialize_func,]
       [rec_to_bin=rec_to_bin_func,] [rec_to_text=rec_to_text_func])

       Although the keyword arguments to rreeggiisstteerr__ffiieelldd(()) are all optional
       from Python's perspective, certain keyword arguments must be present
       before an application will define the key or aggregate value.  The
       following table summarizes the keyword arguments used by each
       application.  An "F" means the argument is required for a key field, an
       "A" means the argument is required for an aggregate value field, "f"
       and "a" mean the application will use the argument for a key field or
       an aggregate value if the argument is present, and a dot means the
       application completely ignores the argument.

                          rwcut  rwgroup  rwsort  rwstats  rwuniq
        add_rec_to_bin      .       .       .        A       A
        bin_compare         .       .       .        A       .
        bin_bytes           .       F       F       F,A     F,A
        bin_merge           .       .       .        A       A
        bin_to_text         .       .       .       F,A     F,A
        column_width        F       .       .       F,A     F,A
        description         f       f       f       f,a     f,a
        initial_value       .       .       .        a       a
        initialize          f       f       f       f,a     f,a
        rec_to_bin          .       F       F        F       F
        rec_to_text         F       .       .        .       .

       The following sections describe how to use rreeggiisstteerr__ffiieelldd(()) in each
       application.

   rwcut usage
       The purpose of rrwwccuutt(1) is to print attributes of (or attributes
       derived from) every SiLK record it reads as input.  A plug-in used by
       rwcut must produce a printable (textual) attribute from a SiLK record.
       To define a new attribute, the rreeggiisstteerr__ffiieelldd(()) method should be called
       as shown:

       register_field(field_name, column_width=column_width_value,
       rec_to_text=rec_to_text_func, [description=description_string,]
       [initialize=initialize_func])

       field_name
           Names the field being defined, a string.  If you attempt to add a
           field that already exists, you will get an an error message.  To
           display the field, include field_name in the argument to the
           --fields switch.

       column_width_value
           Specifies the length of the longest printable representation.
           rwcut will use it as the width for the field_name column when
           columnar output is selected.

       rec_to_text_func
           string = rec_to_text_func(silk.RWRec).  Names a callback function
           that takes a silk.RWRec object as its sole argument and produces a
           printable representation of the field being defined.  The length of
           the returned text should not be greater than column_width_value.
           If the value returned from this function is not a string, the
           returned value is converted to a string by the Python ssttrr(())
           function.

       description_string
           Provides a string giving a brief description of the field, suitable
           for printing in --help-fields output.  This argument is optional.

       initialize_func
           iinniittiiaalliizzee__ffuunncc(()).  Names a callback function that will be invoked
           after the application has completed its argument processing, and
           just before it opens the first input file.  This function is only
           called when --fields includes field_name.  The function takes no
           arguments and its return value is ignored.  This argument is
           optional.

       If the rec_to_text argument is not present, the rreeggiisstteerr__ffiieelldd(())
       function will do nothing when called from rwcut.  If the column_width
       argument is missing, rwcut will complain that the textual width of the
       plug-in field is 0.

   rwgroup and rwsort usage
       The rrwwssoorrtt(1) tool sorts SiLK records by their attributes or attributes
       derived from them.  rrwwggrroouupp(1) reads sorted SiLK records and writes a
       common value into the next hop IP field of all records that have common
       attributes.  The output from both of these tools is a stream of SiLK
       records (the output typically includes every record that was read as
       input).  A plug-in used by these tools must return a value that the
       application can use internally to compare records.  To define a new
       field that may be included in the --id-fields switch to rwgroup or the
       --fields switch to rwsort, the rreeggiisstteerr__ffiieelldd(()) method should be
       invoked as follows:

       register_field(field_name, bin_bytes=bin_bytes_value,
       rec_to_bin=rec_to_bin_func, [description=description_string,]
       [initialize=initialize_func])

       field_name
           Names the field being defined, a string.  If you attempt to add a
           field that already exists, you will get an an error message.  To
           have rwgroup or rwsort use this field, include field_name in the
           argument to --id-fields or --fields.

       bin_bytes_value
           Specifies a positive integer giving the length, in bytes, of the
           byte sequence that the rreecc__ttoo__bbiinn__ffuunncc(()) function produces; the
           byte sequence must be exactly this length.

       rec_to_bin_func
           byte-sequence = rec_to_bin_func(silk.RWRec).  Names a callback
           function that takes a silk.RWRec object and returns a byte sequence
           that represents the field being defined.  The returned value should
           be exactly bin_bytes_value bytes long.  For proper grouping or
           sorting, the byte sequence should be returned in network byte order
           (i.e., big endian).

       description_string
           Provides a string giving a brief description of the field, suitable
           for printing in --help-fields output.  This argument is optional.

       initialize_func
           iinniittiiaalliizzee__ffuunncc(()).  Names a callback function that will be invoked
           after the application has completed its argument processing, and
           just before it opens the first input file.  This function is only
           called when field_name is included in the list of fields.  The
           function takes no arguments and its return value is ignored.  This
           argument is optional.

       If the rec_to_bin argument is not present, the rreeggiisstteerr__ffiieelldd(())
       function will do nothing when called from rwgroup or rwsort.  If the
       bin_bytes argument is missing, rwgroup or rwsort will complain that the
       binary width of the plug-in field is 0.

   rwstats and rwuniq usage
       rrwwssttaattss(1) and rrwwuunniiqq(1) group SiLK records into bins based on key
       fields.  Once a record is matched to a bin, the record is used to
       update the aggregate values (e.g., the sum of bytes) that are being
       computed, and the record is discarded.  Once all records have been
       processed, the key fields and the aggregate values are printed.

       Key Field

       A plug-in used by rwstats or rwuniq for creating a new key field must
       return a value that the application can use internally to compare
       records, and there must be a function that converts that value to a
       printable representation.  The following invocation of rreeggiisstteerr__ffiieelldd(())
       will produce a key field that can be used in the --fields switch of
       rwstats or rwuniq:

       register_field(field_name, bin_bytes=bin_bytes_value,
       bin_to_text=bin_to_text_func, column_width=column_width_value,
       rec_to_bin=rec_to_bin_func, [description=description_string,]
       [initialize=initialize_func])

       The arguments are:

       field_name
           Contains the name of the field being defined, a string.  If you
           attempt to add a field that already exists, you will get an an
           error message.  The field will only be active when field_name is
           specified as an argument to --fields.

       bin_bytes_value
           Contains a positive integer giving the length, in bytes, of the
           byte sequence that the rreecc__ttoo__bbiinn__ffuunncc(()) function produces and that
           the bbiinn__ttoo__tteexxtt__ffuunncc(()) function accepts.  The byte sequences must
           be exactly this length.

       bin_to_text_func
           string = bin_to_text_func(byte-sequence).  Names a callback
           function that takes a byte sequence, of length bin_bytes_value, as
           produced by the rreecc__ttoo__bbiinn__ffuunncc(()) function and returns a printable
           representation of the byte sequence.  The length of the text should
           be no longer than the value specified by column_width.  If the
           value returned from this function is not a string, the returned
           value is converted to a string by the Python ssttrr(()) function.

       column_width_value
           Contains a positive integer specifying the length of the longest
           textual field that the bbiinn__ttoo__tteexxtt__ffuunncc(()) callback function
           returns.  This length will used as the column width when columnar
           output is requested.

       rec_to_bin_func
           byte-sequence = rec_to_bin_func(silk.RWRec).  Names a callback
           function that takes a silk.RWRec object and returns a byte sequence
           that represents the field being defined.  The returned value should
           be exactly bin_bytes_value bytes long.  For proper sorting, the
           byte sequence should be returned in network byte order (i.e., big
           endian).

       description_string
           Provides a string giving a brief description of the field, suitable
           for printing in --help-fields output.  This argument is optional.

       initialize_func
           iinniittiiaalliizzee__ffuunncc(()).  Names a callback function that is called after
           the command line arguments have been processed, and before opening
           the first file.  This function is only called when --fields
           includes field_name.  The function takes no arguments and its
           return value is ignored.  This argument is optional.

       Aggregate Value

       A plug-in used by rwstats or rwuniq for creating a new aggregate value
       must be able to use a SiLK record to update an aggregate value, take
       two aggregate values and merge them to a new value, and convert that
       aggregate value to a printable representation.  To use an aggregate
       value for ordering the bins in rwstats, the plug-in must also define a
       function to compare two aggregate values.  The aggregate values are
       represented as byte sequences.

       To define a new aggregate value in rwstats, the user calls:

       register_field(agg_value_name, add_rec_to_bin=add_rec_to_bin_func,
       bin_bytes=bin_bytes_value, bin_merge=bin_merge_func,
       bin_to_text=bin_to_text_func, column_width=column_width_value,
       [bin_compare=bin_compare_func,] [description=description_string,]
       [initial_value=initial_value,] [initialize=initialize_func])

       The call to define a new aggregate value in rwuniq is nearly identical:

       register_field(agg_value_name, add_rec_to_bin=add_rec_to_bin_func,
       bin_bytes=bin_bytes_value, bin_merge=bin_merge_func,
       bin_to_text=bin_to_text_func, column_width=column_width_value,
       [description=description_string,] [initial_value=initial_value,]
       [initialize=initialize_func])

       The arguments are:

       agg_value_name
           Contains the name of the aggregate value field being defined, a
           string.  The name of value must be unique among all aggregate
           values, but an aggregate value field and key field can have the
           same name.  The value will only be active when agg_value_name is
           specified as an argument to --values.

       add_rec_to_bin_func
           byte-sequence = add_rec_to_bin_func(silk.RWRec, byte-sequence).
           Names a callback function whose two arguments are a silk.RWRec
           object and an aggregate value.  The function updates the aggregate
           value with data from the record and returns a new aggregate value.
           Both aggregate values are represented as byte sequences of exactly
           bin_bytes_value bytes.

       bin_bytes_value
           Contains a positive integer representing the length, in bytes, of
           the binary aggregate value used by the various callback functions.
           Every byte sequence for this field must be exactly this length, and
           it also governs the length of the byte sequence specified by
           initial_value.

       bin_merge_func
           byte-sequence = bin_merge_func(byte-sequence, byte-sequence).
           Names a callback function which returns the result of merging two
           binary aggregate values into a new binary aggregate value.  This
           merge function will often be addition; however, if the aggregate
           value is a bitmap, the result of merge function could be the union
           of the bitmaps.  The function should take two byte sequence
           arguments and return a byte sequence, where all byte sequences are
           exactly bin_bytes_value bytes in length.  If merging the aggregate
           values is not possible, the function should throw an exception.
           This function is used when the data structure used by rwstats or
           rwuniq runs out memory.  When that happens, the application writes
           its current state to a temporary file, empties its buffers, and
           continues reading records.  Once all records have been processed,
           the application needs to merge the temporary files to produce the
           final output.  The bbiinn__mmeerrggee__ffuunncc(()) function is used when merging
           these binary aggregate values.

       bin_to_text_func
           string = bin_to_text_func(byte-sequence).  Names a callback
           function that takes a byte sequence representing an aggregate value
           as an argument and returns a printable representation of that
           aggregate value.  The byte sequence input to bbiinn__ttoo__tteexxtt__ffuunncc(())
           will be exactly bin_bytes_value bytes long.  The length of the text
           should be no longer than the value specified by column_width.  If
           the value returned from this function is not a string, the returned
           value is converted to a string by the Python ssttrr(()) function.

       column_width_value
           Contains a positive integer specifying the length of the longest
           textual field that the bbiinn__ttoo__tteexxtt__ffuunncc(()) callback function
           returns.  This length will used as the column width when columnar
           output is requested.

       bin_compare_func
           int = bin_compare_func(byte-sequence, byte-sequence).  Names a
           callback function that is called with two aggregate values, each
           represented as a byte sequence of exactly bin_bytes_value bytes.
           The function returns (1) an integer less than 0 if the first
           argument is less than the second, (2) an integer greater than 0 if
           the first is greater than the second, or (3) 0 if the two values
           are equal.  This function is used by rwstats to sort the bins into
           top-N order.

       description_string
           Provides a string giving a brief description of the aggregate
           value, suitable for printing in --help-fields output.  This
           argument is optional.

       initial_value
           Specifies a byte sequence representing the initial state of the
           binary aggregate value.  This byte sequence must be of length
           bin_bytes_value bytes.  If this argument is not specified, the
           aggregate value is set to a byte sequence containing
           bin_bytes_value null bytes.

       initialize_func
           iinniittiiaalliizzee__ffuunncc(()).  Names a callback function that is called after
           the command line arguments have been processed, and before opening
           the first file.  This function is only called when --values
           includes agg_value_name.  The function takes no arguments and its
           return value is ignored.  This argument is optional.

   Byte sequences
       The rwgroup, rwsort, rwstats, and rwuniq programs make extensive use of
       "byte sequences" (a.k.a., "array of bytes", "byte strings", or "binary
       values") in their plug-in functions.  The byte sequences are used in
       both key fields and aggregate values.

       When used as key fields, the values can represent uniqueness or
       indicate sort order.  Two records with the same byte sequence for a
       field will be considered identical with respect to that field.  When
       sorting, the byte sequences are compared in network byte order.  That
       is, the most significant byte is compared first, followed by the next-
       most-significant byte, etc.  This equates to string comparison starting
       with the left-hand side of the string.

       When used as an aggregate field, the byte sequences are expected to
       behave more like numbers, with the ability to take binary record and
       add a value to it, or to merge (e.g., add) two byte sequences outside
       the context of a SiLK record.

       Every byte sequence has an associated length, which is passed into the
       rreeggiisstteerr__ffiieelldd(()) function in the bin_bytes argument.  The length
       determines how many values the byte sequence can represent.  A byte
       sequence with a length of 1 can represent up to 256 unique values (from
       0 to 255 inclusive).  A byte sequence with a length of 2 can represent
       up to 65536 unique values (0 to 65535).  To generalize, a byte sequence
       with a length of n can represent up to 2^(8n) unique values (0 to
       2^(8n)-1).

       How byte sequences are represented in Python depends on the version of
       Python.  Python represents a sequence of characters using either the
       bytes type (introduced in 2.6) or the unicode type.  The bytes type can
       encode byte sequences while the unicode type cannot.  In Python 2, the
       str (string) type was an alias for bytes, so that any Python 2 string
       is in effect a byte sequence.  In Python 3, str is an alias for
       unicode, thus Python 3 strings are unicode objects and cannot represent
       byte sequences.

       Python does not make conversions between integers and byte sequences
       particularly natural.  As a result, here are some pointers on how to do
       these conversions:

       Use the bbyytteess(()) and oorrdd(()) methods

       If you converting a single integer value that is less than 256, the
       easiest way to convert it to a byte sequence is to use the bbyytteess(())
       function; to convert it back, use the oorrdd(()) function.

        seq = bytes([num])
        num = ord(seq)

       The bbyytteess(()) function takes a list of integers between 0 and 255
       inclusive, and returns a bytes sequence of the length of that list.  To
       convert a single byte, use a list of a single element.  The oorrdd(())
       function takes a byte sequence of a single byte and returns an integer
       between 0 and 255.

       Note: In versions of Python earlier than 2.6, use the cchhrr(()) function
       instead of the bbyytteess(()) function.  It takes a single number as its
       argument.  cchhrr(()) will work in Python 2.6 and 2.7 as well, but there are
       compatibility problems in Python 3.x.

       Use the struct module

       When the value you are converting to a byte sequence is 255 or greater,
       you have to go with another option.  One of the simpler options is to
       use Python's built-in struct module.  With this module, you can encode
       a number or a set of numbers into a byte sequence and convert the
       result back using a struct.Struct object.  Encoding the numbers to a
       byte sequence uses the object's ppaacckk(()) method.  To convert that byte
       sequence back to the number or set of numbers, use the object's
       uunnppaacckk(()) method.  The length of the resulting byte sequences can be
       found in the size attribute of the struct.SSttrruucctt(()) object.  A
       formatting string is used to indicate how the numbers are encoded into
       binary.  For example:

        import struct

        # Set up the format for two 64-bit numbers
        two64 = struct.Struct("!QQ)
        # Encode two 64-bit numbers as a byte sequence
        seq = two64.pack(num1, num2)
        #Unpack a byte sequence back into two 64-bit numbers
        (num1, num2) = two64.unpack(seq)
        #Length of the encoded byte sequence
        bin_bytes = two64.size

       In the above, "Q" represents a single unsigned 64-bit number (an
       unsigned long long or quad).  The "!" at the beginning of the string
       forces network byte order.  (For sort comparison purposes, always pack
       in network byte order.)

       Here is another example, which encodes a signed 16-bit integer and a
       floating point number:

        import struct

        # Set up the format for a 16-bit signed integer and a float
        obj = struct.Struct("!hf")
        #Encode a 16-bit signed integer and a float as a byte sequence
        seq = obj.pack(intval, floatval)
        #Unpack a byte sequence back into a 16-bit signed integer and a float
        (intval, floatval) = obj.unpack(seq)
        #Length of the encoded byte sequence
        bin_bytes = obj.size

       Note that uunnppaacckk(()) returns a sequence.  When unpacking a single value,
       assign the result of unpack to (variable_name,), as shown:

        import struct

        u32 = struct.Struct("!I")
        #Encode an unsigned 32-bit integer as a byte sequence
        seq = u32.pack(num1)
        #Unpack a byte sequence back into a unsigned 32-bit integer
        (num1,) = struct.unpack(seq)
        #Length of the encoded byte sequence
        bin_bytes = u32.size

       The full list of codes can be found in the Python library documentation
       for the struct module, <http://docs.python.org/library/struct.html>.

       Note: Python versions prior to 2.5 do not include support for the
       struct.Struct object.  For older versions of Python, you have to use
       struct's functional interface.  For example:

        import struct

        #Encode a 16-bit signed integer and a float as a byte sequence
        seq = struct.pack("!hf", intval, floatval)
        #Unpack a byte sequence back into a 16-bit signed integer and a float
        (intval, floatval) = struct.unpack("!hf", seq)
        #Length of the encoded byte sequence
        bin_bytes = struct.calcsize("!hf")

       This method works in Python 2.5 and above as well, but is inherently
       slower, as it requires re-evaluation of the format string for each
       packing and unpacking operation.  Only use this if there is a need to
       inter-operate with older versions of Python.

       Use the array module

       The Python array module provides another way to create byte sequences.
       Beware that the array module does not provide an automatic way to
       encode the values in network byte order.

OPTIONS
       The following options are available when the SiLK Python plug-in is
       used from rwfilter.

       --python-file=FILENAME
           Load the Python file FILENAME.  The Python code may call
           rreeggiisstteerr__ffiilltteerr(()) multiple times to define new partitioning
           functions that takes a silk.RWRec object as an argument.  The
           return value of the function determines whether the record passes
           the filter.  For backwards compatibility, if rreeggiisstteerr__ffiilltteerr(()) is
           not called and a function named rrwwffiilltteerr(()) exists, that function is
           automatically registered as the filtering function.  Multiple
           --python-file switches may be used to load multiple plug-ins.

       --python-expr=PYTHON_EXPRESSION
           Pass the SiLK Flow record if the result of the processing the
           record with the specified PYTHON_EXPRESSION is true.  The
           expression is evaluated in the following context:

           o   The record is represented by the variable named rec, which is a
               silk.RWRec object.

           o   There is an implicit from silk import * in effect.

       The following options are available when the SiLK Python plug-in is
       used from rwcut, rwgroup, rwsort, rwstats, or rwuniq:

       --python-file=FILENAME
           Load the Python file FILENAME.  The Python code may call
           rreeggiisstteerr__ffiieelldd(()) multiple times to define new fields for use by the
           application.  When used with rwstats or rwuniq, the Python code may
           call rreeggiisstteerr__ffiieelldd(()) multiple times to create new aggregate
           fields.  Multiple --python-file switches may be used to load
           multiple plug-ins.

EXAMPLES
       In the following examples, the dollar sign ("$") represents the shell
       prompt.  The text after the dollar sign represents the command line.
       Lines have been wrapped for improved readability, and the back slash
       ("\") is used to indicate a wrapped line.

   rwfilter --python-expr
       Suppose you want to find traffic destined to a particular host,
       10.0.0.23, that is either ICMP or coming from 1434/udp.  If you attempt
       to use:

        $ rwfilter --daddr=10.0.0.23 --proto=1,17 --sport=1434         \
               --pass=outfile.rw  flowrec.rw

       the --sport option will not match any of the ICMP traffic, and your
       result will not contain ICMP records.  To avoid having to use two
       invocations of rwfilter, you can use the SiLK Python plugin to do the
       check in a single pass:

        $ rwfilter --daddr=10.0.0.23 --proto=1,17                      \
               --python-expr 'rec.protocol==1 or rec.sport==1434'      \
               --pass=outfile.rw  flowrec.rw

       Since the Python code is slower than the C code used internally by
       rwfilter, we want to limit the number of records processed in Python as
       much as possible.  We use the rwfilter switches to do the address check
       and protocol check, and in Python we only need to check whether the
       record is ICMP or if the source port is 1434 (if the record is not ICMP
       we know it is UDP because of the --proto switch).

   rwfilter --python-file
       To see all records whose protocol is different from the preceding
       record, use the following Python code.  The code also prints a message
       to the standard output on completion.

        import sys

        def filter(rec):
            global lastproto
            if rec.protocol != lastproto:
                lastproto = rec.protocol
                return True
            return False

        def initialize():
            global lastproto
            lastproto = None

        def finalize():
            sys.stdout.write("Finished processing records.\n")

        register_filter(filter, initialize = initialize, finalize = finalize)

       The preceding file, if called lastproto.py, can be used like this:

        $ rwfilter --python-file lastproto.py --pass=outfile.rw flowrec.rw

       Note: Be careful when using a Python plug-in to write to the standard
       output, since the Python output could get intermingled with the output
       from --pass=stdout and corrupt the SiLK output file.  In general,
       printing to the standard error is safer.

   Command line switch
       The following code registers the command line switch "count-protocols".
       This switch is similar to the standard --protocol switch on rwfilter,
       in that it passes records whose protocol matches a value specified in a
       list.  In addition, when rwfilter exits, the plug-in prints a count of
       the number of records that matched each specified protocol.

        import sys
        from silk.plugin import *

        pro_count = {}

        def proto_count(rec):
            global pro_count
            if rec.protocol in pro_count.keys():
                pro_count[rec.protocol] += 1
                return True
            return False

        def print_counts():
            for p,c in pro_count.iteritems():
                sys.stderr.write("%3d|%10d|\n" % (p, c))

        def parse_protocols(protocols):
            global pro_count
            for p in protocols.split(","):
                pro_count[int(p)] = 0
            register_filter(proto_count, finalize = print_counts)

        register_switch("count-protocols", handler=parse_protocols,
                        help="Like --proto, but prints count of flow records")

       When this code is saved to the file count-proto.py, it can be used with
       rwfilter as shown to get a count of TCP and UDP flow records:

        $ rwfilter --start-date=2008/08/08 --type=out                  \
               --python-file=count-proto.py --count-proto=6,17         \
               --print-statistics=/dev/null

       rwfilter does not know that the plug-in will be generating output, and
       rwfilter will complain unless an output switch is given, such as --pass
       or --print-statistics.  Since our plug-in is printing the data we want,
       we send the output to /dev/null.

   Create integer key field with simple API
       This example creates a field that contains the sum of the source and
       destination port.  While this value may not be interesting to display
       in rwcut, it provides a way to sort fields so traffic between two low
       ports will usually be sorted before traffic between a low port and a
       high port.

        def port_sum(rec):
            return rec.sport + rec.dport

        register_int_field("port-sum", port_sum)

       If the above code is saved in a file named portsum.py, it can be used
       to sort traffic prior to printing it (low-port to low-port will appear
       first):

        $ rwfilter --start-date=2008/08/08 --type=out,outweb       \
               --proto=6,17 --pass=stdout                          \
          | rwsort --python-file=portsum.py --fields=port-sum      \
          | rwcut

       To see high-port to high-port traffic first, reverse the sort:

        $ rwfilter --start-date=2008/08/08 --type=out,outweb       \
               --proto=6,17 --pass=stdout                          \
          | rwsort --python-file=portsum.py --fields=port-sum      \
               --reverse                                           \
          | rwcut

   Create IP key field with simple API
       SiLK stores uni-directional flows.  For network conversations that
       cross the network border, the source and destination hosts are swapped
       depending on the direction of the flow.  For analysis, you often want
       to know the internal and external hosts.

       The following Python plug-in file defines two new fields: "internal-ip"
       will display the destination IP for an incoming flow, and the source IP
       for an outgoing flow, and "external-ip" field shows the reverse.

        import silk

        # for convenience, create lists of the types
        in_types = ['in', 'inweb', 'innull', 'inicmp']
        out_types = ['out', 'outweb', 'outnull', 'outicmp']

        def internal(rec):
            "Returns the IP Address of the internal side of the connection"
            if rec.typename in out_types:
                return rec.sip
            else:
                return rec.dip

        def external(rec):
            "Returns the IP Address of the external side of the connection"
            if rec.typename in in_types:
                return rec.sip
            else:
                return rec.dip

        register_ip_field("internal-ip", internal)
        register_ip_field("external-ip", external)

       If the above code is saved in a file named direction.py, it can be used
       to show the internal and external IP addresses and flow direction for
       all traffic on 1434/udp from Aug 8, 2008.

        $ rwfilter --start-date=2008/08/08 --type=all              \
               --proto=17 --aport=1434 --pass=stdout               \
          | rwcut --python-file direction.py                       \
               --fields internal-ip,external-ip,3-12

   Create enumerated key field with simple API
       This example expands the previous example.  Suppose instead of printing
       the internal and external IP address, you wanted to group by the label
       associated with the internal and external addresses in a prefix map
       file.  The ppmmaappffiilltteerr(3) manual page specifies how to print labels for
       source and destination IP addresses, but it does not support internal
       and external IPs.

       Here we take the previous example, add a command line switch to specify
       the path to a prefix map file, and have the internal and external
       functions return the label.

        import silk

        # for convenience, create lists of the types
        in_types = ['in', 'inweb', 'innull', 'inicmp']
        out_types = ['out', 'outweb', 'outnull', 'outicmp']

        # handler for the --int-ext-pmap command line switch
        def set_pmap(arg):
            global pmap
            pmap = silk.PrefixMap(arg)
            labels = pmap.values()
            width = max(len(x) for x in labels)
            register_enum_field("internal-label", internal, width, labels)
            register_enum_field("external-label", external, width, labels)

        def internal(rec):
            "Returns the label for the internal side of the connection"
            global pmap
            if rec.typename in out_types:
                return pmap[rec.sip]
            else:
                return pmap[rec.dip]

        def external(rec):
            "Returns the label for the external side of the connection"
            global pmap
            if rec.typename in in_types:
                return pmap[rec.sip]
            else:
                return pmap[rec.dip]

        register_switch("int-ext-pmap", handler=set_pmap,
                        help="Prefix map file for internal-label, external-label")

       Assuming the above is saved in the file int-ext-pmap.py, the following
       will group the flows by the internal and external labels contained in
       the file ip-map.pmap.

        $ rwfilter --start-date=2008/08/08 --type=all              \
               --proto=17 --aport=1434 --pass=stdout               \
          | rwuniq --python-file int-ext-pmap.py                   \
               --int-ext-pmap ip-map.pmap                          \
               --fields internal-label,external-label

   Create minimum/maximum integer value field with simple API
       The following example will create new aggregate fields to print the
       minimum and maximum byte values:

        register_int_min_aggregator("min-bytes", lambda rec: rec.bytes,
                                    (1 << 32) - 1)
        register_int_max_aggregator("max-bytes", lambda rec: rec.bytes,
                                    (1 << 32) - 1)

       The lambda expression allows one to create an anonymous function.  In
       this code, we need to return the number of bytes for the given record,
       and we can easily do that with the anonymous function.  Since the SiLK
       bytes field is 32 bits, the maximum 32-bit number is passed the
       registration functions.

       Assuming the code is stored in a file bytes.py, it can be used with
       rwuniq to see the minimum and maximum byte counts for each source IP
       address:

        $ rwuniq --python-file=bytes.py --fields=sip               \
               --values=records,bytes,min-bytes,max-bytes

   Create IP key for rwcut with advanced API
       This example is similar to the simple IP example above, but it uses the
       advanced API.  It also creates another field to indicate the direction
       of the flow, and it does not print the IPs when the traffic does not
       cross the border.  Note that this code has to determine the column
       width itself.

        import silk, os

        # for convenience, create lists of the types
        in_types = ['in', 'inweb', 'innull', 'inicmp']
        out_types = ['out', 'outweb', 'outnull', 'outicmp']
        internal_only = ['int2int']
        external_only = ['ext2ext']

        # determine the width of the IP field depending on whether SiLK
        # was compiled with IPv6 support, and allow the IP_WIDTH environment
        # variable to override that width.
        ip_len = 15
        if silk.ipv6_enabled():
            ip_len = 39
        ip_len = int(os.getenv("IP_WIDTH", ip_len))

        def cut_internal(rec):
            "Returns the IP Address of the internal side of the connection"
            if rec.typename in in_types:
                return rec.dip
            if rec.typename in out_types:
                return rec.sip
            if rec.typename in internal_only:
                return "both"
            if rec.typename in external_only:
                return "neither"
            return "unknown"

        def cut_external(rec):
            "Returns the IP Address of the external side of the connection"
            if rec.typename in in_types:
                return rec.sip
            if rec.typename in out_types:
                return rec.dip
            if rec.typename in internal_only:
                return "neither"
            if rec.typename in external_only:
                return "both"
            return "unknown"

        def internal_external_direction(rec):
            """Generates a string pointing from the sip to the dip, assuming
            internal is on the left, and external is on the right."""
            if rec.typename in in_types:
                return "<---"
            if rec.typename in out_types:
                return "--->"
            if rec.typename in internal_only:
                return "-><-"
            if rec.typename in external_only:
                return "<-->"
            return "????"

        register_field("internal-ip", column_width = ip_len,
                       rec_to_text = cut_internal)
        register_field("external-ip", column_width = ip_len,
                       rec_to_text = cut_external)
        register_field("int_to_ext", column_width = 4,
                       rec_to_text = internal_external_direction)

       The ccuutt__iinntteerrnnaall(()) and ccuutt__eexxtteerrnnaall(()) functions may return an IPAddr
       object instead of a string.  For those cases, the Python ssttrr(()) function
       is invoked automatically to convert the IPAddr to a string.

       If the above code is saved in a file named direction.py, it can be used
       to show the internal and external IP addresses and flow direction for
       all traffic on 1434/udp from Aug 8, 2008.

        $ rwfilter --start-date=2008/08/08 --type=all              \
               --proto=17 --aport=1434 --pass=stdout               \
          | rwcut --python-file direction.py                       \
               --fields internal-ip,int_to_ext,external-ip,3-12

   Create integer key field for rwsort with the advanced API
       The following example Python plug-in creates one new field,
       "lowest_port", for use in rwsort.  Using this field will sort records
       based on the lesser of the source port or destination port; for
       example, flows where either the source or destination port is 22 will
       occur before flows where either port is 25.  This example shows using
       the Python struct module with multiple record attributes.

        import struct

        portpair = struct.Struct("!HH")

        def lowest_port(rec):
            if rec.sport < rec.dport:
                return portpair.pack(rec.sport, rec.dport)
            else:
                return portpair.pack(rec.dport, rec.sport)

        register_field("lowest_port", bin_bytes = portpair.size,
                       rec_to_bin = lowest_port)

       To use this example to sort the records in flowrec.rw, one saves the
       code to the file sort.py and uses it as shown:

        $ rwsort --python-file=sort.py --fields=lowest_port        \
               flowrec.rw > outfile.rw

   Create integer key for rwstats and rwuniq with advanced API
       The following example defines two key fields for use by rwstats or
       rwuniq: "prefixed-sip" and "prefixed-dip".  Using these fields, the
       user can count flow records based on the source and/or destination IPv4
       address blocks (CIDR blocks).  The default CIDR prefix is 16, but it
       can be changed by specifying the --prefix switch that the example
       creates.  This example uses the Python struct module to convert between
       the IP address and a binary string.

        import os, struct
        from silk import *

        default_prefix = 16

        u32 = struct.Struct("!L")

        def set_mask(prefix):
            global mask
            mask = 0xFFFFFFFF
            # the value we are handed is a string
            prefix = int(prefix)
            if 0 < prefix < 32:
                mask = mask ^ (mask >> prefix)

        # Convert from an IPv4Addr to a byte sequence
        def cidr_to_bin(ip):
            if ip.is_ipv6():
                raise ValueError, "Does not support IPv6"
            return u32.pack(int(ip) & mask)

        # Convert from a byte sequence to an IPv4Addr
        def cidr_bin_to_text(string):
            (num,) = u32.unpack(string)
            return IPv4Addr(num)

        register_field("prefixed-sip", column_width = 15,
                       rec_to_bin = lambda rec: cidr_to_bin(rec.sip),
                       bin_to_text = cidr_bin_to_text,
                       bin_bytes = u32.size)

        register_field("prefixed-dip", column_width = 15,
                       rec_to_bin = lambda rec: cidr_to_bin(rec.dip),
                       bin_to_text = cidr_bin_to_text,
                       bin_bytes = u32.size)

        register_switch("prefix", handler=set_mask,
                        help="Set prefix for prefixed-sip/prefixed-dip fields")

        set_mask(default_prefix)

       The lambda expression allows one to create an anonymous function.  In
       this code, the lambda function is used to pass the appropriate IP
       address into the cciiddrr__ttoo__bbiinn(()) function.  To write the code without the
       lambda would require separate functions for the source and destination
       IP addresses:

        def sip_cidr_to_bin(rec):
            return cidr_to_bin(rec.sip)

        def dip_cidr_to_bin(rec):
            return cidr_to_bin(rec.dip)

       The lambda expression helps to simplify the code.

       If the code is saved in the file mask.py, it can be used as follows to
       count the number of flow records seen in the /8 of each source IP
       address.  The flow records are read from flowrec.rw.  The
       --ipv6-policy=ignore switch is used to restrict processing to IPv4
       addresses.

        $ rwuniq --ipv6-policy=ignore --python-file mask.py        \
               --prefix 8 --fields prefixed-sip flowrec.rw

   Create new average bytes value field for rwstats and rwuniq
       The following example creates a new aggregate value that can be used by
       rwstats and rwuniq.  The value is "avg-bytes", a value that calculates
       the average number of bytes seen across all flows that match the key.
       It does this by maintaining running totals of the byte count and number
       of flows.

        import struct

        fmt = struct.Struct("QQ")
        initial = fmt.pack(0, 0)
        textsize = 15
        textformat = "%%%d.2f" % textsize

        # add byte and flow count from 'rec' to 'current'
        def avg_bytes(rec, current):
            (total, count) = fmt.unpack(current)
            return fmt.pack(total + rec.bytes, count + 1)

        # return printable representation
        def avg_to_text(bin):
            (total, count) = fmt.unpack(bin)
            return textformat % (float(total) / count)

        # merge two encoded values.
        def avg_merge(rec1, rec2):
            (total1, count1) = fmt.unpack(rec1)
            (total2, count2) = fmt.unpack(rec2)
            return fmt.pack(total1 + total2, count1 + count2)

        # compare two encoded values
        def avg_compare(rec1, rec2):
            (total1, count1) = fmt.unpack(rec1)
            (total2, count2) = fmt.unpack(rec2)
            return cmp((float(total1) / count1), (float(total2) / count2))

        register_field("avg-bytes",
                       column_width    = textsize,
                       bin_bytes       = fmt.size,
                       add_rec_to_bin  = avg_bytes,
                       bin_to_text     = avg_to_text,
                       bin_merge       = avg_merge,
                       bin_compare     = avg_compare,
                       initial_value   = initial)

       To use this code, save it as avg-bytes.py, specify the name of the
       Python file in the --python-file switch, and list the field in the
       --values switch:

        $ rwuniq --python-file=avg-bytes.py --fields=sip           \
               --values=avg-bytes infile.rw

       This particular example will compute the average number of bytes per
       flow for each distinct source IP address in the file infile.rw.

   Create integer key field for all tools that use fields
       The following example Python plug-in file defines two fields,
       "sport-service" and "dport-service".  These fields convert the source
       port and destination port to the name of the "service" as defined in
       the file /etc/services; for example, port 80 is converted to "http".
       This plug-in can be used by any of rwcut, rwgroup, rwsort, rwstats, or
       rwuniq.

        import os,socket,struct

        u16 = struct.Struct("!H")

        # utility function to convert number to a service name,
        # or to a string if no service is defined
        def num_to_service(num):
            try:
                serv = socket.getservbyport(num)
            except socket.error:
                serv = "%d" % num
            return serv

        # convert the encoded port to a service name
        def bin_to_service(bin):
            (port,) = u16.unpack(bin)
            return num_to_service(port)

        # width of service columns can be specified with the
        # SERVICE_WIDTH environment variable; default is 12
        col_width = int(os.getenv("SERVICE_WIDTH", 12))

        register_field("sport-service", bin_bytes = u16.size,
                       column_width = col_width,
                       rec_to_text = lambda rec: num_to_service(rec.sport),
                       rec_to_bin = lambda rec: u16.pack(rec.sport),
                       bin_to_text = bin_to_service)

        register_field("dport-service", bin_bytes = u16.size,
                       column_width = col_width,
                       rec_to_text = lambda rec: num_to_service(rec.dport),
                       rec_to_bin = lambda rec: u16.pack(rec.dport),
                       bin_to_text = bin_to_service)

       If this file is named service.py, it can be used by rwcut to print the
       source port and its service:

        $ rwcut --python-file service.py                           \
               --fields sport,sport-service flowrec.rw

       Although the plug-in can be used with rwsort, the records will be
       sorted in the same order as the numerical source port or destination
       port.

        $ rwsort --python-file service.py                          \
               --fields sport-service flowrec.rw > outfile.rw

       When used with rwuniq, it can count flows, bytes, and packets indexed
       by the service of the destination port:

        $ rwuniq --python-file service.py --fields dport-service   \
               --values=flows,bytes,packets flowrec.rw

   Create human-readable fields for all tools that use fields
       The following example adds two fields, "hu-bytes" and "hu-packets",
       which can be used as either key fields or aggregate value fields.  The
       example uses the formatting capabilities of netsa-python
       (<http://tools.netsa.cert.org/netsa-python/index.html>) to present the
       bytes and packets fields in a more human-friendly manner.

       When used as a key, the "hu-bytes" field presents the value 1234567 as
       1205.6Ki or as 1234.6k when the HUMAN_USE_BINARY environment variable
       is set to "False".

       When used as a key, the "hu-packets" field adds a comma (or the
       character specified by the HUMAN_THOUSANDS_SEP environment variable) to
       the display of the packets field.  The value  1234567 becomes
       1,234,567.

       The "hu-bytes" and "hu-packets" fields can also be used as aggregate
       value fields, in which case they compute the sum of the bytes and
       packets, respectively, and display it as for the key field.

       The code for the plug-in is shown here, and an example of using the
       plug-in follows the code.

        import silk, silk.plugin
        import os, struct
        from netsa.data.format import num_prefix, num_fixed

        # Whether the use Base-2 (True) or Base-10 (False) values for
        # Kibi/Mebi/Gibi/Tebi/... vs Kilo/Mega/Giga/Tera/...
        use_binary = True
        if (os.getenv("HUMAN_USE_BINARY")):
            if (os.getenv("HUMAN_USE_BINARY").lower() == "false"
                or os.getenv("HUMAN_USE_BINARY") == "0"):
                use_binary = False
            else:
                use_binary = True

        # Character to use for Thousands separator
        thousands_sep = ','
        if (os.getenv("HUMAN_THOUSANDS_SEP")):
            thousands_sep = os.getenv("HUMAN_THOUSANDS_SEP")

        # Number of significant digits
        sig_fig=5

        # Use a 64-bit number for packing the bytes or packets data
        fmt = struct.Struct("Q")
        initial = fmt.pack(0)

        ### Bytes functions
        # add_rec_to_bin
        def hu_ar2b_bytes(rec, current):
            global fmt
            (cur,) = fmt.unpack(current)
            return fmt.pack(cur + rec.bytes)

        # rec_to_binary
        def hu_r2b_bytes(rec):
            global fmt
            return fmt.pack(rec.bytes)

        # bin_to_text
        def hu_b2t_bytes(current):
            global use_binary, sig_fig, fmt
            (cur,) = fmt.unpack(current)
            return num_prefix(cur, use_binary=use_binary, sig_fig=sig_fig)

        # rec_to_text
        def hu_r2t_bytes(rec):
            global use_binary, sig_fig
            return num_prefix(rec.bytes, use_binary=use_binary, sig_fig=sig_fig)

        ### Packets functions
        # add_rec_to_bin
        def hu_ar2b_packets(rec, current):
            global fmt
            (cur,) = fmt.unpack(current)
            return fmt.pack(cur + rec.packets)

        # rec_to_binary
        def hu_r2b_packets(rec):
            global fmt
            return fmt.pack(rec.packets)

        # bin_to_text
        def hu_b2t_packets(current):
            global thousands_sep, fmt
            (cur,) = fmt.unpack(current)
            return num_fixed(cur, dec_fig=0, thousands_sep=thousands_sep)

        # rec_to_text
        def hu_r2t_packets(rec):
            global thousands_sep
            return num_fixed(rec.packets, dec_fig=0, thousands_sep=thousands_sep)

        ### Non-specific functions
        # bin_compare
        def hu_bin_compare(cur1, cur2):
            if (cur1 < cur2):
                return -1
            return (cur1 > cur2)

        # bin_merge
        def hu_bin_merge(current1, current2):
            global fmt
            (cur1,) = fmt.unpack(current1)
            (cur2,) = fmt.unpack(current2)
            return fmt.pack(cur1 + cur2)

        ### Register the fields
        register_field("hu-bytes", column_width=10, bin_bytes=fmt.size,
                       rec_to_text=hu_r2t_bytes, rec_to_bin=hu_r2b_bytes,
                       bin_to_text=hu_b2t_bytes, add_rec_to_bin=hu_ar2b_bytes,
                       bin_merge=hu_bin_merge, bin_compare=hu_bin_compare,
                       initial_value=initial)

        register_field("hu-packets", column_width=10, bin_bytes=fmt.size,
                       rec_to_text=hu_r2t_packets, rec_to_bin=hu_r2b_packets,
                       bin_to_text=hu_b2t_packets, add_rec_to_bin=hu_ar2b_packets,
                       bin_merge=hu_bin_merge, bin_compare=hu_bin_compare,
                       initial_value=initial)

       This shows an example of the plug-in's invocation and output when the
       code below is stored in the file human.py.

        $ rwstats --count=5 --no-percent --python-file=human.py    \
               --fields=proto,hu-bytes,hu-packets                  \
               --values=records,hu-bytes,hu-packets data.rw
        INPUT: 501876 Records for 305417 Bins and 501876 Total Records
        OUTPUT: Top 5 Bins by Records
        pro|  hu-bytes|hu-packets|   Records|  hu-bytes|hu-packets|
         17|       328|         1|     15922|    4.98Mi|    15,922|
         17|      76.0|         1|     15482|    1.12Mi|    15,482|
          1|       840|        10|      5895|    4.72Mi|    58,950|
         17|      68.0|         1|      4249|     282Ki|     4,249|
         17|      67.0|         1|      4203|     275Ki|     4,203|

UPGRADING LEGACY PLUGINS
       Some functions were marked as deprecated in SiLK 2.0, and have been
       removed in SiLK 3.0.

       Prior to SiLK 2.0, the rreeggiisstteerr__ffiieelldd(()) function was called
       rreeggiisstteerr__pplluuggiinn__ffiieelldd(()), and it had the following signature:

       register_plugin_field(field_name, [bin_len=bin_bytes_value,]
       [bin_to_text=bin_to_text_func,] [text_len=column_width_value,]
       [rec_to_bin=rec_to_bin_func,] [rec_to_text=rec_to_text_func])

       To convert from register_plugin_field to register_field, change
       text_len to column_width, and change bin_len to bin_bytes.  (Even older
       code may use field_len; this should be changed to column_width as
       well.)

       The rreeggiisstteerr__ffiilltteerr(()) function was introduced in SiLK 2.0.  In versions
       of SiLK prior to SiLK 3.0, when rwfilter was invoked with --python-file
       and the named Python file did not call rreeggiisstteerr__ffiilltteerr(()), rwfilter
       would search the Python input for functions named rrwwffiilltteerr(()) and
       ffiinnaalliizzee(()).  If it found the rrwwffiilltteerr(()) function, rwfilter would act as
       if the file contained:

        register_filter(rwfilter, finalize=finalize)

       To update your pre-SiLK 2.0 rwfilter plug-ins, simply add the above
       line to your Python file.

ENVIRONMENT
       PYTHONPATH
           This environment variable is used by Python to locate modules.
           When --python-file or --python-expr is specified, the application
           must load the Python files that comprise the PySiLK package, such
           as silk/__init__.py.  If this silk/ directory is located outside
           Python's normal search path (for example, in the SiLK installation
           tree), it may be necessary to set or modify the PYTHONPATH
           environment variable to include the parent directory of silk/ so
           that Python can find the PySiLK module.

       PYTHONVERBOSE
           If the SiLK Python extension or plug-in fails to load, setting this
           environment variable to a non-empty string may help you debug the
           issue.

       SILK_PYTHON_TRACEBACK
           When set, Python plug-ins will output trace back information
           regarding Python errors to the standard error.

SEE ALSO
       ppyyssiillkk(3), rrwwffiilltteerr(1), rrwwccuutt(1), rrwwggrroouupp(1), rrwwssoorrtt(1), rrwwssttaattss(1),
       rrwwuunniiqq(1), ppmmaappffiilltteerr(3), ssiillkk(7), ppyytthhoonn(1), <http://docs.python.org/>

SiLK 3.11.0.1                     2016-02-19                     silkpython(3)
Search: Section: