DragonFly On-Line Manual Pages
GLARK(1) glark 1.8.0 GLARK(1)
NAME
glark - Search text files for complex regular expressions
SYNOPSIS
glark [options] expression file ...
DESCRIPTION
Similar to "grep", "glark" offers: Perl-compatible regular expressions,
color highlighting of matches, context around matches, complex
expressions ("and" and "or"), grep output emulation, and automatic
exclusion of non-text files. Its regular expressions should be familiar
to persons experienced in Perl, Python, or Ruby. File may also be a
list of files in the form of a path.
OPTIONS
Input
-0[nnn]
Use \nnn (octal) as the input record separator. If nnn is
omitted, use '\n\n' as the record separator, which treats
paragraphs as lines.
-d ACTION, --directories=ACTION
Directories are processed according to the given ACTION, which
by default is "read". If ACTION is "recurse", each file in the
directory is read and each subdirectory is recursed into
(equivalent to the "-r" option). If ACTION is "skip",
directories are not read, and no message is produced.
--binary-files=TYPE
Specify how to handle binary files, thus overriding the default
behavior, which is to denote the binary files that match the
expression, without displaying the match. TYPE may be one of:
"binary", the default; "without-match", which results in binary
files being skipped; and "text", which results in the binary
file being treated as text, the display of which may have bad
side effects with the terminal. Note that the default behavior
has changed; this previously was to skip binary files. The same
effect may be achieved by setting binary-files to
"without-match" in the ~/.glarkrc file.
--[with-]basename EXPR, --[with-]name EXPR
Search only files whose names match the given regular
expression. As in find(1), this works on the basename of the
file. This expression can be negated and modified with "!" and
"i", such as '!/io\.[hc]$/i'.
--[with-]fullname EXPR, --[with-]path EXPR
Search only files whose names, including path, match the given
regular expression. As in find(1), this works on the path of
the file. This expression can be negated and modified with "!"
and "i", such as '!/Dialog.*\.java$/i'.
--without-basename EXPR, --without-name EXPR
Do not search files with base names matching the given regular
expression.
--without-fullname EXPR, --without-path EXPR
Do not search files with full names matching the given regular
expression.
-M, --exclude-matching
Do not search files whose names match the given expression.
This can be useful for finding external references to a file,
or to a class (assuming that class names match file names).
-r, --recurse
Recurse through directories. Equivalent to --directories=read.
--split-as-path(=VALUE), --no-split-as-path
Sets whether, if a command line argument includes the path
separator (such as ":"), the argument should be split by the
path separator. This functionality is useful for using
environment variables as input, such as $PATH and $CLASSPATH,
which are automatically split and processed as a list of files
and directories. The default value of this option is "true".
"--no-split-as-path" is equivalent to "--split-as-path=false".
--size-limit=SIZE
If provided, files no larger than SIZE bytes will be searched.
This is useful when running the "--recurse" option on
directories that may contain large files.
Matching
-a NUM expr1 expr2
--and NUM expr1 expr2
--and=NUM expr1 expr2
( expr1 --and=NUM expr2 )
Match both of the two expressions, within NUM lines of each
other. See the EXPRESSIONS section for more information.
-b NUM[%], --before NUM[%]
Restrict the search to before the given location, which
represents either the number of the last line within the valid
range, or the percentage of lines to be searched.
--after NUM[%]
Restrict the search to after the given section, which
represents either the number of the first line within the valid
range, or the percentage of lines to be skipped.
-f FILE, --file=FILE
Use the lines in the given file as expressions. Each line
consists of a regular expression.
-i, --ignore-case
Match regular expressions without regard to case. The default
is case sensitive.
-m NUM, --match-limit NUM
Find only the first NUM matches in each file.
-o expr1 expr2
--or expr1 expr2
( expr1 --or expr2 )
Match either of the two expressions. See the EXPRESSIONS
section for more information.
-R, --range NUM[%],NUM[%]
Restrict the search to the given range of lines, as either line
numbers or a percentage of the length of the file.
-v, --invert-match
Show lines that do not match the expression.
-w, --word, --word-regexp
Put word boundaries around each pattern, thus matching only
where the full word(s) occur in the text. Thus, "glark -w Foo"
is the same as "glark '/\bFoo\b/'".
-x, --line-regexp
Select only where the entire line matches the pattern(s).
--xor expr1 expr2
( expr1 --xor expr2 )
Match either of the two expressions, but not both. See the
EXPRESSIONS section for more information.
Output
-A NUM, --after-context=NUM
Print NUM lines after a matched expression.
-B NUM, --before-context=NUM
Print NUM lines before a matched expression.
-C [NUM], -NUM, --context[=NUM]
Output NUM lines of context around a matched expression. The
default is no context. If no NUM is given for this option, the
number of lines of context is 2.
-c, --count
Instead of normal output, display only the number of matches in
each file.
-F, --file-color COLOR
Specify the highlight color for file names. See the
HIGHLIGHTING section for the values that can be used.
--no-filter
Display the entire file(s), presumably with matches
highlighted.
-g, --grep
Produce output like the grep default: file names, no line
numbers, and a single line of the match, which will be the
first line for matches that span multiple lines. If the EMACS
environment variable is set, this value is set to true. Thus,
running glark under Emacs results in the output format expected
by Emacs.
-h, --no-filename
Do not display the names of the files that matched.
-H, --with-filename
Display the names of the files that matched. This is the
default behavior.
-l, --files-with-matches
Print only the names of the file that matched the expression.
-L, --files-without-match
Print only the names of the file that did not match the
expression.
--label=NAME
Use NAME as output file name. This is useful when reading from
standard input.
-n, --line-number
Display the line numbers. This is the default behavior.
-N, --no-line-number
Do not display the line numbers.
--line-number-color
Specify the highlight color for line numbers. This defaults to
none (no highlighting). See the HIGHLIGHTING section for more
information.
-T, --text-color COLOR
Specify the highlight color for text. See the HIGHLIGHTING
section for more information.
--text-color-NUM COLOR
Specify the highlight color for the regular expression capture
NUM. Colors are used by regular expressions in the order they
are created (that is, with the "--and" and "--or" option), or
with captures within a regular expression (such as
'/(this)|(that)/'). is See the HIGHLIGHTING section for more
information.
-u, --highlight=[FORMAT]
Enable highlighting. This is the default behavior. Format is
"single" (one color) or "multi" (different color per regular
expression). See the HIGHLIGHTING section for more information.
-U, --no-highlight
Disable highlighting.
-y, --extract-matches
Display only the region that matched, not the entire line. If
the expression contains "backreferences" (i.e., expressions
bounded by "( ... )"), then only the portion captured will be
displayed, not the entire line. This option is useful with
"-g", which eliminates the default highlighting and display of
file names.
-Z, --null
When in -l mode, write file names followed by the ASCII NUL
character ('\0') instead of '\n'.
Debugging/Errors
-?, --help
Display the help message.
--config
Display the settings glark is using, and exit. Since this is
run after configuration files are read, this may be useful for
determining values of configuration parameters.
--explain
Write the expression in a more legible format, useful for
debugging.
-q, -s, --quiet, --no-messages
Suppress warnings.
-Q, --no-quiet
Enable warnings. This is the default.
-V, --version
Display version information.
--verbose
Display normally suppressed output, for debugging purposes.
EXPRESSIONS
Regular Expressions
Regular expressions are expected to be in the Perl/Ruby format.
"perldoc perlre" has more general information. The expression may be of
either form:
something
/something/
There is no difference between the two forms, except that with the
latter, one can provide the "ignore case" modifier, thus matching
"someThing" and "SoMeThInG":
% glark /something/i
Note that this is redundant with the "-i" ("--ignore-case") option.
All regular expression characters and options are available, such as
"\w", ".*?" and "[^9]". For example:
% glark '\b[a-z][^\d]\d{1,3}.*\s*>>\s*\d+\s*.*& +\d{3}'
If the and and or options are not used, the last non-option is
considered to be the expression to be matched. In the following,
"printf" is used as the expression.
% glark -w printf *.c
POSIX character classes (e.g., [[:alpha:]]) are also supported.
Complex Expressions
Complex expressions combine regular expressions (and complex
expressions themselves) with logical AND, OR, and XOR operators. Both
prefix and infix notation is supported.
-o expr1 expr2
--or expr1 expr2 --end-of-or
( expr1 --or expr2 )
Match either of the two expressions. The results of the two
forms are equivalent. In the latter syntax, the --end-of-or is
optional.
-a number expr1 expr2
--and=number expr1 expr2 --end-of-and
( expr1 --and number expr2 )
Match both of the two expressions, within <number> lines of
each other. As with the "or" option, the results of the two
forms are equivalent, and the "--end-of-and" is optional. The
forms "-aNUM" and "--and=NUM" are also supported.
If the number provided is -1 (negative one), the distance is
considered to be "infinite", and thus, the condition is
satisfied if both expressions match within the same file.
If the number provided is 0 (zero), the condition is satisfied
if both expressions match on the same line.
If the --and option is used, and the follow argument is not
numeric, then the value defaults to zero.
A warning will be issued if the value given in the number
position does not appear to be numeric.
--xor expr1 expr2 --end-of-xor
( expr1 --xor expr2 )
Match either of the two expressions, but not both.
"--end-of-xor" is optional.
Negated Regular Expressions
Regular expressions can be negated, by being prefixed with '!', and
using the '/' quote characters around the expression, such as:
!/expr/
This has the effect of "match anything other than this". For a single
expression, this is no different than the -v/--invert-match option, but
it can be useful in complex expressions, such as:
--and 0 this '!/that/'
which means "match and line that has "this", but not "that".
HIGHLIGHTING
Matching patterns and file names can be highlighted using ANSI escape
sequences. Both the foreground and the background colors may be
specified, from the following:
black
blue
cyan
green
magenta
red
white
yellow
The foreground may have any number of the following modifiers applied:
blink
bold
concealed
reverse
underline
underscore
The format is "MODIFIERS FOREGROUND on BACKGROUND". For example:
red
black on yellow (the default for patterns)
reverse bold (the default for file names)
green on white
bold underline red on cyan
By default text is highlighted as black on yellow. File names are
written in reversed bold text.
EXAMPLES
Basic Usage
% glark format *.h
Searches for "format" in the local .h files.
% glark --ignore-case format *.h
Searches for "format" without regard to case. Short form:
% glark -i format *.h
% glark --context=6 format *.h
Produces 6 lines of context around any match for "format".
Short forms:
% glark -C 6 format *.h
% glark -6 format *.h
% glark --exclude-matching Object *.java
Find references to "Object", excluding the files whose names
match "Object". Thus, SessionBean.java would be searched;
EJBObject.java would not. Short form:
% glark -M Object *.java
% glark --grep --extract-matches '\w+\.printStackTrace\(.*\)'
*.java
Show where exceptions are dumped. Note that the "--grep" option
is used, thus turning off highlighting and display of file
names. If the "--no-filename" option is used, the output will
consist of only the matching portions. The short form of this
command is:
% glark -gy '\w+\.printStackTrace\(.*\)' *.java
% glark --grep --extract-matches '(\w+)\.printStackTrace\(.*\)'
*.java
Show only the variable name of exceptions that are dumped.
Short form:
% glark -gy '(\w+)\.printStackTrace\(.*\)' *.java
% who | glark -gy '^(\S+)\s+\S+\s*May 15'
Display only the names of users who logged in today.
% glark -l '\b\w{25,}\b' *.txt
Display (only) the names of the text files that contain "words"
at least 25 characters long.
% glark --files-without-match '"\w+"'
Display (only) the names of the files that do not contain
strings consisting of a single word. Short form:
% glark -L '"\w+"'
% for i in *.jar; do jar tvf $i | glark --LABEL=$i Exception; done
Search the files for 'Exception', displaying the jar file name
instead of the standard input marker ('-').
Highlighting
% glark --text-color "red on white" '\b[[:digit:]]{5}\b' *.c
Display (in red text on a white background) occurrences of
exactly 5 digits. Short form:
% glark -T "red on white" '\b\d{5}\b' *.c
See the HIGHLIGHTING section for valid colors and modifiers.
Complex Expressions
% glark --or format print *.h
Searches for either "printf" or "format". Short form:
% glark -o format print *.h
% glark --and 4 printf format *.c *.h
Searches for both "printf" or "format" within 4 lines of each
other. Short form:
% glark -a 4 printf format *.c *.h
% glark --context=3 --and 0 printf format *.c
Searches for both "printf" or "format" on the same line
("within 0 lines of each other"). Three lines of context are
displayed around any matches. Short form:
% glark -3 -a 0 printf format *.c
% glark -8 -i -a 15 -a 2 pthx '\.\.\.' -o 'va_\w+t' die *.c
(In order of the options:) Produces 8 lines of context around
case insensitive matches of ("phtx" within 2 lines of '...'
(literal)) within 15 lines of (either "va_\w+t" or "die").
% glark --and -1 '#define\s+YIELD' '#define\s+dTHR' *.h
Looks for "#define\s+YIELD" within the same file (-1 ==
"infinite distance") of "#define\s+dTHR". Short form:
% glark -a -1 '#define\s+YIELD' '#define\s+dTHR' *.h
Range Limiting
% glark --before 50% cout *.cpp
Find references to "cout", within the first half of the file.
Short form:
% glark -b 50% cout *.cpp
% glark --after 20 cout *.cpp
Find references to "cout", starting at the 20th line in the
file. Short form:
% glark -b 50% cout *.cpp
% glark --range 20 50% cout *.cpp
Find references to "cout", in the first half of the file, after
the 20th line. Short form:
% glark -R 20 50% cout *.cpp
File Processing
% glark -r print .
Search for "print" in all files at and below the current
directory.
% glark --fullname='/\.java$/' -r println org
Search for "println" in all Java files at and below the "org"
directory.
% glark --basename='!/CVS/' -r '\b\d\d:\d\d:\d\d\b' .
Search for a time pattern in all files without "CVS" in their
basenames.
% glark --size-limit=1024 -r main -r .
Search for "main" in files no larger than 1024 bytes.
ENVIRONMENT
GLARKOPTS
A string of whitespace-delimited options. Due to parsing
constraints, should probably not contain complex regular
expressions.
$HOME/.glarkrc
A resource file, containing name/value pairs, separated by either
':' or '='. The valid fields of a .glarkrc file are as follows,
with example values:
after-context: 1
before-context: 6
context: 5
file-color: blue on yellow
highlight: off
ignore-case: false
quiet: yes
text-color: bold reverse
line-number-color: bold
verbose: false
grep: true
"yes" and "on" are synonymnous with "true". "no" and "off" signify
"false".
My ~/.glarkrc file is the following:
file-color: bold reverse
text-color: bold black on yellow
context: 2
highlight: on
verbose: false
ignore-case: false
quiet: yes
word: false
binary-files: without-match
local .glarkrc
See the local-config-files field below:
Fields
after-context
See the "--after-context" option. For example, for 3 lines of
context after the match:
after-context: 3
basename
See the "--basename" option. For example, to omit Subversion
directories:
basename: !/\.svn/
before-context
See the "--before-context" option. For example, for 7 lines of
context before the match:
before-context: 7
binary-files
See the "--binary-files" option. For example, to skip binary files:
binary-files: without-match
context
See the "--context" option. For example, for 2 lines before and
after matches:
context: 2
expression
See the EXPRESSION section. Example:
expression: --or '^\s*public\s+class\s+\w+' '^\s*\w+\(
file-color
See the "--file-color" option. For example, for white on black:
file-color: white on black
filter
See the "--filter" option. For example, to show the entire file:
filter: false
fullname
See the "--fullname" and "--basename" options. For example, to omit
CVS files:
fullname: !/\bCVS\b/
grep
See the "--grep" option. For example, to always run in grep mode:
grep: true
highlight
See the "--highlight" option. To turn off highlighting:
highlight: false
ignore-case
See the "--ignore-case" option. To make matching case-insensitive:
ignore-case: true
known-nontext-files
The extensions of files that should be considered to always be
nontext (binary). If a file extension is not known, the file
contents are examined for nontext characters. Thus, setting this
field can result in faster searches. Example:
known-nontext-files: class exe dll com
See the Exclusion of Non-Text Files section in NOTES for the
default settings.
known-text-files
The extensions of files that should be considered to always be
text. See above for more. Example:
known-text-files: ini bat xsl xml
See the Exclusion of Non-Text Files section in NOTES for the
default settings.
local-config-files
By default, glark uses only the configuration file ~/.glarkrc.
Enabling this makes glark search upward from the current directory
for the first .glarkrc file.
This can be used, for example, in a Java project, where .class
files are binary, versus a PHP project, where .class files are
text:
/home/me/.glarkrc
local-config-files: true
/home/me/projects/java/.glarkrc
known-nontext-files: class
/home/me/projects/php/.glarkrc
known-text-files: class
With this configuration, .class files will automatically be treated
as binary file in Java projects, and .class files will be treated
as text. This can speed up searches.
Note that the configuration file ~/.glarkrc is read first, so the
local configuration file can override those settings.
quiet
See the "--quiet" option.
show-break
Whether to display breaks between sections, when displaying
context. Example:
show-break: true
By default, this is false.
text-color
See the "--text-color" option. Example:
text-color: bold blue on white
verbose
See the "--verbose" option. Example:
verbose: true
verbosity
See the "--verbosity" option. Example:
verbosity: 4
NOTES
Exclusion of Non-Text Files
Non-text files are automatically skipped, by taking a sample of the
file and checking for an excessive number of non-ASCII characters. For
speed purposes, this test is skipped for files whose suffixes are
associated with text files:
c
cpp
css
h
f
for
fpp
hpp
html
java
mk
php
pl
pm
rb
rbw
txt
Similarly, this test is also skipped for files whose suffixes are
associated with non-text (binary) files:
Z
a
bz2
elc
gif
gz
jar
jpeg
jpg
o
obj
pdf
png
ps
tar
zip
See the "known-text-files" and "known-nontext-files" fields for
denoting file name suffixes to associate as text or nontext.
Exit Status
The exit status is 0 if matches were found; 1 if no matches were found,
and 2 if there was an error. An inverted match (the -v/--invert-match
option) will result in 1 for matches found, 0 for none found.
SEE ALSO
For regular expressions, the "perlre" man page.
Mastering Regular Expressions, by Jeffrey Friedl, published by
O'Reilly.
CAVEATS
"Unbalanced" leading and trailing slashes will result in those slashes
being included as characters in the regular expression. Thus, the
following pairs are equivalent:
/foo "/foo"
/foo\/ "/foo/"
/foo\/i "/foo/i"
foo/ "foo/"
foo/ "foo/"
The code to detect nontext files assumes ASCII, not Unicode.
AUTHOR
Jeff Pace <jpace at incava dot org>
COPYRIGHT
Copyright (c) 2006, Jeff Pace.
All Rights Reserved. This module is free software. It may be used,
redistributed and/or modified under the terms of the Lesser GNU Public
License. See http://www.gnu.org/licenses/lgpl.html for more
information.
glark 1.8.0 2006-12-30 GLARK(1)