NAME dirwatch v0.9.0 SYNOPSIS dirwatch [ options ]+ [ directory = ./ ] [ mode = watch ] DESCRIPTTION dirwatch is a tool used to rapidly build event driven processing systems. dirwatch manages an sqlite database that mirrors the state of a directory and then triggers user definable event handlers for certain filesystem activities such file creation, modification, deletion, etc. dirwatch can also implement a tmpwatch like behaviour to ensure files of a certain age are removed from the directory being watched. dirwatch normally runs as a daemon process by first sychronizing the database inventory with that of the directory and then firing appropriate triggers as they occur. ----------------------------------------------------------------------------- the following actions may have triggers configured for them ----------------------------------------------------------------------------- created -> a file was created modified -> a file has had it's mtime updated updated -> the union of created and modified deleted -> a file was deleted existing -> a file has not changed but is still exists ----------------------------------------------------------------------------- the command line 'mode' must be one of the following ----------------------------------------------------------------------------- create (c) -> initialize the database and supporting files watch (w) -> monitor directory and trigger actions in the foreground start (S) -> spawn a daemon watcher in the background restart (R) -> (re)spawn a daemon watcher in the background stop (H) -> stop/halt any currently running watcher status (T) -> determine if any watcher is currently running truncate (D) -> truncate/delete all entries from the database archive (a) -> create a hot-backup of a watch's database contents list (l) -> dump database to stdout in silky smooth yaml format the default mode is to 'watch'. for all modes the command line argument must be the name of the directory to which to apply the operation - this defaults to the current directory. ----------------------------------------------------------------------------- mode: create (c) ----------------------------------------------------------------------------- initializes a storage directory with all required database files, logs, command directories, sample configuration, sample programs, etc. examples: 0) initialize the directory incoming_data/ to be dirwatched using all defaults ~ > dirwatch create incoming_data/ ----------------------------------------------------------------------------- mode: start (S) ----------------------------------------------------------------------------- dirwatch is normally run in daemon mode. the start mode is equivalent to running in 'watch' mode with the '--daemon' and '--quiet' flags. examples: 0) start a background daemon process watching incoming_data/ ~ > dirwatch start incoming_data/ ----------------------------------------------------------------------------- mode: restart (R) ----------------------------------------------------------------------------- 'restart' mode checks a watcher's pidfile and either restarts the currently running watcher or starts a new one as in 'start' mode. this is equivalent to sending SIGHUP to the watcher daemon process. examples: 0) re-start a background daemon process watching incoming_data/ ~ > dirwatch restart incoming_data/ ----------------------------------------------------------------------------- mode: stop (H) ----------------------------------------------------------------------------- 'stop' mode checks for any process watching the specified directory and kills this process if it exists. this is equivalent to sending TERM to the watcher daemon process. the process will not exit immediately but will do at the first possible safe opportunity. do __not__ kill -9 the daemon process. examples: 0) stop the daemon process watching incoming_data/ ~ > dirwatch stop incoming_data/ ----------------------------------------------------------------------------- mode: status (T) ----------------------------------------------------------------------------- 'status' mode reports whether or not a watcher is running for the given directory. examples: 0) report on the watcher, iff any, watching incoming_data/ ~ > dirwatch status incoming_data/ ----------------------------------------------------------------------------- mode: truncate (D) ----------------------------------------------------------------------------- 'truncate' mode empties the database of all state in an atomic fashion. examples: 0) empty the database in a safe way ~ > dirwatch truncate incoming_data/ ----------------------------------------------------------------------------- mode: archive (a) ----------------------------------------------------------------------------- archive mode is used to atomically create a hot-backup tgz file of a the storage directory for a given directory while respecting the locking subsystem. examples: 0) make a hot-backup of the database and all supporting files in incoming_data/ ~ > dirwatch archive incoming_data/ ----------------------------------------------------------------------------- mode: watch (w) ----------------------------------------------------------------------------- this is the meat of dirwatch. dirwatch is designed to run as a daemon, updating a database inventory at the interval specified by the '--interval' option (5 minutes by default) and firing appropriate trigger commands. two watchers may not watch the same dir simoultaneously and attempting the start a second watcher will fail when the second watcher is unable to obtain a lockfile. it is a non-fatal error to attempt to start another watcher when one is running and this failure can be made silent by using the '--quiet' option. the reason for this is to allow a crontab entry to be used to make the daemon 'immortal'. for example, the following crontab entry */15 * * * * dirwatch directory --daemon will __attempt__ to start a daemon watching 'directory' every fifteen minutes. if the daemon is not already running one will started, otherwise dirwatch will simply fail silently (no cron email sent due to stderr). this feature allows a normal user to setup daemon processes that will not only run after machine reboot, but which will continue to run after other unforseen terminal program behaviour. such a daemon is known as an 'immortal' daemon. as the watcher runs and maintains the database inventory it is noted when files/directories (entries) have been created, modified, updated, deleted, or are existing. these entries are then handled by user definable triggers as specified in the config file. the config file is of the format ... actions : created : commands : ... updated : commands : ... ... ... where the commands to be run for each trigger type are enumerated. each command entry is of the following format: ... - command : the command to run type : calling convention, how info is passed to the program pattern : filter files by this regex timing : synchronous or asynchronous execution ... further explanation of each field: command: this is the program to run. the search path for the program is modified to first include the commands/ dir underneath the .dirwatch/ dir in the directory being watched. type: there are four types of commands. the type merely indicates the calling convention of the program. when commands are run there are two peices of information which are passed to the program, the file in question and the mtime of that file. the mtime is less important but programs may use it to know if the file has been changed since they were last spawned or other bookkeeping. mtime will probably be ignored for most commands. the four types of commands fall into two catagories: those commands called once for each file and those types of commands called once with __all__ files file at a time: simple: the command will be called with two arguments: the file in question and the mtime datetime, eg: command foobar.txt '2002-11-04 01:01:01.1234' expanded: the command will be have the strings '@file' and '@mtime' replaced with appropriate values. eg: command '@file' '@mtime' expands to (and is called as) command 'somefile' '2002-11-04 01:01:01.1234' files at once: filter: the stdin of the program will be given a list where each line contains two items, the file and the datetime. yaml: the stdin of the program will be given a list where each entry contains two items, the file and the mtime. the format of the list is valid yaml and the schema is an array of hashes where each hash has the keys 'path' and 'mtime'. pattern: all the files for a given action are filtered by this pattern, and only those files matching pattern will have triggers fired. timing: if timing is asynchronous the command will be run and not waited for before starting the next command. asynchronous commands may yield better performance but may also result in many commands being run at once. asyncronous commands should not be programs that load the system heavily unless one is looking to freeze a machine. synchronous commands are spawned and waited for before the next command is started. a side effect of synchronous commands is that the time spent waiting may sum to an ammount of time greater than the interval ('--interval' option) specified - if the amount of time spent running commands exceeds the interval the next inventory simply begins immeadiately with no pause. because of this one should think of the interval used as a minimum bound only, especially when synchronous commands are used. note that sample commands of each type are auto-generated in the dbdir/commands directory. reading these should answer any questions regarding the calling conventions of any of the four types. for other questions regard the sample config, which is also auto-generated. examples: 0) run a watch from this terminal (non daemon) ~ > dirwatch directory watch ----------------------------------------------------------------------------- mode: list (l) ----------------------------------------------------------------------------- dump the contents of the database in yaml format for easy viewing/parsing examples: 0) dump database as yaml ~ > dirwatch directory list ENVIRONMENT for dirwatch itself: export SLDB_DEBUG=1 -> cause sldb lib actions (sql) to be logged export LOCKFILE_DEBUG=1 -> cause lockfile lib actions to be logged for programs run by dirwatch the following environment variables will be set: DIRWATCH_DIR -> the directory being watched DIRWATCH_ACTION -> action type, one of 'instance', 'created', 'modified', 'updated', 'deleted', or 'existing' DIRWATCH_TYPE -> command type, one of 'simple', 'expanded', 'filter', or 'yaml' DIRWATCH_N_PATHS -> the total number of paths for this action. the paths themselves will be passed to the program in a different way depending on DIRWATCH_TYPE, for instance on the command line or on stdin, but this number will always be the total number of paths the program should expect. DIRWATCH_PATH_IDX -> for some command types, like 'simple', the program will be run more than once to handle all paths since calling convention only allows the program to be called with one path at a time. this number is the index of the current path in such cases. for instance, a 'simple' program may only be called with one path at a time so if 10 files were created in the directory that would result in the program being called 10 times. in each case DIRWATCH_N_PATHS would be 10 and DIRWATCH_PATH_IDX would range from 0 to 9 for each of the 10 calls to the program. in the case of 'filter' and 'yaml' command types, where every path is given at once on stdin this value will be equal to DIRWATCH_N_PATHS DIRWATCH_PATH -> for 'simple' and 'expanded' command types, which are called once for each path, this will contain the path the program is being called with. in the case of 'filter' or 'yaml' command types the varible contains the string 'stdin' implying that all paths are available on stdin. DIRWATCH_MTIME -> for 'simple' and 'expanded' command types, which are called once for each path, this will contain the mtime the program is being called with. in the case of 'filter' or 'yaml' command types the varible contains the string 'stdin' implying that all mtimes are available on stdin. DIRWATCH_PID -> the pid of dirwatch watcher process DIRWATCH_ID -> an identifier for this action that will be unique for any given run of a dirwatch watcher process. restarting the watcher resets the generator. this identifier is logged in the dirwatch watcher logs to is useful to match program logs with dirwatch logs PATH -> the normal shell path. for each program run the PATH is modified to contain the commands dir of the dirwatch watcher processs. normally this will be $DIRWATCH_DIR/.dirwatch/commands/:$PATH note that all the sample programs generated show how to access these environment vars. FILES directory/.dirwatch/ -> dirwatch data files directory/.dirwatch/dirwatch.conf -> default configuration file directory/.dirwatch/commands/ -> default location for triggers directory/.dirwatch/db -> sldb/sqlite database directory/.dirwatch/dirwatch.pid -> default pidfile directory/.dirwatch/logs/ -> automatically rolled log files DIAGNOSTICS success -> $? == 0 failure -> $? != 0 AUTHOR ara.t.howard@noaa.gov BUGS 1 < bugno && bugno < 42 OPTIONS --help, -h this message --log=path, -l set log file - (default stderr) --verbosity=verbostiy, -v 0|fatal < 1|error < 2|warn < 3|info < 4|debug - (default info) --config=path valid path - specify config file (default nil) --template=[path] valid path - generate a template config file in path (default stdout) --recursive, -r recurse into subdirectories (default do not recurse) --all, -a consider all filesystem entries, includig directories (default files only) --follow, -f follow links (default does not follow links) --pattern=pattern, -p consider only filesystem entries that match pattern (default all entries) --daemon, -D specify daemon mode (default not daemon) --quiet, -Q be wery wery quiet (default not quiet) --dirwatch_dir=dirwatch_dir, -S specify dirwatch storage dir (default .dirwatch/ in dir being watched) --n_loops=n_loops, -N loop only this many times before exiting (default infinite) --interval=seconds, -I sleep at least this long between loops (default 300sec (5min)) --lockfile, -L create a lockfile in dir while running (default no lockfile)