-
Notifications
You must be signed in to change notification settings - Fork 759
Per project management and workflow
OpenGrok can be run with or without projects. A project is simply a directory directly underneath the OpenGrok source root directory. A project can have zero or more Source Code Management repositories underneath. In a setup without projects, all of the data has to be indexed at once. With projects however, each project has its own index so it is possible to index projects in parallel, thus speeding the overall process.
When working with project data, there are 2 types of processing that can take a long time:
- synchronization: updating project data so that it matches its origin
- usually involves running commands like
git pull
in all the repositories for given project.
- usually involves running commands like
- indexing: updating the index so that it matches the project data
For some projects either or both steps can take a long time. Say you have a repository that has its origin residing on a NFS share across the Atlantic so it has high latency plus it uses legacy VCS that operates not on changesets but on individual files and therefore the repository takes a long time (say tens of minutes if not hours) to synchronize. Or, there is a repository that has a large number of files so the initial phase of indexing always takes a long time (due to scanning the whole project directory tree for changed files) even though the incremental changes are small.
Or maybe there is lots of lots of projects that exhibit some of these characteristics.
Previously, it was necessary to index all of source root in order to discover new projects and put them to configuration. Starting with OpenGrok 1.1, it is possible to manage and index projects separately.
As a result, the indexing of complete source root is only necessary when upgrading across OpenGrok version with incompatible Lucene indexes.
Combine these procedures with the parallel processing tools (see repository synchronization) and you have per-project management with parallel processing.
The following examples assume that OpenGrok install base is under the /opengrok
directory.
It is possible to start from scratch or using OpenGrok instance that already indexes all projects in one go.
There are some design choices that need to be dealt with.
The thing with indexer is that either it has to discover projects and their repositories during the indexing preparation or it has to know them in advance.
The following is assuming that the commands opengrok-projadm
, opengrok-groups
and opengrok-config-merge
tools are in PATH
. You can install these from the opengrok-tools
python package available in the release tarball.
Using the opengrok-projadm
tool (that utilizes the opengrok-config-merge
tool and RESTful API) it is possible to manage the projects.
The next sections start by suggesting to backup current configuration. This could be done by e.g. copying the configuration.xml
(that is written by the indexer when using the -W
indexer option) file aside, taking file-system snapshot of the directory the configuration is stored in etc.
This is necessary as a prevention if something goes wrong.
The indexing part of the wiki explains how to run the indexer.
TODO
- backup current config
- add the project data to a directory under the source root directory
- this usually involves running VCS command such as
git clone
, extracting source code from an archive, etc.
- this usually involves running VCS command such as
- perform any necessary authorization adjustments
- add the project to configuration (also refreshes the configuration on disk):
opengrok-projadm -b /opengrok -a PROJECT
- change any per-project settings (see Web services)
- index the project
- it is recommended to use the
opengrok-reindex-project
script (it downloads fresh configuration from the webapp)
- it is recommended to use the
- save the configuration (this is necessary so that the indexed flag of the project is persistent). The
-R
indexer option can be used to supply path to read-only configuration so that it is merged with current configuration.
opengrok-projadm -b /opengrok -r
- now it is possible to reindex the sync and reindex the project in a cycle (see repository synchronization)
- backup current config
- delete the project from configuration (deletes project's index data and refreshes on disk configuration). The
-R
indexer option can be used to supply path to read-only configuration so that it is merged with current configuration.
opengrok-projadm -b /opengrok -d PROJECT
- perform any necessary authorization, group, per-project adjustments in read-only configuration (if any)
provides a way how to run a sequence of commands for a set of projects in parallel.
The script accepts the configuration either in JSON or YAML.
Use e.g. like this:
$ opengrok-sync -c /scripts/sync.conf -d /ws-local/
where the sync.conf
file contents might look like this:
commands:
- command:
- http://localhost:8080/source/api/v1/messages
- POST
- cssClass: info
duration: PT1H
tags: ['%PROJECT%']
text: resync + reindex in progress
- command: [sudo, -u, wsmirror, /opengrok/dist/bin/opengrok-mirror, -c, /opengrok/etc/mirror-config.yml, -U, 'http://localhost:8080/source']
- command: [sudo, -u, webservd, /opengrok/dist/bin/opengrok-reindex-project, -J=-d64,
'-J=-XX:-UseGCOverheadLimit', -J=-Xmx16g, -J=-server, --jar, /opengrok/dist/lib/opengrok.jar,
-t, /opengrok/etc/logging.properties.template, -p, '%PROJ%', -d, /opengrok/log/%PROJECT%,
-P, '%PROJECT%', -U, 'http://localhost:8080/source', --, --renamedHistory, 'on', -r, dirbased, -G, -m, '256', -c,
/usr/local/bin/ctags, -U, 'http://localhost:8080/source', -o, /opengrok/etc/ctags.config,
-H, '%PROJECT%']
env: {LC_ALL: en_US.UTF-8}
limits: {RLIMIT_NOFILE: 1024}
- command: ['http://localhost:8080/source/api/v1/messages?tag=%PROJECT%', DELETE,
'']
- command: [/scripts/check-indexer-logs.ksh]
cleanup:
- command: ['http://localhost:8080/source/api/v1/messages?tag=%PROJECT%', DELETE, '']
Note: the above -U 'http://localhost:8080/source'
twice in reindex-project
is not a typo. It must be specified twice - for the python and for the indexer.
The above opengrok-sync
command will basically take all directories under /ws-local
and for each it will run the sequence of commands specified in the sync.conf
file. This will be done in parallel - on project level. The level of parallelism can be specified using the the --workers
option (by default it will use as many workers as there are CPUs in the system).
Another variant of how to specify the list of projects to be synchronized is to use the --indexed
option of opengrok-sync
that will query the webapp configuration for list of indexed projects and will use that list. Otherwise, the --projects
option can be specified to process just specified projects.
The commands above will basically:
- mark the project with alert (to let the users know it is being synchronized/indexed) using the RESTful API call (the
%PROJECT%
string is replaced with current project name) - pull the changes from all the upstream repositories that belong to the project using the
opengrok-mirror
command - reindex the project using
opengrok-reindex-project
- clear the alert using the second RESTful API call
- execute the
/scripts/check-indexer-logs.ksh
script to perform some pattern matching in the indexer logs to see if there were any serious failures there. The script can look e.g. like this:
#!/usr/bin/ksh
#
# Check OpenGrok indexer logs in the last 24 hours for any signs of serious
# trouble.
#
if (( $# != 1 )); then
print -u2 "usage: $0 <project_name>"
exit 1
fi
project_name=$1
typeset -r log_dir="/opengrok/log/$project_name/"
if [[ ! -d $log_dir ]]; then
print -u2 "cannot open log directory $log_dir"
exit 1
fi
# Check the last log file.
if grep SEVERE "$log_dir/opengrok0.0.log"; then
exit 1
fi
The opengrok-sync
script will print any errors to the console and uses file level locking to provide exclusivity of run so it is handy to run from crontab
periodically.
Each "command" can be either normal command execution (supplying the list of program arguments) or RESTful API call (supplying the HTTP verb and optional payload).
Note that the cleanup
is a set of commands. If any of them fails (i.e. returns non zero value), the process is not interrupted, unlike the main command sequence.
Note that if the web application is listening on non-standard host or port (localhost
and 8080 is the default), the URI has to be used everywhere where it matters. Given that opengrok-sync
performs RESTful API queries itself, one has to specify the location using the -U option of this script and then again it is necessary to specify it in the configuration file - for any RESTful API calls or for opengrok-indexer
command (which also uses the -U option).
If any of the commands in "commands"
fail, the "cleanup"
command will be executed. This is handy in this case since the first RESTful API call will mark the project with alert in the WEB UI so if any of the commands that follow fails, the cleanup call will be made to clear the alert.
Normal command execution can be also performed in the cleanup
section.
Some project can be notorious for producing spurious errors so their errors are ignored via the "ignore_errors"
section.
In the above example it is assumed that opengrok-sync
is run as root
and synchronization and reindexing are done under different users. This is done so that the web application cannot tamper with source code even if compromised.
The commands got appended project name unless one of their arguments contains
%PROJECT%
, in which case it is substituted with project name and no append is
done.
For per-project reindexing to work properly, opengrok-reindex-project
uses
the logging.properties.template
to make sure each project has its own
log directory. The file can look e.g. like this:
handlers= java.util.logging.FileHandler
.level= FINE
java.util.logging.FileHandler.pattern = /opengrok/log/%PROJ%/opengrok%g.%u.log
# Create one file per indexer run. This makes indexer log easy to check.
java.util.logging.FileHandler.limit = 0
java.util.logging.FileHandler.append = false
java.util.logging.FileHandler.count = 30
java.util.logging.FileHandler.formatter = org.opengrok.indexer.logger.formatter.SimpleFileLogFormatter
java.util.logging.ConsoleHandler.level = WARNING
java.util.logging.ConsoleHandler.formatter = org.opengrok.indexer.logger.formatter.SimpleFileLogFormatter
The %PROJ%
template is passed to the script for substitution in the
logging template. This pattern must differ from the %PROJECT%
pattern, otherwise the sync.py
script would substitute it in the command arguments and the substitution in the template file
would not happen.
You can find a logging.properties.template
file in the final release tarball, under doc
directory.