============ Using HPSSPy ============ Introduction ++++++++++++ The primary *command-line* interface to HPSSPy is the script :command:`missing_from_hpss`, which is automatically generated by the package install process. If you need to generate this script manually, it is equivalent to:: #!/usr/bin/env python from sys import exit from hpsspy.scan import main exit(main()) Options +++++++ There are several of command-line options. ``missing_from_hpss --help`` will display all of them. Just the short versions of the commands are shown here. -c DIR Cache files (described below) are written to ``$HOME/cache`` by default. This option allows the user to choose any directory. -D Delete and recreate the disk cache file (described below). -E Exit if an error is detected while processing files on disk or on HPSS. -H Delete and recreate the HPSS cache file (described below). -l N Limit archive files to this size in GB. The default is 1024 GB (1 TB). -p Issue the HPSS commands necessary to actually back up the files found that need to be backed up. -r N Issue a progress report on how many files have been analyzed after ``N`` files (default 10,000). -t Test mode. Try not to make any changes. Also pretend that there are no files backed up to HPSS. -v Print *lots* of extra information. --version Print a version string and exit. Besides the options described above, :command:`missing_from_hpss` requires two positional arguments:: missing_from_hpss config.json section The two arguements are the path to a configuration file and a section of that file to process. These are extensively described in the :doc:`configuration document `. Cache Files +++++++++++ :command:`missing_from_hpss` uses a few cache files primarily to reduce memory footprint. These files will be stored in ``$HOME/cache`` by default. The files are: Disk Cache A CSV file of the form ``disk_cache_
.csv``, where ``
`` is the section (as defined above) specified on the command-line. The columns are file name, file size in bytes and modification time. HPSS Cache A CSV file of the form ``hpss_cache_
.csv``, where ``
`` is the section (as defined above) specified on the command-line. The columns are file name, file size in bytes and modification time. Missing File Cache A JSON file of the form ``missing_files_
.json``, where ``
`` is the section (as defined above) specified on the command-line. It contains a map of HPSS archive files to the files that belong in that archive. In addition the size of the resulting files (modulo small overheads from the archive file creation process) will be saved to this file. These files are *not* cleaned up by default because they are very useful for debugging purposes. Testing and Quality Assurance +++++++++++++++++++++++++++++ To test a configuration file just run :command:`missing_from_hpss` with the ``--test`` option as described above. Aside from creating cache files in a directory as described above, this mode will not alter any of the data, neither on disk nor on HPSS. In addition to validating JSON files and regular expressions, as described in the :doc:`configuration document `, :command:`missing_from_hpss` will: 1. Make sure all regular expressions are actually used. 2. Make sure all files actually match *one and only one* regular expression. 3. Create a manifest file containing the actual files on disk matched and the archive file they map to. This is one and the same as the "Missing File Cache" described above. 4. Make sure that all archive file sizes are less than a user-defined limit (default 1 TB), configurable on the command-line. HPSSPy Library ++++++++++++++ For programmatic access to HPSS, the :doc:`HPSSPy library ` provides equvalents of :mod:`os` and :mod:`os.path` that operate on the HPSS filesystem.