HPSSPy API

hpsspy

Python interface to the HPSS system.

exception hpsspy.HpssError[source]

Generic exception class for HPSS Errors.

exception hpsspy.HpssOSError[source]

HPSS Errors that are similar to OSError.

hpsspy.os

Reproduces some features of the Python built-in os.

hpsspy.os.chmod(path, mode)[source]

Reproduces the behavior of os.chmod() for HPSS files.

Parameters:
  • path (str) – File to chmod.

  • mode (str or int) – Desired file permissions. This mode will be converted to a string.

Raises:

HpssOSError – If the underlying hsi reports an error.

hpsspy.os.listdir(path)[source]

List the contents of an HPSS directory, similar to os.listdir().

Parameters:

path (str) – Directory to examine.

Returns:

A list of HpssFile objects.

Return type:

list

Raises:

HpssOSError – If the underlying hsi reports an error.

hpsspy.os.lstat(path)[source]

Perform the equivalent of os.lstat() on the HPSS file path.

Parameters:

path (str) – Path to file or directory.

Returns:

An object that contains information similar to the data returned by os.stat().

Return type:

HpssFile

Raises:

HpssOSError – If the underlying hsi reports an error.

hpsspy.os.makedirs(path, mode=None)[source]

Reproduces the behavior of os.makedirs().

Parameters:
  • path (str) – Directory to create.

  • mode (str, optional) – String representation of the octal directory mode.

Raises:

HpssOSError – If the underlying hsi reports an error.

Notes

Unlike os.makedirs(), attempts to create existing directories raise no exception.

hpsspy.os.mkdir(path, mode=None)[source]

Reproduces the behavior of os.mkdir().

Parameters:
  • path (str) – Directory to create.

  • mode (str, optional) – String representation of the octal directory mode.

Raises:

HpssOSError – If the underlying hsi reports an error.

Notes

Unlike os.mkdir(), attempts to create existing directories raise no exception.

hpsspy.os.stat(path, follow_symlinks=True)[source]

Perform the equivalent of os.stat() on the HPSS file path.

Parameters:
  • path (str) – Path to file or directory.

  • follow_symlinks (bool, optional) – If False, makes stat() behave like os.lstat().

Returns:

An object that contains information similar to the data returned by os.stat().

Return type:

HpssFile

Raises:

HpssOSError – If the underlying hsi ls reports an error.

hpsspy.os.walk(top, topdown=True, onerror=None, followlinks=False)[source]

Traverse a directory tree on HPSS, similar to os.walk().

Parameters:
  • top (str) – Starting directory.

  • topdown (bool, optional) – Direction to traverse the directory tree.

  • onerror (callable, optional) – Call this function if an error is detected.

  • followlinks (bool, optional) – If True symlinks to directories are treated as directories.

Returns:

This function can be used in the same way as os.walk().

Return type:

iterable

hpsspy.os.path

Reproduces some features of the Python built-in os.path.

hpsspy.os.path.isdir(path)[source]

Reproduces the behavior of os.path.isdir() for HPSS files.

Parameters:

path (str) – Path to the file.

Returns:

True if path is a directory.

Return type:

bool

hpsspy.os.path.isfile(path)[source]

Reproduces the behavior of os.path.isfile() for HPSS files.

Parameters:

path (str) – Path to the file.

Returns:

True if path is a file.

Return type:

bool

Reproduces the behavior of os.path.islink() for HPSS files.

Parameters:

path (str) – Path to the file.

Returns:

True if path is a symlink.

Return type:

bool

hpsspy.scan

Functions for scanning directory trees to find files in need of backup.

hpsspy.scan._options()[source]

Parse command-line options.

Returns:

The parsed command-line arguments.

Return type:

argparse.Namespace

hpsspy.scan.compile_map(old_map, section)[source]

Compile the regular expressions in a map.

Parameters:
  • old_map (dict) – A dictionary containing regular expressions to compile.

  • section (str) – An initial key to determine the section of the dictionary of interest. Typically, this will be a top-level directory.

Returns:

A new dictionary containing compiled regular expressions.

Return type:

dict

hpsspy.scan.extract_directory_name(filename)[source]

Extract a directory name from a HTAR filename that may contain various prefixes.

Parameters:

filename (str) – Name of HTAR file, including directory path.

Returns:

Name of a directory.

Return type:

str

hpsspy.scan.files_to_hpss(hpss_map_cache, section)[source]

Create a map of files on disk to HPSS files.

Parameters:
  • hpss_map_cache (str) – Data file containing the map.

  • section (str) – An initial key to determine the section of the dictionary of interest. Typically, this will be a top-level directory.

Returns:

A tuple contiaining the compiled mapping and an additional configuration dictionary.

Return type:

tuple()

hpsspy.scan.find_missing(hpss_map, hpss_files, disk_files_cache, missing_files, report=10000, limit=1024.0)[source]

Compare HPSS files to disk files.

Parameters:
  • hpss_map (dict) – A mapping of file names to HPSS files.

  • hpss_files (dict) – The list of actual HPSS files.

  • disk_files_cache (str) – Name of the disk cache file.

  • missing_files (str) – Name of the file that will contain the list of missing files.

  • report (int, optional) – Print an informational message when N files have been scanned.

  • limit (float, optional) – HPSS archive files should be smaller than this size (in GB).

Returns:

True if no serious problems were found.

Return type:

bool

hpsspy.scan.iterrsplit(s, c)[source]

Split string s on c and rejoin on c from the end of s.

Parameters:
  • s (str) – String to split

  • c (str) – Split on this string.

Returns:

Iteratively return the joined parts of s.

Return type:

str

hpsspy.scan.main()[source]

Entry-point for command-line scripts.

Returns:

An integer suitable for passing to sys.exit().

Return type:

int

hpsspy.scan.physical_disks(release_root, config)[source]

Convert a root path into a list of physical disks containing data.

Parameters:
  • release_root (str) – The “official” path to the data.

  • config (dict) – A dictionary containing path information.

Returns:

A tuple containing the physical disk paths.

Return type:

tuple()

hpsspy.scan.process_missing(missing_cache, disk_root, hpss_root, dirmode='2770', test=False)[source]

Convert missing files into HPSS commands.

Parameters:
  • missing_cache (str) – Name of a JSON file containing the missing file data.

  • disk_root (str) – Missing files are relative to this root on disk.

  • hpss_root (str) – Missing files are relative to this root on HPSS.

  • dirmode (str, optional) – Create directories on HPSS with this mode (default drwxrws---).

  • test (bool, optional) – Test mode. Try not to make any changes.

hpsspy.scan.scan_disk(disk_roots, disk_files_cache, overwrite=False)[source]

Scan a directory tree on disk and cache the files found there.

Parameters:
  • disk_roots (list) – Name(s) of a directory in which to start the scan.

  • disk_files_cache (str) – Name of a file to hold the cache.

  • overwrite (bool, optional) – If True, ignore any existing cache files.

Returns:

Returns True if the cache is populated and ready to read.

Return type:

bool

hpsspy.scan.scan_hpss(hpss_root, hpss_files_cache, overwrite=False)[source]

Scan a directory on HPSS and return the files found there.

Parameters:
  • hpss_root (str) – Name of a directory in which to start the scan.

  • hpss_files_cache (str) – Name of a file to hold the cache.

  • overwrite (bool, optional) – If True, ignore any existing cache files.

Returns:

The set of files found on HPSS, with size and modification time.

Return type:

dict

hpsspy.scan.validate_configuration(config)[source]

Check the configuration file for validity.

Parameters:

config (str) – Name of the configuration file.

Returns:

An integer suitable for passing to sys.exit().

Return type:

int

hpsspy.util

Low-level utilities.

class hpsspy.util.HpssFile(*args)[source]

This class is used to store and access an HPSS file’s metadata.

Parameters:

args (iterable) – This object this will normally be initialized by a tuple produced by hpsspy.os.listdir().

hpss_path

Path on the HPSS filesystem.

Type:

str

raw_type

Raw type string.

Type:

str

raw_permission

Raw permission string.

Type:

str

Number of hard links.

Type:

int

st_uid

Owner’s name.

Type:

str

st_gid

Group name.

Type:

str

st_size

File size in bytes.

Type:

int

raw_dow

Day-of-week of modification time.

Type:

str

raw_month

Month of modification time.

Type:

str

raw_day

Day of modification time.

Type:

int

raw_hms

H:M:S of modification time.

Type:

str

raw_year

Year of modification time.

Type:

int

raw_name

Name of file.

Type:

str

ishtar

True if the file is an htar file.

Type:

bool

htar_contents()[source]

Return (and cache) the contents of an htar file.

Returns:

List containing the contents.

Return type:

list

property isdir

True if the file is a directory or a symbolic link that points to a directory.

True if the file is a symbolic link.

property name

Name of the file.

property path

Full path to the file.

Destination of symbolic link.

property st_mode

File permission mode.

property st_mtime

File modification time.

hpsspy.util.get_hpss_dir()[source]

Return the directory containing HPSS commands.

Returns:

Full path to the directory containing HPSS commands.

Return type:

str

Raises:

KeyError – If the HPSS_DIR environment variable has not been set.

hpsspy.util.get_tmpdir(**kwargs)[source]

Return the path to a suitable temporary directory.

Resolves the path to the temporary directory in the following order:

  1. If tmpdir is present as a keyword argument, the value is returned.

  2. If TMPDIR is set, its value is returned.

  3. If neither are set, /tmp is returned.

Parameters:

kwargs (dict) – Keyword arguments from another function may be passed to this function. If tmpdir is present as a key, its value will be returned.

Returns:

The name of a temporary directory.

Return type:

str

hpsspy.util.hsi(*args, **kwargs)[source]

Run hsi with arguments.

Parameters:
  • args (tuple()) – Arguments to be passed to hsi.

  • tmpdir (str, optional) – Write temporary files to this directory. Defaults to the value returned by hpsspy.util.get_tmpdir(). This option must be passed as a keyword!

Returns:

The standard output from hsi.

Return type:

str

Raises:

KeyError – If the HPSS_DIR environment variable has not been set.

hpsspy.util.htar(*args)[source]

Run htar with arguments.

Parameters:

args (tuple()) – Arguments to be passed to htar.

Returns:

The standard output and standard error from htar.

Return type:

tuple()

Raises:

KeyError – If the HPSS_DIR environment variable has not been set.