Thursday, April 28, 2011

Cooperative Limiting of Concurrent Process Instances

It is a common desire to limit the number of instances of a specific program for one of several reasons including resource management, ensuring programs not meant to handle multiple concurrent instances are run only once, or managing license restrictions. In this post I will detail a method for managing the number of concurrent instances of a process or group of processes in a manner that does not require alteration of the programs being limited and avoids a number of issues with existing methods that usually do.


There are several traditional ways to limit the number of instances of a process on a Unix system:

  • Using a pid file.
  • This method uses a file at a pre-determined location to store the pid of the running instance of the process. If the file exists when the program is started, the program checks to see if the pid in the file exists and if it does it tries to determine if it corresponds to an instance of the same process and exits if it does. Otherwise the file is overwritten with the pid of the new process. File-locking has to be used carefully to avoid race conditions that could break the mechanism. This is the primary method used by daemon programs on UNIX systems. The main drawback with this approach is that is doesn't scale well to allow a limited number of multiple instances of a program. Other complications include properly managing the file locking (especially in a shell script) and determining if the pid in the pid file corresponds to a running instance of the same program or another program.

  • Using a directory as a mutex.
  • When the process starts it tries to create a pre-determined directory. If this operation fails the program quits assuming another instance is already running. This prevents the complications of race conditions but doesn't scale and it is difficult or impossible to ensure that the directory will be deleted when the process terminates abnormally.

  • Using sockets.
  • In this scheme the process tries to create a socket on the machine which will fail if another instance of the program already owns the socket. This solution doesn't scale well, and requires some care in picking the port of the socket to use.


An ideal solution would achieve the following goals:
  • Flexibility - allow a configurable number of instances of the process to run
  • Robustness - don't assume that the process in question will be in a position to clean up after itself when it terminates
  • Usability - the solution should be a implementable as separate program that acts as a wrapper for other programs to enforce the concurrent limits
  • Portability - the solution should be fairly portable, at least amongst POSIX systems.

POSIX provides an advisory locking mechanism via fcntl. One of the nice things about fcntl is that its locks are managed by the kernel and are automatically released when the process terminates, even if the process is killed or aborts. A solution using POSIX advisory locks will help meet the robustness goal.

Using fcntl it is also possible to lock individual bytes in a file - this will be key in supporting multiple instances. fcntl locks are also kept across a call to the exec family of functions (but not across a fork) which allows us to preserve the lock from a wrapper program, this supports the usability goal.

The fcntl function is required by POSIX and is thus supported by virtually all Unix-like systems, this meets the portability goal.


Method

The idea is to use a lock file to track the number of instances of a process, or group of processes. The lock file is not a traditional lock file though; its presence alone does not convey anything about the program's usage. Instead, the lock file serves as a "lock repository". Each time the process starts, it obtains a POSIX advisory lock on an available byte of the file, if there is one. The availability of a byte to lock is determined by examining the bytes, counting the number that are locked, and testing whether the number of existing locks is less than the maximum allowed.

The first several bytes in the file will contain an integer value that represents the index of the maximum locked byte so the process knows how many bytes to check to determine how many processes are in use. The maximum number of instances allowed is not stored in the file but managed externally which allows the number to change during production. If the number of locked bytes encountered is greater than or equal to the max allowed instances, the program will not start. Otherwise it will obtain its own lock on an available byte and update the max index, if necessary.

I have created a wrapper program, called exectimes around this concept that takes a file to use as a lock repository, the number of max instances allowed, and the command to run as arguments. The program also supports the ability to check the number of locks in use as well as the ability to list the PIDs of the processes with the locks. The source code is C and can be obtained at git://github.com/rgamble/exectimes.git.

Instead of calling the target program directly, a shell script can be used which runs the program using exectimes to enforce concurrent limits.

Below is a discussion of the major parts of the implementation of exectimes.

Implementation

Step 1

Open the lock file in read-write mode and create the file if it does not exist. If the file is being created, the permissions allow read and write access by everyone as multiple users may need to be able to write to the file. If only specific users should be able to access the program and the lock file then it can simply be placed in a directory with limited access.

/* Open the lock file, create if doesn't exist */
    mask = umask(0);
    fd = open(argv[1], O_CREAT | O_RDWR, 0666);
    if (fd < 0) {
        perror("Failed to open file");
        exit(EXIT_FAILURE);
    }
    (void) umask(mask);

Step 2

After opening the file, attempt to lock the initial bytes of the file that correspond to the max index. This part of the file serves two purposes: as a mutex for the file and as an index of the max byte that needs to be checked. The process will attempt to take out an exclusive lock on this region and will wait until it is able to if necessary.

/* Lock the beginning of the lock file */
    fl.l_type = F_WRLCK;
    fl.l_whence = SEEK_SET;
    fl.l_start = 0;
    fl.l_len = lock_cnt_size;

    while (fcntl(fd, F_SETLKW, &fl) < 0) {
        if (errno == EINTR) {
            continue;
        }

        perror("Failed to lock file");
        exit(EXIT_FAILURE);
    }

After the process successfully obtains a read-write lock on the initial section of the lock file, it uses the index to determine how many bytes need to be examined for locks. It then scans these bytes to determine which ones of them are currently locked, keeps a count of the number of locks, and remembers the index of the first unlocked byte. Lastly, as a small optimization, the index of the last used lock is kept for later use. Since the read-write lock on the initial section is still held, no other process can obtain a lock during this time.

/* Read the max index */
    {
        const ssize_t result = read(fd, &max_idx, lock_cnt_size);
        if (result < 0) {
            perror("Failed to read file");
            exit(EXIT_FAILURE);
        }
        else if (result < (ssize_t) lock_cnt_size) {
            max_idx = lock_cnt_size;
        }
    }

    /* Get a count of the number of locks and the first available byte */
    locks_held = 0;
    for (i = lock_cnt_size; (i <= (max_idx + 1)) && (i < lock_cnt_max); i++) {
        fl.l_start = (off_t) i;
        fl.l_type = F_WRLCK;
        fl.l_whence = SEEK_SET;
        fl.l_len = 1;

        if (fcntl(fd, F_GETLK, &fl) < 0) {
            perror("fcntl failure");
            exit(EXIT_FAILURE);
        }

        if (fl.l_type == F_UNLCK) {
            if (first_free == 0) {
                first_free = i;
            }
        }
        else {
            if (list_instances) {
                printf("Slot %lu held by PID %d\n",
                        (unsigned long) (i - lock_cnt_size) + 1,
                        (int) fl.l_pid);
            }
            locks_held++;
            last_used = i;
        }
    }

Step 3

If the number of currently locked bytes is greater than or equal to the maximum instances allowed, the process terminates, releasing it read-write lock.

if ((locks_held >= max_allowed) || (first_free == 0)) {
        fprintf(stderr, "Cannot start, %lu instances already running\n", 
                (unsigned long) locks_held);
        exit(EXIT_FAILURE);
    }

Otherwise the first free byte is locked. If the index of the locked byte exceeds the max index stored in the locked region, the index is updated. Otherwise, since we already scanned all of the potentially locked byes and know the location of the locked byte with the highest index, if this index is less than what is stored in the locked region we will update that number to lower the number of bytes that have to be examined in future checks.

/* At least one lock is free, try to get it */
    fl.l_start = (off_t) first_free;
    fl.l_type = F_WRLCK;
    fl.l_whence = SEEK_SET;
    fl.l_len = 1;

    if (fcntl(fd, F_SETLK, &fl) < 0) {
        perror("Failed to lock previously determined free byte in file");
        exit(EXIT_FAILURE);
    }
   
    if (first_free > last_used) {
        last_used = first_free;
    }
    
    /* Update the max_idx byte if neccessary */
    if (last_used != max_idx) {
        lseek(fd, SEEK_SET, 0); /*lint !e747*/
        write(fd, &last_used, lock_cnt_size);
    }

Step 4

With our locked byte in place, we can release the lock on the max index and attempt to exec the provided command. If the command fails then the process exits which will automatically release the lock, otherwise the new process starts and will maintain the lock until it exits.

/* Unlock initial bytes and exec the new process */
    fl.l_type = F_UNLCK;
    fl.l_whence = SEEK_SET;
    fl.l_start = 0;
    fl.l_len = lock_cnt_size;

    if (fcntl(fd, F_SETLK, &fl) < 0) {
        perror("Failed to release lock on file");
        exit(EXIT_FAILURE);
    }

    if (execvp(argv[3], &argv[3]) < 0) {
        perror("exec failed");
        exit(EXIT_FAILURE);
    }

    return EXIT_FAILURE;

One thing to note about the implementation is that nothing is written to the file except for the max index. Since POSIX allows the locking of bytes that do not exist, and we only use the bytes for the purpose of creating and checking locks, the only actual data in the lock file is the index of the maximum used byte.

Limitations

If this method is implemented by a wrapper program and the target process closes the file descriptor associated with the lock file, the lock will be released which would allow more than the anticipated number of instances to exist. This is not typical behavior of most programs but some daemon programs, instead of closing just close stdin, stdout, and stderr during daemonization, will try to close all of their descriptors.

Possible solutions to this issue include modifying the target program so it doesn’t do this or using dup2 in the wrapper program to open the file at a file descriptor larger than the largest one closed by the target program when it performs it closes. Another option would be to have exectimes call fork after obtaining its byte lock, have the child exec the program, and have the parent wait for the child. This will incur an overhead of an extra process for each running instance. Also, if the parent process is killed before the child process, the lock will be prematurely released.

In most kernels, including the Linux and FreeBSD kernels, the time it takes to check or acquire a lock increases linearly with the number of locks that already exist. This is because, in a typical implementation, the kernel keeps the locks in a linked list which must be fully traversed to determine if the requested section is already locked. Additionally, the number of existing locks increases the number of bytes that need to be checked before finding an available slot making the lock acquisition a potentially quadratic time operation. In practice, this doesn't start to become a concern until the number of locks exceeds several thousand, even in the worst case.

Final Notes

Because the maximum number of concurrent instances is specified by the wrapper program, this number can be dynamic. For example, you might have a group that can only launch the program is less than 10 instances are already running and another group that has a higher limit. Multiple programs can use the same lock file to accommodate a maximum number of total instances among a group of programs. Lastly, a single program can be associated with multiple lock files to enforce a maximum number of instances per user, for instance.

The fcntl locking mechanism used here should work across NFS allowing multiple machines to use the same lock file to coordinate access across a network. I haven't tested this so if you try this with NFS, please share your experience in the comments.

No comments: