JSON output from DF

So I’m adding more capabilities to my sysinfo.py program. The next thing that I want to do is get a JSON result from df. This is a function whose description, from the man page, says “report file system disk space usage”.

Here is a sample of the output of df for one of my systems:

Filesystem                1K-blocks    Used Available Use% Mounted on
/dev/mapper/flapjack-root 959088096 3802732 906566516   1% /
udev                        1011376       4   1011372   1% /dev
tmpfs                        204092     288    203804   1% /run
none                           5120       0      5120   0% /run/lock
none                        1020452       0   1020452   0% /run/shm
/dev/sda1                    233191   50734    170016  23% /boot

So I started by writing a little Python program that used the subprocess.check_output() method to capture the output of df.

This went through various iterations and ended up with this single line of python code, which requires eleven lines of comments to explain it:

#
# this next line of code is pretty tense ... let me explain what
# it does:
# subprocess.check_output(["df"]) runs the df command and returns
#     the output as a string
# rstrip() trims of the last whitespace character, which is a '\n'
# split('\n') breaks the string at the newline characters ... the
#     result is an array of strings
# the list comprehension then applies shlex.split() to each string,
#     breaking each into tokens
# when we're done, we have a two-dimensional array with rows of
# tokens and we're ready to make objects out of them
#
df_array = [shlex.split(x) for x in
            subprocess.check_output(["df"]).rstrip().split('\n')]

My original df.py code constructed the JSON result manually, a painfully finicky process. After I got it running I remembered a lesson I learned from my dear friend the late David Nochlin, namely that I should construct an object and then use a rendering library to create the JSON serialization.

So I did some digging around and discovered that the Python json library includes a fairly sensible serialization method that supports prettyprinting of the result. The result was a much cleaner piece of code:

# df.py
#
# parse the output of df and create JSON objects for each filesystem.
#
# $Id: df.py,v 1.5 2014/09/03 00:41:31 marc Exp $
#

# now let's parse the output of df to get filesystem information
#
# Filesystem                1K-blocks    Used Available Use% Mounted on
# /dev/mapper/flapjack-root 959088096 3799548 906569700   1% /
# udev                        1011376       4   1011372   1% /dev
# tmpfs                        204092     288    203804   1% /run
# none                           5120       0      5120   0% /run/lock
# none                        1020452       0   1020452   0% /run/shm
# /dev/sda1                    233191   50734    170016  23% /boot

import subprocess
import shlex
import json

def main():
    """Main routine - call the df utility and return a json structure."""

    # this next line of code is pretty tense ... let me explain what
    # it does:
    # subprocess.check_output(["df"]) runs the df command and returns
    #     the output as a string
    # rstrip() trims of the last whitespace character, which is a '\n'
    # split('\n') breaks the string at the newline characters ... the
    #     result is an array of strings
    # the list comprehension then applies shlex.split() to each string,
    #     breaking each into tokens
    # when we're done, we have a two-dimensional array with rows of
    # tokens and we're ready to make objects out of them
    df_array = [shlex.split(x) for x in
                subprocess.check_output(["df"]).rstrip().split('\n')]
    df_num_lines = df_array[:].__len__()

    df_json = {}
    df_json["filesystems"] = []
    for row in range(1, df_num_lines):
        df_json["filesystems"].append(df_to_json(df_array[row]))
    print json.dumps(df_json, sort_keys=True, indent=2)
    return

def df_to_json(tokenList):
    """Take a list of tokens from df and return a python object."""
    # If df's ouput format changes, we'll be in trouble, of course.
    # the 0 token is the name of the filesystem
    # the 1 token is the size of the filesystem in 1K blocks
    # the 2 token is the amount used of the filesystem
    # the 5 token is the mount point
    result = {}
    fsName = tokenList[0]
    fsSize = tokenList[1]
    fsUsed = tokenList[2]
    fsMountPoint = tokenList[5]
    result["filesystem"] = {}
    result["filesystem"]["name"] = fsName
    result["filesystem"]["size"] = fsSize
    result["filesystem"]["used"] = fsUsed
    result["filesystem"]["mount_point"] = fsMountPoint
    return result

if __name__ == '__main__':
    main()

which, in turn, produces a rather nice df output in JSON.

{
  "filesystems": [
    {
      "filesystem": {
        "mount_point": "/", 
        "name": "/dev/mapper/flapjack-root", 
        "size": "959088096", 
        "used": "3802632"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/dev", 
        "name": "udev", 
        "size": "1011376", 
        "used": "4"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/run", 
        "name": "tmpfs", 
        "size": "204092", 
        "used": "288"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/run/lock", 
        "name": "none", 
        "size": "5120", 
        "used": "0"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/run/shm", 
        "name": "none", 
        "size": "1020452", 
        "used": "0"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/boot", 
        "name": "/dev/sda1", 
        "size": "233191", 
        "used": "50734"
      }
    }
  ]
}

Quite a lot of fun, really.