Buying notebooks …

Well, I’m trained as a scientist and engineer, so I keep a notebook. This is something I have done religiously since I was in grad school, much to my wife’s dismay.

Since 1991 I have loved the National brand Chemistry Notebook (number 43-571), but National was bought a few years ago and the new owners cut a stupid corner by reducing the notebook from 128 pages to 120. Worse yet, this notebook has become rather expensive to buy, costing upward of $10 per book. The pages are still numbered for me, but the reduction from 128 to 120 remains an irritant. national-brand-43-571-chemistry-notebook

So, when I recently changed jobs and, at the same time, ran out of notebooks I decided to switch to the Clairefontaine 9542C.  This is a smaller notebook with paper that is slightly more opaque and quadrille ruled 5×5 to the inch.

9542C_3

Oddly, despite the fact that it is made in France and described with metric dimensions (14.8 cm x 21 cm) the ruling is specified as 5×5 to the inch.  I agree that this is a convenient grid size for technical notebooks, but is there no metric ruling that matches?  0.5 cm come so mind, since that would end up very close to 5×5 to the inch, since 5 x 0.5 cm is 2.5 cm, and 2.54 cm is an inch.  Perhaps it is marketed as 0.5 cm square grid in Europe but as 5×5 to the inch in the US?

Anyway, I needed to buy some more of these notebooks.  Normally I pick them up from a stationery store near my apartment, but that is inconvenient and expensive.

I tried looking for them on Amazon (amazon.com, to be precise).  While I can find them, it’s hard to tell which product is being sold because Amazon’s product information for these Clairefontaine notebooks is dreadful.  And they’re expensive.

After being frustrated by the unusually low quality of Amazon’s offerings I tried searching Google for “clairefontaine 9542c”.  To my surprise, I found an amazon.de page near the top of the organic results.  Even more of a surprise was the fact that it was offering five of these lovely notebooks for about 10 euros, or only a little bit more than I was paying for one in the US.

Not reading German I decided to try amazon.co.uk.  There I found these notebooks, again better described, priced at ten pounds for a package of five.  I ordered two packages.  Even with shipping to the US these notebooks come out at about half the price that I pay for them in the US.

Simple Python __str__(self) method for use during development

For my development work I want a simple way to display the data in an object instance without having to modify the __str__(self) method every time I add, delete, or rename members. Here’s a technique I’ve adopted that relies on the fact that every object stores all of its members in a dictionary called self.__dict__. Making a string representation of the object is just a matter of returning a string representation of __dict__. This can be achieved in several ways. One of them is simply str(self.__dict__) and the other uses the JSON serializer json.dumps(), which lets you prettyprint the result.

Here’s a little Python demonstrator program:


# /usr/bin/python

""" demo - demonstrate a simple technique to display text representations
    of Python objects using the __dict__ member and a json serializer.

    $Id: demo.py,v 1.3 2015/07/18 13:07:15 marc Exp marc $
"""

import json

class something(object):
    """ This is just a demonstration class. """
    def __init__(self, id, name):
        self.id = id
        self.name = name

    def rename(self, name):
        self.name = name

    def __str__(self):
        return json.dumps(self.__dict__, indent=2, separators=(',', ': '))
        # return str(self.__dict__)

def main():
    o1 = something(1, "first object")
    o2 = something(2, "second object")

    print str(o1)
    print str(o2)

    o1.rename("dba third object")

    print str(o1)

if __name__ == '__main__':
        main()

Running it produces this output:


$ python demo.py
{
  "id": 1,
  "name": "first object"
}
{
  "id": 2,
  "name": "second object"
}
{
  "id": 1,
  "name": "dba third object"
}
 

Nice and easy for testing and debugging. Once I’m ready for production and no longer want the JSON representations I can introduce a DEBUG flag so that the non-DEBUG behavior of __str__(self) is appropriate to the production use.

[update]

What’s wrong with this?  If I have a member that is itself an object, then the json.dumps() call fails.  Ideally Python would call __str__() on a member if __str__() was called on the object.

On reading some more goodies, it’s clear that what I should be using is repr() and not str().

Economical NUC desktop running Ubuntu

The TV in the kitchen has long had a Mac Mini attached to one of its inputs. We used it to watch Youtube videos, listen to music from iTunes and Google Music, to browse the web, to show photographs from our trips, and so on.

Sadly, the little Mini passed away earlier this year, refusing to power up. When we priced out replacement machines we discovered that the new Minis were a lot more expensive, even if a the same time more capable.

2014-11-08-nuc-desktop

Given that we were not planning to store lots of data on the machine, we decided to leverage the lessons we had learned from building our little collection of NUC servers and design and build a small desktop on one of the NUC engines. We conducted some research and selected a machine sporting an i3 processor. The parts list we ended up with was:

  • Intel NUC DCCP847DYE [1 @ $ 146.22]
    • Intel Core i3 Processor
  • Crucial CT120M500SSD3 [1 @ $ 72.09]
    • 120GB mSATA SSD
  • Crucial CT25664BF160B [2 @ $ 20.97]
    • 2GB DDR3 1600 SODIMM 204-Pin 1.35V/1.5V Memory Module
  • Intel Network 7260.HMWG [1 @ $30.95]
    • WiFi and Bluetooth HMC
  • Belkin 6ft / 3 Prong Notebook Power Cord [1 @ $6.53]

Which brought the total expense to $ 297.73, substantially cheaper than the more highly configured i5-based servers that we described in a previous post.

We ordered the parts from Amazon and they arrived a few days later.

The next step was to get the BIOS patches needed for the machine and an install image.

The new BIOS image came from the Intel site.  Note that the BIOS for the DYE line is different from that in the i5-based WYK line that we used for the servers.  The BIOS patch that we downloaded is named gk0054.bio and we found it on an Intel page (easier to find with a search engine than with the Intel site navigation tools, but easy either way).

The Ubuntu desktop image is on the Ubuntu site … they ask you for a donation (give one if you can afford it, please).

The, by now familiar, steps to create an installable image on a USB flash drive are:

> diskutil list
> hdiutil convert -format UDRW -o ubuntu-14.04.1-desktop-amd64.img ubuntu-14.04.1-desktop-amd64.iso 
> diskutil unmountDisk /dev/disk2
> sudo dd if=ubuntu-14.04.1-desktop-amd64.img.dmg of=/dev/rdisk2 bs=1m

Where /dev/disk2 and /dev/rdisk2 are identified from examination of the output of the diskutil list call.

That done, we recorded the MAC address from the NUC packaging and updated our DHCP and DNS configurations so that the machine would get its host name and IP address from our infrastructure.

A couple of important differences between building a desktop and a server:

  • We added the WiFi and Bluetooth network card to the machine.  We did not use the WiFi capability, since we were installing the machine in a location with good hard-wired Ethernet connectivity, but we did plan to use a Bluetooth keyboard and mouse on the machine.
  • The desktop install image for Ubuntu 14.04 is big, about 1/3 larger than the server image.  The first device we used for the install was the same 1G drive that I had used for my initial server installs, before I got the network install working.  What we didn’t realize, and dd did not tell us, is that the image was too big for the 1G drive.  When we tried to do the install the first time we got a cryptic error message from the BIOS.  It took us a while, stumbling around in the dark, to realize that the install image was too big for the drive we were using.  After we rebuilt the install image on a 32G drive we had in a drawer, the install proceeded without error.

After the installation completed we had trouble getting the Bluetooth keyboard and mouse to work well.  The machine ultimately paired with the keyboard, but we could not get input to it.

We then thought back on some of the information we’d seen for our earlier NUC research and verified that the machine actually has an integrated antenna.  We opened up the case and found the antenna wires, which we connected to the wireless card as shown in this picture:

nuc-antenna-wires-connected

Shortly after we were logged on to the machine.  We installed Chrome and connected up to a Google Music library and were playing music as background to a photo slide show within a few minutes.

The only remaining problem is that the Apple Wireless Trackpad that we’re using seems to regularly stop talking to the machine.  The pointer freezes and we’re left using the tab key to navigate the fields of the active window.

Adding CPUInfo to Sysinfo

There is a lot of interesting information about the processor hardware in /proc/cpuinfo. Here is a little bit from one of my NUC servers:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 69
model name	: Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz
stepping	: 1
microcode	: 0x16
cpu MHz		: 779.000
cache size	: 3072 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
bogomips	: 3791.14
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

The content of “cat /proc/cpuinfo” is actually four copies this, with small variations in core id (ranging between 0 and 1), the processor (ranging between 0 and 3), and the apcid (ranging from 0 to 3).

In order to add this information to my sysinfo.py I wrote a new module, cpuinfo.py, modeled on the df.py module that I used to add filesystem information.

""" Parse the content of /proc/cpuinfo and create JSON objects for each cpu

Written by Marc Donner
$Id: cpuinfo.py,v 1.7 2014/11/06 18:25:30 marc Exp marc $

"""

import subprocess
import json
import re

def main():
    """Main routine"""
    print CPUInfo().to_json()
    return

# Utility routine ...
#
# The /proc/cpuinfo content is a set of (attribute, value records)
# the separator between attribute and value is "/t+: "
#
# When there are multiple CPUs, there's a blank line between sets
# of lines.
#

class CPUInfo(object):
    """ An object with key data from the content of the /proc/cpuinfo file """

    def __init__(self):
        self.cpus = {}
        self.populated = False

    def to_json(self):
        """ Display the object as a JSON string (prettyprinted) """
        if not self.populated:
            self.populate()
        return json.dumps(self.cpus, sort_keys=True, indent=2)

    def get_array(self):
        """ return the array of cpus """
        if not self.populated:
            self.populate()
        return self.cpus["processors"]

    def populate(self):
        """ get the content of /proc/cpuinfo and populate the arrays """
        self.cpus["processors"] = []
        cpu = {}
        cpu["processor"] = {}
        text = str(subprocess.check_output(["cat", "/proc/cpuinfo"])).rstrip()
        lines = text.split('\n')
        # Use re.split because there's a varying number of tabs :-(
        array = [re.split('\t+: ', x) for x in lines]
        # cpuinfo is structured as n blocks of data, one per logical processor
        # o each block has the processor id (0, 1, ...) as its first row.
        # o each block ends with a blank row
        # o some of the rows have attributes but no values
        #  (e.g. power_management)
        for row in range(0, len(array[:])):
            # New processor detected - attach this one to the output, then
            if len(lines[row]) == 0:
                # create a new processor
                self.cpus["processors"].append(cpu)
                cpu = {}
                cpu["processor"] = {}
            if len(array[row]) == 2:
                (attribute, value) = array[row]
                attribute = attribute.replace(" ", "_")
                cpu["processor"][attribute] = value
        self.cpus["processors"].append(cpu)
        self.populated = True

if __name__ == '__main__':
    main()

The state machine implicit in the main loop of populate() is plausibly efficient, though there remains something about it that annoys me. I need to think about edge cases and failure modes to see whether I can make it better.

The result is an augmented json object including info on the logical processors:

cat crepe.sysinfo 
{
  "boot_time": "system boot  2014-09-14 16:03", 
  "bufferram": 193994752, 
  "distro_codename": "trusty", 
  "distro_description": "Ubuntu 14.04.1 LTS", 
  "distro_distributor": "Ubuntu", 
  "distro_release": "14.04", 
  "filesystems": [
    {
      "filesystem": {
        "mount_point": "/", 
        "name": "/dev/sda1", 
        "size": "444919888", 
        "used": "3038660"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/sys/fs/cgroup", 
        "name": "none", 
        "size": "4", 
        "used": "0"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/dev", 
        "name": "udev", 
        "size": "8169708", 
        "used": "4"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/run", 
        "name": "tmpfs", 
        "size": "1636112", 
        "used": "564"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/run/lock", 
        "name": "none", 
        "size": "5120", 
        "used": "0"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/run/shm", 
        "name": "none", 
        "size": "8180548", 
        "used": "4"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/run/user", 
        "name": "none", 
        "size": "102400", 
        "used": "0"
      }
    }
  ], 
  "freeram": 12954943488, 
  "freeswap": 17103319040, 
  "hardware_platform": "x86_64", 
  "kernel_name": "Linux", 
  "kernel_release": "3.13.0-35-generic", 
  "kernel_version": "#62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014", 
  "machine": "x86_64", 
  "mem_unit": 1, 
  "nodename": "crepe", 
  "operating_system": "GNU/Linux", 
  "processor": "x86_64", 
  "processors": [
    {
      "processor": {
        "address_sizes": "39 bits physical, 48 bits virtual", 
        "apicid": "0", 
        "bogomips": "3791.14", 
        "cache_alignment": "64", 
        "cache_size": "3072 KB", 
        "clflush_size": "64", 
        "core_id": "0", 
        "cpu_MHz": "779.000", 
        "cpu_cores": "2", 
        "cpu_family": "6", 
        "cpuid_level": "13", 
        "flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid", 
        "fpu": "yes", 
        "fpu_exception": "yes", 
        "initial_apicid": "0", 
        "microcode": "0x16", 
        "model": "69", 
        "model_name": "Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz", 
        "physical_id": "0", 
        "processor": "0", 
        "siblings": "4", 
        "stepping": "1", 
        "vendor_id": "GenuineIntel", 
        "wp": "yes"
      }
    }, 
    {
      "processor": {
        "address_sizes": "39 bits physical, 48 bits virtual", 
        "apicid": "2", 
        "bogomips": "3791.14", 
        "cache_alignment": "64", 
        "cache_size": "3072 KB", 
        "clflush_size": "64", 
        "core_id": "1", 
        "cpu_MHz": "779.000", 
        "cpu_cores": "2", 
        "cpu_family": "6", 
        "cpuid_level": "13", 
        "flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid", 
        "fpu": "yes", 
        "fpu_exception": "yes", 
        "initial_apicid": "2", 
        "microcode": "0x16", 
        "model": "69", 
        "model_name": "Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz", 
        "physical_id": "0", 
        "processor": "1", 
        "siblings": "4", 
        "stepping": "1", 
        "vendor_id": "GenuineIntel", 
        "wp": "yes"
      }
    }, 
    {
      "processor": {
        "address_sizes": "39 bits physical, 48 bits virtual", 
        "apicid": "1", 
        "bogomips": "3791.14", 
        "cache_alignment": "64", 
        "cache_size": "3072 KB", 
        "clflush_size": "64", 
        "core_id": "0", 
        "cpu_MHz": "779.000", 
        "cpu_cores": "2", 
        "cpu_family": "6", 
        "cpuid_level": "13", 
        "flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid", 
        "fpu": "yes", 
        "fpu_exception": "yes", 
        "initial_apicid": "1", 
        "microcode": "0x16", 
        "model": "69", 
        "model_name": "Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz", 
        "physical_id": "0", 
        "processor": "2", 
        "siblings": "4", 
        "stepping": "1", 
        "vendor_id": "GenuineIntel", 
        "wp": "yes"
      }
    }, 
    {
      "processor": {
        "address_sizes": "39 bits physical, 48 bits virtual", 
        "apicid": "3", 
        "bogomips": "3791.14", 
        "cache_alignment": "64", 
        "cache_size": "3072 KB", 
        "clflush_size": "64", 
        "core_id": "1", 
        "cpu_MHz": "1000.000", 
        "cpu_cores": "2", 
        "cpu_family": "6", 
        "cpuid_level": "13", 
        "flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid", 
        "fpu": "yes", 
        "fpu_exception": "yes", 
        "initial_apicid": "3", 
        "microcode": "0x16", 
        "model": "69", 
        "model_name": "Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz", 
        "physical_id": "0", 
        "processor": "3", 
        "siblings": "4", 
        "stepping": "1", 
        "vendor_id": "GenuineIntel", 
        "wp": "yes"
      }
    }
  ], 
  "report_date": "2014-11-06 13:27:06", 
  "sharedram": 0, 
  "totalhigh": 0, 
  "totalram": 16753766400, 
  "totalswap": 17103319040, 
  "uptime": 4573401
}

I am tempted to augment the module with a configuration capability that would let me set sysinfo up to restrict the set of data from /dev/cpuinfo that I actually include in the sysinfo structure. Do I need “fpu” and “fpu_exception” or “clflush_size” for the things that I will be using the sysinfo stuff for? I’m skeptical. If I make it a configurable filter I can always incorporate data elements after I decide they’re interesting.

Decisions, decisions.

Moreover, the multiple repetition of the CPU information is annoying. The four attributes that vary are, processor, core id, apicid, and initial apicid. The values are structured thus (initial apicid seems never to vary from apicid):

processor core id apicid
0 0 0
1 1 2
2 0 1
3 1 3

It would be much more sensible to reduce the size and complexity of the processors section by consolidating the common parts and displaying the variant sections in some sensible subsidiary fashion.

These items are discussed in this Intel web page.

JSON output from DF

So I’m adding more capabilities to my sysinfo.py program. The next thing that I want to do is get a JSON result from df. This is a function whose description, from the man page, says “report file system disk space usage”.

Here is a sample of the output of df for one of my systems:

Filesystem                1K-blocks    Used Available Use% Mounted on
/dev/mapper/flapjack-root 959088096 3802732 906566516   1% /
udev                        1011376       4   1011372   1% /dev
tmpfs                        204092     288    203804   1% /run
none                           5120       0      5120   0% /run/lock
none                        1020452       0   1020452   0% /run/shm
/dev/sda1                    233191   50734    170016  23% /boot

So I started by writing a little Python program that used the subprocess.check_output() method to capture the output of df.

This went through various iterations and ended up with this single line of python code, which requires eleven lines of comments to explain it:

#
# this next line of code is pretty tense ... let me explain what
# it does:
# subprocess.check_output(["df"]) runs the df command and returns
#     the output as a string
# rstrip() trims of the last whitespace character, which is a '\n'
# split('\n') breaks the string at the newline characters ... the
#     result is an array of strings
# the list comprehension then applies shlex.split() to each string,
#     breaking each into tokens
# when we're done, we have a two-dimensional array with rows of
# tokens and we're ready to make objects out of them
#
df_array = [shlex.split(x) for x in
            subprocess.check_output(["df"]).rstrip().split('\n')]

My original df.py code constructed the JSON result manually, a painfully finicky process. After I got it running I remembered a lesson I learned from my dear friend the late David Nochlin, namely that I should construct an object and then use a rendering library to create the JSON serialization.

So I did some digging around and discovered that the Python json library includes a fairly sensible serialization method that supports prettyprinting of the result. The result was a much cleaner piece of code:

# df.py
#
# parse the output of df and create JSON objects for each filesystem.
#
# $Id: df.py,v 1.5 2014/09/03 00:41:31 marc Exp $
#

# now let's parse the output of df to get filesystem information
#
# Filesystem                1K-blocks    Used Available Use% Mounted on
# /dev/mapper/flapjack-root 959088096 3799548 906569700   1% /
# udev                        1011376       4   1011372   1% /dev
# tmpfs                        204092     288    203804   1% /run
# none                           5120       0      5120   0% /run/lock
# none                        1020452       0   1020452   0% /run/shm
# /dev/sda1                    233191   50734    170016  23% /boot

import subprocess
import shlex
import json

def main():
    """Main routine - call the df utility and return a json structure."""

    # this next line of code is pretty tense ... let me explain what
    # it does:
    # subprocess.check_output(["df"]) runs the df command and returns
    #     the output as a string
    # rstrip() trims of the last whitespace character, which is a '\n'
    # split('\n') breaks the string at the newline characters ... the
    #     result is an array of strings
    # the list comprehension then applies shlex.split() to each string,
    #     breaking each into tokens
    # when we're done, we have a two-dimensional array with rows of
    # tokens and we're ready to make objects out of them
    df_array = [shlex.split(x) for x in
                subprocess.check_output(["df"]).rstrip().split('\n')]
    df_num_lines = df_array[:].__len__()

    df_json = {}
    df_json["filesystems"] = []
    for row in range(1, df_num_lines):
        df_json["filesystems"].append(df_to_json(df_array[row]))
    print json.dumps(df_json, sort_keys=True, indent=2)
    return

def df_to_json(tokenList):
    """Take a list of tokens from df and return a python object."""
    # If df's ouput format changes, we'll be in trouble, of course.
    # the 0 token is the name of the filesystem
    # the 1 token is the size of the filesystem in 1K blocks
    # the 2 token is the amount used of the filesystem
    # the 5 token is the mount point
    result = {}
    fsName = tokenList[0]
    fsSize = tokenList[1]
    fsUsed = tokenList[2]
    fsMountPoint = tokenList[5]
    result["filesystem"] = {}
    result["filesystem"]["name"] = fsName
    result["filesystem"]["size"] = fsSize
    result["filesystem"]["used"] = fsUsed
    result["filesystem"]["mount_point"] = fsMountPoint
    return result

if __name__ == '__main__':
    main()

which, in turn, produces a rather nice df output in JSON.

{
  "filesystems": [
    {
      "filesystem": {
        "mount_point": "/", 
        "name": "/dev/mapper/flapjack-root", 
        "size": "959088096", 
        "used": "3802632"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/dev", 
        "name": "udev", 
        "size": "1011376", 
        "used": "4"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/run", 
        "name": "tmpfs", 
        "size": "204092", 
        "used": "288"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/run/lock", 
        "name": "none", 
        "size": "5120", 
        "used": "0"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/run/shm", 
        "name": "none", 
        "size": "1020452", 
        "used": "0"
      }
    }, 
    {
      "filesystem": {
        "mount_point": "/boot", 
        "name": "/dev/sda1", 
        "size": "233191", 
        "used": "50734"
      }
    }
  ]
}

Quite a lot of fun, really.

Automatic Inventory

Now I have four machines.  Keeping them in sync is the challenge.  Worse yet, knowing whether they are in sync or out of sync is a challenge.

So the first step is to make a tool to inventory each machine.  In order to use the inventory utility in a scalable way, I want to design it to produce machine-readable results so that I can easily incorporate them into whatever I need.

What I want is a representation that is both friendly to humans and to computers.  This suggests a self-describing text representation like XML or JSON.  After a little thought I picked JSON.

What sorts of things do I want to know about the machine?  Well, let’s start with the hardware and the operating system software plus things like the quantity of RAM and other system resources.  Some of that information is available from uname and other is availble from the sysinfo(2) function.

To get the information from the sysinfo(2) function I had to do several things:

  • Install sysinfo on each machine
    • sudo apt-get install sysinfo
  • Write a little program to call sysinfo(2) and report out the results
    • getSysinfo.c

Of course this program, getSysinfo.c is a quick-and-dirty – the error handling is almost nonexistent and I ought to have generalized the mechanism to work from a data structure that includes the name of the flag and the attribute name and doesn’t have the clumsy sequence of if statements.

/*
 * getSysinfo.c
 *
 * $Id: getSysinfo.c,v 1.4 2014/08/31 17:29:43 marc Exp $
 *
 * Started 2014-08-31 by Marc Donner
 *
 * Using the sysinfo(2) call to report on system information
 *
 */

#include <stdio.h> /* for printf */
#include <stdlib.h> /* for exit */
#include <unistd.h> /* for getopt */
#include <sys/sysinfo.h> /* for sysinfo */

int main(int argc, char **argv) {

   /* Call the sysinfo(2) system call with a pointer to a structure */
   /* and then display the results */
   struct sysinfo toDisplay;
   int rc;

   if ( rc = sysinfo(&toDisplay) ) {
      printf("  rc: %d\n", rc);
      exit(rc);
   }

   int c;
   int opt_a = 0;
   int opt_b = 0;
   int opt_f = 0;
   int opt_g = 0;
   int opt_h = 0;
   int opt_m = 0;
   int opt_r = 0;
   int opt_s = 0;
   int opt_u = 0;
   int opt_w = 0;
   int opt_help = 0;
   int opt_none = 1;

   while ( (c = getopt(argc, argv, "abfghmrsuw?")) != -1) {
      opt_none = 0;
      switch (c) {
         case 'a':
            opt_a = 1;
            break;
         case 'b':
            opt_b = 1;
            break;
         case 'f':
            opt_f = 1;
            break;
         case 'g':
            opt_g = 1;
            break;
         case 'h':
            opt_h = 1;
            break;
         case 'm':
            opt_m = 1;
            break;
         case 'r':
            opt_r = 1;
            break;
         case 's':
            opt_s = 1;
            break;
         case 'u':
            opt_u = 1;
            break;
         case 'w':
            opt_w = 1;
            break;
         case '?':
            opt_help = 1;
            break;
      }
   }

   if ( opt_none || opt_help ) {
      showHelp();
      return 100;
   } else {
      if ( opt_u || opt_a ) { printf("  \"uptime\": %lu\n", toDisplay.uptime); }
      if ( opt_r || opt_a ) { printf("  \"totalram\": %lu\n", toDisplay.totalram); }
      if ( opt_f || opt_a ) { printf("  \"freeram\": %lu\n", toDisplay.freeram); }
      if ( opt_b || opt_a ) { printf("  \"bufferram\": %lu\n", toDisplay.bufferram); }
      if ( opt_s || opt_a ) { printf("  \"sharedram\": %lu\n", toDisplay.sharedram); }
      if ( opt_w || opt_a ) { printf("  \"totalswap\": %lu\n", toDisplay.totalswap); }
      if ( opt_g || opt_a ) { printf("  \"freeswap\": %lu\n", toDisplay.freeswap); }
      if ( opt_h || opt_a ) { printf("  \"totalhigh\": %lu\n", toDisplay.totalhigh); }
      if ( opt_m || opt_a ) { printf("  \"mem_unit\": %d\n", toDisplay.mem_unit); }
      return 0;
   }
}

int showHelp() {
   printf( "Syntax: getSysinfo [options]\n" );
   printf( "\nDisplay results from the sysinfo(2) result structure\n\n" );
   printf( "Options:\n" );
   printf( " -b : bufferram\n" );
   printf( " -f : freeram\n" );
   printf( " -g : freeswap\n" );
   printf( " -h : totalhigh\n" );
   printf( " -m : mem_unit\n" );
   printf( " -r : totalram\n" );
   printf( " -s : sharedram\n" );
   printf( " -u : uptime\n" );
   printf( " -w : totalswap\n\n" );
   printf( "getSysinfo also accepts arbitrary combinations of permitted options." );
   return 100;
}

And with this in place, the python program sysinfo.py required to pull together various other bits and pieces becomes possible:

#
# sysinfo
#
# report a JSON object describing the current system
#
# $Id: sysinfo.py,v 1.8 2014/08/31 21:04:30 marc Exp $
#

from subprocess import call
from subprocess import check_output
import time

# First we get the uname information
#
# kernel_name : -s
# nodename : -n
# kernel_release : -r
# kernel_version : -v
# machine : -m
# processor : -p
# hardware_platform : -i
# operating_system : -o
#

operating_system = check_output( ["uname", "-o"] ).rstrip()
kernel_name = check_output( ["uname", "-s"] ).rstrip()
kernel_release = check_output( ["uname", "-r"] ).rstrip()
kernel_version = check_output( ["uname", "-v"] ).rstrip()
nodename = check_output( ["uname", "-n"] ).rstrip()
machine = check_output( ["uname", "-m"] ).rstrip()
processor = check_output( ["uname", "-p"] ).rstrip()
hardware_platform = check_output( ["uname", "-i"] ).rstrip()

# now we get the boot time using who -b
boot_time = check_output( ["who", "-b"]).rstrip().lstrip()

# now we get information from our handy-dandy getSysinfo program
GETSYSINFO = "/home/marc/projects/s/sysinfo/getSysinfo"
getsysinfo_uptime = check_output( [GETSYSINFO, "-u"] ).rstrip().lstrip()
getsysinfo_totalram = check_output( [GETSYSINFO, "-r"] ).rstrip().lstrip()
getsysinfo_freeram = check_output( [GETSYSINFO, "-f"] ).rstrip().lstrip()
getsysinfo_bufferrram = check_output( [GETSYSINFO, "-b"] ).rstrip().lstrip()
getsysinfo_sharedram = check_output( [GETSYSINFO, "-s"] ).rstrip().lstrip()
getsysinfo_totalswap = check_output( [GETSYSINFO, "-w"] ).rstrip().lstrip()
getsysinfo_freeswap = check_output( [GETSYSINFO, "-g"] ).rstrip().lstrip()
getsysinfo_totalhigh = check_output( [GETSYSINFO, "-h"] ).rstrip().lstrip()
getsysinfo_mem_unit = check_output( [GETSYSINFO, "-m"] ).rstrip().lstrip()

print "{"
print "  \"report_date\": \"" + time.strftime("%Y-%m-%d %H:%M:%S") + "\","
print "  \"operating_system\": " + "\"" + operating_system + "\","
print "  \"kernel_name\": " + "\"" + kernel_name + "\","
print "  \"kernel_release\": " + "\"" + kernel_release + "\","
print "  \"kernel_version\": " + "\"" + kernel_version + "\","
print "  \"nodename\": " + "\"" + nodename + "\","
print "  \"machine\": " + "\"" + machine + "\","
print "  \"processor\": " + "\"" + processor + "\","
print "  \"hardware_platform\": " + "\"" + hardware_platform + "\","
print "  \"boot_time\": " + "\"" + boot_time + "\","
print "  " + getsysinfo_uptime + ","
print "  " + getsysinfo_totalram + ","
print "  " + getsysinfo_freeram + ","
print "  " + getsysinfo_sharedram + ","
print "  " + getsysinfo_totalswap + ","
print "  " + getsysinfo_totalhigh + ","
print "  " + getsysinfo_freeswap + ","
print "  " + getsysinfo_mem_unit
print "}"

which in turn enables the Makefile:

#
# Makefile for sysinfo
#
# $Id: Makefile,v 1.9 2014/08/31 21:27:35 marc Exp $
#

FORCE := force

HOST := $(shell hostname)
HOSTS := flapjack waffle pancake frenchtoast
SSH_FILES := $(HOSTS:%=.%_ssh)
PUSH_HOSTS := $(filter-out ${HOST}, ${HOSTS})
PUSH_FILES := $(PUSH_HOSTS:%=.%_push)

help: ${FORCE}
	cat Makefile

FILES := Makefile sysinfo.py sysinfo.bash getSysinfo.c

checkin: ${FILES}
	ci -l ${FILES}

install: ~/bin/sysinfo

~/bin/sysinfo: ./sysinfo.bash
	cp $< $@
	chmod +x $@

getSysinfo: getSysinfo.c
	cc $ $*.sysinfo
	touch $@

test: ${FORCE}
	time python sysinfo.py

force:

Notice the little trick with the Makefile variables HOST, HOSTS, SSH_FILES, PUSH_HOSTS, and PUSH_FILES that lets one host push to the others for distributing the code but lets it call on all of the hosts when gathering data.

With all of this machinery in place and distributed to all of the UNIX machines in my little network, I was now able to type ‘make ssh’ and get the resulting output:

marc@flapjack:~/projects/s/sysinfo$ more *.sysinfo
::::::::::::::
flapjack.sysinfo
::::::::::::::
{
  "report_date": "2014-09-01 10:37:30",
  "operating_system": "GNU/Linux",
  "kernel_name": "Linux",
  "kernel_release": "3.2.0-52-generic",
  "kernel_version": "#78-Ubuntu SMP Fri Jul 26 16:21:44 UTC 2013",
  "nodename": "flapjack",
  "machine": "x86_64",
  "processor": "x86_64",
  "hardware_platform": "x86_64",
  "boot_time": "system boot  2014-08-07 22:01",
  "uptime": 2118958,
  "totalram": 2089889792,
  "freeram": 145928192,
  "sharedram": 0,
  "totalswap": 2134896640,
  "totalhigh": 0,
  "freeswap": 2062192640,
  "mem_unit": 1
}
::::::::::::::
frenchtoast.sysinfo
::::::::::::::
{
  "report_date": "2014-09-01 10:37:31",
  "operating_system": "GNU/Linux",
  "kernel_name": "Linux",
  "kernel_release": "3.13.0-32-generic",
  "kernel_version": "#57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014",
  "nodename": "frenchtoast",
  "machine": "x86_64",
  "processor": "x86_64",
  "hardware_platform": "x86_64",
  "boot_time": "system boot  2014-07-19 14:58",
  "uptime": 3785970,
  "totalram": 16753840128,
  "freeram": 14150377472,
  "sharedram": 0,
  "totalswap": 17103319040,
  "totalhigh": 0,
  "freeswap": 17103319040,
  "mem_unit": 1
}
::::::::::::::
pancake.sysinfo
::::::::::::::
{
  "report_date": "2014-09-01 10:37:31",
  "operating_system": "GNU/Linux",
  "kernel_name": "Linux",
  "kernel_release": "3.13.0-35-generic",
  "kernel_version": "#62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014",
  "nodename": "pancake",
  "machine": "x86_64",
  "processor": "x86_64",
  "hardware_platform": "x86_64",
  "boot_time": "system boot  2014-08-31 09:06",
  "uptime": 91840,
  "totalram": 16753819648,
  "freeram": 15609884672,
  "sharedram": 0,
  "totalswap": 17104367616,
  "totalhigh": 0,
  "freeswap": 17104367616,
  "mem_unit": 1
}
::::::::::::::
waffle.sysinfo
::::::::::::::
{
  "report_date": "2014-09-01 10:37:30",
  "operating_system": "GNU/Linux",
  "kernel_name": "Linux",
  "kernel_release": "3.13.0-35-generic",
  "kernel_version": "#62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014",
  "nodename": "waffle",
  "machine": "x86_64",
  "processor": "x86_64",
  "hardware_platform": "x86_64",
  "boot_time": "system boot  2014-08-31 09:07",
  "uptime": 91784,
  "totalram": 16752275456,
  "freeram": 15594139648,
  "sharedram": 0,
  "totalswap": 17104367616,
  "totalhigh": 0,
  "freeswap": 17104367616,
  "mem_unit": 1
}

So now I have the beginning of a structured inventory of all of my machines, and an easy way to scale it up.

Follow

Get every new post delivered to your Inbox.