Automatic Inventory

Now I have four machines.  Keeping them in sync is the challenge.  Worse yet, knowing whether they are in sync or out of sync is a challenge.

So the first step is to make a tool to inventory each machine.  In order to use the inventory utility in a scalable way, I want to design it to produce machine-readable results so that I can easily incorporate them into whatever I need.

What I want is a representation that is both friendly to humans and to computers.  This suggests a self-describing text representation like XML or JSON.  After a little thought I picked JSON.

What sorts of things do I want to know about the machine?  Well, let’s start with the hardware and the operating system software plus things like the quantity of RAM and other system resources.  Some of that information is available from uname and other is availble from the sysinfo(2) function.

To get the information from the sysinfo(2) function I had to do several things:

  • Install sysinfo on each machine
    • sudo apt-get install sysinfo
  • Write a little program to call sysinfo(2) and report out the results
    • getSysinfo.c

Of course this program, getSysinfo.c is a quick-and-dirty – the error handling is almost nonexistent and I ought to have generalized the mechanism to work from a data structure that includes the name of the flag and the attribute name and doesn’t have the clumsy sequence of if statements.

/*
 * getSysinfo.c
 *
 * $Id: getSysinfo.c,v 1.4 2014/08/31 17:29:43 marc Exp $
 *
 * Started 2014-08-31 by Marc Donner
 *
 * Using the sysinfo(2) call to report on system information
 *
 */

#include <stdio.h> /* for printf */
#include <stdlib.h> /* for exit */
#include <unistd.h> /* for getopt */
#include <sys/sysinfo.h> /* for sysinfo */

int main(int argc, char **argv) {

   /* Call the sysinfo(2) system call with a pointer to a structure */
   /* and then display the results */
   struct sysinfo toDisplay;
   int rc;

   if ( rc = sysinfo(&toDisplay) ) {
      printf("  rc: %d\n", rc);
      exit(rc);
   }

   int c;
   int opt_a = 0;
   int opt_b = 0;
   int opt_f = 0;
   int opt_g = 0;
   int opt_h = 0;
   int opt_m = 0;
   int opt_r = 0;
   int opt_s = 0;
   int opt_u = 0;
   int opt_w = 0;
   int opt_help = 0;
   int opt_none = 1;

   while ( (c = getopt(argc, argv, "abfghmrsuw?")) != -1) {
      opt_none = 0;
      switch (c) {
         case 'a':
            opt_a = 1;
            break;
         case 'b':
            opt_b = 1;
            break;
         case 'f':
            opt_f = 1;
            break;
         case 'g':
            opt_g = 1;
            break;
         case 'h':
            opt_h = 1;
            break;
         case 'm':
            opt_m = 1;
            break;
         case 'r':
            opt_r = 1;
            break;
         case 's':
            opt_s = 1;
            break;
         case 'u':
            opt_u = 1;
            break;
         case 'w':
            opt_w = 1;
            break;
         case '?':
            opt_help = 1;
            break;
      }
   }

   if ( opt_none || opt_help ) {
      showHelp();
      return 100;
   } else {
      if ( opt_u || opt_a ) { printf("  \"uptime\": %lu\n", toDisplay.uptime); }
      if ( opt_r || opt_a ) { printf("  \"totalram\": %lu\n", toDisplay.totalram); }
      if ( opt_f || opt_a ) { printf("  \"freeram\": %lu\n", toDisplay.freeram); }
      if ( opt_b || opt_a ) { printf("  \"bufferram\": %lu\n", toDisplay.bufferram); }
      if ( opt_s || opt_a ) { printf("  \"sharedram\": %lu\n", toDisplay.sharedram); }
      if ( opt_w || opt_a ) { printf("  \"totalswap\": %lu\n", toDisplay.totalswap); }
      if ( opt_g || opt_a ) { printf("  \"freeswap\": %lu\n", toDisplay.freeswap); }
      if ( opt_h || opt_a ) { printf("  \"totalhigh\": %lu\n", toDisplay.totalhigh); }
      if ( opt_m || opt_a ) { printf("  \"mem_unit\": %d\n", toDisplay.mem_unit); }
      return 0;
   }
}

int showHelp() {
   printf( "Syntax: getSysinfo [options]\n" );
   printf( "\nDisplay results from the sysinfo(2) result structure\n\n" );
   printf( "Options:\n" );
   printf( " -b : bufferram\n" );
   printf( " -f : freeram\n" );
   printf( " -g : freeswap\n" );
   printf( " -h : totalhigh\n" );
   printf( " -m : mem_unit\n" );
   printf( " -r : totalram\n" );
   printf( " -s : sharedram\n" );
   printf( " -u : uptime\n" );
   printf( " -w : totalswap\n\n" );
   printf( "getSysinfo also accepts arbitrary combinations of permitted options." );
   return 100;
}

And with this in place, the python program sysinfo.py required to pull together various other bits and pieces becomes possible:

#
# sysinfo
#
# report a JSON object describing the current system
#
# $Id: sysinfo.py,v 1.8 2014/08/31 21:04:30 marc Exp $
#

from subprocess import call
from subprocess import check_output
import time

# First we get the uname information
#
# kernel_name : -s
# nodename : -n
# kernel_release : -r
# kernel_version : -v
# machine : -m
# processor : -p
# hardware_platform : -i
# operating_system : -o
#

operating_system = check_output( ["uname", "-o"] ).rstrip()
kernel_name = check_output( ["uname", "-s"] ).rstrip()
kernel_release = check_output( ["uname", "-r"] ).rstrip()
kernel_version = check_output( ["uname", "-v"] ).rstrip()
nodename = check_output( ["uname", "-n"] ).rstrip()
machine = check_output( ["uname", "-m"] ).rstrip()
processor = check_output( ["uname", "-p"] ).rstrip()
hardware_platform = check_output( ["uname", "-i"] ).rstrip()

# now we get the boot time using who -b
boot_time = check_output( ["who", "-b"]).rstrip().lstrip()

# now we get information from our handy-dandy getSysinfo program
GETSYSINFO = "/home/marc/projects/s/sysinfo/getSysinfo"
getsysinfo_uptime = check_output( [GETSYSINFO, "-u"] ).rstrip().lstrip()
getsysinfo_totalram = check_output( [GETSYSINFO, "-r"] ).rstrip().lstrip()
getsysinfo_freeram = check_output( [GETSYSINFO, "-f"] ).rstrip().lstrip()
getsysinfo_bufferrram = check_output( [GETSYSINFO, "-b"] ).rstrip().lstrip()
getsysinfo_sharedram = check_output( [GETSYSINFO, "-s"] ).rstrip().lstrip()
getsysinfo_totalswap = check_output( [GETSYSINFO, "-w"] ).rstrip().lstrip()
getsysinfo_freeswap = check_output( [GETSYSINFO, "-g"] ).rstrip().lstrip()
getsysinfo_totalhigh = check_output( [GETSYSINFO, "-h"] ).rstrip().lstrip()
getsysinfo_mem_unit = check_output( [GETSYSINFO, "-m"] ).rstrip().lstrip()

print "{"
print "  \"report_date\": \"" + time.strftime("%Y-%m-%d %H:%M:%S") + "\","
print "  \"operating_system\": " + "\"" + operating_system + "\","
print "  \"kernel_name\": " + "\"" + kernel_name + "\","
print "  \"kernel_release\": " + "\"" + kernel_release + "\","
print "  \"kernel_version\": " + "\"" + kernel_version + "\","
print "  \"nodename\": " + "\"" + nodename + "\","
print "  \"machine\": " + "\"" + machine + "\","
print "  \"processor\": " + "\"" + processor + "\","
print "  \"hardware_platform\": " + "\"" + hardware_platform + "\","
print "  \"boot_time\": " + "\"" + boot_time + "\","
print "  " + getsysinfo_uptime + ","
print "  " + getsysinfo_totalram + ","
print "  " + getsysinfo_freeram + ","
print "  " + getsysinfo_sharedram + ","
print "  " + getsysinfo_totalswap + ","
print "  " + getsysinfo_totalhigh + ","
print "  " + getsysinfo_freeswap + ","
print "  " + getsysinfo_mem_unit
print "}"

which in turn enables the Makefile:

#
# Makefile for sysinfo
#
# $Id: Makefile,v 1.9 2014/08/31 21:27:35 marc Exp $
#

FORCE := force

HOST := $(shell hostname)
HOSTS := flapjack waffle pancake frenchtoast
SSH_FILES := $(HOSTS:%=.%_ssh)
PUSH_HOSTS := $(filter-out ${HOST}, ${HOSTS})
PUSH_FILES := $(PUSH_HOSTS:%=.%_push)

help: ${FORCE}
	cat Makefile

FILES := Makefile sysinfo.py sysinfo.bash getSysinfo.c

checkin: ${FILES}
	ci -l ${FILES}

install: ~/bin/sysinfo

~/bin/sysinfo: ./sysinfo.bash
	cp $< $@
	chmod +x $@

getSysinfo: getSysinfo.c
	cc $ $*.sysinfo
	touch $@

test: ${FORCE}
	time python sysinfo.py

force:

Notice the little trick with the Makefile variables HOST, HOSTS, SSH_FILES, PUSH_HOSTS, and PUSH_FILES that lets one host push to the others for distributing the code but lets it call on all of the hosts when gathering data.

With all of this machinery in place and distributed to all of the UNIX machines in my little network, I was now able to type ‘make ssh’ and get the resulting output:

marc@flapjack:~/projects/s/sysinfo$ more *.sysinfo
::::::::::::::
flapjack.sysinfo
::::::::::::::
{
  "report_date": "2014-09-01 10:37:30",
  "operating_system": "GNU/Linux",
  "kernel_name": "Linux",
  "kernel_release": "3.2.0-52-generic",
  "kernel_version": "#78-Ubuntu SMP Fri Jul 26 16:21:44 UTC 2013",
  "nodename": "flapjack",
  "machine": "x86_64",
  "processor": "x86_64",
  "hardware_platform": "x86_64",
  "boot_time": "system boot  2014-08-07 22:01",
  "uptime": 2118958,
  "totalram": 2089889792,
  "freeram": 145928192,
  "sharedram": 0,
  "totalswap": 2134896640,
  "totalhigh": 0,
  "freeswap": 2062192640,
  "mem_unit": 1
}
::::::::::::::
frenchtoast.sysinfo
::::::::::::::
{
  "report_date": "2014-09-01 10:37:31",
  "operating_system": "GNU/Linux",
  "kernel_name": "Linux",
  "kernel_release": "3.13.0-32-generic",
  "kernel_version": "#57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014",
  "nodename": "frenchtoast",
  "machine": "x86_64",
  "processor": "x86_64",
  "hardware_platform": "x86_64",
  "boot_time": "system boot  2014-07-19 14:58",
  "uptime": 3785970,
  "totalram": 16753840128,
  "freeram": 14150377472,
  "sharedram": 0,
  "totalswap": 17103319040,
  "totalhigh": 0,
  "freeswap": 17103319040,
  "mem_unit": 1
}
::::::::::::::
pancake.sysinfo
::::::::::::::
{
  "report_date": "2014-09-01 10:37:31",
  "operating_system": "GNU/Linux",
  "kernel_name": "Linux",
  "kernel_release": "3.13.0-35-generic",
  "kernel_version": "#62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014",
  "nodename": "pancake",
  "machine": "x86_64",
  "processor": "x86_64",
  "hardware_platform": "x86_64",
  "boot_time": "system boot  2014-08-31 09:06",
  "uptime": 91840,
  "totalram": 16753819648,
  "freeram": 15609884672,
  "sharedram": 0,
  "totalswap": 17104367616,
  "totalhigh": 0,
  "freeswap": 17104367616,
  "mem_unit": 1
}
::::::::::::::
waffle.sysinfo
::::::::::::::
{
  "report_date": "2014-09-01 10:37:30",
  "operating_system": "GNU/Linux",
  "kernel_name": "Linux",
  "kernel_release": "3.13.0-35-generic",
  "kernel_version": "#62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014",
  "nodename": "waffle",
  "machine": "x86_64",
  "processor": "x86_64",
  "hardware_platform": "x86_64",
  "boot_time": "system boot  2014-08-31 09:07",
  "uptime": 91784,
  "totalram": 16752275456,
  "freeram": 15594139648,
  "sharedram": 0,
  "totalswap": 17104367616,
  "totalhigh": 0,
  "freeswap": 17104367616,
  "mem_unit": 1
}

So now I have the beginning of a structured inventory of all of my machines, and an easy way to scale it up.

Log consolidation

Well, my nice DNS service with two secondaries and a primary is all well and good, but my logs are now scattered across three machines. If I want to play with the stats or diagnose a problem or see when something went wrong, I now have to grep around on three different machines.

Obviously I could consolidate the logs using syslog. That’s what it’s designed for, so why don’t I do that. Let’s see what I have to do to make that work properly:

  1. Set up rsyslogd on flapjack to properly stash the DNS messages
  2. Set up DNS on flapjack to log to syslog
  3. Set up the rsyslogd service on flapjack to receive syslog messages over the network
  4. Set up rsyslog on waffle to forward dns log messages to flapjack
  5. Set up rsyslog on pancake to forward dns log messages to flapjack
  6. Set up the DNS secondary configurations to use syslog instead of local logs
  7. Distribute the updates and restart the secondaries
  8. Test everything

A side benefit of using syslog to accumulate my dns logs is that they’ll now be timestamped so I can do more sophisticated data analysis if I ever get a Round Tuit.

Here’s the architecture of the setup I’m going to pursue:

2014-08-04-dns-syslog-architecture

So the first step is to set up the primary DNS server on flapjack to write to syslog.  This has several parts:

  • Declare a “facility” in syslog that DNS can write to.  For historical reasons (Hi, Eric!) syslog has a limited number of separate facilities that can accumulate logs.  The configuration file links sources to facilities, allowing the configuration master to do various clever filtering of the log messages that come in.
  • Tell DNS to log to the “facility”
  • Restart both bind9 and rsyslogd to get everything working.

The logging for Bind9 is specified in a file called at /etc/bind/named.conf.local.  The default setup involves appending log records to a file named /var/log/named/query.log.

We’ll keep using that file for our logs going forward, since some other housekeeping knows about that location and no one else is intent on interfering with it.

The old logging stanza was:

logging {
    channel query.log {
        file "/var/log/named/query.log";
        severity debug 3;
    };
    category queries { query.log; };
};

What I want will be this:

logging {
    channel query.log {
	syslog local6;
        severity debug 3;
    };
    category queries { query.log; };
};

Because I have decided to use the facility named local6 for DNS.

In order to make the rsyslogd daemon on flapjack listen to messages from DNS, I have to declare the facility active.

The syslog service on flapjack is provided by a server called rsyslogd.  It’s an alternative to the other two main stream syslog products – syslog-ng and sysklogd.  I picked rsyslogd because it comes as the standard logging service on Ubuntu 12.04 and 14.04, the distros I am using in my house.  You might call me lazy, you might call me pragmatic, but don’t call me late for happy hour.

In order to make rsyslogd do what I need, I have to take control of the management of two configuration files: /etc/rsyslog.conf and /etc/rsyslog.d/50-default.conf.  As is my wont, I do this by creating a project directory ~/projects/r/rsyslog/ with a Makefile and the editable versions of the two files under RCS control.  Here’s the Makefile:

cat Makefile
#
# rsyslog setup file
#
# As of 2014-08-01 syslog host is flapjack
#
# $Id: Makefile,v 1.4 2014/08/02 12:11:52 marc Exp $
#

TARGETS = /etc/rsyslog.conf /etc/rsyslog.d/50-default.conf

FILES = Makefile rsyslog.conf 50-default.conf

help: ${FORCE}
	cat Makefile

# sudo
/etc/rsyslog.conf: rsyslog.conf
	cp $< $@ 

/etc/rsyslog.d/50-default.conf: 50-default.conf
	cp $< $@ 

# sudo
push: ${TARGETS}

# sudo
restart: ${FORCE}
	service rsyslog restart

verify: ${FORCE}
	rsyslogd -c5 -N1

compare: ${FORCE}
	diff /etc/rsyslog.conf rsyslog.conf
	diff /etc/rsyslog.d/50-default.conf 50-default.conf

checkin: ${FORCE}
	ci -l ${FILES}

FORCE:

Actually, this Makefile ends up in ~/projects/r/rsyslog/flapjack, since waffle and pancake will end up with different rsyslogd configurations and I separate the different control directories this way.

In order to log using syslog I need to define a facility, local6, in the 50-default.conf file. The new assertion looks like this:

local6.*	-/var/log/named/query.log

With a restart of each of the appropriate daemons, we’re off to the races and the new logs appear in the log file. I needed to change the ownership of the /var/log/named/query.log from bind to syslog in order for the new writer to be able to write, but that was the work of a moment.

Now comes the task of making the logs from the two secondary DNS servers go across the network to flapjack. This involved a lot of little bits and pieces.

First of all, I had to tell the rsyslogd daemon on flapjack to listen to the rsyslog UDP port. I could have turned on the more reliable TCP logging facility or the even more reliable queueing facility, but let’s get real. These are DNS query logs we’re talking about. I don’t really care if some of them fall on the floor. And anyway, the traffic levels on donner.lan are so low that I’d be very surprised if the loss rate is significant anyway.

To turn on UDP listening on flapjack all I had to do was uncomment two lines in the /etc/rsyslog.conf file:

# provides UDP syslog reception
$ModLoad imudp
$UDPServerRun 514

One more restart of rsyslogd on flapjack and we’re good to go.

The next step is to make the DNS name service on waffle and pancake send their logs to the local6 facility. In addition, I had to set up rsyslog on waffle and flapjack with a local6 facility, though this time the facility has to know to send the logs across to flapjack by UDP rather than writing locally.

The change to the named.conf.local file for waffle and pancake’s DNS secondary service was identical to the change to flapjack’s primary service, so kudos to the designers of bind9 and syslogd for good modularization.

To make waffle and pancake forward their logs over to flapjack required that the /etc/rsyslog.d/50-default.conf file define local6 in this way:

local6.*	@syslog

Notice that the @ tells rsyslogd to forward logs to local6 via UDP. I could have put the IP address of flapjack right after the @ or I could have put in flapjack. Instead, I created a DNS listing for a service host named syslog … it happens to have the same IP address as flapjack, but it gives me a level of indirection if I should desire to relocate the syslog service to another host.

With a restart of rsyslogd and bind9 on both waffle and pancake, we are up and running. All DNS logs are now consolidated on a single host, namely flapjack.

Waiting for the File Server

Well, I now have four different UNIX machines and I’ve been doing sysadmin tasks on all of them.  As a result I now have four home directories that are out of sync.

How annoying.

Ultimately I plan to create a file server on one of my machines and provide the same home directory on all of them, but I haven’t done that yet, so I need some temporary crutches to tide me over until I get the file server built. In particular, I need to find out what is where.

The first thing I did was establish trust among the machines, making flapjack, the oldest, into the ‘master’ trusted by the others.  This I did by creating an SSH private key using ssh-keygen on the master and putting the matching public key in .ssh/authorized_keys on the other machines.

Then I decided to automate the discovery of what directories were on which machine.  This is made easier because of my personal trick for organizing files, namely to have a set of top level subdirectories named org/, people/, and projects/ in my home directory. Each of these has twenty-six subdirectories named a through z, with appropriately named subdirectories under them. This I find helps me put related things together. It is not an alternative to search but rather a complement.

Anyway, the result is that I could build a Makefile that automates reaching out to all of my machines and gathering information. Here’s the Makefile:

# $Id: Makefile,v 1.7 2014/07/04 18:57:44 marc Exp marc $

FORCE = force

HOSTS = flapjack frenchtoast pancake waffle

FILES = Makefile

checkin: ${FORCE}
	ci -l ${FILES}

uname: ${FORCE}
	for h in ${HOSTS}; \
	   do ssh $$h uname -a \
	      | sed -e 's/^/'$$h': /'; \
	   done

host_find: ${FORCE}
	echo > host_find.txt
	for h in ${HOSTS}; \
		do ssh $$h find -print \
		| sed -e 's/^/'$$h': /' \
		 >> host_find.txt; done

clusters.txt: host_find.txt
	sed -e 's|\(/[^/]*/[a-z]/[^/]*\)/.*$$|\1|' host_find.txt \
	| uniq -c \
	| grep -v '^ *1 ' \
	> clusters.txt

force:

Ideally, of course, I’d get the list of host names in the variable HOSTS from my configuration database, but having neglected to build one yet, I am just listing my machines by name there.

The first important target host_find does an ssh to all of the machines, including itself, and runs find, prefixing the host name on each line so that I can determine which files exist on which machine. This creates a file named host_find.txt which I can probably dispense with now that the machinery is working.

The second important target, clusters.txt, passes the host_find.txt output through a SED script. This SED script does a rather careful substitution of patterns like /org/z/zodiac/blah-blah-blah with /org/z/zodiac. Then the pipe through uniq -c counts up the number of identical path prefixes. That’s fine, but there are lots of subdirectories /org/f that are empty and I don’t want them cluttering up my result, so the grep -v '^ *1 ' pipe segment excludes the lines with a count of 1.

The result of running that tonight is the following report:

      8 flapjack: ./org/c/coursera
    351 flapjack: ./org/s/studiopress
   3119 flapjack: ./org/g/gnu
   1312 flapjack: ./org/f/freedesktop
    293 flapjack: ./org/m/minecraft
      9 flapjack: ./org/b/brother
      2 flapjack: ./org/n/national_center_for_access_to_justice
   1168 flapjack: ./org/w/wordpress
      4 flapjack: ./projects/c/cron
     10 flapjack: ./projects/c/cups
      6 flapjack: ./projects/d/dhcp
     33 flapjack: ./projects/d/dns
     15 flapjack: ./projects/s/sysadmin
      5 flapjack: ./projects/f/ftp
      3 flapjack: ./projects/p/printcap
      8 flapjack: ./projects/p/programming
      8 flapjack: ./projects/t/tftpd
     35 flapjack: ./projects/n/netboot
      7 flapjack: ./projects/l/logrotate
      8 flapjack: ./projects/r/rolodex
    189 flapjack: ./projects/h/html5reset
      6 frenchtoast: ./projects/p/printcap
      5 frenchtoast: ./projects/c/cups
    380 pancake: ./org/m/minecraft
      3 pancake: ./projects/l/logrotate
     15 pancake: ./projects/d/dns
      9 pancake: ./projects/s/sysadmin
     11 waffle: ./projects/s/sysadmin
      8 waffle: ./projects/t/tftpd
     15 waffle: ./projects/d/dns
      3 waffle: ./projects/l/logrotate
    375 waffle: ./org/m/minecraft

And … voila! I have a map that I can use to figure out how to consolidate the many scattered parts of my home directory.

[2014-07-04 – updated the Makefile so that it is more friendly to web browsers.]

[2014-07-29 – a friend of mine critiqued my Makefile code and pointed out that gmake has powerful iteration functions of its own, eliminating the need for me to incorporate shell code in my targets. The result is quite elegant, I must say!]

#
# Find out what files exist on all of the hosts on donner.lan
# Started in June 2014 by Marc Donner
#
# $Id: Makefile,v 1.12 2014/07/30 02:07:07 marc Exp $
#

FORCE = force

# This ought to be the result of a call to the CMDB
HOSTS = flapjack frenchtoast pancake waffle

FILES = Makefile host_find.txt clusters.txt

#
# This provides us with the ISO 8601 date (YYYY-MM-DD)
#
DATE := $(shell /bin/date +"%Y-%m-%d")

help: ${FORCE}
	cat Makefile

checkin: ${FORCE}
	ci -l ${FILES}

# A finger exercise to ensure that we can see the base info on the hosts
HOSTS_UNAME := $(HOSTS:%=.%_uname.txt)

uname: ${HOSTS_UNAME}
	cat ${HOSTS_UNAME}

.%_uname.txt: ${FORCE}
	ssh $* uname -a | sed -e 's/^/:'$*': /' > $@

HOSTS_UPTIME := $(HOSTS:%=.%_uptime.txt)

uptime: ${HOSTS_UPTIME}
	cat ${HOSTS_UPTIME}

.%_uptime.txt: ${FORCE}
	ssh $* uptime | sed -e 's/^/:'$*': /' > $@

# Another finger exercise to verify the location of the ssh landing
# point home directory

HOSTS_PWD := $(HOSTS:%=.%_pwd.txt)

pwd: ${HOSTS_PWD}
	cat ${HOSTS_PWD}

.%_pwd.txt: ${FORCE}
	ssh $* pwd | sed -e 's/^/:'$*': /' > $@

# Run find on all of the ${HOSTS} and prefix mark all of the results,
# accumulating them all in host_find.txt

HOSTS_FIND := $(HOSTS:%=.%_find.txt)

find: ${HOSTS_FIND}

.%_find.txt: ${FORCE}
	echo '# ' ${DATE} > $@
	ssh $* find -print | sed -e 's/^/:'$*': /' >> $@

# Get rid of the empty directories and report the number of files in each
# non-empty directory
clusters.txt: ${HOSTS_FIND}
	cat ${HOSTS_FIND} \
	| sed -e 's|\(/[^/]*/[a-z]/[^/]*\)/.*$$|\1|' \
	| uniq -c \
	| grep -v '^ *1 ' \
	| sort -t ':' -k 3 \
	> clusters.txt

force:

Two Intel NUC servers running Ubuntu

Two Intel NUC servers running Ubuntu

A week or two ago I took the plunge and ordered a pair of Intel NUC systems. Here’s what happened next as I worked to build a pair of Ubuntu servers out of the hardware:

I ordered the components for two Linux servers from Amazon:

  • Intel NUC D54250WYK [$364.99 each]
  • Crucial M500 240 GB mSATA [$119.99 each]
  • Crucial 16GB Kit [$134.99 each]
  • Cables Unlimited 6-Foot Mickey Mouse Power Cord [$5.99 each]

for a total of $625.96 per machine. Because I have a structured wiring system in my apartment I didn’t bother with the wifi card.

Assembly was fast, taking ten or fifteen minutes to open the bottom cover, snap in the RAM and the SSD, and button the machine up again.

Getting Ubuntu installed was rather more work (on an iMac):

Download the Ubuntu image from the Ubuntu site.

Prepare a bootable USB with the server image (used diskutil to learn that my USB stick was on /dev/disk4):

  • hdiutil convert -format UDRW -o ubuntu-14.04-server-amd64.img ubuntu-14.04-server-amd64.iso
  • diskutil unmountDisk /dev/disk4
  • sudo dd if=ubuntu-14.04-server-amd64.img.dmg of=/dev/rdisk4 bs=1m
  • diskutil eject /dev/disk4

This then booted on the NUC, and the install went relatively smoothly.

However the system would not boot – did not recognize the SSD as a boot system – after the installation was complete

Did a little searching around and learned that I needed to update the BIOS on the NUC. Downloaded the updated firmware from the Intel site, following a YouTube video from Intel, and applied the new firmware.

Redid the install, which ultimately worked, after one more glitch. The second machine went more smoothly.

Two little Linux boxes now working quite nicely – completely silent, 16G of RAM on each, 240G SSD on each.

They are physically tiny … hard to overemphasize how tiny, but really tiny. They sit on top of my Airport Extreme access point and make it look big.

2014 Five Borough Bike Tour – I’m riding

The Five Borough Bike Tour is an annual event in which tens of thousands of New Yorkers ride 40 or 50 miles from lower Manhattan up through the Bronx, Queens, Brooklyn, and over the Verrazano Narrows Bridge to Staten Island.  For the last three years I’ve supported a wonderful organization called Bronxworks (http://bronxworks.org/) that helps families in need in The Bronx.  I ride with a number of friends, some of whom live in the Bronx, and all of whom have adopted this wonderful group.

I rode with the Bronxworks team in 2011 and 2012 but a conflict prevented me from riding in 2013, though I donated to support the rest of the team.  Fortunately for me I will be riding again this year.  If you want to contribute to Bronxworks in support of my ride you may visit my fundraising page http://www.crowdrise.com/BronxWorks2014BikeTour/fundraiser/marcdonner.  If you do so, I will be eternally grateful!

 

From the Editors: The Invisible Computers

[Originally published in the November/December 2011 issue (Volume 9 number 6) of IEEE Security & Privacy magazine.]

Just over a decade ago, shortly before we launched IEEE Security & Privacy, MIT Press published Donald Norman‘s book The Invisible Computer. At the time, conversations about the book focused on the opportunities exposed by his powerful analogies between computers and small electric motors as system components.

Today, almost everything we use has one or more computers, and a surprising number have so many that they require internal networks. For instance, a new automobile has so many computers in it that it has at least two local area networks, separated by a firewall, to connect them, along with interconnects to external systems. There’s probably even a computer in the key!

Medical device makers have also embraced computers as components. Implantable defibrillators and pacemakers have computers and control APIs. If it’s a computer, it must have some test facilities, and these, if misused, could threaten a patient’s health. Doctors who have driven these designs, focused entirely on saving lives, are shocked when asked about safeguards to prevent unauthorized abuse. It’s probably good that their minds don’t go that way, but someone (that’s you) should definitely be thinking that way.

In 2007, the convergence battle in the mobile telephone world was resolved with the iPhone. iPhone’s launch ended the mad competition to add more surfaces and smaller buttons to attach more “features” to each phone. Ever after, a mobile phone would be primarily a piece of software. One button was enough. After that, it was software all the rest of the way down, and control of the technology’s evolution shifted from mechanical to software engineers.

By now, the shape of the computer systems world is beginning to emerge. No longer is the familiar computer body plan of a screen, keyboard, and pointing device recognizable. Now computers lurk inside the most innocuous physical objects, specialized in function but increasingly sophisticated in behavior. Beyond the computer’s presence, however, is the ubiquity of interconnection. The new generation of computers is highly connected, and this is driving a revolution in both security and privacy issues.

It isn’t always obvious what threats to security and privacy this new reality will present. For example, it’s now possible to track stolen cameras using Web-based services that scan published photographs and index them by metadata included in JPEG or TIFF files. Although this is a boon for theft victims, the privacy risks have yet to be understood.

The computer cluster that is a contemporary automobile presents tremendous improvements in safety, performance, and functionality, but it also presents security challenges that are only now being studied and understood. Researchers have identified major vulnerabilities and, encouragingly, report engagement from the automobile industry in acting to mitigate the documented risks.

Security and privacy practitioners and researchers have become comfortable working in the well-lit neighborhood of the standard computer system lamppost. However, the computing world will continue to change rapidly. We should focus more effort on the challenges of the next generations of embedded and interconnected systems.

This is my valedictory editor-in-chief message. I helped George Cybenko, Carl Landwehr, and Fred Schneider launch this magazine and have served as associate EIC ever since. In recent years, my primary work moved into other areas, and lately I have felt that I was gaining more than I was contributing. Thus, at the beginning of 2011, I suggested to EIC John Viega that I would like to step down as associate EIC and give him an opportunity to bring some fresh blood to the team. The two new associate EIC — Shari Lawrence Pfleeger and Jeremy Epstein — are both impressive experts and a wonderful addition. The magazine, and the community it serves, are in excellent hands.

From the Editors: Privacy and the System Life Cycle

[Originally published in the March/April 2011 issue (Volume 9 number 2) of IEEE Security & Privacy magazine.]

Engineering long-lived systems is hard, and adding privacy considerations to such systems makes the work harder.

Who may look at private data that I put online? Certainly I may look at it, plus any person I explicitly authorize. When may the online system’s operators look at it? Certainly when customer service representatives are assisting me in resolving a problem, they might look at the data, though I would expect them to get my permission before doing so. I would also expect my permission to extend only for the duration of the support transaction and to cover just enough data elements to allow the problem’s analysis and resolution.

When may developers responsible for the software’s evolution and maintenance look at my data? Well, pretty much never. The exception is when they’re called in during escalation of a customer service transaction. Yes, that’s right: developers may not, in general, look at private data contained in the systems that they have written and continue to support. In practice, it’s probably infeasible to make developer access impossible, but we should make it highly visible.

Doesn’t the code have a role in this? Of course it does, but the code isn’t generally created by the consumer and isn’t private. Insofar as consumers create code—and they do when they write macros, filters, and configurations for the system—it’s part of this analysis. The system life cycle and privacy implications of user-created code are beyond the current state of the art and merit significant attention in their own right.

So what happens when an online system is forced to migrate data from one version of the software to another version? This happens periodically in the evolution of most long-lived systems, and it often involves a change to the underlying data model. How do software engineers ensure that the migration is executed correctly? They may not spot-check the data, of course, because it’s private. Instead, they build test datasets and run them through the migration system and carefully check the results. But experienced software engineers know very well that test datasets are generally way too clean and don’t exercise the worst of the system. Remember, no system can ever be foolproof because fools are way too clever. So we must develop tests that let us verify that data migration has been executed properly without being able to examine the result and spot-check it by eye. Ouch.

What’s the state of the art with respect to this topic? Our community has produced several documents that represent a start for dealing with private data in computer systems. By and large, these documents focus on foundational issues such as what is and isn’t private data, how to notify consumers that private data will be gathered and held, requirements of laws and regulations governing private data, and protecting private data from unauthorized agents and uses.

Rules and regulations concerning privacy fall along a spectrum. At one end are regulations that attempt to specify behavior to a high level of detail. These rules are well intended, but it’s sometimes unclear to engineers whether compliance is actually possible. At the other end are rules such as HIPAA (Health Insurance Portability and Accountability Act) that simply draw a bright line around a community of data users that comprise doctors, pharmacies, labs, insurers, and their agents and forbid any data flow across that line. HIPAA provides few restrictions on the handling or use of this data within that line. Of course, one irony with HIPAA is that the consumer is outside the line.

Given the current state of engineering systems for online privacy, regulations like HIPAA are probably better than heavy-handed attempts to rush solutions faster than the engineering community can figure out feasibility limits.

This is an important area of work, and some promising research is emerging, such as Craig Gentry’s recent PhD thesis on homomorphic encryption ( http://crypto.stanford.edu/craig/craig-thesis.pdf), but full rescue looks to be years off. We welcome reports from practitioners and researchers on approaches to the problem of maintaining data that may not be examined.