Disk Space or DNS

Whenever I have a client call me up to tell me that a service that’s been working fine for ages has suddenly stopped, two prime culprits immediately spring to mind. One is very subtle, the other not so much.

When first logging in to the afflicted system, I do `df -h` to check available disk space on all volumes. It’s amazing how many of these incidents come down to a disk running out of space. Usually, a monitoring system warns when disk space hits 90%. Sometimes, though, that last 10% can be churned through very quickly, catching even the monitoring service off guard. In a number of cases, I’ve had monitoring systems miss full disk conditions because the full disk condition stopped the agent from accurately reporting the current state (a topic for another entry). Bottom line: check your disk space.

DNS issues are much more subtle and can manifest themselves in different ways. Often, you’ll see a sudden decrease in performance for some inexplicable reason. Things that used to work suddenly break. In many cases, this is because systems that refer to hosts by name can either no longer find those hosts or there’s a sudden delay in the process.

The `host` command is your friend here:

chris@utc-pdc1:~$ host www.dollmont.net
www.dollmont.net is an alias for dollmont.net.
dollmont.net has address 216.86.156.11

This tells me that www.dollmont.net points to dollmont.net, which in turn has an ip address of 216.86.156.11.

chris@utc-pdc1:~$ host 216.86.156.11
11.156.86.216.in-addr.arpa domain name pointer server1.fusednetwork.com.

Doing a `host` command on the IP address will tell me if reverse DNS is working. In this case, it’s pointing me at a Fused Network server, which is correct—Fused Networks does my site hosting. In a perfect world, I would have a DNS PTR record for the IP address that pointed back to dollmont.net, but this is more than good enough. Forward and reverse DNS both resolve.

If there’s an error in either one of these checks, it means that DNS is broken somewhere down the line. Trouble shooting that can be a nightmare, and is beyond the scope of this article.

Disk space and DNS are the first two things you should look at whenever things start behaving in a wildly different manner. Often the problem will be one or the other and usually involves a quick fix.

The perils of bit rot…

Bit rot is a concept familiar to a lot of people. Usually, it refers to the slow degradation of data storage. Magnetic devices, over time, can lose their charge in places that can cause data holes to appear. Lose a few bits from a file and the file can become useless.

But there’s another form of bit rot that just as insidious and can be far more dangerous. It’s the bit rot that comes from keeping old systems on line and in production with no plan for an upgrade path. Let enough time pass and it could become impossible to upgrade the system.

I have a client that has a production system running an older version of Linux. The version of this particular distro is so old that file repositories are no longer maintained and security patches are no longer offered. Furthermore, this particular server is running a commercial piece of software that has been end-of-life’d by the vendor. Not only does the vendor no longer support the software, it’s impossible to get distribution media for the software. If this system were to blow up somehow, the client is basically screwed. We cannot reload the OS because we no longer have the original install media. We cannot reload the application for the same reason. The vendor might offer an upgrade to the software or we could just pay for the package all over again, but there’s no guarantee that the latest version will be backward-compatible with the existing system. Like I said—screwed.

The good news in this case is that we were able to uplift the entire system into a virtual machine. We can now back up the virtual machine image. In a worst case scenario, we can recover from backup and restore the virtual machine to be the same as the current hardware system. The process of uplifting such an old system was not trivial, and we were lucky that we were able to get it done. The client is now “protected” much more than he would have been had he continued down the same path.

Whenever you put a system in place, you need to plan for the eventual upgrade of that system. The reality is that software, and especially operating systems like Linux, matures. You need to plan for that. Distribution choice matters. Any of the Ubuntu LTS distributions are guaranteed to be supported for 5 years. Commercial distributions like RedHat and their Open derivatives like CentOS have long term support options available, too. Using Open Source tools makes the software path more favourable to upgrade.

Don’t let systemic bit rot happen to you…

Website optimization…

I’ve picked up 3 short contracts in the last week to optimize websites. The real goal here is not so much optimizing the website as it is optimizing the web server. In every case, the web server was Apache, the most popular and widely used web server on the Internet.

Invariably, the client has set up a server using Ubuntu server edition or CenOS or some other Linux distribution. The base server choice is invariably a good one, but problems start to appear after going live and coming under load. These problems occur because the sysadmin who set up the server ignored a few simple rules:

  1. Package managers are good for setting up generic systems. For small loads, packaged Apache is fine. But once you start to play with the big boys, you must be prepared to build Apache from scratch. Apache needs to optimized to your hardware and needs to be stripped of cruft that you don’t need.
  2. Apache is a powerful server and is great for serving dynamic content. But if you have a lot of traffic, it can be overwhelmed trying to do too many things at once. Split the tasks and have a specialized, light-weight server like lighttpd server up your static content and leave just the heavy lifting for Apache.
  3. Optimize your database, too. Don’t focus on just the webserving. If you’re getting data from MySQL, look carefully at how it’s set up. I’ll be covering this as a separate note soon.

In short, you need to look at all of the processes and individual components. Often, when I’m called in, decisions such as hardware and server software have already been made and I’m required to work within those constraints. But even with hands tied, there’s a lot of things that can be done to improve website performance.

admin confusion and sma on OpenSolaris

When coming from a Linux world to OpenSolaris, there’s a wee bit of retraining required. The latest example I came across is configuring snmp on OpenSolaris.

snmp services on OpenSolaris are provided by a service called sma. To get snmp working on OpenSolaris:

$pfexec pkg install SUNWsmmgr

$pfexec svcadm enable sma

The configuration file is net-snmp standard stuff, but it’s concealed in /etc/sma/snmp/snmpd.conf. After changing this file, restart sma:

$pfexec svcadm restart sma

OpenSolaris will now respond as normal to snmpwalk requests from your monitoring system.

By the way, pfexec is the OpenSolaris equivalent (roughly) to sudo on Linux. It allows you to execute commands as root. The first account created, by default, is allowed to execute pfexec against all OpenSolaris commands without requiring a password. A future note will tell you how to change that…

Windows—Vista, 7 in a Unix world.

I’ve been working with Windows 7 on a couple of laptops for the last few weeks, and I have to say I’m liking it quite a bit. I also run Vista on my main desktop, so I can run some comparisons. The user experience on Windows 7 is much better. I know you’d like to hear that it’s more stable than Vista but I can honestly say I’ve not had stability issues with Vista.

Every day I spend considerable time with Linux and Solaris. Unix is my job and it’s been my career for 20 years. I have laptops that run Linux for a lot of the work I do on the road. I have a NAS box running Solaris. I have a desktop running Linux and MythTV. People ask why I feel the need to run Windows at all.

I have kids. They like games. There are some games I enjoy playing. The main desktop is a Windows machine just for the gaming experience.

I have clients. Some run Windows. There are, from time to time, sound business reasons to be running Windows…

OK. I’ve gone back and re-read that last sentence 4 or 5 times. I can’t keep a straight face. Let me rephrase:

There are, from time to time, clients who convince themselves that there are sound business reasons to be running Windows. I have to work with these folks. One of my primary areas of expertise is integrating Open Source solutions with Windows. LDAP into Active Directory. Unix based CIFS servers as primary data storage on Windows networks. Postfix as a gateway device into Exchange. That sort of thing. I have to be familiar with Windows or my job is harder than it needs to be.

Some Linux fanboys don’t get it. They speak out against Microsoft at every opportunity. They deride Microsoft’s products. They refuse to have anything at all to do with Windows. They have their reasons, of course, but I question the validity of those reasons. To complete exclude yourself from working with something simply because it goes against your grain limits your options. Better to know your enemy and know the ways in which you can make it better.

Good, fast, cheap—pick…all three?!?

It’s long been a saying in this business: You can have it good, fast and cheap. Pick any two. That’s beginning to change on many levels, and the changes are starting to ripple outward.

Sun Fishworks aims to bring large volume NAS and SAN storage down to commodity pricing. They’re combining a lot of their different technologies and playing on all their strengths—most notably, Solaris. By combining the power of Solaris with low cost hardware, Sun is challenging companies like NetApp. With any luck, they’ll be able to pull it off from a marketing standpoint—an area of weakness at Sun.

The Fishworks philosophy is a great one—do more with less. I brought this home by building a NAS server for the house based around OpenSolaris and the MSI Wind PC.

OpenSolaris is installed on a 4 GB Compact Flash card that sits on the Wind PC’s motherboard. There are 2 500 GB hard drives—one in the hard drive bay and one in the optical drive bay via a 5.25 to 3.5 adapter. There is no optical drive. In this configuration you can install OpenSolaris via a USB optical drive or, like I did, via a USB thumb drive.

Once installed, the two disks are placed into a ZFS pool and, in my case, mirrored. It’s amazing how flexible and easy ZFS is to manage, particularly with the power it gives you. CIFS is now an in-kernel driver on OpenSolaris and managed via ZFS settings. Sharing volumes and directories is easy. Even NFS sharing for the Linux boxes on my network is a no-brainer.

I’m still tweaking and testing system performance, but once I’m done I’ll do a full writeup on the system. So far, it’s good, fast and cheap. Not bad at all…

OpenSolaris on Levenvo Thinkpad T61

I’ve made attempts at installing Solaris on my Thinkpad in the past, with mixed success. I’m happy to report that, as of OpenSolaris 11.08, the installation is straightforward and almost everything works out of the box.

Go ahead and follow the procedures for installing OpenSolaris. Once the installation is complete, do a software update:

$pfexec pkg image-update –v

The pfexec command raises your privileges to root—like sudo in Linux. The -v part of the command will give you feedback on how everything is going. Once you’re done, you’ll have an entirely new boot image, courtesy of ZFS and snapshots. Next time you boot, your update OS will be the default option in GRUB, and you’ll be able to boot back into your previous, unpatched OS.

To enable suspend/resume, edit /etc/power.conf:

$pfexec vi /etc/power.conf

At the bottom, insert:

S3-support-enable

Save the file and run:

$pfexec pmconfig

On the next reboot, Suspend will be added as an option to the shutdown menu. Unlike other systems, you’ll have to press the power button to resume. In your Gnome Power Preferences, you’ll now be able to select Suspend as an option for “When laptop lid is closed:”

Wireless, including support for WPA, works out of the box. So does the nVidia card, including 3D acceleration and Compiz Fusion. The desktop is beautiful and performance is outstanding.

The only thing I’ve not got working yet are the softkeys for the volume controls. I’ll post an update when I get that sorted.

Sometimes, simple tools are the best…

Not just better, mind you, but the best.

When you’re forced to work on a Unix infrastructure on a Windows desktop, the very first tool you want is an ssh client. I’ve long used PuTTY for this. It’s an excellent tool and has all the bells and whistles that I need: certificate management, port forwarding, configurable terminal, screen support, etc. It has a couple of quirks that I don’t like, the most notable being that it stores its saved settings in the registry so it’s difficult to move saved settings from one machine to another. That’s not a show stopper, but as I’ve expanded my Windows work lately it became a real annoyance. I’ve been configuring multiple Windows machines for sys admin work and having to reenter the servers on every machine is a pain.

I took a look around at the state of Windows ssh tools and found a couple that looked really nice. I played around with them and enjoyed things like separate configuration files that are easily transportable. Some had built in scp clients. All were good, solid tools.

But I’m back to PuTTY. Despite not being updated since early 2007, PuTTY is still a solid tool. It’s memory footprint is small. It’s easy to install. It’ll run in a standalone USB key environment. There’s now a Linux version, though I’ve not used it. It just works. Every time. With no futzing about. It’s old, stodgy and very, very reliable. It is the best ssh client for Windows.

Two additional points: if you need scp, the PuTTY scp client works, but it’s not got a lot of features. Try WinSCP instead. Secondly, when is Microsoft going to get with the program and put an ssh client directly into the Windows shell? Being able to run “ssh -l 8080:remote:8080 -X somemachine.com” directly from the command line is, for me, more intuitive and easier to do. One of the drawbacks of being a crotchety old Unix admin.