Wednesday, September 28, 2005

Web 2.0 and Puppet

Here's a letter I sent to O'Reilly Radar, in relation to my attendance of the Web 2.0 conference:

I just announced the beta release of my software startup's main product, Puppet (http://reductivelabs.com/projects/puppet), which is a GPLed configuration management solution written in Ruby (no, the point of this email isn't a pitch, it's to ask two questions related to Web 2.0). At this point Puppet is analogous to cfengine, although I believe I've created a significantly superior product, especially Puppet's language. I actually wrote a couple of cfengine articles for onlamp.com last year and I spent three years doing cfengine consulting, along with spending 4 months trying (and failing) to rewrite cfengine's parser, so I know cfengine and its language pretty well.

The reason that I'm writing the O'Reilly Radar about Puppet is that I have plans to significantly develop the Puppet client to create a kind of Puppet mesh network, and while I am convinced that there is some value-add to doing this that's analogous to the value-add in Web 2.0 sites, I can't quite pin it down.

I'm attending the Web 2.0 conference in October, and I'd like to show up with, at the least, some extra contacts so that my time in the hallway track (as it's called at LISA) is a bit more valuable, but I'd especially love to have a bit of a dialog about this idea before I show up, so that the time at the conference is especially valuable.

So, what am I thinking of? Here are some of the important aspects of the setup:

* Each puppet daemon will be modeling the entire configuration of the server on which it's running, using higher-level elements like packages, services, and files. In addition to the normal elements, though, the client will also be modeling all of the relationships between objects -- if a service requires a file, the client will know it.

* Each daemon will also be doing significant monitoring and record-keeping on the client and will have information available on what work it has done -- what packages it has installed or upgraded, which files it had to fix permissions or ownership of, which services had to be restarted, etc.

* Daemons will eventually also be able to model relationships between different servers, although that's not going to happen until probably 2.0. So, what we have now is a mesh capable of modeling not just a single host's configuration but that of the entire network, including hopefully all of the interrelationships between hosts and the services on the hosts. In addition, we have historical information about what things previously looked like and what we've had to do to keep the configuration correct.

How can we throw some Web 2.0 goodness into the picture? Well, I'm not really sure myself, at least partially because the definition of Web 2.0 doesn't seem very clear, but it does kind of necessarily imply humans visiting websites, and here we have neither humans nor websites. So we first have to ask whether it makes sense even to talk about Web 2.0 without those two key features; or rather, it makes sense to ask whether the general principles of Web 2.0 extend beyond the web and into general connectedness.

I think they do. Let's take a simple example: Puppet has classing capabities, where you collect objects and name the collection:


define apache {
service { apache: running => true }
package { apache: install => latest }
file { "/etc/apache":
source => "puppet://server/source",
recurse => true
}
}

Like most things, it makes sense to create the configuration in this kind of heirarchical style, but like most things, we want to be able to get more out of the configuration than simple heirarchy. Let's take the flickr route, then, and consider each of these elements to be tagged with 'apache', and then maybe also tag them each with the name of each server to which the 'apache' definition is applied. This doesn't seem too useful to start with, but if we extend it all the way up -- every element on a system is tagged with each class or definition that includes it, and since both classes and definitions can be hierarchical, this could be pretty big:


import "apache" # the definition above

class webserver {
apache {}
}

case $hostname {
culain: { webserver {} }
}


This would result in each of the objects in the apache definition getting tagged with 'apache', 'webserver', and 'culain'.

Let's go one better, though; this is marginally interesting on one client, but let's extend it to the whole network: Let's normalize all configuration elements across the entire network. Let's do the same tag-and-flatten to every element on every node on the whole network (ignore where we do this for now, whether on a central server or whatever) -- you now have, continuing with our example, a single apache package element (or maybe one for each major rev), tagged with every host that has apache installed along with a tag for every class or definition that refers to an apache package.

Now take this tagged-and-flattened list and make it available to every node, and add some CLI tools to access it. Now you can connect to any machine on the network and query for tags related to any element, and you'll get back all kinds of metadata -- what hosts also have that element, what classes care about it, what elements depend on it, that kind of thing.

Would this be useful? I can only think it would be useful, even with just that. But then take this and start doing all kinds of weird things, like looking for clusters like flickr's tag clusters -- I can basically guarantee you that you'll find all kinds of interesting patterns and clusters in this flattened-and-tagged list.

So, my two questions to the O'Reilly Radar team are:

1) Is this Web 2.0?

2) Are any of you interested in having a bit of a conversation about this? If not, is there a forum that it makes sense to bring this to? I think Puppet will be a useful and popular tool regardless of whether I can apply Web 2.0 to it, but if I can take all of this information and really do interesting things with it, I think I could have a great tool, and I think I could seriously affect how systems and managed and monitored.

No idea if you're interested, but feel free to post this on the blog if you think it would generate interesting discussion.

0 Comments:

Post a Comment

<< Home