mulhern_at_yocto | Recent Entries

So, I've been using Perl yet again, as the utility I've been working on, buck-security, is written in Perl. And it does leave something to be desired. Here's a few of the things that have bothered me since I've been using it a bit more.

1) There is no interpreter. This is a pain, because it is an obstacle in the way of experimentation. There is a trick, which is run it in the debugger, but this is a cumbersome trick and should be unnecessary. I have no real idea why there is no interpreter; it could be history, it could be philosophy.

2) There is strange semantics, often. The comma is an operator which concatenates things in a strange and unpredictable way. If you are more used to Python, the comma is a false friend.

3) Perl positively encourage global variables. 99% of the time, a global variable eventually has to be changed to a more local variable and then you always wish you, or the person who originally wrote the code, had started out in the right way, with a local variable.

4) The "grep" function is not about regular expressions. This is deceptive. It should have been called filter, because that is what it does. This problem is probably due to history.

5) The fact that arguments and return values are always arrays but are dressed up to look like something else is unfortunate. This can cause problems, because a small mistake will result in one argument overwriting another. This has probably been the source of about 1/2 million Perl bugs since Perl began. I had a few myself.

So, I spent a while working on a set of Python utilities of about a thousand lines altogether, and not surprisingly, I had a lot of commits, around eighty or so, by the time I was done. Of course, during the development process, those commits were useful, but now that I'm done, nobody should be forced to go through them all. They should get just a few commits, representing the logical divisions of the work, which they can then apply to the master branch.

Achieving this automatically is astonishingly easy. I have two branches, a master branch and my development branch. I switch back to master, pull the current version, and make sure it's in a good state. Then I make a new branch from the master branch, my development-final branch and switch to that. It is an exact copy of the master branch. Then, I merge my changes from the development branch into the development-final branch, but I add the --squash flag.

git figures out an important fact, which is that my development branch branched from the master at a particular point in the past. Therefore, all subsequent updates to the master branch and consequently to the development-final branch are totally acceptable and are not counted as fundamental differences between the development and development-final branch. The changes I merge into the development-final branch are exactly my own development and nothing else. Also, the changes are staged for commit, i.e., they are not yet committed. This means that I can unstage, restage, and commit in whatever way I like. Handy!

When this patch set is finally applied to the master I will be free to delete both the development and development-final branches.

Of course, what may often happen is that as soon as I've made the development-final branch and begun to work with it to get my patch set ready I realize that there is something not quite right about my development. In that case, I switch back to the development branch, and unless I've already unstaged or commited something my development-final branch simply becomes clean again. I can do my fixes in the development branch, and when I'm done switch to the development-final branch and do a merge --squash again. If I have unstaged or commited I'll have to do a little cleanup before I switch to the development branch, either by adding or by doing a reset --hard. But that is all second nature by now.

It's curious that a branch becomes dirty if you unstage something. In that case, if you try to switch to a new branch git will complain saying that your unstaged changes will be lost. But if all your changes are staged and you switch, git will not complain, but it will lose your changes. I don't yet quite understand the rationale.

My first experiences with Perl were pretty rough. There was a time when Perl was the hot new thing and I decided then that I ought to learn it. I tried, and there were quite a few things that made it a bad experience. First, I had no real experience with similar languages, having been formally instructed in C++ and being in the process of learning Java. Second, I had little experience with Perl's chosen domain, which coincided with the domain of shell scripts. Third, I picked a book which, while highly regarded then and having gone through many editions since, only confused and irritated me.

Later I became more sophisticated, more skilled and more knowledgeable. I learned some programming language theory and discovered that many PL theoreticians viewed Perl as an object lesson in poor programming language design. I experienced Perl pitfalls in my new, sophisticated persona, and with my new PL knowledge realized that the language really was to blame and that it wasn't all my fault. Several times, I found myself debugging severely broken ten line Perl programs written by some of the best programmers I knew. This did not inspire confidence.

Lately, though, I'm feeling better about Perl. I've seen Perl scripts written with considerable discipline and I've encountered a text, "Learning Perl the Hard Way" by Allen B. Downey which told me something useful on the second page. After I'd read one and a half chapters of this book and a bit of "Higher-Order Perl" by Mark Jason Dominus I was able to write a simple script that automatically calculated the recursive CPAN dependencies of a particular script. Of course, I used existing modules from CPAN itself to do the heavy lifting, so the script was only 50 or so lines but it was not a pain at all. Features the script uses are:

anonymous and higher-order functions

foreach

named function arguments

constructing and dereferencing references

object orientation

and of course I make my variables local, use warnings, and use strict. Not bad for a first.

Do I have a sophisticated, mature understanding of Perl? Not in the least. Do I finally have a bit of traction with the language? I think so.

quilt is a patch organizing utility. Recently, I was upgrading some packages and I got a chance, for the first time, to use some of quilt's more sophisticated features. I had added patches as I discovered the need to add them, rather than organizing them by theme, but I felt it was time to tidy up. In particular I wanted to combine three patches that were performing similar actions for identical reasons into one patch.

I popped all the patches I wanted to combine off the stack, along with a whole lot of others. Then I deleted the patches from the quilt series. Then I started a new patch, added the file to be changed to that patch, applied the deleted patches, and refreshed the new patch. I had combined the three patches into one.

I really think that quilt must have a better way built in, but I have not yet figured it out. I can kind of imagine a quilt merge command. It would work like this:

Pop all the patches you want to merge off the stack.

quilt new [the new patch]

quilt merge --- this applies these patches and deletes them from the series

quilt refresh --- the new patch is refreshed with all the changes

The merge is complete and the new patch contains all the changes of the merged patches while the merged patches have disappeared from the series.

There is a quilt subcommand, fold, which I now find I could have used about as effectively as the technique I did use. I would have popped all but the bottommost patch that I wanted to merge and then successively quilt folded in the remaining patches.

Something that appears to be the case is that bitbake uses quilt to manage its patch process. There is no other explanation that I can come up with for the presence of a patches subdirectory containing a series file and softlinks to all my patches that I can come up with.

You can find out on the Yocto project's website, but here's my take.

Yocto is a build system, not for individual software applications, but for an entire configuration which includes the operating system, the hardware and the software configuration. A few well chosen commands will eventually yield an operating system and a root filesystem for a desired architecture which can be loaded on an embedded system (either real or emulated).

The front end is basically a package manager. That is, it will handle fetching upstream sources, configuring, compiling and installing as necessary. Of course, it's not installing those packages on your system, it is preparing them for installation on some other system. Like a package manager, it tracks dependencies. It must also track the particular open-source license under which each source has been released.

If compilation is necessary, it is usual that a package is being cross-compiled. This is not the situation that regular package managers have to deal with and it is a possibility that many configuration scripts ignore. Thus, the Yocto package maintainer's task is a bit more challenging than for a normal package manager as they must modify configure scripts to be cross-compiler compliant.

The back end takes over after the build process has completed the install into a directory in the designated workspace. The files in the install directory are split out and repackaged into a bunch of sub-directories. This step gives the build process a way to limit the size of the root filesystem to only those parts of the install files that are necessary, a crucial step given the relatively small size of embedded devices.

Yocto can be configured to use a variety of package managers to repackage any necessary files obtained by the previous splitting and packaging process.

Eventually, everything is combined into a few largish files which constitute the ultimate goal of this whole process.

Last night I was at a dinner with some other Boston open source people and I learned something from another OPW intern.

Now for some history. The process of installing Ubuntu on my Mac was surprisingly painful. The initial bit wasn't so bad but after I had completed the install I realized I had a chicken and an egg problem. I needed to be online to download an appropriate device driver for my wireless interface but I could not get online because my ISP (Comcast) restricts you to just one IP address and I didn't want to risk disconnecting my router from my modem in order to get a wired connection for the machine on which I was installing. I might not have succeeded in reconnecting my modem afterward (because Comcast makes this kind of hard) and my other machine, which I was using to read instructions and useful hints about the process, would not have been online during the crucial operation. So I had to schlep both my laptops to my office where I could get a wired and a wireless connection simultaneously. I thought that kind of snag must happen to more or less everyone, but it turns out I was paying the price that one must pay sometimes for owning a Mac.

Because of open-source licensing the Ubuntu install disk can not include proprietary software and the available device drivers are proprietary. This would not necessarily have been true if I'd been installing Ubuntu on some other kind of computer. Because Apple is a closed (and very closely integrated business) the details of their devices are kept unusually private. If I'd been installing on a Dell I might have gotten the appropriate device drivers without all the hassle.

This is really about Twisted again, but it would be neat if it were possible to keep statistics about the number of revisions between the first patch and the final application. One could look them up and see how one was doing with respect to others, over time, etc. Of course, as time went by, one's patches would become more ambitious, and so the issues that might come up in a review would perhaps be more interesting, and so forth.

So, when I was applying to the OPW I tried out two projects: Twisted and Yocto. Twisted was nice and easy to get started on, just clone and start editing, and so I submitted a couple patches during the application process. Several weeks later I'm still trying to get those patches through and they've been back and forth to the reviewers a couple of times. There's a built in lag as it may take reviewers quite a while to send my patches back and then if I'm very busy it takes me a while to revise my patches and send them back to them. Moreover, and this is no surprise, one ticket spawned an additional ticket, and one ticket ramified a bit.

So, the main thing I get out of this articles http://rhaas.blogspot.com/2011/03/welcoming-community.html is that this is not unusual. Now, the author speaks of pain and feelings of rejection when your patch gets criticized in this fashion, but I don't feel any of that. I do feel somewhat amused at the amount of work that is going into some patches that will have so little effect when finally applied, but I believe that the Twisted people are committed to their process, with no exceptions, and I'm happy to be experiencing it all.

One thing I noticed is that my reviewers have all been uniformly formal and polite and that every one has thanked me for my patch (before criticizing it). Now, in the past I've typically developed in environments where I knew all my fellow committers and so my commit messages are habitually terse and entirely technical without any extras as I always had the option to be as formal or polite as I wanted to be in person.

My new resolution: Remember to thank my reviewers for their comments every time I submit a revision. It's no less than they're doing for my patches in the first place.

I've been wondering recently where the word yocto as in "Yocto Project" might come from and recently I came across it's use as a prefix indicating size.

I can count down by thousands as, "milli-", "micro-", "nano-", and "pico-", but that's as far a I go. However, if you want to go further you can with "femto-", "atto-", "zepto-", and "yocto-". yocto is from the Greek for eight, because it's (10^3)^8, i.e., 10^24. The y was prepended so that there would be some uniformity in the naming scheme. The number nine in Greek is something like ennea, maybe the prefix for 10^27 could be "xena-"?

I think I can remember when I first learned about the diff and patch utilities, but I was a lot younger then. Even though it was so very long ago, context diffs were the norm and so I learned about them at the same time as I learned about these utilities.

What I didn't learn...or at least didn't learn very well...was that the context part of the diff is what matters. The patch utility uses the context to identify the location of the change, the line numbers just tell the patch utility where to start looking. Moreover, if the patch utility fails to find a location that precisely matches the context it will look for a less precisely matching context by ignoring the value of the outermost lines of the context in searching for a match.

What is the point of this very generous notion of a match? Well it's to aid developers who may be submitting a patch for a change generated wrt. a slightly different code base from the copy where the change is applied. If the patch utility had a more precise definition of a match then the developers submitting the patch would have to do their work on a codebase that precisely matches the code base to which the patch is applied. That would force a good deal of syncing on the part of the developer and make the whole process more onerous.

Of course, patch itself can be configured by means of the fuzz factor option to be less or more precise. A fuzz factor of 0 makes patch require a perfect match.

When is requiring a perfect match a good thing? Well, this all came up when I was working on updating a package for the yocto project. There were some patches that needed to be applied to the new version's configure script, and I updated the patches so that they were a precise match. If I hadn't updated them, however, they would have still been applied correctly as the configure script had not changed much between the versions. So, should patches like this be required to be a perfect match? The code to which they must be applied is known exactly, which is an argument for requiring precision. On the other hand, perfect precision places a burden on package maintainers who must remake these patches each time a version changes. Yet it's probably a good idea to require package maintainers to review and pay attention to those patches; after all they did have an original purpose which may become obsolete as the packages are updated. Requiring a perfect match would cause a bit more oversight. But, would it also be a large burden for a small benefit? Dunno.

Note that the quilt utility can be used to automate the process of updating patches making the whole thing less of a burden.

Two years ago I used git for a bit because GitHub uses git and GitHub was convenient for my needs at that time.

This allowed me to make one observation of some societal interest. People who use GitHub may not even know that "GitHub" is a portmanteau word, i.e., git + hub. In fact, they may not know about the existence of git at all. To some GitHub is just the same as DropBox, only, in some indefinable and mysterious way, cooler, and only for text. I don't know how that happened and how many people actually use it that way, but that some do is a fact.

Now, I knew that, in certain very important ways, git was not just like Subversion. It was, I was told, one of the new kind of distributed VCSs. Of course, in a perfect world I would have rapidly learned to exploit everything that was different, and perhaps better, about git. But I was very busy and git could be used just like Subversion and so...that's what I did.

Back when I was involved in a research project w/ multiple developers using Subversion I noticed a kind of pattern. A colleague would commit a change that affected a whole bunch of files and changed a lot of lines. Then, they would commit a bunch of small changes, one after the other, that really should have gone in with the initial commit. Often the note attached to the commit was just something very basic like "Should have gone in with previous." of "fixes a small bug in previous commit." or something like that. I often thought how nice it would be if all those little emendations could somehow be attached or combined to the original commit that got it all started. But in Subversion, you can't really do any sort of combining of commits without having superpowers and doing lots of fancy stuff. That kind of thing is just not part of the expected Subversion workflow.

Well, in git you can split, combine, and otherwise edit your commits freely using the git rebase command. This is because git divides your actions into two phases, whereas Subversion just has a single phase. In git, when you commit, you have only completed the first phase and you've only affected the copy of the code that you have. You haven't yet "pushed" your changes to some remote repository (which probably exists so that it can be shared with someone else). So you can rearrange all the commits in your private copy in many ways before you expose your work to the rest of the world by pushing it.

On the other hand, Subversion's commit combines git's commit and git's push into a single action.

How, really, will this difference change how I develop code and inflict it on the outside world? Will it make things better or just more nerve-wracking? We shall see.

I'm a computer scientist currently living in Massachusetts. I was always too busy being a graduate student at the University of Wisconsin-Madison and being a professor subsequently to contribute much more to the open-source community than a few bug reports and some money but that has all changed.

Last Monday I was notified that I had been accepted as an Outreach Program for Women Summer Intern for the Yocto project. I'm thrilled for a number of reasons:

I've benefited from open source for a long time, using many open-source applications and working for a research project that used open-source applications written in C as objects for study, but beyond the odd bug report and some money I've not really contributed. This is my chance to give back a little bit more.

I'm fascinated by the impact of the open-source community and want to understand its mechanisms better.

I believe that I may, sometime, develop some wonderful idea that is best achieved via an open-source collaboration; I'ld like to know how to make that work when the time comes.

I've spent a long time in academia and this is a great way to move into industry.

One of the things that I really like about academia is the opportunity to teach, which is what I've been doing the last few years. Grading is not nearly as much fun. If you associate with open-source you can mentor (teach) people and not have to grade them. Can it be the best of teaching without the rest of it? We shall see!

About the Yocto project:

The build system, bitbake, is in Python and Python is my go-to language for practical tasks. I'm very comfortable with it and am eternally grateful to Guido van Rossum for the clever way he's made my life that much more pleasant.

Since it's all about building operating systems much of the object code is in C. I know my way around that language pretty well; I'ld say that I'm an expert with some things left to learn, but I'll recognize them when I see them.

Even making a contribution to the Yocto project got me doing a whole lot of new things. I've always been a Mac user, but I only succumbed to the allure when OS X came out. If I need a Unix tool, I just turn to MacPorts (another open-source project). But for Yocto I needed to go all the way and actually install a Linux distribution. I partitioned the drive on my older Mac, installed rEfit (open-source), and installed Ubuntu on the partition from a USB stick. Then (I should have read more carefully!) I realized the partition, at 40G, was a bit too small than the recommended size for the project. But, somehow, an extra, immovable partition had ended up on my drive and I could not get rid of it. It was stuck right in the middle of the disk, so there was no way I could repartition and make the partition for my chosen Linux distribution large enough. So, I ended up having to do some dramatic surgery on the disk, repartition, and restore my Mac from backups. Now my Linux distribution has something like 100 G, which seems to be enough. Once that was done it was relatively smooth sailing.

Two technical things I'll mention in subsequent posts: