Shortly after posting the Unfoggedbot I got this email from Stanley:
Is there anywhere that Joe Humanities Degree can go to to learn how to do this sort of thing?
Needless to say, this sort of request warms the cockles of my heart (whatever those are). And today, in the aftermath of various recent demonstrations of scripting languages’ utility, it might be a good time for me to press my case.
So: yes! Yes, a thousand times yes. You can learn a scripting language, which is the easy way to put together an IM answerbot, or a ballot-stuffing bot, or whatever else. In fact, you not only can but probably should. Scripting languages are a powerful way to control computerized systems. The computerization of the world is obviously nowhere near complete, but it’s getting there quickly — and that means that the relevance and power of scripting languages are only going to increase. You can already control various pieces of the physical world with this technology, like lights and thermostats. And if you use Excel or perform repetitive tasks on a computer in your daily life, you can likely save yourself some drudgery with a well-written script. Soon enough your toaster, garage door and television will be accessible through these languages, too.
There will be other ways to control the electronic world, it’s true. People will continue to invent visual metaphors like Yahoo! Pipes and Lego Mindstorms that expose some of the capabilities of programming in an accessible way. But we’re a long way from the day when these GUIs confer the same power that knowledge of a scripting language affords. Besides, they tend to rely on metaphors that are more easily learned through a text-based programming language.
And that’s important. Along with the ability to manipulate the devices and information around you, learning a little programming will help you understand how engineers think. If you familiarize yourself with the sausage-making side of computers you’ll begin to develop an intuitive understanding of hardware and software interfaces which, as I mentioned, will comprise a larger and larger portion of the world. You’ll never have to be one of those hopeless codgers reduced to asking his or her grandkids for help programming the hyper-VCR.
So how should you approach this worthy undertaking? The best way is probably to pick a scripting language and start perusing the materials that are available. But which scripting language? In some ways it doesn’t matter — you’ll need to learn what a variable is, how an if..then statement works and what a for loop does in any of them. But you may as well hear the nerd scuttlebutt on each and then make an informed choice. So let’s look at the big three first:
- Perl
- Perl’s one of the oldest scripting languages still in widespread use, and powered much of the early web. Its biggest advantage is probably CPAN, a massive library of add-ons written by people to handle the hard part of whatever neat thing you want to do, whether it’s the creation of an IM bot, manipulating Flickr or the automatic generation of charts. It’s also very nice for processing text, which, after all, is what the net is mostly comprised of. And because of its age you can find it preinstalled and available on most webhosts and Unix or Mac systems (it’s easy to get running on Windows, too).
The knock on Perl is that it’s slow and somewhat byzantine. Most importantly, it’s relatively hard to read Perl code (much harder than it is to write it), which makes learning by example difficult. As you may have noticed I use Perl a lot for tasks where speed isn’t important, both because I’m used to it and because CPAN makes so many excellent tools available.
If you want to learn Perl, I’d recommend that you start by reading the freely available PDF of the book Beginning Perl. - Ruby
- It’s the cool new kid on the block. Ruby’s very chic, and is frequently lauded for producing “beautiful” code. It really is a pretty elegant language. It’s also the underpinnings of the Ruby On Rails framework, which is a new and very popular way to get a dynamic, database-driven website up and running with minimal effort. Ruby’s cool, and is the next language on my list of things to learn.
But it’s still a work in progress. It’s a little slow, although that’s likely to improve. And its RubyGems package manager system — Ruby’s version of CPAN — is easy to use, but not yet all that fleshed out. If you want to talk to Flickr, for instance, you’ll find that most of the available Gems have serious limitations. It’ll get better, though, and these sorts of problems aren’t that relevant for beginners.
There are also particularly great learning tools for Ruby, many of them thanks to one man called Why. You should definitely have a look at Try Ruby, an in-browser demonstration of the language that walks you through a basic tutorial. After that, LearnRuby.com is probably the best clearinghouse for traditional learning materials. But if you like your programming primers weird, illustrated and told through elaborate, nonsensical stories, you should absolutely read Why’s (Poignant) Guide To Ruby. It’s definitely the strangest programming book I’ve ever read. - Python
- Python kind of splits the difference. It’s not as old as Perl and not as new as Ruby. It’s got a lot of tools available for it, but perhaps not as many as Perl (it’s better-designed, though). It’s got its own hot web development framework — one that’s not quite as exciting as Rails but seems to have fewer scaling problems. It can be made blazingly fast when necessary — Google uses Python extensively, and it sort of seems like they know what they’re doing.
I don’t really care for it, though. I tried to learn Python by writing a series of bloggy tutorials way back when. But not only was I not very good at keeping my instruction properly focused, I also didn’t end up liking the language very much. The weird markup bugged me (it controls program flow with whitespace in an unusual way), but the sin for which I’ll never forgive Python is its awkward syntax for using regular expressions. Regexes are a common feature across languages, and are an incredibly powerful way to process text. Python requires a preparatory step before using them; Perl and Ruby don’t. For that reason I’m unlikely to dive back into Python.
Wolfson loves it, though. You may want to talk to him if you’re interested in the language — I’m sure he can make a better case for it, and point you toward good resources for learning it.
As I said, those are the big three. There are countless others, but these are the ones that you’ll run into the most. They’re used all the time to process data and glue applications together. Any one of them would be a fine choice.
But there are a couple of less conventional options that I think you may want to consider, too:
- AppleScript
- If you have a Mac, you have AppleScript. I don’t know it myself, but it’s an extremely readable language designed to let you automate tasks on your Mac. The applications it lets you control will be immediately familiar — you’ll be able to fetch a webpage with Safari rather than, say, wget or curl. Consequently you may be able to perform impressive feats more quickly than you would with a more general-purpose language. This PDF seems like a good place to start.
- PHP
- PHP is widely-used, but generally in a web context (“Hypertext Processor” makes up the final two thirds of the acronym). If you buy a cheap webhosting account, it probably comes with PHP. This entry exists on a page served by PHP. Recent versions of OS X have it installed by default, too.
It’s not frequently used for scripts that run for a long time — web pages need to be served quickly, after all. But you can run it from the command line just fine, if you’d like (our previous sysadmin had most of his maintenance scripts written in PHP). You may have to adjust its automatic timeout period upward if you use it this way, though (scripts that take too long to run are automatically terminated by default, since they usually represent a misbehaving web page that’s a threat to system performance or stability).
PHP is widely available, fast, easy to read, uses a syntax that’s a sort of mutt conglomeration of other languages (rather than an idiosyncratic dead-end), and has a ton of stuff built in — if you want to generate graphics or process PDFs, the functionality is likely already available rather than existing in an add-on you need to install.
PHP has a bad reputation because it’s so easy to use — lots of lousy code has been written in it. It’s also not object-oriented (although it pretends to be), which is important to professional programmers but mostly irrelevant to beginners. But there’s tons of good documentation: nearly every page on PHP.net contains dozens of comments from users offering helpful tips and sample code.
If you think your scripting efforts might primarily end up living on the web, PHP is probably the best choice for you. Here’s a good starting point.
That’s it. At the very least you should poke through some starter documentation and get a sense of what’s involved. You can get a lot done with a relatively small amount of programming knowledge. And of course if you’ve got a particular question you should feel free to leave a comment or send me an email. As you can probably tell, I’m anxious to share the scripting gospel.
This is super helpful, Tom, thanks. But I usually get hung up on an even more general level about what programming languages are doing. Like, what’s a variable, what’s a “for loop” and what the hell does “object oriented” mean? I know these things are explained in the context of each language in most of the tutorials, but it doesn’t seem to sink in for me unless it’s explained more generally.
That’s helpful to know. Okay, I’ll try to take a pass at a rundown of that stuff shortly.
Good Python resource for non-programmers:
http://wiki.python.org/moin/BeginnersGuide/NonProgrammers
You’re doing a great thing here, Tom — programming to the masses!
Actually, Unfogged seems to have a lot of programmers and general Computer Literates reading it, along with all the Joe Six-pack Humanities types. Ogged should set up a ‘programming group’, along the lines of the old-now-defunct reading group, that lets Regulars ask programming or computing type questions, and others answer them, in a semi-structured way.
Just … my suggestion.
It’s a good one, and I’d be game for it. There’s definitely a lot of programming talent on Unfogged — I bet I could learn a lot.
And you could teach a lot, too. If I was half as productive as you seem to be sometimes…
I just think that, in the same way that someone can learn from group discussion when they’re working their way through Heidegger, so other people might really benefit from knowledgeable people who can tell them where their simple regex is going wrong, or explain the basics of for-loops and scripting languages, or … whatever. Suggest useful libraries. Turn them on to different languages. Pass around useful snippets of code.
I remember what it was like, trying to figure things out on my own in high-school. Frustrating as hell.
Ogged should set up a ‘programming group’, along the lines of the old-now-defunct reading group, that lets Regulars ask programming or computing type questions, and others answer them, in a semi-structured way.
Tell me more. What would this look like?
Well, others might have different opinions about how to set it up, but I would envision it as being a separate-but-attached site (the way the old reading group blog used to be) with a couple different sections.
For one, it would provide a convenient place to put example code — like, say, the bots that Wolfson wrote. So people who wanted to use or modify code that had been written For The Mineshaft wouldn’t have to go hunting through the comments for links.
A second section would be for people to ask pointed technical/programming questions. “How does this particular script work?” or “How would I go about trying to write a program to do X?” or “Has someone ever seen problem Y before?” or “Can someone tell me whether scripting language A is usually preferred over B?” Questions-and-answers kind of stuff.
And I think a third section (or at least, a third use for a kind of site like this) is to host something like what Tom has written here. Mini-tutorials. Tom or Becks or Lunar Rockette or one of the other techy readers wants to write a little guest-post about … some particular subject. “How Does One Write a Vote-polling Bot in Perl?” or “What is a Compiler?” or “Why are your tech friends blathering on about ‘web 2.0′?” Or something along those lines.
I’ve outlined it as ‘sections’, but I don’t even think they need to be separate as such. I just think that, if you had a unified place at the Mineshaft for all three of these purposes, it might be really useful to a lot of the tech people who read — or, alternately, to the humanities people who want to learn.
Ok, I see what you’re saying. I think we decided that the separate-but-attached format was doomed to fail, but there’s no reason we can’t set up a category for posts such as what we’re talking about. In the next few days, I’ll float this idea on the blog and we’ll see what kind of feedback we get.
Even just an updated (update-able) index of posts/links/code/explanations might be really useful. Part of the problem tends to be that links get posted in comments, and ‘grand tutorials’ tend to get written piece-by-piece. It’s hard, sometimes, to track things down in the Unfogged archives after the fact. (unless you make the effort to archive the interesting stuff on your own).
So even if it wasn’t a ‘separate-but-attached’ thing, it might still be useful to be able to keep track of topics that span multiple posts and multiple comments.
Actually, that might not be a bad idea for a programming project in and of itself….
Wow! Thanks, Tom. Very helpful.
It amuses me that Python’s method of supporting regular expressions is what caused you to bail on the language; I’ve been working in it for years now and I think I can count the number of times I’ve written a regex on both hands. Maybe this is an indication I need to broaden the range of things I work on…
I don’t know — I think it may be an indication that we work on different sorts of things. I’m usually gluing things together with a simple little script. If you’re developing larger systems I can imagine that you’d likely be interact with data sources that are more reliably formed than the web junk that I’m typically scraping.
Plus, there’s an everything-looks-like-a-nail quality to consider: I use regexes a *lot* because I think they’re very efficient. But there are other ways to accomplish the same things.
For what it’s worth, we should probably distinguish the syntax of regular expressions from their implementation. The syntax is, obviously, pretty compact and efficient in a lot of cases. But not all implementations of regular expressions under-the-hood are created equal. I thought this was a pretty nice summary that I came across online a while ago. (Although perhaps you’ve linked to it here before, Tom? I can’t remember.)
Yeah, good point. I think they’re a really elegant way to accomplish a lot of tasks, but they can definitely be slow to execute. I’m extremely wary of writing them into frequently-run code at work.
I think you’re right that part of it comes down to approach. Looking at your old post on regexes, it wouldn’t even occur to me to use them to extract links from HTML; I’d rely on something like Beautiful Soup to do the heavy lifting for me. But then, I’m lazy…
Any chance of setting up the programming group on a site where I’m not banned? This would be a great thing that I would like to participate in.
I’m in, because I’m nowhere near as good a coder as I ought to be.
“Expecto patronum” could be valid syntax.
You’re utterly mad, Wolfson.
“A plausible rationale for wanting to do such a thing is provided.”
…And yet I do not find any such rationale.
“provided” s/b “suggested.” Or “hinted at.”
And if we’re going to go that route, “plausible” s/b “fanciful”.
Damian Conway is utterly mad. I just have a good memory.
PHP5′s OO code is vastly more sensible than PHP4′s, if not as syntactically pretty as that of Ruby or Python. (I’ve been doing work lately on a project the client decided should be written in the Symfony app framework, so I’ve gotten more exposure to it than I want, frankly.) It still doesn’t support either multiple inheritances or mix-ins, although the godawful slow framework has a hack-around for that.
I don’t know AppleScript; people who love it can make it do wonderful things, but I think it would actually be a bad place to start for those interested in learning a more common scripting language thanks to its idiosyncrasies.
What about JavaScript? No joke. I think JS is actually a very good beginner language.
Javascript’s awfully cool — until looking through libraries like Prototype and JQuery I had no idea what an interesting language it is. But I’m not really aware of what the options are for development in console space, rather than in the web browser. And the limitations and weirdnesses accompanying the browser would make any attempt to do cool stuff pretty confusing for newbies, I’d think. They shouldn’t have to learn how CSS and the DOM work to get Hello World (+1) working. But maybe there are options I’m not aware of.
Good points about AppleScript and PHP, although most beginners won’t have PHP5 installed on their OS X machines or webhosts by default (yet). And I still think it’s a sloppy language in general — parameter orders for similar functions aren’t consistent; some stuff is lifted from Perl, some from C, some from places I can’t identify. I think it’s really useful and I don’t mind working in it, but it’s definitely kludgy, even in PHP5.
The syntax in PHP5 is still a mess, and I don’t really like working in it; I just wanted to point out that it’s functional as an OO language, unlike PHP4.
You can run JavaScript from the CLI using Spidermonkey, but that’s not available by default on any system that I know of and may be more work than beginners want to take on.
JScript is a fun scripting language. and it has the advantage of being the scripting language that runs inside web browsers. if you can write an HTML page, the next step, IMO, should be to learn a little JScript.
as an even better bonus, it gives you what you need to be able to write GreaseMonkey scripts to live inside FireFox and do things to web pages that the page author didn’t intend – such as blog comment mangling…