In the shop: comparing the Data Scientist's Toolbox and Software Carpentry
I’m trying to learn more about online teaching. I have signed up for a couple of MOOCs in the past and (like many people who sign up for MOOCs) never got very far. This time I’m trying a little harder, with a couple of courses from Coursera. I don’t have any particularly strong reason to choose Coursera except that I got a marketing email from them recently. So far I’ve completed Finance for Non-Financial Professionals, in which I learned that what I actually wanted was an accounting course, not a finance course. (I won’t spoil the surprise by telling you what the difference is.) I’m part way through The Data Scientist’s Toolbox, which is the first course in the JHU data science specialization.
I find the Toolbox course interesting from a pedagogical point of view, since it covers a lot of the same ground as Software Carpentry workshops. I thought it would be interesting to do a comparison from the point of view of someone who is reasonably familiar with Software Carpentry but pretty new to the Coursera platform.
Obviously the course formats have to be different, since the Toolbox course has thousands of participants and is spread out over 4 weeks, while a typical Software Carpentry workshop has a few dozen learners and happens over a few days. The Toolbox course content is delivered via video lectures (voice over static slides with mouse emphasis) with accompanying lecture notes and multiple-choice quizzes while Software Carpentry has a live-coding instructor and in-room helpers; learners follow along and complete short exercises. The Toolbox course has the explicit goal of introducing learners to further courses in the data science sequence, while Software Carpentry workshops are generally aimed at people who are already doing research in a field which probably isn’t data science.
Both the Toolbox course and Software Carpentry include an introduction to the command-line shell, git, and GitHub: this is week 2 in the Toolbox course, and usually the first and third half-days in Software Carpentry. Software Carpentry’s other half-days include programming in Python or R, and sometimes SQL; R and databases are in separate courses in the data science sequence. The Toolbox course includes some video instructions on how to do the setup required for the course; due to time constraints, Software Carpentry asks learners to do some installation before the workshop by following this webpage. Both courses provide for peerinteraction during the course, Toolbox with forums on the Coursera website and Software Carpentry with the Etherpad tool.
The Toolbox command-line shell lesson is shorter than the corresponding Software Carpentry one – I would say it covers most of items 1 through 3 in the SWC list. It’s interesting but perhaps not suprising that both lessons stress the dangers of rm -r
. The Toolbox videos feature the instructor Jeff Leek using mouse pointing and drawing to emphasize specific points on the screen capture images on the slides. The material in the videos goes by at what I think would be a pretty fast clip for someone who had never seen it before. I think a learner who tried to type along, with use of the pause button, the notes, and the nice summary at the end, would be able to get the basic ideas. The multiple-choice quiz questions at the end of week 2 were better than I have seen in other courses and MOOCs (in that they got at understanding rather than just memorization), but the quiz was too short to really test understanding of all the material introduced.
The git/Github videos in the Toolbox course cover the basics of initializing git, making a new repository or forking an existing one, cloning, pushing and pulling. This would be items 1-4 and 7-8 in the Software Carpentry lesson. Branches are mentioned briefly. I particularly liked this image which I intend to steal and use the next time I teach this stuff. Again, it’s a lot of material for about 20 minutes of video, but probably manageable with pausing and checking references – and the major “course project” for the Toolbox course involves putting the git and GitHub material into action, so learners would get a chance to practice. I can’t see suggesting adding any more material to this section, given the time constraints, but possibly unpacking the git jargon (“commit”, “repo”, “pull”, “push”) a bit more — or just acknowledging that it seems mysterious — would be helpful.
I can see these two different courses working well for different types of learners. Someone who doesn’t mind poking around on their own a bit (a hacker, as described in the Toolbox intro) would probably find the Toolbox approach workable, while learners with less confidence, or more desire to work directly with others, would probably be better suited to Software Carpentry. Learners need to be more self-regulated to work in the MOOC environment, but of course it has the advantage of not requiring physical presence at a specific place and time. I think there is a place for both approaches. I found listening to Dr Leek’s explanations on the videos helpful in terms of thinking about how I explain similar concepts; for the same reason I think that it’s good for scientists to follow science journalism, I think it’s helpful for instructors see how others instruct.
Off to the third week of the Toolbox now. I may revisit the data science sequence later, once I get to one of the courses where I really know nothing about the content. That should be a different experience!