If you want to pursue a career analysing data about real-world things (and I’m not sure what other data there is/are), then your effectiveness will benefit greatly from actually visiting the real world. You need to fully understand the context of the data at hand, for one thing. Being briefed on what the files contain by the CEO is one thing, but speaking in confidence to the poor devils who actually assembled it gives you a whole new perspective. You need to know where the flaws are, so the analysis takes that into account. More than once, I’ve picked out the biggest pattern in a complex dataset only to find that it was in fact, the switch from one coding system to another, or the arrival of a new source of data. In other words, not something of interest at all, but a structural feature that should have been built into a model for the data.
Getting out from behind the computer and talking to people could be the number one most valuable skill to add to a data science person’s armoury. Anyone can do code, any fool can install trendy R/Python packages and press Go.
Here’s an anecdote from Past, Present and Future of Statistical Science (which I previously reviewed in full here):
Dennis Cook described (p. 98) a yearly cycle of experimental design, field work, data collection and then, only then, analysis. He was involved in every step.
Starting in the late winter, we would prepare the fertilizer combinations to be tested … and lay out the experimental designs on paper. … plots would be planted in the spring … and tended throughout the summer … harvested in the fall, followed by threshing and weighing the wheat. Most of the winter was spent constructing analysis of variance tables with the aid of large desktop Monroe calculators and drawing conclusions prior to the next cycle of experimentation.
I’m sure there is something to be said for this experience, especially early in one’s career. He later (p. 106) got involved in developing capture-recapture methods by dangling from a helicopter and shooting paintballs at wild horses. Now that’s something most statisticians don’t get to try out very often.
Serendipity, the happy accident, subconscious problem-solving all feature here too. Many a theoretical or modelling problem in my career has been solved by putting my boots on going out in the country for a few hours.
A parting tweet for you to ponder: