R tips for moderately large data

Some useful tips recently featured on r-bloggers and originally posted at Mollie’s Research Blog are worth reading. I say moderately large because I don’t really believe there is such a thing as big data (and it looks like Mollie doesn’t either, judging by the judicious use of the word ‘large’), but there are special computational problems that appear as you go large. Maybe in ten years we’ll laugh at those problems but I suspect the data will have kept pace just ahead of our capabilities.

For example, did you know that by specifying the class of each variable (string, integer and so on) when opening a file in R, you can cut the time taken nearly in half? I certainly didn’t. What about not bothering to open it at all if it’s already in memory? That’s a good idea too. I’ll be keeping an eye on the blog for more top tips.

It would be interesting to see how many of these have parallels in other stats software.

About these ads

1 Comment

Filed under R

One response to “R tips for moderately large data

  1. The data.table:::fread function is an even faster way to open large text files!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s