Speed up for loops in R

Are your for loops too slow in R ? Are loops that should take seconds actually taking hours ?

As I found out recently, how you structure your code can make a huge difference in execution times. Fortunately making a few small changes to your code can speed up these loops by several orders of magnitude.

This Stack Overflow post goes through a number of ways to optimise your for loops – I only implemented the first method and my loop run time went from over an hour to less than 10 seconds !!!

The secret ? to loop over a vector rather than data frames as R is optimised for vector and matrix operations.

Heat maps using R

One of the great things about following blogs on R is seeing what others are doing & being able to replicate and try out things on my own data sets.

20130112-181004.jpg

For example, some great links on rapidly creating heat maps using R.

The basic steps in the process are (i) to scale the numeric data using the scale function, (ii) create a Euclidean distance matrix using the dist function and then (iii) plotting the heat map with the heatmap function.

 

tolower() – error catching unmappable characters

The tolower() function returns an error where it can't map to the Unicode character set of the input data – a common occurrence when analysing social media data with emoticons.

Emoticons are those symbols such as πŸ˜ŠπŸΆπŸŽ… that are commonly used on mobile phones but aren't always recognised on all platforms.

For example, when converting tweets to @delta (Delta Airlines), I got the following error:

Error in tolower(text) :
invalid input '@ActualALove: First time I've seen a foot-rest in first class! Oh @Delta, how I love thee \ud83d\ude0a✈\ud83d\udc78 http://t.co/noKI9CiM' in 'utf8towcs'

When I looked up the actual tweet, it looked liked this.

20130106-194554.jpg

The two unicode characters that weren't recognised were \ud83d\ude0a (SMILING FACE WITH SMILING EYES) and \ud83d\udc78 (PRINCESS).

Gaston Sanchez has posted a solution to this problem in his blog Data Analysis Visually Enforced. I've used the code and it works well. When I have time, I'll extend it to replace the offending characters instead of returning NA for the entire string.

100 most read R posts in 2012

R-bloggers is the source for R news and tutorials. Posts are aggregated from 425 R bloggers with daily updates.

I use it when I'm looking for help on a particular subject and also to see what cool things people are doing in the R community.

A great way to start is to check out the recent post 100 most read R posts in 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages.