100 most read R posts in 2012

R-bloggers is the source for R news and tutorials. Posts are aggregated from 425 R bloggers with daily updates.

I use it when I'm looking for help on a particular subject and also to see what cool things people are doing in the R community.

A great way to start is to check out the recent post 100 most read R posts in 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages.

Advertisements

Joining 2 R data sets with different column names

Joining or merging two data sets is one of the most common tasks in preparing and analysing data. In fact a Google search returns 253 million results.

However most examples assume that the columns that you want to merge by have the same names in both data sets which is often not the case. For example:

 

  mergedData <- merge(a, b, by “ID”)

 

Often you have data sets from different sources that do not have  the same naming convention. While it’s straight forward to merge using differently named columns, most Googled examples either don’t cover it explicitly or suggest that you rename your column names to be the same !

Merge using the by.x and by.y arguments to specify the names of the columns to join by.

 

  mergedData <- merge(a, b, by.x=c(“colNameA”),
    
by.y=c(“colNameB”))

 

where colNameA and colNameB are the column names in a and b to merge on.

Whenever I’m stuck

The first place I go when ever I’m stuck doing something that should be simple is Quick-RRob Kabacoff has done a great job of maintaining this site for the past 5 years.

It’s well structured and gives me enough information that I can use and experiment with.

NewImage

 

minimalR

“Simplicity is about subtracting the obvious and adding the meaningful.”  – John Maeda

R is a simple and elegant language but I’ve always struggled to use it simply and elegantly. Why ? Two main reasons.

  1. The power and flexibility of R can make difficult things easy but also easy things hard, and by hard I mean complex, counter intuitive and difficult
  2. Documentation is dense, thorough and complete but usually unusable. Most documentation reflects the software’s statistical analysis roots and makes minimal reference to data preparation and transformation that usually takes up to 80% of the effort in real life modelling exercises

Together this makes programming in R an exercise in Googling and looking for help on what should be straight forward. The purpose of these posts is to document and share what I’ve learnt in trying to use R in real exercises.