If I have a database and I need to do some random queries where I need to do experimental queries interactively, I turn to python’s SqlSoup. But if it’s CSV, I used to just do some file handling in Python, load it up, and use lists (which are very cool) or some other straightforward data processing on it. Python’s good at that. Then I heard about R. I don’t have a lot of time to dig into the details at the moment but the link below really opened my eyes to what it can do. And how easily at that.
The following page is a quick guide for using R to do most statistics necessary in an introductory statistics class.
Like loading in a table as a dataset:
I mean seriously! While it didn’t pop up a dialog on iTerm (using latest R for OSX) I just pasted in the path to the file and my data was there as
ready for manipulation. Some searches on filtering led to this post showing:
>tt <- matrix(1:20, ncol = 4) tt [,1] [,2] [,3] [,4] [1,] 1 6 11 16 [2,] 2 7 12 17 [3,] 3 8 13 18 [4,] 4 9 14 19 [5,] 5 10 15 20 tt[tt[,1] < 3, ] [,1] [,2] [,3] [,4] [1,] 1 6 11 16 [2,] 2 7 12 17
and I tried
tt[tt[,1]>3 & tt[,3]<15]
based on a conditional seen here just to see if I understood it and got
 4 9 14 19
which is what I expected but didn’t realize how easy that would be. Nice! The subset command is more explicit though with its named columns. Either way, wow that’s really nice to work with. Of course, there is also RPy:
RPy is a very simple, yet robust, Python interface to the R Programming Language. It can manage all kinds of R objects and can execute arbitrary R functions (including the graphic functions). All errors from the R language are converted to Python exceptions. Any module installed for the R system can be used from within Python.
Best of both worlds? 🙂