R without fear: our interview to Dr Ken Aho

Posted on 18 May, 2017 by Carlo Pecoraro

At the R Project website, R is defined as a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

There is no doubt about the power of R for statistical computing and data visualization. However, many biologists have found their first encounter with R to be discouraging and frustrating.

We discussed this topic with our instructor Dr. Ken Aho. Ken is an associate professor at Idaho State University and author of the book “Foundational and Applied Statistics for Biologists Using R.”

1) Ken, how was your first encounter with R? and when did you start using it?

My first encounter with an R-alike environment was in a graduate statistics course that used S-plus. Like many others unused to a command line format I found this experience to be frustrating and daunting, although no-more-so than typical SAS operations.

I found S functions to be efficient and impressive even during my initial use of the software. However, I was left with a feeling that the program was somewhat mysterious and idiosyncratic, and probably “not for me.”

I started using R intensively several years later (around 2003), when I began to apply computationally intensive approaches-- for instance, matrix randomization, multivariate analysis-- to my own data. It became clear that R could usually perform these operations faster than commercial software, and with much greater flexibility. I have used R more-or-less exclusively for statistics, data management, and graphics ever since.

2) Which are your main suggestions for a biologist who wants to start using R without wasting time and energy?

My initial anxieties with R were due to the application of functions merely to get answers, without adequately understanding basic language operations and environmental constraints. Thus, I recommend that aspiring R-users familiarize themselves with fundamental concepts before embarking on more advanced R user-defined applications, for instance, looping or calling other languages, or even using complex existing functions.

3) Which are the main reasons that led you to decide to write the book “Foundational and Applied Statistics for Biologists Using R”?

I came into my first faculty appointment in 2011 with a strong interest in statistics and a graduate degree in biostatistics. Given this, I was expected to be able to consult with students and faculty regarding analyses and experimental design and to teach statistics.

At the same time I felt that the R environment had been ignored as an incredible environment for pedagogy, and that applied documentation of R functions that allow analyses important to biologists were often missing and/or not organized into a central reference. I wrote the book to be both an R-centered teaching resource for graduate and undergraduate statistics classes and statistical resource and reference for biologists using R.

4) Could you tell us about your R library asbio and which analyses we can perform using it?

The asbio package has undergone a number of evolutions. I originally intended it to be a source for statistical functions with a biological/ecological bent, particularly multivariate analyses. However, while writing my book asbio became a warehouse for many pedagogic functions, particularly graphical user interfaces for animations and demonstrations. As a result the package currently serves two purposes: one, it provides companion software for my statistics textbook, and two, it is collection of applied biostatistical functions, some of which are based on my own statistical and ecological research.

Applied functions in asbio include: a wide range of plot and print methods for pairwise comparisons that control for family-wise type I error following ANOVA-type tests; intervallic estimators for the true ratio of binomial proportions (i.e. relative risk), and the true product of binomial proportions, including new methods; recently developed robust analogues to multiway ANOVAs; and multivariate community ecology functions.

Thanks Ken and see you in Berlin!

https://www.physalia-courses.org/courses/course13/