Member Login

Lost your password?

Registration is closed

Sorry, you are not allowed to register by yourself on this site!


Taking a Ride on the Wild Function – Introducing the dostats package

February 21st, 2012 by andrew

Lately I have been rather productive in my programming and frustrated at the same time. Trying to solve the problems of creating a demographics summary table proved to be a lesson in frustration with R. Since I love R, this was disheartening. I did eventually find the reporttools package which does make a great latex table, but onlyin latex. Also the tables package looks great, but also not entirely what I was looking for, so I do the first logical thing for an R User when faced with this sort of thing. I created a package to fill in the missing functionality.

The dostats package/function

The new package is dostats. There are two functions of the package.

  1. Create summaries of vectors through the dostats function.
  2. Manipulate functions.

The package started out with the dostats function for creating more informative summary tables. It works very similar with tabular from tables package, but it is designed to work with plyr functions. The idea is to pass in a vector as the first argument and then the remaining arguments are functions that compute statistics on the vector. For example:

library(dostats)
set.seed(20120220)
dostats(rnorm(100), mean, sd, N = length)
##     mean     sd   N
## 1 0.0775 0.8975 100

There is also the renaming construct built in to create the desired variables. This construct is nice because it facilitates easily passing as an argument into ldply such as

library(plyr)
ldply(mtcars, dostats, mean, sd, IQR)
##     .id     mean       sd     IQR
## 1   mpg  20.0906   6.0269   7.375
## 2   cyl   6.1875   1.7859   4.000
## 3  disp 230.7219 123.9387 205.175
## 4    hp 146.6875  68.5629  83.500
## 5  drat   3.5966   0.5347   0.840
## 6    wt   3.2172   0.9785   1.029
## 7  qsec  17.8487   1.7869   2.008
## 8    vs   0.4375   0.5040   1.000
## 9    am   0.4062   0.4990   1.000
## 10 gear   3.6875   0.7378   1.000
## 11 carb   2.8125   1.6152   2.000

This makes for a more logical summary data.frame object that has usable columns, each with the same data type. Unfortunatly this does not always work for all data set. The above example only has numerical data. Any data frame with categorigal data would have that data treated as categorical. Another limitation is that the results of each function must be the same dimention for each variable. For this reason I introduced functions that filter by the variable class.

  • class.stats creates a dostats function for a given class, tested by inherits.
  • integer.stats predefined class stats for integer variables. This defined as class.stats('integer')
  • numeric.stats for numeric variables, which would also include integer variables.
  • factor.stats for factors.

When a class.stats function is passed to ldply, variable not matching that class are silently removed.

ldply(iris, numeric.stats, mean, sd)
##            .id  mean     sd
## 1 Sepal.Length 5.843 0.8281
## 2  Sepal.Width 3.057 0.4359
## 3 Petal.Length 3.758 1.7653
## 4  Petal.Width 1.199 0.7622
ldply(iris, factor.stats, N = length)
##       .id   N
## 1 Species 150

You can also chain together arguments to compute on subsets using ddply and ldply.

ddply(iris, .(Species), ldply, numeric.stats,
    mean, median, sd)
##       Species          .id  mean median     sd
## 1      setosa Sepal.Length 5.006   5.00 0.3525
## 2      setosa  Sepal.Width 3.428   3.40 0.3791
## 3      setosa Petal.Length 1.462   1.50 0.1737
## 4      setosa  Petal.Width 0.246   0.20 0.1054
## 5  versicolor Sepal.Length 5.936   5.90 0.5162
## 6  versicolor  Sepal.Width 2.770   2.80 0.3138
## 7  versicolor Petal.Length 4.260   4.35 0.4699
## 8  versicolor  Petal.Width 1.326   1.30 0.1978
## 9   virginica Sepal.Length 6.588   6.50 0.6359
## 10  virginica  Sepal.Width 2.974   3.00 0.3225
## 11  virginica Petal.Length 5.552   5.55 0.5519
## 12  virginica  Petal.Width 2.026   2.00 0.2747

Function manipulations

Passing all these functions around also requires some extra function manipulation functions. Now that is a mouthful, but something we do with R.

Composition

R lacks a function composition function. So I created one. function(x)any(is.na(x)) is just to long to type, and I find myself doing things like this far too often. The word “function” is just too long to type and takes up lots of space. It is much easier to do any%.%is.na or compose(any, is.na) either of which results in a function that creates a new function testing if there are any missing values. The two forms are

  1. compose(...)
  2. fun1%.%fun2

compose takes any number of arguments and nests them with the right most being the inner most and the left being the outermost. The easy to remember is that they read the same as when they were input.

Argument Manipulations

Composition and dostats, only operate on the first argument which necessitates functions for manipulating arguments.

  1. wargs: creates a new function with changed defaults. An example would be wargs(mean, rm.na=T) creates a new function that automatically removes missing values.
  2. onarg: Specifies the first argument for the function. Such as onarg(rep,'times') makes the number of times to repeate the first argument.

One example of this that is included in dostats is the contains and %contains% which is the reverse order of %in%.

Conclussion

There will likely be more functions as I come across the necessity. If you have an idea that should be included submit to the issues tracker.

Permanently Setting the CRAN repository

November 29th, 2011 by andrew

Setting the CRAN repository so that it does not ask every time you try to install a package  is something that I think few people bother to do, but it is so simple and can save a fair bit of frustration when working.  This is accomplished through a setting in one of the Rprofile files.  There is the site file found at either

/etc/R/Rprofile.site

on linux or

C:\Program Files\R\R-2.14.0\etc\Rprofile.site

on windows, for R-2.14.0.  In this file you will even find an example of setting the CRAN mirror.  You can edit here is you have root or administrative privileges, but more likely you will copy it and place it in the personal

.Rprofile

file in your home directory.

Inside the command is simple, this is copied straight from the Rprofile.site file.

local({
  r <- getOption("repos")
  r["CRAN"] <- "http://cran.cnr.berkeley.edu/"
  options(repos = r)
})

The local command prevents r from being set for every session.  You would replace the “http://cran.cnr.berkely.edu” with the best mirror for your location.  The list of CRAN Mirrors is available on the main CRAN website.

Data manipulations

August 23rd, 2011 by andrew

In the last Utah R Users group meeting I gave a presentation on data manipulations on R, and today I found through the plyr mailing list two commands that I was previously unaware of that should definitely be made mention of.

arrange

I was very pleased to find arrange because it fills the nagging hole for sorting data frames.  Calling

arrange(df, var1, var2)

is much better than calling

df[order(df$var1, df$var2),]

Created by Pretty R at inside-R.org

because it’s understandable by practically anyone, and when your code is understandable there is less chance of mistakes.

mutate

mutate is not that different from transform, but I have to make the confession that when I was setting things up for my presentation I tried to see if transform could do the things that mutate does.  mutate can include previously defined variables in later defined variables.  I quote from the mutate help file,

# Things transform can't do
mutate(airquality, Temp = (Temp - 32) / 1.8, OzT = Ozone / Temp)

Created by Pretty R at inside-R.org

Notice that temp is first defined then used.  Usually when I need to do something like that I resort to using within, but hopefully I will have to do that less now.

NppToR 2.6.0 beta 2

July 29th, 2011 by andrew

http://sourceforge.net/projects/npptor/files/npptor%20installer/NppToR-2.6.0.beta2.exe/download

I’ve released beta 2 of NppToR 2.6.0.  Please take a look and report any problems.  This improves the installer and the uninstaller as well as a few bugs that popped up from the transition to UNICODE.

Looking for NppToR beta testers.

July 20th, 2011 by andrew

I’m very happy to announce that there are great changes coming to NppToR.  There are a few changes under the hood but the biggest change is the announcement of a new feature I call quick keys.

Quick keys lets you define your own commands for easy evaluation in R.  For example, the help command has been moved to the quick keys.  It is still control F1 (^F1) but it is defined in the quickkeys.txt file, don’t work there is a menu shortcut to edit the quick keys.  It is “^F1=$word%” on it’s own line.  The $word$ section is then replaced with the current word as defined by Notepad++.  Unfortunately the period is a word separator in Notepad++, but $word$ also will capture any selection, with selection being given priority.
The $word$ variable is the only variable available for beta testing, but I plan on getting several more, $line$, $file$, $directory$.  If there are any others you want to see, let me know what, and we can see if they can be implemented.
Under the hood, NppToR is converting to unicode AutoHotKey.  This makes the quick keys much easier to implement, but also may have impact on some other areas as well. For this reason officially only interaction with the Unicode version of Notepad++ will be supported from now on.  It will remain a native x86 application, and ther ewill not be a 64 bit version for now.
I’m asking for beta testers to look at beta of NppToR 2.6.0 and let me know the problems that they find.  Due to some of the changes under the hood, there is a noticeable increase in performance, and everyone should be happy about that. The switches under the hood could be bigger than I anticipate and cause some problems.  All I am really asking is for people to use it and report back if they have problems or not.  If you test the beta please post in forum your results.

Post-hoc Pairwise Comparisons of Two-way ANOVA

February 4th, 2011 by andrew

I read this post today by John Quick. I was a little taken back when he used a pairwise t-test for post hoc analysis. In a contradiction the t-test did not show differences in the treatment means when the ANOVA model did. This is because the pairwise.t.test does not take into account the two-way anova, it only looks marginally, and so gives erroneous results. The more appropriate analysis should be TukeyHSD applied to the fitted model.

> model1<-aov(StressReduction~Treatment+Age, data )
> TukeyHSD(model1, "Treatment")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = StressReduction ~ Treatment + Age, data = data)

$Treatment
                 diff         lwr        upr     p adj
mental-medical      2  0.92885267 3.07114733 0.0003172
physical-medical    1 -0.07114733 2.07114733 0.0702309
physical-mental    -1 -2.07114733 0.07114733 0.0702309

Created by Pretty R at inside-R.org

Here I already had the data read in as data, then I fit the model and applied a post-hoc pairwise test. This yielded that the mental and medical are different, but no other treatments. This is shown by the plot of the data.

NppToR 2.5.2 Improves startup

December 21st, 2010 by Admin

I’ve been getting lots of feedback that there are problems starting NppToR with some of the latest version. I took to the task of looking at that yesterday on the train home. I have made improvements to the way NppToR finds the RHome directory, not relying entirely on the windows registry. I also removed the error that pops up when R cannot be found. That should not have happened since NppToR only needs to know the Rhome directory to spawn new R processes, but is not critical to the passing commands. A warning is now issued instead of the error that terminates NppToR.

The new files can be downloaded from sourceforge http://npptor.sf.net

Slides from first Utah.edu & R.P. RUG meeting

November 23rd, 2010 by Admin

Here are the slides from the first University of Utah and Research Park R Users Group meeting. They discuss getting help and finding packages.
R<-0:60 <- Click Here.

New R Users Group for University of Utah and Research Park

November 13th, 2010 by Admin

I’m organizing a new R Users Group for the University of Utah and Research Park sponsored by the Study Design and Biostatistics Center. We welcome all to come. The first meeting will be dedicated to finding out what users needs and abilities are. We also welcome all skill levels. But I will also give a short presentation about some of the basics of R.

We will be meeting Tuesday November 16th from 12 -1PM in the Williams Building, Room 223, 295 Chipeta Way, Salt lake City, UT 84132.

Please email me at Andrew.Redd at hsc.utah.edu for more information.

NppToR 2.5.0

October 19th, 2010 by Admin

NppToR 2.5.0 is finally here. There were some major changes in the functionality. First, which will be the biggest for most people is that NppToR 2.5 supports R-2.12.0 which was just released. The new dual file structure for R beaks the spawning procedure in previous versions of NppToR.

Another big change is handling settings. NppToR previously masked the settings in a user profile, but now NppToR also reads those files in and does noting to mask the Console settings besides the enforcement of Single Document Interface (SDI mode).

One big change is that the syntax highlighter has been removed, this was partially by choice, partially not. The tools I was using with ruby no longer with the newest versions. So failure to compile was one issue, when faced with updating the code base I found that I personally rarely use the custom syntax, and usually default to the built in language, which has improved since first introduced. These and the lack of time, convinced me that it is time to retire that portion.

This does have some changes in the code base, so it is quite likely that there will be bugs introduced. I am always happy to hear that people are finding bugs (that means people are using it), so please report then and I can fix it before it prevent someone else from adopting it. The place to report bugs is in the forum s .