RStudio internship week 2

I think I can make it through the summer

By Daniel Chen

June 18, 2019

The main topics and events of last week were:

  1. Much git.
  2. Metaprogramming and non-standard evaluation (NSE) in R
  3. Four 1-hour workshops by Allison Hill on the summer-of-blogdown
    • moving things over from jekyll will take some time

So many of the random things I’ve tinkered with in the past have come front and center. As an educator, I know seeing these things again make learning and understanding them easier. You build on your previous knowledge to solidify, fix, and fill in gaps in your mental model. The process repeats until you get an understanding about a topic.

For me, I’m getting a better foundation to how NSE works and how it all plays together within the Tidyverse.

Git

I got some things merged!

The pull-request that was broken and merged on my first day finally was fixed and merged. I also got to work with the lintr package and merged it into grader too.

This week was probably the first time I’ve used git amend in a long time (if ever?). I’ve typically always just made the commit, and run git rebase -i to squash and/or amend my commits. I can see why common operations like making changes to the previous commit would have a shortcut. I typically don’t use these features because it’s another thing to teach, and understanding git rebase -i is more general than git commit --amend.

What --amend allows you to do is replace the previous git commit with another one. You can fix the commit message, or add/fix file you missed. Theses are all ways to make the commit history cleaner.

The recipe looks like this:

git add <file>
git commit --amend

<Fix/modify the commit message>

git push -f origin master

The last line does a -f force push, because the commit is actually different from the one before you --amended.

R package versions

There’s a convention about version numbering adding a .9000+ after the patch (e.g., v0.1.4.9000) to show a a development version number. You can couple this with the DESCRIPTION file by forcing a particular version to make sure you and the team all have access to the same development features.

I came across this in grader that has a Imports for learnr (>= 0.9.2.9001).

grader progress

We’re probably going to change the name of the package to gradethis because the package gradeR (note capital R) was submitted to CRAN right as I started.

Here’s what I learned about the library I’m working on this week:

  • learnr set’s the checker function
    • In the knitr chunk exercise.checker specifies the checker function
      • tutorial_options(exercise.timelimit = 60, exercise.checker = grade_learnr)
    • In this example, grade_learnr is the main entrypoint from learnr to grader and my work starts with this function.
  • checker is called on line 129 in exercise.R in learnr
    • the checker function (i.e., grade_learnr) returns a value depending on what is passed into it
      • if missing: returns a list() with message, correct, type, and location keys
      • if error: graded object (named checked_result of class grader_graded) with correct and message
      • or evaluated code

There’s a bunch of stuff within the exercise.R function in learnr that captures information from shiny, sets up the knitr environment, and inserts the output and results into the correct place in the DOM. That’ll be a separate writeup when I leave the grader world.

For the next week or so the goal is to update the check_result API, which got me down the rabbit hole of non-standard evaluation in R (I’ll talk about it in a separate set of posts).

Non-standard Evaluation (NSE)

I gave a talk about writing functions in R which touched on NSE but it was pretty superficial. Since NSE is so crucial to grader I’ll write a series of posts about this topic and eventually turn it into a talk.

In the meantime, here are the materials (in no particular order) I’ll be reading to:

Misc

Other random things I’ve discovered this week

dplyr::count vs dplyr::tally

From the docs:

tally() is a convenient wrapper for summarise that will either call n() or sum(n) depending on whether you’re tallying for the first time, or re-tallying. count() is similar but calls group_by() before and ungroup() after. If the data is already grouped, count() adds an additional group that is removed afterwards.

dplyr::count can count observations with 0 counts (useful for group_by operations) with the .drop argument

pryr::standarise_call

Manipulating the function call is black magic NSE voodoo. This is the stuff that is happening within grader that gets student code, solution code, learnr and grader arguments that are all passed into grade_learnr.

# code below was copy/pasted from the console

my_add <- function(x, y) {
  x + y
}

# pass in part of an expression
call <- pryr::standardise_call(quote(my_add(x = 3)))

# on the fly add more parameters!
call$y <- 10 

# evaluate the thing
eval(call) 

## [1] 13

This all uses the global environment, but grader will be doing this type of thing with separate environments for each exercise that will be checked.

Also, this is all how match.call works in base R.

checkmate package

There is a package called checkmate that is unit testing (e.g., testthat) on steroids. It allows you to more specific type and argument checking in R. I haven’t work with the package personally yet, but it does seem to be like type hints in Python and allows more specific checks into what objects in R contain.

Credentials

One of the coolest things about being an intern at RStudio is being on the slack channel! I try to keep my questions reserved but one of the things that have always bothered me was how store and access credentials for R. Putting in API keys in .Renviron are common practice, but I piggy-backed on another intern’s question by asking about storing passwords more securely than in a plain text file.

I’ve used the rstudioapi, secret, and getPass libraries before, but as Raymond Hettinger always says: There must be a better way.

The resource I was given was to look at how database credentials are stored: https://db.rstudio.com/best-practices/managing-credentials/

Summer of blog down

Lastly, the great Allison Hill hosted a series of blogdown workshops for people who were interested, summer-of-blogdown. It was a total of 4 days and we covered the basics of blogdown, how to pick (the academic) themes, deploying it on netlify, and best ways to maintain the site.

I didn’t realize how amazingly flexible the academic theme was until this workshop. I’ll be sure to move my own website over to blogdown + academic one of these days.

I’m currently trying to find out how to save urls in a common location so they can be maintained in one place and be used in links throughout the site. The ongoing search for how to use variables in an md document (tl;dr: you can’t, but you might still be able to do what I want): https://discourse.gohugo.io/t/variables-in-markdown/7113/12. It almost seemed that site variables were going to be the way to go, but that ended up in a dead end.

What I’m most excited about is the ability to write posts in Rmd and jupyter notebooks for R and Python posts.

Things I’ve learned the 4 days:

  • Each folder in content is a “section” and each “section” has a “page”
  • /content/home/ contain widgets
  • Learn x in y minutes website for TOML
  • From Greg Wilson: Put a LICENSE and CITATION + orcid on your website. Make the librarians happy.
Posted on:
June 18, 2019
Length:
6 minute read, 1239 words
Tags:
R rstudio-internship
See Also:
R Hex Bowtie
My time as an RStudio Intern
Table of Model Results using kable and kableExtra