Case closed: Using assignment in R | Andreï V. Kostyrka’s web site

Upon reading a very heavily opinionated article on the use of forward pipes, I could barely contain my righteous indignation—righteous as long as my hatred for the tidyverse and its inflexible bloated commands does not subside. That’s a topic for another post (ggplot2 with smoothing? hiding all hyper-parameters under the hood?) in the vein of ‘simple black box with one “Compute” button versus understanding and being able to implement any step of the computation’, but for now, let’s make an equally heavily opinionated counterargument for John Mount’s Win-Vector blog post.

The section that buggers me is ‘Assignment in R’. Let’s not forget that the entire point of the post is defending the right of forward assignment to exist within the context of %>%-like pipes from magrittr, which by itself is a package to be avoided that turns one’s workflow into a bloody bash pipeline which obscures dangerous intermediate type casting and is just slower. But enough of that, I deeply respect Hadley Wickham’s professionalism and aspiration to popularise R; I completely disagree with his approach of turning R into an alien-looking language, though.

Original punctuation preserved.

R‘s preferred assignment operator is <-. This is in the popular style guides. <...> This has some advantages, and is the public style. Also “=” is much harder to use inside R’s base::quote method than <-, so there are still cases where the semantics of = and <- are different (though I think they all involve the distinction trying specify argument binding versus assignment while inside a function call’s argument list).

That basically says it. In computer science, there is assignment, there is comparison, and there is argument binding. I strongly suspect that = was kept for the purposes of assignment out of user-friendliness considerations.

I have previously written that given the choice I prefer = for assignment. It has the advantages that:

• <- is has a different meaning to many readers. In R x<-3 assigns the value 3 to a variable named x, in other popular programming languages (where new R users may be coming from) x<-3 denotes comparing x to -3.

The word ‘pain’ has a different meaning to an English speaker (an unpleasant physical sensation) and a French speaker (food made of flower, water, and yeast).

First of all, this is not proper coding style. One should not use scriptio continua: this practice had died out by the XIVth century. Mathematical operators should have spaces around them, and TeX, world’s best typesetting system, puts a special emphasis on how many varieties of spaces exists in mathematical typesetting: relational operators, binary operators, named operators etc. For now, let’s just use simple spaces around operators to improve legibility or at least use them to distinguish outermost and innermost operators: x + 2*y + 3*z looks beautiful and legible. x<-3 does not. x <- 3 does. x < -3 does, and has a different meaning.

• = is a single character, so it can not be ruined by the insertion of a space. x< -3 does not assign the value 3 to a variable named x, it compares x to -3. I would not mind so much if x< -3 was a syntax error (as x< =3 is), but it is valid code that quietly does something very different than x<-3. If you have taught R enough you have experience helping students undo this bug.

I agree that in general, basic operations should not be ruined by spaces, but the same can be said about trigraphs in C preprocessor. Even the simplest comment symbol // breaks in almost all languages if a space is inserted in between (/ / ). Some symbol combinations are not supposed to be broken with spaces, and this is true for almost all programming languages.

• Also = can not be broken up by line-splitting.

John, your line-splitting algorithm is broken. Apply soft wrapping to the damaged area or submit a fix to the highlighter you are using.

• = is on the keyboard (as ← was when arrow like assignments were themselves introduced).

On Linux, if one enables ‘Extra typographic characters’ in X options, they will obtain the same (AltGr + 0). [cough-cough APL cough-cough] If R supported Unicode commands, I would wholeheartedly advocate the use of a single-character assignment operator. Why isn’t John complaining about the less-or-equal-to operator <= and suggesting replacing it with ‘≤’, adding the latter onto the keyboard, though? Why not map all digraphs into equivalent single characters? 2 < = 3 throws an error as well!

• = is easier to paste into HTML as it does not require escape coding such as <.

So is ‘≤’, but somehow John is panning <- exclusively, not <=.

• It is the symbol used in most every other popular current programming language for assignment.

Most other popular current programming languages cannot do in one line of code the things R can. Applied statisticians should not hard-code matrix inversion in C++. There are different language types, different paradigms, and different built-ins. Once I suggested the shortest solution for the Code Golf question about implementing Kolmogorov—Smirnov test. The ‘every other’ argument does not apply here. Most every other popular current programming language has clunkier implementations of R’s one-liners.

• There is an asymmetric cost of mistakes. Typing = when you meant <- is usually harmless. Typing <- in a context where = was needed is not caught by R and fairly bad (please see here for details). So if you get out of the habit of using <- one type of bug become less likely.

Know thy tools. The cost of using an electric hammer drill on wood is not the same as the cost of using an ordinary drill on hard concrete. There are cases where one character is put by force of habit can break things. E. g., if one, being spoiled by data.table, wants to select the second row of a matrix a <- matrix(1:9, 3); a[2], they will get 2 instead of 4 5 6 yielded by a[, 2]. The same goes for multi-dimensional arrays. Yet John does not complain about those kinds of mistakes.

• There is a cognitive benefit in reducing the number of low-value distinctions you need to maintain, especially for beginners. If we think of the mind as having “seven plus or minus two” slots for current information do we really want to waste 11 to 20 percent of our students’ attention on something like this when teaching? The beginner does not need to worry over the differences between value assignment and argument binding at all times. In fact it is a useful generalization to think of argument binding as a safe transient value assignment.

Appealing to outdated psychological theories is not a good argument to begin with. I am sure John and many other people knows far more than 7 programming languages and more than 7 commands in every programming language. This rule is one of those old wives’ tales that needs to die as quickly as possible (‘5-second rule’, ‘hairy palms’ etc.). Someone has already generalised programming to such Turing-complete programming languages as Вrаinf*сk and Whitespace in as few distinct commands as possible, but most people did not like it.

To sum it up, dear John, one should not hammer in nails with a microscope. You did a great distinction between <-, ==, and =, and you should not cherry-pick some digraphs and argue against their use.