data.table
When you are doing a lot of R, you can find two different kind of people, the dplyr and the data.table people. Indeed, those to packages are used for table manipulation. They are like the ggplot2 function for graphic, when you know them they change your R experience and you cannot go back.
dplyr
I will not talk a lot about dplyr
package, because I am not using
it. However, you need to know few things about it. Usually, dplyr
is
used with another package named tidyr, and since recently, you
can find one package which will load both of them and more (load also
ggplot2
, readr
, etc.) and named tidyverse.
To be simple, dplyr
is very useful to handle object, and specially for
sorting, sub-sample, etc. thanks to the new operator %>%
.
This operator will pipe object to another line to make more change on
it. It is very convenient to avoid typing multiple times the object
names like this:
TRUE
But, there is a lot of very nice function with dplyr
to select,
sub.select, sort, replace data from tables.
If you are using emacs, you can define a key-bind for %>%
. Here is the
lisp code you will need to put in your .emacs config file.
Thus, in my example, the shortcut will be Meta (Alt) - f5. Of course, you can change it for whatever you like, just be sure that your shortcut is not already use.
You will find a lot of site to learn more of the dplyr
package on
internet. The user community is large and you will find a lot of
question/answer in stackoverflow from the dplyr
package.
data.table
I started using data.table
a couple of years ago. At first, this is very
hard and the syntax is counter-intuitive, but you will use to it.
The data.table
function is known to be faster than the dplyr
package,
but the difference of speed will only be consequent for huge data set
(millions or rows and hundred of columns).
data.table
use less verbose than dplyr
but is more complex to write in
my opinion. Here is some examples (the data are here):
1548 |
6 |
504 |
4 |
And you can add
504 |
5 |
As you can see, here there is a new column.
And if you are using emacs and R, I suggest you to add the following package in
your .emacs
for the function :=
to have space between it, you can add
this code:
Also, the electric-operator will make you code easier to read by adding space between operator such as " == ", or " + ", etc.
Because I am used to data.table
and specially love the fread
and
fwrite
function to read and save tables, I did not want to use
dplyr
. However, if you are a beginner, or you want to read old code
easily years later, I suggest you to use dplyr
.