Resources
What makes a research project reproducible is not a simple question… Nonetheless I do believe than one should be able to reproduce ones’ own analysis without pain, even in the future. This may sound all obvious, but not be so easy to achieve in practice!
Here are some tips for students, based on my iterations toward more reproducible practices:
-
Start from the beginning: although it may sometimes feel like a waste of time, putting yourself in the situation of readily redoing your analysis will make your researcher’s life easier (and to the very least help you to rerun chunks from earlier work).
-
For each usual task (data analysis, plotting, citing references), you should master one tool down to its dirty details.
-
Relying on text-based formats (e.g. Markdown, LaTeX, CSV) is critical in order to be able to use version control to maintain your code, to write manuscripts, etc. This may indeed guide your choice of tools. A good starting points to learn version control with
git
is the Software Carpentry tutorial. GitHub’s documentation provides help on more advanced topics.
handling data
-
For data analysis and plotting, I enjoy very much (most of the time!) working with
R
’stidyverse
, in particulardplyr
andggplot2
. If you are new to R or to data analysis from the command line, the companion book R for Data Science (available online) is the best introduction you can dream of. My advice: study sections 2 to 8 thoroughly, the next ones will be useful to go deeper on specific topics based on your needs.
Hint: if you need to speed up your analysis withdplyr
have a look at its parallelized counterpartmultidplyr
. -
Dont overlook RStudio’s cheatsheets!
-
Follow a well-established coding style guide. If you don’t know which one to pick, use the lintr package (along with styler for existing code) to follow the tidyverse’s style guide.
-
Follow simple guidelines when recording data in spreadsheets
-
Use regular expressions whenever you can. Regexs are great, regexs are tough, and regexs are poorly taught (if at all!) unless you’ve a computer science background: luckily Damian Conway’s presentations are eye-opening (e.g. this 50’ video) and there is a great cheatsheet for R. Also you want to view what you’re doing, e.g. within RStudio using RegExplain addin or online with RegExr.com.
-
To share large datasets, Zenodo is a great (free) service. If you use another one, make sure that your dataset gets a DOI.
handling text
-
Despite LaTeX’s popularity in quantitative fields, I believe that the time is ripe to leave it to advanced editing where microtypography matters… Simpler syntaxes (in particular Markdown) are sufficient for literate data analysis (e.g. with Rmarkdown or R notebooks) and even for more advance tasks like writing a dissertation or an article; I put online a template to render a manuscript and its companion supplementary material with cross-reference. Whatever format you choose to rely on, don’t miss that
pandoc
is an incredibly powerful conversion tools between most formats (.md, .tex, .rtf, .docx, etc).
Hint: the Markdown converter used by RStudio (pandoc-citeproc
) is able to handle citations just like bibtex would (and in fact simpler!). -
For storing and citing articles, Zotero is the most versatile open-source software.
handling DNA sequences
- Benchling is the 21st century sequence editor to design and keep track of your molecular biology experiments: design primers, align sequencing chromatograms, test your next cloning in silico. It even has an integrated lab notebook!
Big drawback: your data must be hosted on their servers…
handling microscopy data
-
Micro-Manager is an open-source software for microscope control. If your hardware falls in the list of (over 200) supported items, it is very easy to configure and can be extended to support advanced features such as hardware triggering (on which we wrote a step-by-step tutorial).
-
A smooth introduction lecture to fluorescence microscopy taken from a great large series.
academia survival kit
-
Uri Alon’s materials for nurturing scientists is probably the best existing “academia survival kit” that I know of (and it even has additional information). Below is my attempt to enrich it with a few links.
-
Some Modest Advice for Graduate Students by S. Stearns and R. Huey’s Reply to Stearns: Some Acynical Advice for Graduate Students.
-
Writings such as B. Latour’s Petites leçons de sociologie des sciences and I. Stengers’ Sciences et pouvoirs (Power and invention) have helped me to overcome various academic frustrations and hopefully to fight some of their causes. Making genes, making waves by J. Beckwidth is another inspiring read.
-
L’Atelier des Jours à Venir, a French non-profit company promoting reflexive and responsible research practices, gathered valuable resources.
-
On research integrity: I was a founder of the Scientific Red Cards initiative (discontinued). Related issues are currently addressed by Retraction Watch and PubPeer.
-
Need a practical guide to research integrity? I find the guidelines of Piet Borst one of the most insightful! In particular, his “advice is to be generous towards your colleagues and tough towards yourself. Nearly everybody has the tendency to overestimate his own contribution to the research of others and underestimate the contribution of other people to his own work.” Probably worth meditating every time one is upset by the status of a collaboration…