I'm looking for an application or workflow which integrates data values into the writing process

One possible workflow is using R for producing the results, LaTeX for writing the report, and Sweave to integrate both. With either TexStudio or LyX (or any text editor that supports track changes) as writing environments and Dropbox, you can set up some sort of "collaboration".


Take a look at R Markdown. It allows to generate files (Markdown, HTML, LaTeX, PDF or even some interactive Shiny slides) based on text, LaTeX formulae and code (not only in R - you can use other languages as well!). For a smooth start you can try using a real-time editor editR.

enter image description here

Alternatively, you can use IPython Notebook, it is easy to share, but harder to collaborate on or convert into a nice LaTeX.


While other answers have given some very good suggestions, I wish to focus on the part "if anyone has had this problem, and how they have solved it?" of the question.

I use Sweave and can only speak for this particular method. My general thoughts are that:

  1. Yes, it's awesome.
  2. However, the time to make the two sets of code to work may not necessarily be shorter or less miserable than revising the statistics and tables by hand. It has some learning curve. So, I'd suggest considering using this method if you have i) some documents that need to be repeatedly created or the data are repeatedly being appended, like periodic reports, or ii) some analysis that involves a large amount of repetitions.
  3. The benefit really shines for tables and graphs. Yet I found that embedded text can be troublesome. For instance, weird sentence like "the mean energy intake increased by -1357 kcal at the end of the study."
  4. As an extension of the above, sometimes the restructuring of the analysis can be so drastic that the codes will need to be revised extensively. And you'll have two sets of code to revise and two sets of bug to catch.
  5. In my own circle of colleagues, it's hard enough to have them keep the statistical syntax in a standardized format. I will not even ask if they use LaTeX, not to even mention Sweave.

Having said that, it is indeed very satisfying to see a 100-page PDF analysis report being revised with one click. I'd suggest at least find a suitable environment to try once. By the way, Sweave can also work with Stata and SAS (statweave), quite versatile.


Now, back to the root cause. I'd like to share with you how I minimize this Sisyphean situation.

  1. Remember, if you do no take charge, coworkers will take charge for you. Some statements to express firm decisions about leaving and entering a certain stage in the analysis process can be forceful and yield productive results. This is also true if you are just a student and they are your supervisors. Some reasonable assertiveness goes a long way.
  2. Put all the data set details, variables, research questions, proposed analyses, and some reasonable amount of "plan B's" on what I call a DMAP (Data management and analysis plan.) Pay particular attentions to: i) how missing values will be handled, ii) how outliers are defined in the key variables of interest, and iii) recoding scheme if any categorization is to be done. Gather input from all of them. Once finalized, carry out the analysis.
  3. In the next meeting, share analysis report (but NOT write up). Prepare a descriptive statistics package. And then according to the research questions, lay out the main findings in the same sequence. After each summary output, state 1-3 main "talking points" that will be the foundation (or topic sentences) of the Discussion. Show only necessary output and make sure to make them reader-friendly. Highlight or bold the parts that you want them to focus on. Have the group contribute their thoughts on revision or sub-analysis. Revise the DMAP. Have the previous DMAPs handy to avoid the "you said, I said" situation.
  4. Repeat steps 2 and 3 until no more input was given. Be very clear that "you are going to finalize this analysis and start writing the Discussion." Are there anyone not replying your e-mail and can potentially disrupt this finalization? Deal with them individually before moving on.
  5. Go on to craft the Discussion based on the talking points that have been previously agreed upon.
  6. Along the process, keep clear documentation. Keep your syntax files and analysis report files clear and dated. Include section numbers corresponding to the research question, page number, and line number. Date and sign (provide name and e-mail) all your reports and syntax files.

The main point is: do not write the Results and Discussion and distribute them before the analysis is finalized. You may draft them in private, but never circulate them while the analysis is still actively being evaluated/revised. Doing so provides too many distractions to the group, and it's just going to end up with a hot mess.


In my own experience 75% or more of the so-called sub-analyses are what I call "brain farts." They are a healthy sign that the brain is working, but not pleasant if happening too frequently. Most of them are "what if's" and they can be out of control especially if the results do not go with how they want the world to work.

Yet, 1 out of 8-10 times the suggestions can be good. I usually will take the pain to revise the analysis plan and restart the process. Leave the writing, and come back to deal with it with the new analysis is finalized.

Finally, some catch phrases.

  • "That is a great suggestion, however it's seriously deviated from our original research questions. For the sake of being succinct, I'd write this idea down and we can pursue it in another setting."

  • "Sub-group analysis? Yes, but be prepared that it's going to be underpowered and please don't keep you hope too high."

  • "Sub-group analysis? But the interaction terms are not even significant and I can tell you to rest assure that the two groups will not show any difference."

  • "Another parameter? Another scenario? Sure, let's get this done with, once and for all. Let me know all possible parameters you want to try now. I will just loop through them."

  • "No, it's not related to our hypothesis."

  • "Would you like to follow up with that suggestion? I can send you the codes."