Creating Tables for Microsoft Word output with R Markdown

R Markdown

I have switched to solely use R Markdown for the communication step of my tidy work flow and found it pleasant and actually prodocutive to use. However, there is one major hassle: output in Word format. In a perfect world, I would never need to use any Microsoft products ever again. In reality when I need to work with collaborators that do not use RMarkdown and funders of research projects that require reports in Word format (and a required style template), there are no other options. As a matter of fact, it seems that the only viable solution collaborating with anybody that doesn’t use RMarkdown and/or git is still MS Word as of 2017/10.

Rendering R Markdown file to word_document works for the most part, except for sophisticated tables, which, unfortunately, are prevalent in most of my work.

Possible Solutions

There are a few possible solutions, none of them good enough for me at this point:

  1. Gmisc::docx_document (the output is a html file with an additional css file; no support for Word style template);
  2. Convert html output to Word document through soffice (Tables in the output do not resemble the html input; some support for template);
  3. Pure R solution officer + flextable (They can generate good looking Word output, but they do not work with R Markdown yet (see below)).

My solution/hack (as of 2017/10)

  1. Render Rmd file with output set to rmarkdown::word_document;
  2. The Rmd file uses texreg/htmlreg to generate html tables for regression results and saves them into separate html files (one for each table) during rendering;
  3. Convert html to docx: a. Via Word COM through py32win module (the used option) b. Via docx-html-js c. Via soffice: soffice --headless --convert-to docx:"MS Word 2007 XML" --writer --outdir . Table\ 1.html
  4. Patch the rendered word_document and replace placeholder tables with actual tables

Steps and Files:

  1. rmarkdown::render("reg_tables_demo.Rmd")
  2. python html2docx.py tabs/ (requiring py32win and Microsoft Word on a Windows PC, tested with MS Office 2016 and python 2.7)
  3. source("patch_docx_tables.R"); patchFlexTables("reg_tables_demo.docx", "tabs/", "reg_tables_demo_final.docx")
  4. Final reg_tables_demo.docx

All files/scripts used can be founded on github: https://github.com/cities-lab/rmd-docx

Alternatives / Things to keep an eye on

  1. broom > huxtable > flextable > pandoc (2.0)

    This solution directly knitr::knit Rmd files into Word documents embedding flextables through xml chunks. It eliminates the need to run scripts outside knit and is a much more elegant solution than my solution above. However it is not yet an acceptable solution at the moment for a number of reasons:

    i. pandoc 2.0 is not yet released. Users will need to build their own pandoc installation from source. i. The flextable implementation, standing as a proposal by Maxim Nazarov, is yet to be incorporated in flextable's code base, although Maxim’s code should work already. i. broom supports a small set of model objects than texreg, although broom + huxtable seem to be a better solution in the long run given they follow the tidy philosophy and broom is now part of tidyverse.

  2. pandoc, when it improves its support for sophisticated tables in unknown future

    The table support currently in pandoc is rudimentary, in particular, no support for table row/column spans, which is what sent me down this rabbit hole. Once the issue is resolved, it’d be as easy and functional to produce Word outputs as html and pdf ones. Not sure when this would be resolved (if ever), as the github issue has been open for 4 years now.

Credits

  1. Carsten Behring for coming out with the idea and sharing his code for the Word patching hack;
  2. Max Gordon
Associate Professor of Urban Studies and Planning

Dr. Wang is an Associate Professor of Urban Studies and Planning and affiliated professor of Civil Engineering at Portland State University.