--- title: "Build Excel Reports from R" author: "Cole Arendt" date: "`r Sys.Date()`" output: rmarkdown::html_vignette Vignette: > %\VignetteIndexEntry{Build Excel Reports from R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteKeywords{spreadsheet, Excel, java, report} %\VignettePackage{xlsx} --- ```{r setup, echo=FALSE, message=FALSE} knitr::opts_chunk$set(echo=TRUE, collapse=T, comment='#>') library(rJava) library(xlsx) ``` ## Building Excel Reports in R Excel has a lot of buy-in. This is generally pretty unfortunate, as there are much better tools for data analysis (i.e. \R). Excel is also not platform-agnostic, so there are difficulties generating Excel reports on non-Windows systems. However, Java is, and the excellent [Apache POI](https://poi.apache.org/) project provides a nice interface to Excel documents in a platform-agnostic manner. Further, \R integrates nicely with this project through the `xlsx` package to provide a suite of tools to be used in data science and report generation on any operating system. # From Scratch The first step of any report generation from scratch is data preparation. Suffice it to say that R has many tools to expedite that process. For instance, the [`tidyverse`](https://www.tidyverse.org/) provides several packages for getting data into a nice, _tidy_ format for modeling and visualization. We will presume that you have your data structured the way that you want and focus on getting that presentation into Excel. ## Functional Cell Styles The `CellStyle` class in the `xlsx` package makes possible many of the desirable traits of a good Excel report - Borders, DataFormats, Alignment, Font, and Fill. While a bit verbose to type at times, the `+` operator is implemented to make *building* these styles more natural. For instance, we might define a theme which has a standard selection of fonts, colors, and formats. ```{r theme} ## fonts require a workbook createFonts <- function(wb) { list( data = Font(wb, heightInPoints = 11, name='Arial') , title = Font(wb, heightInPoints = 16, name='Arial', isBold = TRUE) , subtitle = Font(wb, heightInPoints = 13, name='Arial', isBold = FALSE, isItalic = TRUE) ) } ## alignment alignLeft <- Alignment(horizontal='ALIGN_LEFT', vertical='VERTICAL_CENTER', wrapText = TRUE) alignCenter <- Alignment(horizontal='ALIGN_CENTER', vertical='VERTICAL_CENTER', wrapText=TRUE) ## data formats dataFormatDate <- DataFormat('m/d/yyyy') dataFormatNumberD <- DataFormat('0.0') ## fill fillPrimary <- Fill('#cc0000','#cc0000','SOLID_FOREGROUND') fillSecondary <- Fill('#ff6666','#ff6666','SOLID_FOREGROUND') ``` Once we have defined such standard cell-styles, it is straightforward to use them with the additive CellStyle framework. ```{r prep_for_report, include=FALSE} ## todo: fix the xlsx jars being available when generating vignette #.jaddClassPath(rprojroot::is_r_package$find_file('inst/java/rexcel-0.5.1.jar')) ``` ```{r build_report, results='hide'} ## The dataset numbercol <- 9 mydata <- as.data.frame(lapply(1:numbercol,function(x){runif(15, 0,200)})) mydata <- setNames(mydata,paste0('col',1:numbercol)) ## Build report wb <- createWorkbook() sh <- createSheet(wb, 'Report') f <- createFonts(wb) headerrow <- createRow(sh, 1:2) headercell <- createCell(headerrow, 1:ncol(mydata)) ## title addMergedRegion(sh,1,1,1,ncol(mydata)) lapply(headercell[1,],function(cell) { setCellValue(cell, 'Title of Report') setCellStyle(cell, CellStyle(wb) + f$title + alignCenter) }) ## subtitle addMergedRegion(sh, 2,2, 1,ncol(mydata)) lapply(headercell[2,],function(cell) { setCellValue(cell, 'A fantastic report about nothing') setCellStyle(cell, CellStyle(wb) + f$subtitle + alignCenter ) }) ## cell styles for data cslist <- lapply(1:ncol(mydata), function(x){CellStyle(wb) + f$data + alignCenter + dataFormatNumberD}) cslist[1:2] <- lapply(cslist[1:2], function(x){x + alignLeft}) ## left align first two columns ## add data workrow <- 4 addDataFrame(mydata, sh , col.names=TRUE , row.names = FALSE , startRow = workrow , startColumn = 1 , colStyle = setNames(cslist,1:numbercol) , colnamesStyle = CellStyle(wb) + f$subtitle + alignCenter + fillPrimary ) workrow <- workrow + nrow(mydata) + 1 ## + 1 for header ## add total row... sorta ## - (just the first row because I am lazy) addDataFrame(mydata[1,], sh , col.names=FALSE , row.names=FALSE , startRow = workrow , startColumn = 1 , colStyle = setNames(lapply(cslist,function(x){x + fillSecondary}),1:numbercol) ) saveWorkbook(wb, 'excel_report.xlsx') ``` Once you get used to some of the verbosity, the elegance of automatically creating nicely formatted Excel reports on a UNIX platform from R begins to shine. Further, with a bit of work implementing an R API, we get the benefit of a robust Java community debugging issues behind the scenes at Apache. If you would like to contribute on improving the R API and adding functionality, you can do so [on github](https://github.com/colearendt/xlsx). ![Our Beautiful Report](`r rprojroot::is_r_package$find_file('vignettes/excel_report.png')`) Bonus: you can even customize print formatting and a whole host of other things! There is more to come on that.