knitr, Connect, and logging

This document summarizes my experiences running RMarkdown1 files on Posit Connect, the work I’ve done so far, and work yet to be done. The overall goal of my work is to explicitly make our our pro product experience better, but I think it will broadly help anyone using knitr in a non-interactive environment where the results are logged to a file.

Progress bar

The new knitr progress bar is fantastic for interactive use because it’s easy to see the current state and how much of the document remains. However, because it uses \r to repeatedly write to the same line, it’s less suitable for use in logs2.

For example, here’s a small snippet of the tidyverse dashboard progress bar, as shown in Connect:

| | | 0% | |. | 3% | |... | 6% (setup) | |.... | 8% | |...... | 11% (pr_stats) | |....... | 14% | |........ | 17%

In https://github.com/yihui/knitr/issues/2221, I recommend an alternative progress bar for batch usage that just prints the chunk name, number, and total number of code chunks. It’s possible to use this progress bar on Connect by creating a project .Rprofile but you can’t deploy a .Rprofile with the IDE UI so you need to use rsconnect::deployApp() or similar. So ideally, we’d figure out someway to bake this into knitr itself.

(There’s one other small problem with the current progress bar: it uses stderr for some messaging and stdout for other leading to out of order results. It’s hard to correctly interleave these streams so I’d recommend that the error bar also switch to using stderr.)

In batch usage, I think it would also be useful to always echo the code to the logging stream, so that you can see exactly what code has caused the failure, without having to look up the chunk in the original document. This can also be useful for crude performance analysis by inspecting the timestamps corresponding to each line. I have completed an implementation in evaluate but it will also require some new options in knitr.

Additionally, cli progress bars should probably appear in the log, rather than doc: https://github.com/r-lib/cli/issues/583.

Failure

When an Rmd fails due to an unexpected error in R code, you get R’s compact backtrace that looks something like this:

Error in 1 + "a" : non-numeric argument to binary operator
Calls: <Anonymous> ... eval -> f -> g -> h -> tryCatch -> tryCatchList -> i

This is not generally that useful for debugging, so it would be nice to get a more descriptive traceback like this:

Quitting from lines 2-8 (error.Rmd) 
Error in `1 + "a"`:
! non-numeric argument to binary operator
Backtrace:
    ▆
 1. └─global f()
 2.   └─global g()
 3.     └─global h()
 4.       ├─base::tryCatch(i())
 5.       │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 6.       └─global i()

To do this, you need to take advantage of rlang’s tooling to provide richer tracebacks:

rlang::global_entrace()
options(rlang_backtrace_on_error = "full")

Christoph has an initial PR to bake this behaviour into knitr: https://github.com/yihui/knitr/pull/2218.

Render success, but job failure

Another interesting case comes about in data engineering or data validation jobs where you might be using pointblank to validate the data. If data validation fails, you want the render to still succeed (so you can see the detailed table giving the validation failures) but the overall job to fail.

I think this probably requires instrumentation in both knitr (a function that you could call to tell knitr to error after rendering) and in Connect (to read custom failure messages from the log).

Do we also want an option so that a doc could fail with an expected error, but still render up to that point?

Warnings

Currently, by default, warnings are captured and shown in the doc. This means that they’re not available in the logging stream, making it hard to discover without careful inspection of the rendered document. I’d suggest we add the capability to knitr to also optionally log these. I think this needs to be complementary to the existing warning option because in some cases you might want to record both in the document and in the log file, and in other cases you might want to include in just the log file, or just the document. This work has been completed in evaluate but still needs a new option in knitr

Additionally, warnings are not currently very actionable as it is very hard to see where they have come from. We are working on infrastructure in rlang to add backtraces to warnings, like errors, so you can more easily see the sequence of calls that lead to the warning. This will also require some work in knitr to activated automatically when rlang is installed.

Additional logging

To send output to the log stream (rather than having it appear in the document), you need to send the output to stderr(). You can do this with cat(file = stderr()) or with a specialised logging package.

R has many logging packages and I have made no attempt to systematically survey them all, but at least log4r and logger3 print to stdout() by default, making them unsuitable for direct use in RMarkdown. I submitted an experimental PR to log4r, https://github.com/johnmyleswhite/log4r/pull/28 to resolve this issue by switching the default output stream to stderr() when used inside a knitr doc4 .

I’d recommend we also add an article to solutions.posit.co that discusses how and why to add logging to your knitr document.

Connect logging tweaks

While working with the logs on connect, I discovered a number of small features that would go a long way to improving the logging experience:

  • \r should be replaced with \n, rather than ignored, as currently.
  • Very long lines currently generate a horizontal scrollbar which is hard to use (especially on mobile). Could we instead wrap these lines?
  • Leading whitespace appears to be stripped.
  • Could we hide repeated times to make it easier to see when there are gaps?
  • Can we support ANSI colours and hyperlinks?

There are four bigger asks that I think are also worthwhile, but are likely to require a significant investment of time:

  • Could we have sectioning/grouping tools like github?
  • Can we improve the automated failure email to attach the log file (assuming we don’t believe there are security issues)?
  • Can we flag non-critical issues (like warnings)?

Footnotes

  1. The majority of these findings apply to Quarto as well, since in both cases the execution of R code happens through knitr↩︎

  2. Particularly in Connect, as noted below.↩︎

  3. I recommend that we pick one logging package to consistently recommend, and the front runner is log4r since it’s maintained by Aaron Jacobs)↩︎

  4. The PR also resolves another minor issue: currently both Connect and log4r add timestamps making the log output uneccesarily noisy.↩︎