One Cause of 404 Errors in Bookdown: URL Encoding

r
Be careful with URL encoding when using Japanese in Bookdown.
Published

2024-06-18

Modified

2024-06-18

This is a note about an error I encountered when creating a document with the {bookdown} package in R.

Situation

I created a directory inside a web server running on a Synology NAS. Then I copied all files from the Bookdown HTML output folder, whose default name is _book, into that directory so that the document could be accessed from outside.

Most pages could be viewed normally, but one page kept returning a 404 error.

Cause and Background

The conclusion was that the problem was URL encoding. The file name was something like 〇〇マップ.html, and it contained Japanese characters.

Japanese characters are not ASCII characters, so they cannot be included in a URL as-is. When a page name contains Japanese, it is converted into a URL-safe representation through a process called URL encoding.

For example, the Japanese character is replaced by %E3%81%82. Because the conversion uses combinations of % and character codes, it is apparently also called percent encoding.

The string マップ is normally converted to %E3%83%9E%E3%83%83%E3%83%97. The mapping is roughly:

  • : %E3%83%9E
  • : %E3%83%83
  • : %E3%83%97

However, in the link generated by Bookdown, it had been converted to %E3%83%9E%E3%83%83%E3%83%95%E3%82%9A. The difference is that was encoded as a separated plus , the half-voiced sound mark.

In other words, the link should have pointed to the ordinary マップ, but instead it pointed to a version where the half-voiced mark was separated. That led to a 404 error because the link pointed to a page that did not exist.

Workaround

For now, I avoided the error by changing the heading in the source Rmd file to English, such as map, before rendering.

Bookdown uses UTF-8, so Japanese and kanji can be used normally. However, problems can occasionally appear when those strings become part of a URL.

Because Bookdown uses the level-1 heading, #, as the file name of the rendered HTML file, using English may be the safer choice if you want to avoid this kind of error.

I have not checked whether this happens only with voiced and half-voiced sound marks, or whether it can also happen with other non-ASCII characters.

Closing Note

Both strings looked identical as マップ, so it took quite a while to identify the cause.

I noticed the problem because I could access the page by manually typing the URL in the address bar. Then I copied the URL from both the working and non-working cases into an editor and found that the encodings were different.

For documents intended for Japanese readers, writing in Japanese is usually better. Still, Japanese can become a source of subtle errors. This experience made me think that it may be safer to create documents mainly in English and add Japanese only as supplementary text.

Reference

I used the following site to check URL encoding.

URLエンコード・デコード

Thank you.