Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc improvements #2711

Merged
merged 8 commits into from
May 28, 2024
Merged

Doc improvements #2711

merged 8 commits into from
May 28, 2024

Conversation

behrmann
Copy link
Contributor

No description provided.

@behrmann
Copy link
Contributor Author

The don't merge label is because I want to see whether there is a way to appease both Github's and pandoc's markdown parsers

Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvements. I tried to figure out how to handle the multi-para items before, but couldn't get it to work. Great that you figured this out.

mkosi/resources/mkosi.md Outdated Show resolved Hide resolved
@keszybz
Copy link
Member

keszybz commented May 28, 2024

The don't merge label is because I want to see whether there is a way to appease both Github's and pandoc's markdown parsers

Before this patchset, the rendering on github was mostly broken. After it, it's still mostly broken, but in different places ;) I would just accept this as reality and merge to improve the pandoc rendering.

The way we formatted definitions

term
: paragraph1

: paragraph2

gets clobbered into single text blocks by pandoc. The thing it can actually
parse is

term
:   paragraph1

    paragraph2

This (mostly) whitespace-only change unclobbers the text.
pandoc has a weird algorithm to define the width of tables in markdown. The
width cannot be specified absolutely, but is made relative to the text width by
how many dashes are in the horizontal line under the header in each
column. This can lead to spurious word breaks even on wide displays where the
whole table would fit. Removing the prefix should somewhat ameliorate the
problem until a better solution is found.
We already have two different X in there, X and x, which are hard to tell
apart, and since we want to say something positive, let's make it a checkmark.
The table directly follows the definitions, which makes it difficult to tell
apart from the previous definition.
@behrmann behrmann merged commit fd27f53 into systemd:main May 28, 2024
26 of 32 checks passed
@behrmann behrmann deleted the docimprov branch May 28, 2024 16:08
@septatrix
Copy link
Contributor

Before this patchset, the rendering on github was mostly broken. After it, it's still mostly broken, but in different places ;) I would just accept this as reality and merge to improve the pandoc rendering.

I haven't noticed significantly broken rendering before but now some tables are broken and everything which contains more than one paragraph. Most notable are Mirror=, Packages= and ToolsTree= where tables and code snippets are just formatted as monospace text :/

@septatrix
Copy link
Contributor

One possible solution would be to use a less lossy format like reStructuredText (which GH can render) or AsciiDoc (which GH can render too, and it has native support for man page rendering).

Alternatively, it would be possible to write a pandoc filter which transforms definition lists before outputting the man page as these seem to be the thing causing the most trouble

@behrmann
Copy link
Contributor Author

So the problem here is that Markdown is an underspecified mess and different implements add different extensions and implement them slightly differently.

The options in the docs are formatted as definition lists, i.e.

term
:    definition

This is an extension from the PHP world. GitHub doesn't implement it, which is why this looks like this on Github

term
: definition

The colon shouldn't be there, there should be indentation.

The implementation of this also varies a lot. pandoc, which at the moment generates our man page, insists on the indentation of four spaces for this to be rendered correctly when using multi-paragraph definitions, as we do for e.g. Packages=. Before this PR all paragraph in the man page were smushed together.

Whereas GitHub doesn't support definition lists, it becomes outright bad in the case of multi-paragraph lists. Before this PR every paragraph in the definition was introduced with a colon, which lead to the aforementioned rendering problems with pandoc, now GitHub parses the indentation as code block. There's no appeasing both.

The solution is to generate a nice HTML from our docs and put it up at mkosi.systemd.io and not think of GitHub as the primary point to look up the docs. I haven't gotten around to making a nicer HTML yet, because there pandoc has a rendering issue: Multi-paragraph definition get paragraph tags for every paragraph, single-paragraph definitions don't get any—this makes the HTML output smushed.

Another point against pandoc is the slightly inane algorithm for line breaks in tables, which adds a few unwanted ones to our man page.

Is there something that generates nice HTML without? Why yes, there is Myst, which is sort of the love child of Markdown and reStructuredText. It just takes the current Markdown and produces beautiful HTML output… but it doesn't output man pages and its XML output cannot be used to just use Docbook.

Currently there's no nice options were everything is easy. At this point I'm considering just generating the roff output from Myst's parse tree. The alternative is either fixing pandoc or using a filter with pandoc. None of those options is appealing, but I know which one is least. :)

Long story short:

  • myst-docutils-html5 --myst-enable-extension=deflist mkosi.md will generate nice HTML
  • pandoc -t html -s mkosi.md will generate a passable HTML
  • pandoc -t man -s mkosi.md will generate a nice man page (much more readable than before)

And having written this: Adding the HTML output to mkosi documentation would probably be quite reasonable.

@septatrix
Copy link
Contributor

So the problem here is that Markdown is an underspecified mess and different implements add different extensions and implement them slightly differently.

Yeah, Markdown only specifies the most basic formatting things leading implementations to include more and more custom extensions and flavors. (Regarding the underspecification, nowadays commonmark seems to have prevailed. So at least the basic subset is rendered similar by modern markdown parsers.)

The options in the docs are formatted as definition lists, i.e.

term
:    definition

This is an extension from the PHP world. GitHub doesn't implement it, which is why this looks like this on Github

term : definition

The colon shouldn't be there, there should be indentation.

I wouldn't use "should" here. Github has no obligation to implement some rarely used extension.

[...]. There's no appeasing both.

Yes, sadly.

The solution is to generate a nice HTML from our docs and put it up at mkosi.systemd.io and not think of GitHub as the primary point to look up the docs.

While standalone online docs are great, people will inevitably use the sources in this repo for example when checking out something from master or from an older version. (It is also beneficial for PRs to have the option of the rich diff.)

Is there something that generates nice HTML without? Why yes, there is Myst, which is sort of the love child of Markdown and reStructuredText. It just takes the current Markdown and produces beautiful HTML output… but it doesn't output man pages and its XML output cannot be used to just use Docbook.

While it looks like a neat idea it seems to be rather niche. Also the fact that it only produces HTML is a bummer. And the Github rendering would be even worse: https://github.com/executablebooks/MyST-Parser/blob/3d84ff87badc795d44451c79d7e78b8eef6c04bf/docs/intro.md

Currently there's no nice options were everything is easy. At this point I'm considering just generating the roff output from Myst's parse tree. The alternative is either fixing pandoc or using a filter with pandoc. None of those options is appealing, but I know which one is least. :)

I have written a small pandoc filter in the past and it was not the end of the world.
However, if you already consider using a different parser and large extension of Markdown it might be best to switch to a more suited format. This is why I suggested AsciiDoc. It provides first party support for HTML, Docbook and manpages, and it even renders great on github: https://github.com/asciidoctor/asciidoctor/blob/main/man/asciidoctor.adoc
We also would not be the first to do so. The git man pages for example are generated using asciidoc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants