Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rationalize handling of text headings #1790

Open
sujato opened this issue Jul 15, 2022 · 1 comment
Open

rationalize handling of text headings #1790

sujato opened this issue Jul 15, 2022 · 1 comment

Comments

@sujato
Copy link
Contributor

sujato commented Jul 15, 2022

These are notes towards a technical implementation. See discussion:

https://discourse.suttacentral.net/t/oh-vagga-numbers-what-are-we-to-do-with-you/25544

step 1: make sure all segments following <header> are 1.0 not 0.1

Normally, the main page title (either sutta-title or range-title) is :0.2 or :0.3. And it is the last segment in the top-level zero sequence. It is then followed with :1.1 or by :1.0 if the sutta starts with a h2.

In some cases, however, this pattern is not followed. These occur when various extraneous elements (such as verses of homage) are included before the main text.

  "pli-tv-bu-vb-pc1:0.1": "<article id='pli-tv-bu-vb-pc1'><header><ul><li class='collection'>{}</li>",
  "pli-tv-bu-vb-pc1:0.2": "<li class='division'>{}</li>",
  "pli-tv-bu-vb-pc1:0.3": "<li class='kanda'>{}</li>",
  "pli-tv-bu-vb-pc1:0.4": "<li class='vagga'>{}</li></ul>",
  "pli-tv-bu-vb-pc1:0.5": "<h1 class='sutta-title'>{}</h1></header>",
  "pli-tv-bu-vb-pc1:0.6": "<p class='namo'>{}</p>",
  "pli-tv-bu-vb-pc1:0.7": "<section class='patimokkha'><p>{}</p></section>",
  "pli-tv-bu-vb-pc1:1.0": "<section class='nidana'><h2>{}</h2>",

Find them with

</header>",
  "(.*?):0\.\d": "

There are 50 such texts, 81 cases in all.

If we remove the void levels, this will mess up the numbering of these segments. Also, it means that the top-zeroth level is inconsistent. But it's really useful to make it consistent!

So, let's do this:

  "pli-tv-bu-vb-pc1:0.1": "<article id='pli-tv-bu-vb-pc1'><header><ul><li class='collection'>{}</li>",
  "pli-tv-bu-vb-pc1:0.2": "<li class='division'>{}</li>",
  "pli-tv-bu-vb-pc1:0.3": "<li class='kanda'>{}</li>",
  "pli-tv-bu-vb-pc1:0.4": "<li class='vagga'>{}</li></ul>",
  "pli-tv-bu-vb-pc1:0.5": "<h1 class='sutta-title'>{}</h1></header>",
  "pli-tv-bu-vb-pc1:1.0.1": "<p class='namo'>{}</p>",
  "pli-tv-bu-vb-pc1:1.0.2": "<section class='patimokkha'><p>{}</p></section>",
  "pli-tv-bu-vb-pc1:1.0.3": "<section class='nidana'><h2>{}</h2>",
  "pli-tv-bu-vb-pc1:1.1": "<p>{}",

step 2: extract numbering into bilara-data references

or use /structure/text_extra_info

Follow ISO 2145

https://en.m.wikipedia.org/wiki/ISO_2145

I'm not sure how to do this. But anyway, let's keep the void-level intact until this is done.

We should probably follow the system used in MS, and indeed we may be able to import the numbers from there. Eg:

<div class="i">1.1.1 Oghataraṇasutta</div>
<div class="h">Devatāsaṃyutta</div>
<div class="h">Naḷavagga</div>
<div class="h">Oghataraṇasutta</div>

step 3: implement section-numbering

Add option to the website to display sectional-numbering. This would be an option on the toolbar next to "spacing".

When it is enabled, section-numbers appear in all relevant places in the navigation. They are distinguished by the use of the section sign: §. I think there is no need to put them in the breadcrumbs: they just make them even longer.

  • In the suttaplex nerdy-row:
    • Brahmajālasutta DN 1 PTS 1.1–1.46 BJT-cs §1.1
  • in the suttaplex-list titles of vaggas, etc.
    • The Chapter on the Entire Spectrum of Ethics

Maybe we can introduce a top-sheet dropdown for "extra references"

step 4: remove void nodes

Once it's all working on the site, we remove the void nodes. From then on:

  • all main headings have exactly two segments:
    • :0.1 the collection
    • :0.2 the sutta
  • all bilara-data text files have :0.1 and :0.2. We can robustly assume :0.2 is the main sutta title
@firepick1
Copy link
Collaborator

ISO is good. Zero is good. Consistency is good. Voice apps can continue to deduce formatting from such as a coarse facsimile of fine SC formatting. Thanks for sorting this out, Bhante.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants