rationalize handling of text headings #1790

sujato · 2022-07-15T00:55:30Z

These are notes towards a technical implementation. See discussion:

https://discourse.suttacentral.net/t/oh-vagga-numbers-what-are-we-to-do-with-you/25544

step 1: make sure all segments following `<header>` are 1.0 not 0.1

Normally, the main page title (either sutta-title or range-title) is :0.2 or :0.3. And it is the last segment in the top-level zero sequence. It is then followed with :1.1 or by :1.0 if the sutta starts with a h2.

In some cases, however, this pattern is not followed. These occur when various extraneous elements (such as verses of homage) are included before the main text.

  "pli-tv-bu-vb-pc1:0.1": "<article id='pli-tv-bu-vb-pc1'><header><ul><li class='collection'>{}</li>",
  "pli-tv-bu-vb-pc1:0.2": "<li class='division'>{}</li>",
  "pli-tv-bu-vb-pc1:0.3": "<li class='kanda'>{}</li>",
  "pli-tv-bu-vb-pc1:0.4": "<li class='vagga'>{}</li></ul>",
  "pli-tv-bu-vb-pc1:0.5": "<h1 class='sutta-title'>{}</h1></header>",
  "pli-tv-bu-vb-pc1:0.6": "<p class='namo'>{}</p>",
  "pli-tv-bu-vb-pc1:0.7": "<section class='patimokkha'><p>{}</p></section>",
  "pli-tv-bu-vb-pc1:1.0": "<section class='nidana'><h2>{}</h2>",

Find them with

</header>",
  "(.*?):0\.\d": "

There are 50 such texts, 81 cases in all.

If we remove the void levels, this will mess up the numbering of these segments. Also, it means that the top-zeroth level is inconsistent. But it's really useful to make it consistent!

So, let's do this:

  "pli-tv-bu-vb-pc1:0.1": "<article id='pli-tv-bu-vb-pc1'><header><ul><li class='collection'>{}</li>",
  "pli-tv-bu-vb-pc1:0.2": "<li class='division'>{}</li>",
  "pli-tv-bu-vb-pc1:0.3": "<li class='kanda'>{}</li>",
  "pli-tv-bu-vb-pc1:0.4": "<li class='vagga'>{}</li></ul>",
  "pli-tv-bu-vb-pc1:0.5": "<h1 class='sutta-title'>{}</h1></header>",
  "pli-tv-bu-vb-pc1:1.0.1": "<p class='namo'>{}</p>",
  "pli-tv-bu-vb-pc1:1.0.2": "<section class='patimokkha'><p>{}</p></section>",
  "pli-tv-bu-vb-pc1:1.0.3": "<section class='nidana'><h2>{}</h2>",
  "pli-tv-bu-vb-pc1:1.1": "<p>{}",

step 2: extract numbering into bilara-data references

or use /structure/text_extra_info

Follow ISO 2145

https://en.m.wikipedia.org/wiki/ISO_2145

I'm not sure how to do this. But anyway, let's keep the void-level intact until this is done.

We should probably follow the system used in MS, and indeed we may be able to import the numbers from there. Eg:

<div class="i">1.1.1 Oghataraṇasutta</div>
<div class="h">Devatāsaṃyutta</div>
<div class="h">Naḷavagga</div>
<div class="h">Oghataraṇasutta</div>

step 3: implement section-numbering

Add option to the website to display sectional-numbering. This would be an option on the toolbar next to "spacing".

When it is enabled, section-numbers appear in all relevant places in the navigation. They are distinguished by the use of the section sign: §. I think there is no need to put them in the breadcrumbs: they just make them even longer.

In the suttaplex nerdy-row:
- Brahmajālasutta DN 1 PTS 1.1–1.46 BJT-cs §1.1
in the suttaplex-list titles of vaggas, etc.
- The Chapter on the Entire Spectrum of Ethics

Maybe we can introduce a top-sheet dropdown for "extra references"

step 4: remove void nodes

Once it's all working on the site, we remove the void nodes. From then on:

all main headings have exactly two segments:
- :0.1 the collection
- :0.2 the sutta
all bilara-data text files have :0.1 and :0.2. We can robustly assume :0.2 is the main sutta title

The text was updated successfully, but these errors were encountered:

firepick1 · 2022-07-15T11:42:00Z

ISO is good. Zero is good. Consistency is good. Voice apps can continue to deduce formatting from such as a coarse facsimile of fine SC formatting. Thanks for sorting this out, Bhante.

sabbamitta mentioned this issue Jul 15, 2022

Show sutta title in search results sc-voice/scv-server-DEPRECATED-#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rationalize handling of text headings #1790

rationalize handling of text headings #1790

sujato commented Jul 15, 2022 •

edited

firepick1 commented Jul 15, 2022

rationalize handling of text headings #1790

rationalize handling of text headings #1790

Comments

sujato commented Jul 15, 2022 • edited

step 1: make sure all segments following <header> are 1.0 not 0.1

step 2: extract numbering into bilara-data references

step 3: implement section-numbering

step 4: remove void nodes

firepick1 commented Jul 15, 2022

sujato commented Jul 15, 2022 •

edited

step 1: make sure all segments following `<header>` are 1.0 not 0.1