Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

force all files of the same text to have the same segment IDs #2941

Open
sujato opened this issue Nov 1, 2023 · 3 comments
Open

force all files of the same text to have the same segment IDs #2941

sujato opened this issue Nov 1, 2023 · 3 comments

Comments

@sujato
Copy link
Contributor

sujato commented Nov 1, 2023

Currently we allow bilara files of the same text to omit segment IDs. This is handy, especially for things like references and variants where there are only a few items. However, it has created issues for us in development. For example, it makes it complicated to test for segment correctness and sort order.

After discussion with STXnext, we propose to ensure that every instance of the same text has the same segment IDs, with no omissions permitted.

  • this applies to everything in Bilara, i.e. to texts as well as to site, blurbs, name, etc.
  • even if there is only one item in a file of a thousand segment IDs, we have to list them all!

We currently allow full segment identity, but do not enforce it. So this will not introduce new situations, merely reduce some flexibility. Therefore I expect that generally this should not cause problems in our systems.

Nonetheless, the world is a weird and wonderful place so we should make sure we test it out well!

This will affect all the apps downstream of bilara-data:

  • SC
  • Voice
  • Publications
  • Third party apps

We will develop this initially in the new Bilara 2.0. Once that is ready we can test in other scenarios.

example

Let us take sn6.7 as an example.

current

sn6.7_html.json:

{
  "sn6.7:0.1": "<article id='sn6.7'><header><ul><li class='division'>{}</li>",
  "sn6.7:0.2": "<li>{}</li></ul>",
  "sn6.7:0.3": "<h1 class='sutta-title'>{}</h1></header>",
  "sn6.7:1.1": "<p>{}</p>",
  "sn6.7:1.2": "<p>{}",
  "sn6.7:1.3": "{}</p>",
  "sn6.7:1.4": "<p>{}</p>",
  "sn6.7:2.1": "<blockquote class='gatha'><p><span class='verse-line'>{}</span>",
  "sn6.7:2.2": "<span class='verse-line'>{}</span>",
  "sn6.7:2.3": "<span class='verse-line'>{}</span>",
  "sn6.7:2.4": "<span class='verse-line'>{}</span></p></blockquote></article>"
}

sn6.7_root-pli-ms.json:

{
  "sn6.7:0.1": "Saṁyutta Nikāya 6.7 ",
  "sn6.7:0.2": "1. Paṭhamavagga ",
  "sn6.7:0.3": "Kokālikasutta ",
  "sn6.7:1.1": "Sāvatthinidānaṁ. ",
  "sn6.7:1.2": "Tena kho pana samayena bhagavā divāvihāragato hoti paṭisallīno. ",
  "sn6.7:1.3": "Atha kho subrahmā ca paccekabrahmā suddhāvāso ca paccekabrahmā yena bhagavā tenupasaṅkamiṁsu; upasaṅkamitvā paccekaṁ dvārabāhaṁ nissāya aṭṭhaṁsu. ",
  "sn6.7:1.4": "Atha kho subrahmā paccekabrahmā kokālikaṁ bhikkhuṁ ārabbha bhagavato santike imaṁ gāthaṁ abhāsi: ",
  "sn6.7:2.1": "“Appameyyaṁ paminanto, ",
  "sn6.7:2.2": "Kodha vidvā vikappaye; ",
  "sn6.7:2.3": "Appameyyaṁ pamāyinaṁ, ",
  "sn6.7:2.4": "Nivutaṁ taṁ maññe puthujjanan”ti. "
}

sn6.7_variant-pli-ms.json:

{
  "sn6.7:0.3": "Kokālikasutta → kokālikasuttaṁ (1) (cck, pts2ed); paṭhamakokālikasuttaṁ (sya1ed, sya2ed) "
}

sn6.7_reference.json:

{
  "sn6.7:1.1": "ms12S1_1070, msdiv178, ndp12.149, sya15.218",
  "sn6.7:2.1": "cck15.200, ms12S1_1071, pts-vp-pli2ed1.323"
}

proposed

In this case, root and html are unchanged, but variant and reference have segments with empty values assigned.

sn6.7_html.json:

{
  "sn6.7:0.1": "<article id='sn6.7'><header><ul><li class='division'>{}</li>",
  "sn6.7:0.2": "<li>{}</li></ul>",
  "sn6.7:0.3": "<h1 class='sutta-title'>{}</h1></header>",
  "sn6.7:1.1": "<p>{}</p>",
  "sn6.7:1.2": "<p>{}",
  "sn6.7:1.3": "{}</p>",
  "sn6.7:1.4": "<p>{}</p>",
  "sn6.7:2.1": "<blockquote class='gatha'><p><span class='verse-line'>{}</span>",
  "sn6.7:2.2": "<span class='verse-line'>{}</span>",
  "sn6.7:2.3": "<span class='verse-line'>{}</span>",
  "sn6.7:2.4": "<span class='verse-line'>{}</span></p></blockquote></article>"
}

sn6.7_root-pli-ms.json:

{
  "sn6.7:0.1": "Saṁyutta Nikāya 6.7 ",
  "sn6.7:0.2": "1. Paṭhamavagga ",
  "sn6.7:0.3": "Kokālikasutta ",
  "sn6.7:1.1": "Sāvatthinidānaṁ. ",
  "sn6.7:1.2": "Tena kho pana samayena bhagavā divāvihāragato hoti paṭisallīno. ",
  "sn6.7:1.3": "Atha kho subrahmā ca paccekabrahmā suddhāvāso ca paccekabrahmā yena bhagavā tenupasaṅkamiṁsu; upasaṅkamitvā paccekaṁ dvārabāhaṁ nissāya aṭṭhaṁsu. ",
  "sn6.7:1.4": "Atha kho subrahmā paccekabrahmā kokālikaṁ bhikkhuṁ ārabbha bhagavato santike imaṁ gāthaṁ abhāsi: ",
  "sn6.7:2.1": "“Appameyyaṁ paminanto, ",
  "sn6.7:2.2": "Kodha vidvā vikappaye; ",
  "sn6.7:2.3": "Appameyyaṁ pamāyinaṁ, ",
  "sn6.7:2.4": "Nivutaṁ taṁ maññe puthujjanan”ti. "
}

sn6.7_variant-pli-ms.json:

{
  "sn6.7:0.1": "",
  "sn6.7:0.2": "",
  "sn6.7:0.3": "Kokālikasutta → kokālikasuttaṁ (1) (cck, pts2ed); paṭhamakokālikasuttaṁ (sya1ed, sya2ed) ",
  "sn6.7:1.1": "",
  "sn6.7:1.2": "",
  "sn6.7:1.3": "",
  "sn6.7:1.4": "",
  "sn6.7:2.1": "",
  "sn6.7:2.2": "",
  "sn6.7:2.3": "",
  "sn6.7:2.4": ""
}

sn6.7_reference.json:

{
  "sn6.7:0.1": "",
  "sn6.7:0.2": "",
  "sn6.7:0.3": "",
  "sn6.7:1.1": "ms12S1_1070, msdiv178, ndp12.149, sya15.218",
  "sn6.7:1.2": "",
  "sn6.7:1.3": "",
  "sn6.7:1.4": "",
  "sn6.7:2.1": "cck15.200, ms12S1_1071, pts-vp-pli2ed1.323",
  "sn6.7:2.2": "",
  "sn6.7:2.3": "",
  "sn6.7:2.4": ""
}
@thesunshade
Copy link
Collaborator

As someone who likes to make things from your data, this sounds like a great thing.

@firepick1
Copy link
Collaborator

This would allow us to delete the merging code in Voice since not all translations have the same segments. Thanks for the notification.
... I have a vague memory that Ven. Brahmali may have added extra segments (i.e., more segments than root). If so, then this might adversely impact his translations.

@ihongda
Copy link
Contributor

ihongda commented Nov 3, 2023

Thanks, Bhante.
I'm going to check the relevant code to see if any changes need to be made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants