Skip to content

suttacentral/bilara-data-integrity

Repository files navigation

sutta_processor

Process, check, and validate text from bilara-data and original ms_yuttadhammo source.

Setting up

This package requires Python. It was tested using Python 3.7, so please make sure that version (or a newer one ) is available on your machine. If is isn't, please download it from https://www.python.org/downloads/.

  1. Clone the repository to your machine.
git clone https://github.com/suttacentral/bilara-data-integrity.git
  1. Clone bilara-data repository to your machine inside the bilara-data-integrity directory.
git clone https://github.com/suttacentral/bilara-data.git

You should now have a directory structure like path/to/bilara-data-integrity/bilara-data/.

  1. Create new virtual environment in the same directory as the cloned bilara-data-integrity.
python3 -m venv ./bilara-data-integrity/
  1. Activate your virtual environment.
source ./bilara-data-integrity/bin/activate
  1. Install requirements.
pip install -r requirements.txt
  1. Install application in developer mode. Note the "." at the end of the command.
pip install -e .
  1. Try running the sutta-processor app. It takes the config file and the test module as arguments.
sutta-processor -e run_all_checks -c sutta_processor_config.yaml

The config file is located here bilara-data-integrity/sutta_processor_config.yaml.

Running the application

Whenever you want to run a particular script from the app just change the first argument. For example:

sutta-processor -e bilara_check_root -c sutta_processor_config.yaml

The list of available scripts can be found here src/sutta_processor/application/use_cases or below.

sutta-processor operates in two different scopes:

  1. all files found in the relevant directories, like html or root
  2. a list of files supplied to the application as arguments

Scope 2 is meant to run on a list of changed files from a git commit.

List of available scripts (unless otherwise noted, all scripts run in Scope 1):

  • check_all_changes - run checks on supplied list of files (Scope 2)
  • run_all_checks - run all available tests (but check_migration)
  • check_migration - cross-validate bilara-data text against original ms_yuttadhammo source files; the result will be saved to the path specified in sutta_processor_config.yaml file, by default: ./bilara-data/migration_differences
  • bilara_check_comment - check if path to comments is set up properly
  • bilara_check_html - check if path to html files is set up properly
  • bilara_check_root - check if path to root files is set up properly
  • bilara_check_translation - check if path to translation files is set up properly
  • bilara_check_variant - check if path to variant files is set up properly
  • bilara_load - load bilara-data
  • noop - no operation, available just for checking purposes

Notes on exceptions

There are several false positives generated by sutta-processor. These have been listed in the file false_positives.yaml, which is required by sutta-processor to run without raising errors for those false positives.

Note: ghost_suttas.json has been added. This is a list of suttas that only exist by number, without any actual text. Currently they raise the exception File with the key: 'sn48.137-146' is missing in the root or reference directory.. They can be added to false_positives.yaml.

Also unused_references.json has been added. Currently these raise the exception Verses from Yuttadhammo which have not been used. These are references that are either omitted from our files or not scanned because they are unusual. Typically they fall into headings, or they are extra material at the beginning or end of files. Sometimes they are in fact present in our files, but they fall into a zeroth level, which is not checked by default (because we handle headings differently than ms.) Anyway they are all fine and can be added to false_positives.yaml.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages