[sword-devel] Script to find a best fit v11n
Greg Hellings
greg.hellings at gmail.com
Thu Jun 19 00:12:58 EDT 2025
And here's an example now that I've fixed the output of the osisIDs when
there are fewer than 100 of them:
[vagrant at localhost ~]$ ./av11n.py kjv.osis.xml
Checking Calvin:
----------------
The following IDs don’t appear in your file:
%s 1Kgs.22.54, 1Sam.20.43, 1Sam.24.23, 3John.1.15, Acts.24.28, Eccl.12.15,
Eccl.12.16, Ezek.21.33, Ezek.21.34, Ezek.21.35, Ezek.21.36, Ezek.21.37,
Hos.12.15, Isa.8.23, Job.39.31, Job.39.32, Job.39.33, Job.39.34, Job.39.35,
Job.39.36, Job.39.37, Job.39.38
, Job.40.25, Job.40.26, Job.40.27, Job.40.28, Jonah.2.11, Mark.10.53,
Mark.9.51, Num.13.34, Num.30.17, Ps.102.29, Ps.108.14, Ps.12.9, Ps.140.14,
Ps.142.8, Ps.18.51, Ps.19.15, Ps.20.10, Ps.21.14, Ps.22.32, Ps.3.9,
Ps.30.13, Ps.31.25, Ps.34.23, Ps.36.13, P
s.38.23, Ps.39.14, Ps.4.9, Ps.40.18, Ps.41.14, Ps.42.12, Ps.44.27,
Ps.45.18, Ps.46.12, Ps.47.10, Ps.48.15, Ps.49.21, Ps.5.13, Ps.51.20,
Ps.51.21, Ps.52.10, Ps.52.11, Ps.53.7, Ps.54.8, Ps.54.9, Ps.55.24,
Ps.56.14, Ps.57.12, Ps.58.12, Ps.59.18, Ps.6.11, Ps
.60.13, Ps.60.14, Ps.61.9, Ps.62.13, Ps.63.12, Ps.64.11, Ps.65.14, Ps.67.8,
Ps.68.36, Ps.69.37, Ps.7.18, Ps.70.6, Ps.75.11, Ps.76.13, Ps.77.21,
Ps.8.10, Ps.80.20, Ps.81.17, Ps.83.19, Ps.84.13, Ps.85.14, Ps.88.19,
Ps.89.53, Ps.9.21, Ps.92.16, Rev.12.18
There are 93 OT IDs and 5 NT IDs in v11n which aren’t in your file.
The following IDs don’t appear in v11n:
%s 1Kgs.22.54, 1Sam.20.43, 1Sam.24.23, 3John.1.15, Acts.24.28, Eccl.12.15,
Eccl.12.16, Ezek.21.33, Ezek.21.34, Ezek.21.35, Ezek.21.36, Ezek.21.37,
Hos.12.15, Isa.8.23, Job.39.31, Job.39.32, Job.39.33, Job.39.34, Job.39.35,
Job.39.36, Job.39.37, Job.39.38
, Job.40.25, Job.40.26, Job.40.27, Job.40.28, Jonah.2.11, Mark.10.53,
Mark.9.51, Num.13.34, Num.30.17, Ps.102.29, Ps.108.14, Ps.12.9, Ps.140.14,
Ps.142.8, Ps.18.51, Ps.19.15, Ps.20.10, Ps.21.14, Ps.22.32, Ps.3.9,
Ps.30.13, Ps.31.25, Ps.34.23, Ps.36.13, P
s.38.23, Ps.39.14, Ps.4.9, Ps.40.18, Ps.41.14, Ps.42.12, Ps.44.27,
Ps.45.18, Ps.46.12, Ps.47.10, Ps.48.15, Ps.49.21, Ps.5.13, Ps.51.20,
Ps.51.21, Ps.52.10, Ps.52.11, Ps.53.7, Ps.54.8, Ps.54.9, Ps.55.24,
Ps.56.14, Ps.57.12, Ps.58.12, Ps.59.18, Ps.6.11, Ps
.60.13, Ps.60.14, Ps.61.9, Ps.62.13, Ps.63.12, Ps.64.11, Ps.65.14, Ps.67.8,
Ps.68.36, Ps.69.37, Ps.7.18, Ps.70.6, Ps.75.11, Ps.76.13, Ps.77.21,
Ps.8.10, Ps.80.20, Ps.81.17, Ps.83.19, Ps.84.13, Ps.85.14, Ps.88.19,
Ps.89.53, Ps.9.21, Ps.92.16, Rev.12.18
There are 1 OT IDs and 29 NT IDs in your file which don’t appear in
v11n.
On Wed, Jun 18, 2025 at 11:00 PM Greg Hellings <greg.hellings at gmail.com>
wrote:
> Here is an example of the first lines of running my script against the
> kjv.osis.xml file from the git repo:
>
>
> Checking Calvin:
> ----------------
> There are 93 OT IDs and 5 NT IDs in v11n which aren’t in your file.
> There are 0 OT IDs and 30 NT IDs in your file which don’t appear
> in v11n.
>
> Checking Catholic:
> ------------------
> There are 4530 OT IDs and 3 NT IDs in v11n which aren’t in your
> file.
> There are 0 OT IDs and 133 NT IDs in your file which don’t appear
> in v11n.
>
> Checking Catholic2:
> -------------------
> There are 4638 OT IDs and 3 NT IDs in v11n which aren’t in your
> file.
> There are 0 OT IDs and 133 NT IDs in your file which don’t appear
> in v11n.
>
> Checking DarbyFr:
> -----------------
> There are 31 OT IDs and 4 NT IDs in v11n which aren’t in your file.
> There are 0 OT IDs and 30 NT IDs in your file which don’t appear
> in v11n.
>
> This continues on to include such output as
>
>
>
> Checking KJV:
> -------------
> Your file has all the references in this v11n
> Your file has no extra references
>
>
>
> Checking KJVA:
> --------------
> There are 5717 OT IDs and 0 NT IDs in v11n which aren’t in your
> file.
> Your file has no extra references
>
> giving a clear example of a winner for this particular file.
>
> Meanwhile, running it against the kjva.osis.xml file includes this in the
> results:
>
> ...
>
> Checking KJV:
> -------------
> Your file has all the references in this v11n
> There are 2 OT IDs and 5715 NT IDs in your file which don’t appear
> in v11n.
>
> Checking KJVA:
>
> --------------
>
> Your file has all the references in this v11n
> Your file has no extra references
> ...
>
> Fiddling with the file has showed me there are a couple of places where I
> need to tweak it for Python 3 compatibility that I missed the last time I
> updated. But fixing those couple of little syntax issues resulted in it
> running just fine in a Fedora 41 vm with nothing more to do than invoke
> `dnf install python3-sword` to setup the system to use it.
>
> --Greg
>
> On Wed, Jun 18, 2025 at 10:40 PM Greg Hellings <greg.hellings at gmail.com>
> wrote:
>
>> My script eschews percentages because they seemed relatively pointless to
>> me for measuring a mismatch like this. Instead it gives a count of both Old
>> and New Testament osisIDs that it finds missing and another that it finds
>> unexpectedly for a given versification. If the total of either count is
>> fewer than 100, the IDs for that particular count are printed to the
>> console. It will do this for every registered versification in the version
>> of the library it was compiled against, allowing the user to select
>> whichever one seems best to them based on the results.
>>
>> On Wed, Jun 18, 2025, 10:25 PM David Haslam <dfhdfh at protonmail.com>
>> wrote:
>>
>>> It’s not just the number of “missing” verses that should figure in the
>>> percentage score, but also the number of verses that get concatenated to
>>> the last one in a chapter.
>>>
>>> The differences in v11n for the Psalms will be especially significant
>>> for this, in that some v11n renumber many of them. Likewise for the last
>>> few chapters in the book of Job.
>>>
>>> Aside: It would be cool to enhance the utility emptyvss by providing a
>>> command line option that would ignore books that are not included in the
>>> scope parameter in the conf file.
>>>
>>> Regards,
>>>
>>> David
>>>
>>> On Thu, Jun 19, 2025 at 03:18, DM Smith <dmsmith at crosswire.org
>>> <On+Thu,+Jun+19,+2025+at+03:18,+DM+Smith+%3C%3Ca+href=>> wrote:
>>>
>>> David,
>>>
>>> Because it only considers the xml, scope is automatically built into it.
>>> It is only comparing what is present in the xml with what is part of the
>>> av11ns.
>>>
>>> It might be good to add the enumeration of missing verses.
>>>
>>> — DM
>>>
>>> On Jun 18, 2025, at 4:02 PM, David Haslam <dfhdfh at protonmail.com>
>>> wrote:
>>>
>>> Does it take account of the Scope key in the .conf file for a less than
>>> complete Bible ?
>>>
>>> David
>>>
>>> Sent from Proton Mail <https://proton.me/mail/home> for iOS
>>>
>>>
>>> On Wed, Jun 18, 2025 at 20:51, DM Smith < dmsmith at crosswire.org
>>> <On+Wed,+Jun+18,+2025+at+20:51,+DM+Smith+%3C%3Ca+href=>> wrote:
>>>
>>> Hi,
>>>
>>> Several have commented on how hard it is to test an OSIS xml file
>>> against v11ns especially since it goes off into an infinite loop. (I’ve
>>> posted a patch that fixes that) But it is still a process of trial and
>>> error to find an appropriate v11n.
>>>
>>> So, I’ve been iterating with chatGPT to create a python script to find a
>>> best fit v11n. Since I don’t know python, I can’t vouch for the script
>>> beyond it worked for a simple test case that had an extra chapter for
>>> Genesis and had some extra verses at the end of a chapter in that book.
>>>
>>> I offer it, as a starting place. See the attached file.
>>>
>>> It has a —debug flag.
>>> The first argument is expected to be the OSIS xml file.
>>> The second argument is optional and gives the location to the include
>>> directory of svn/sword/trunk/include with all the canon*.h files. If you
>>> don’t supply the argument, it uses the web to load the canon*.h files from
>>> https://www.crosswire.org/svn/sword/trunk/include.
>>>
>>> It will score the fitness of each of the v11ns. It gives the score as a
>>> %, but I don’t know what that means. I told it that it should prioritize
>>> book matches, then chapter matches and finally verse matches. I don’t know
>>> how well it did that scoring. I didn’t test for that.
>>>
>>> The output is alphabetized. If more than one v11n have the same high
>>> score, they are listed.
>>>
>>> In His Service,
>>> DM
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>>
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250618/0324cc25/attachment-0001.htm>
More information about the sword-devel
mailing list