[sword-devel] Script to find a best fit v11n
Greg Hellings
greg.hellings at gmail.com
Thu Jun 19 00:00:42 EDT 2025
Here is an example of the first lines of running my script against the
kjv.osis.xml file from the git repo:
Checking Calvin:
----------------
There are 93 OT IDs and 5 NT IDs in v11n which aren’t in your file.
There are 0 OT IDs and 30 NT IDs in your file which don’t appear in
v11n.
Checking Catholic:
------------------
There are 4530 OT IDs and 3 NT IDs in v11n which aren’t in your
file.
There are 0 OT IDs and 133 NT IDs in your file which don’t appear
in v11n.
Checking Catholic2:
-------------------
There are 4638 OT IDs and 3 NT IDs in v11n which aren’t in your
file.
There are 0 OT IDs and 133 NT IDs in your file which don’t appear
in v11n.
Checking DarbyFr:
-----------------
There are 31 OT IDs and 4 NT IDs in v11n which aren’t in your file.
There are 0 OT IDs and 30 NT IDs in your file which don’t appear in
v11n.
This continues on to include such output as
Checking KJV:
-------------
Your file has all the references in this v11n
Your file has no extra references
Checking KJVA:
--------------
There are 5717 OT IDs and 0 NT IDs in v11n which aren’t in your
file.
Your file has no extra references
giving a clear example of a winner for this particular file.
Meanwhile, running it against the kjva.osis.xml file includes this in the
results:
...
Checking KJV:
-------------
Your file has all the references in this v11n
There are 2 OT IDs and 5715 NT IDs in your file which don’t appear
in v11n.
Checking KJVA:
--------------
Your file has all the references in this v11n
Your file has no extra references
...
Fiddling with the file has showed me there are a couple of places where I
need to tweak it for Python 3 compatibility that I missed the last time I
updated. But fixing those couple of little syntax issues resulted in it
running just fine in a Fedora 41 vm with nothing more to do than invoke
`dnf install python3-sword` to setup the system to use it.
--Greg
On Wed, Jun 18, 2025 at 10:40 PM Greg Hellings <greg.hellings at gmail.com>
wrote:
> My script eschews percentages because they seemed relatively pointless to
> me for measuring a mismatch like this. Instead it gives a count of both Old
> and New Testament osisIDs that it finds missing and another that it finds
> unexpectedly for a given versification. If the total of either count is
> fewer than 100, the IDs for that particular count are printed to the
> console. It will do this for every registered versification in the version
> of the library it was compiled against, allowing the user to select
> whichever one seems best to them based on the results.
>
> On Wed, Jun 18, 2025, 10:25 PM David Haslam <dfhdfh at protonmail.com> wrote:
>
>> It’s not just the number of “missing” verses that should figure in the
>> percentage score, but also the number of verses that get concatenated to
>> the last one in a chapter.
>>
>> The differences in v11n for the Psalms will be especially significant for
>> this, in that some v11n renumber many of them. Likewise for the last few
>> chapters in the book of Job.
>>
>> Aside: It would be cool to enhance the utility emptyvss by providing a
>> command line option that would ignore books that are not included in the
>> scope parameter in the conf file.
>>
>> Regards,
>>
>> David
>>
>> On Thu, Jun 19, 2025 at 03:18, DM Smith <dmsmith at crosswire.org
>> <On+Thu,+Jun+19,+2025+at+03:18,+DM+Smith+%3C%3Ca+href=>> wrote:
>>
>> David,
>>
>> Because it only considers the xml, scope is automatically built into it.
>> It is only comparing what is present in the xml with what is part of the
>> av11ns.
>>
>> It might be good to add the enumeration of missing verses.
>>
>> — DM
>>
>> On Jun 18, 2025, at 4:02 PM, David Haslam <dfhdfh at protonmail.com> wrote:
>>
>> Does it take account of the Scope key in the .conf file for a less than
>> complete Bible ?
>>
>> David
>>
>> Sent from Proton Mail <https://proton.me/mail/home> for iOS
>>
>>
>> On Wed, Jun 18, 2025 at 20:51, DM Smith < dmsmith at crosswire.org
>> <On+Wed,+Jun+18,+2025+at+20:51,+DM+Smith+%3C%3Ca+href=>> wrote:
>>
>> Hi,
>>
>> Several have commented on how hard it is to test an OSIS xml file against
>> v11ns especially since it goes off into an infinite loop. (I’ve posted a
>> patch that fixes that) But it is still a process of trial and error to find
>> an appropriate v11n.
>>
>> So, I’ve been iterating with chatGPT to create a python script to find a
>> best fit v11n. Since I don’t know python, I can’t vouch for the script
>> beyond it worked for a simple test case that had an extra chapter for
>> Genesis and had some extra verses at the end of a chapter in that book.
>>
>> I offer it, as a starting place. See the attached file.
>>
>> It has a —debug flag.
>> The first argument is expected to be the OSIS xml file.
>> The second argument is optional and gives the location to the include
>> directory of svn/sword/trunk/include with all the canon*.h files. If you
>> don’t supply the argument, it uses the web to load the canon*.h files from
>> https://www.crosswire.org/svn/sword/trunk/include.
>>
>> It will score the fitness of each of the v11ns. It gives the score as a
>> %, but I don’t know what that means. I told it that it should prioritize
>> book matches, then chapter matches and finally verse matches. I don’t know
>> how well it did that scoring. I didn’t test for that.
>>
>> The output is alphabetized. If more than one v11n have the same high
>> score, they are listed.
>>
>> In His Service,
>> DM
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250618/9f744ccf/attachment-0001.htm>
More information about the sword-devel
mailing list