For the last few weeks I have been working on a new dynamic and semantic search system for Bible Analyzer that mimics the semantic search found in some AI systems. For a couple weeks I looked into what would be involved in implementing a vector index of the Bible trained by a large language model (LLM) using current AI capabilities. After downloading many gigabytes of libraries and models trying to find what works best with Bible data (most AI models are not trained specifically on Bible data) I got a prototype of a semantic search of the KJV Bible available here,
https://semantic-search-kjv.streamlit.app/
Unlike a Keyword Search where specific words are searched for, Semantic Search is a search technology that interprets the meaning of words and phrases. It is a search based on intent rather than exact string matches and is also known as Natural Language Search.
Semantic search is powered by a mathematical vector search which enables it to deliver and rank content based on context and intent relevance. For this page the entire text of the King James Bible was vectored and each word and passage given a ranking based on the data of a pre-trained language model. When a query is supplied it is also vectored and then mathematically matched to the closest vectors of the KJV data.
Think of a vector index as a cube filled with coordinates where each word in a text has a specific coordinate in relation to context and words related to it. When one searches for a certain word, the word is converted to its coordinates then complex mathematical algorithms are used to find its nearest neighbors and return words with similar or semantic meaning. Under the hood this is a complicated system that is well beyond me, but it works pretty well...but there is a downside.
The main downside to these vectored LLM based systems is their size both on disc and in memory. To make the demo listed above I had to use huge libraries (TensorFlow, Torch, Transformers, etc.) just to work with the indexes plus a nearly 1 gigabyte "model" to build upon, and all of these are also required to simply use the search. The server that hosts the demo had to load them all (which can take up to 30 seconds) and then keep them in memory so the search would be responsive. Needless to say, all this overhead makes this type of search very impractical. Who wants to download an app over a full gigabyte in size that takes up that much or more in memory?
So after all the testing I put LLM vector indexing aside...but I didn't put the semantic searching idea aside and the Lord has allowed me to work on developing a text-based semantic system, and so far it is working quite well.
This system uses BM25/IDF Document Ranking with Stemmed Matching, Dynamic Synonym Weighting, and Full & Partial Phrase Scoring. The heart of this system is the weighted synonym matching. The results are very similar to the vectored indexes without all the overhead.
What I need help with is putting together synonyms of Bible terms. I have some I extracted from Strong's definitions and other material but the more the better. All we need are lists of related Bible words and other words people may use as search terms. Here are some examples,
['beginning', 'start', 'first', 'basic', 'primary', 'early', ],
['love', 'charity', 'devotion', 'passion', 'tender', ],
['sin', 'wicked', 'unrighteous', 'evil', 'darkness'],
['soul', 'heart', 'mind'],
['belief', 'compassion', 'devotion', 'faith', 'trust'],
Lists less than 10 items are the best but they can be larger.
These will be weighted for use in the system, so the terms can be listed in any order.
If anyone wants to help with this it will make the system better and the data could also be later used to train a LLM when they become feasible. I have a large list of related words taken from strongs numbers (over 7000 rows!) that need filtered and better sorted if someone wants to try that.
Let me know if you are interested. It is this type of "grunt work" that makes the nice things work.
New Sematic Searching in the Works - Help
New Sematic Searching in the Works - Help
Tim Morton
Developer, Bible Analyzer
But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)
Developer, Bible Analyzer
But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)
Re: New Sematic Searching in the Works - Help
I'd be willing to help with some of these, as the Lord provides the time.
Some terms will require two or three words to express, such as "inner man", and others will requires some type of regular expression. Example:
[ "in (Christ|Jesus|Christ Jesus|Jesus Christ|(the|my|our) Lord)" ] # where double-quote allows grouping
Prepositional phrases such as "in him" and "in whom" will also be relevant. I look at items like
['soul', 'heart', 'mind'],
and can expand it to:
['soul', 'heart', 'mind', 'reins', 'spirit', 'imagination', 'belly', 'thought', 'idea']
Does seem like a lot of work. You shoudl be able to a free copy of Roget's Thesaurus online and adapt it for your needs.
* Project Gutenberg: https://www.gutenberg.org/ebooks/22
* Copyright free Roget's: https://www.bartleby.com/lit-hub/rogets ... d-phrases/
* Concept analysis for using a thesaurus: https://www.roget.org/Sedelow96.pdf
Hope this helps.
Some terms will require two or three words to express, such as "inner man", and others will requires some type of regular expression. Example:
[ "in (Christ|Jesus|Christ Jesus|Jesus Christ|(the|my|our) Lord)" ] # where double-quote allows grouping
Prepositional phrases such as "in him" and "in whom" will also be relevant. I look at items like
['soul', 'heart', 'mind'],
and can expand it to:
['soul', 'heart', 'mind', 'reins', 'spirit', 'imagination', 'belly', 'thought', 'idea']
Does seem like a lot of work. You shoudl be able to a free copy of Roget's Thesaurus online and adapt it for your needs.
* Project Gutenberg: https://www.gutenberg.org/ebooks/22
* Copyright free Roget's: https://www.bartleby.com/lit-hub/rogets ... d-phrases/
* Concept analysis for using a thesaurus: https://www.roget.org/Sedelow96.pdf
Hope this helps.
Eric Pement
2 Cor. 4:5
2 Cor. 4:5
Re: New Sematic Searching in the Works - Help
Thanks for the offer.
I have several lists already together that also include multiple word groups. I pretty much have the code ready to utilize the lists, just refining of both is needed.
I'm not using regular expressions in this tool. The primary keyword search in Bible Analyzer relies heavily on them, but this one is pure text hacking with decreasing n-gram grouping, SQL proximity matching, dynamic synonym substitution, and the like. It is a whole new system that will offer several types of search methods chained together to produce instant results.
When I get the synonym lists more complete, I'll post it for you and others to look at and add to or refine. Synonym substitution cannot be flawless because words have different meaning in different contexts, but the false hits aren't that common or distracting.
I have several lists already together that also include multiple word groups. I pretty much have the code ready to utilize the lists, just refining of both is needed.
I'm not using regular expressions in this tool. The primary keyword search in Bible Analyzer relies heavily on them, but this one is pure text hacking with decreasing n-gram grouping, SQL proximity matching, dynamic synonym substitution, and the like. It is a whole new system that will offer several types of search methods chained together to produce instant results.
When I get the synonym lists more complete, I'll post it for you and others to look at and add to or refine. Synonym substitution cannot be flawless because words have different meaning in different contexts, but the false hits aren't that common or distracting.
Tim Morton
Developer, Bible Analyzer
But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)
Developer, Bible Analyzer
But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)
Re: New Sematic Searching in the Works - Help
Here is a video of my progress so far on the semantic search work.
https://youtu.be/H51I71IKQyQ
Take note of the extended/topical searching in the latter part. This is getting closer to what I have in mind. In general it pretty much matches the results with the LLM vectored search and with the specific names added it surpasses it.
If this works out and people find it of value I may put all the people in by ID instead of name (ExLBP data) plus all the pronouns of deity and make the search more comprehensive. That will be down the road a little, though.
I need more topic lists.
https://youtu.be/H51I71IKQyQ
Take note of the extended/topical searching in the latter part. This is getting closer to what I have in mind. In general it pretty much matches the results with the LLM vectored search and with the specific names added it surpasses it.
If this works out and people find it of value I may put all the people in by ID instead of name (ExLBP data) plus all the pronouns of deity and make the search more comprehensive. That will be down the road a little, though.
I need more topic lists.
Tim Morton
Developer, Bible Analyzer
But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)
Developer, Bible Analyzer
But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)
Re: New Sematic Searching in the Works - Help
Tim, I like what you are doing with searches. I realize I'm probably pushing the envelope, but I'd really like to see you open up original language searching. But know that I like and will use the semantic search stuff. And yes, the search speed is impressive!
Darrel
Darrel
Re: New Sematic Searching in the Works - Help
Here is a demo video of what I came up with concerning Semantic searching and other different ways to search the Bible,
https://youtu.be/Rj_K9cqHY_g
It should be released in a week or so as an Add-On or Plugin for Bible Analyzer.
Bible Analyzer v5.6.5 will be released the same time.
https://youtu.be/Rj_K9cqHY_g
It should be released in a week or so as an Add-On or Plugin for Bible Analyzer.
Bible Analyzer v5.6.5 will be released the same time.
Tim Morton
Developer, Bible Analyzer
But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)
Developer, Bible Analyzer
But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)