New Sematic Searching in the Works - Help

The place to look for Bible Analyzer Updates and Announcements
Post Reply
Tim
Site Admin
Posts: 1580
Joined: Sun Dec 07, 2008 1:14 pm

New Sematic Searching in the Works - Help

Post by Tim »

For the last few weeks I have been working on a new dynamic and semantic search system for Bible Analyzer that mimics the semantic search found in some AI systems. For a couple weeks I looked into what would be involved in implementing a vector index of the Bible trained by a large language model (LLM) using current AI capabilities. After downloading many gigabytes of libraries and models trying to find what works best with Bible data (most AI models are not trained specifically on Bible data) I got a prototype of a semantic search of the KJV Bible available here,

https://semantic-search-kjv.streamlit.app/

Unlike a Keyword Search where specific words are searched for, Semantic Search is a search technology that interprets the meaning of words and phrases. It is a search based on intent rather than exact string matches and is also known as Natural Language Search.

Semantic search is powered by a mathematical vector search which enables it to deliver and rank content based on context and intent relevance. For this page the entire text of the King James Bible was vectored and each word and passage given a ranking based on the data of a pre-trained language model. When a query is supplied it is also vectored and then mathematically matched to the closest vectors of the KJV data.

Think of a vector index as a cube filled with coordinates where each word in a text has a specific coordinate in relation to context and words related to it. When one searches for a certain word, the word is converted to its coordinates then complex mathematical algorithms are used to find its nearest neighbors and return words with similar or semantic meaning. Under the hood this is a complicated system that is well beyond me, but it works pretty well...but there is a downside.

The main downside to these vectored LLM based systems is their size both on disc and in memory. To make the demo listed above I had to use huge libraries (TensorFlow, Torch, Transformers, etc.) just to work with the indexes plus a nearly 1 gigabyte "model" to build upon, and all of these are also required to simply use the search. The server that hosts the demo had to load them all (which can take up to 30 seconds) and then keep them in memory so the search would be responsive. Needless to say, all this overhead makes this type of search very impractical. Who wants to download an app over a full gigabyte in size that takes up that much or more in memory?

So after all the testing I put LLM vector indexing aside...but I didn't put the semantic searching idea aside and the Lord has allowed me to work on developing a text-based semantic system, and so far it is working quite well.

This system uses BM25/IDF Document Ranking with Stemmed Matching, Dynamic Synonym Weighting, and Full & Partial Phrase Scoring. The heart of this system is the weighted synonym matching. The results are very similar to the vectored indexes without all the overhead.

What I need help with is putting together synonyms of Bible terms. I have some I extracted from Strong's definitions and other material but the more the better. All we need are lists of related Bible words and other words people may use as search terms. Here are some examples,

['beginning', 'start', 'first', 'basic', 'primary', 'early', ],
['love', 'charity', 'devotion', 'passion', 'tender', ],
['sin', 'wicked', 'unrighteous', 'evil', 'darkness'],
['soul', 'heart', 'mind'],
['belief', 'compassion', 'devotion', 'faith', 'trust'],

Lists less than 10 items are the best but they can be larger.

These will be weighted for use in the system, so the terms can be listed in any order.

If anyone wants to help with this it will make the system better and the data could also be later used to train a LLM when they become feasible. I have a large list of related words taken from strongs numbers (over 7000 rows!) that need filtered and better sorted if someone wants to try that.

Let me know if you are interested. It is this type of "grunt work" that makes the nice things work.
Tim Morton
Developer, Bible Analyzer

But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)

epement
Posts: 114
Joined: Fri Sep 09, 2011 9:00 pm
Location: Florida
Contact:

Re: New Sematic Searching in the Works - Help

Post by epement »

I'd be willing to help with some of these, as the Lord provides the time.

Some terms will require two or three words to express, such as "inner man", and others will requires some type of regular expression. Example:

[ "in (Christ|Jesus|Christ Jesus|Jesus Christ|(the|my|our) Lord)" ] # where double-quote allows grouping

Prepositional phrases such as "in him" and "in whom" will also be relevant. I look at items like

['soul', 'heart', 'mind'],

and can expand it to:

['soul', 'heart', 'mind', 'reins', 'spirit', 'imagination', 'belly', 'thought', 'idea']

Does seem like a lot of work. You shoudl be able to a free copy of Roget's Thesaurus online and adapt it for your needs.

* Project Gutenberg: https://www.gutenberg.org/ebooks/22
* Copyright free Roget's: https://www.bartleby.com/lit-hub/rogets ... d-phrases/
* Concept analysis for using a thesaurus: https://www.roget.org/Sedelow96.pdf

Hope this helps.
Eric Pement
2 Cor. 4:5

Tim
Site Admin
Posts: 1580
Joined: Sun Dec 07, 2008 1:14 pm

Re: New Sematic Searching in the Works - Help

Post by Tim »

Thanks for the offer.

I have several lists already together that also include multiple word groups. I pretty much have the code ready to utilize the lists, just refining of both is needed.

I'm not using regular expressions in this tool. The primary keyword search in Bible Analyzer relies heavily on them, but this one is pure text hacking with decreasing n-gram grouping, SQL proximity matching, dynamic synonym substitution, and the like. It is a whole new system that will offer several types of search methods chained together to produce instant results.

When I get the synonym lists more complete, I'll post it for you and others to look at and add to or refine. Synonym substitution cannot be flawless because words have different meaning in different contexts, but the false hits aren't that common or distracting.
Tim Morton
Developer, Bible Analyzer

But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)

Tim
Site Admin
Posts: 1580
Joined: Sun Dec 07, 2008 1:14 pm

Re: New Sematic Searching in the Works - Help

Post by Tim »

Here is a video of my progress so far on the semantic search work.
https://youtu.be/H51I71IKQyQ

Take note of the extended/topical searching in the latter part. This is getting closer to what I have in mind. In general it pretty much matches the results with the LLM vectored search and with the specific names added it surpasses it.

If this works out and people find it of value I may put all the people in by ID instead of name (ExLBP data) plus all the pronouns of deity and make the search more comprehensive. That will be down the road a little, though.

I need more topic lists.
Tim Morton
Developer, Bible Analyzer

But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)

darrel_jw
Posts: 283
Joined: Sun Dec 13, 2015 3:38 am

Re: New Sematic Searching in the Works - Help

Post by darrel_jw »

Tim, I like what you are doing with searches. I realize I'm probably pushing the envelope, but I'd really like to see you open up original language searching. But know that I like and will use the semantic search stuff. And yes, the search speed is impressive!

Darrel

Tim
Site Admin
Posts: 1580
Joined: Sun Dec 07, 2008 1:14 pm

Re: New Sematic Searching in the Works - Help

Post by Tim »

Here is a demo video of what I came up with concerning Semantic searching and other different ways to search the Bible,
https://youtu.be/Rj_K9cqHY_g

It should be released in a week or so as an Add-On or Plugin for Bible Analyzer.
Bible Analyzer v5.6.5 will be released the same time.
Tim Morton
Developer, Bible Analyzer

But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)

Post Reply