Count every occurrence of a word (query)

Topics of Interest other than Bible Analyzer
Post Reply
believehim
Posts: 7
Joined: Sat May 04, 2013 8:32 am

Count every occurrence of a word (query)

Post by believehim »

Hi Tim, ** Regarding Sqlite / Python ** I'm studying Python/Sqlite by querying a KJV database.

I know this is a little off topic, but would you mind sharing how you are counting the number of occurrences of a word, per verse? For example, on a normal Sqlite3 table, I can get the verse occurrences by using LIKE...

for example,
SELECT bname,cnumber, vnumber FROM kjv_bible where vtext like '%David%' ORDER BY bnumber,cnumber,vnumber;

This returns all of the chapters and verses, however it doesn't get every occurrence in a verse...for example, in 1 Samuel 17:39, David's name occurs 3 times.

I assume your probably using a Python loop or the Full Text Search feature in Sqlite, or maybe both.

It would help my studies. If you prefer email, please email.

Thanks for the great Bible study tool.

Marshall

Tim
Site Admin
Posts: 1454
Joined: Sun Dec 07, 2008 1:14 pm

Re: Count every occurrence of a word (query)

Post by Tim »

Database queries using SQL will only get you so far.

As you found one can search for strings in an entry, but that is about it. There are the issues of word case, partial words, multiple words, counts, etc., that can't be "quickly" found using only SQL. The Full Text Search will work for most of it, but it is overkill and slow on a larger database because it has to examine every record much more closely to check for a match.

Since computers have gotten much faster over the last few years search speed is not as critical as it used to be. But it still is best practice to make a search as efficient as possible. So the answer is to do the queries in stages.

A plain SQL string query as you are doing is very fast. It is faster than nearly any other method, even a Python string search, but limited. So here are the steps I have found are the most efficient for doing a Bible search in a DB.

1. Do a raw SQL search for the basic string.
If searching for "David" your
SELECT bname,cnumber, vnumber FROM kjv_bible where vtext like '%David%'
should work.

This will get you every verse that has the string "david" in it regardless of case or word boundary.

2. Take all your hits from above and run a regular expression on them to filter out the actual hits.
This is done in Python (or any other language). (Running a RegEx on each record of the DB is much slower.) This way you are just running it on records that can have a hit.

To do this, save your SQL results to a list and run a RegEx on each item with a "for loop." Here is a RegEx that will find "David" only if it is a separate word and capitalized and then put the true hits in another list,

trueHits = []
for item in hitList:
trueHit = re.search(r'\bDavid\b', item, re.I)
if trueHit:
trueHits.append(item)

Now as to your question on counting the hits. If you are searching for a very specific string like "David," you can just do a count on that item string:

cnt = item.count('David')

But there is another way to do it with a RegEx:

item, cnt = re.subn(r'(\bDavid\b)', r'\1', item, re.I)

The subn RegEx is a substitution and counting one. Here we are not substituting anything (just putting the found string back in) but counting at the same time.

This is the basics of searching the Bible or any other text.
Tim Morton
Developer, Bible Analyzer

But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)

believehim
Posts: 7
Joined: Sat May 04, 2013 8:32 am

Re: Count every occurrence of a word (query)

Post by believehim »

Thats very interesting. Thank you very much.

Post Reply