regex bug in negated character set

The place to post any Bugs or Problems you may have with the Current Release Version
Post Reply
epement
Posts: 74
Joined: Fri Sep 09, 2011 9:00 pm
Location: Florida
Contact:

regex bug in negated character set

Post by epement »

Thank you, Tim, for all the work you have put into Bible Analyzer. I am using BA version 5.5.1.18 for Windows, and have come across a bug.

I am using Advanced Search, search method Regular Expression, and using this valid Basic Regular Expression.

Code: Select all

h[^ ]d
This matches three characters: the letter 'h', followed by a negated character set consisting of anything other than a space, and the letter 'd'.

Bible Analyzer returns an error message:
"Search error. Please see the User Manual for information on how to use each search method."

If you don't mind me asking, exactly which regex engine was used to enable regexes in Bible Analyzer? It looks like you're using PCRE's (Perl Compatible Regular Expressions), but Perl permits a space character in the expression pattern. This is not an error for Perl or most PCRE-compatible tools.
Eric Pement
2 Cor. 4:5

Tim
Site Admin
Posts: 1460
Joined: Sun Dec 07, 2008 1:14 pm

Re: regex bug in negated character set

Post by Tim »

It uses the Python regex engine.

For your search try
h\Sd
with the \S matching any non whitespace character.
Tim Morton
Developer, Bible Analyzer

But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. (Rom 4:5 AV)

epement
Posts: 74
Joined: Fri Sep 09, 2011 9:00 pm
Location: Florida
Contact:

Re: regex bug in negated character set

Post by epement »

Thanks for replying quickly, Tim. I know that I can use other synonyms for the regex that I used. The escape sequence \S is one of them, or I could have used [^[:space:]] or [^\x20] instead. My point is that that pattern I used should have matched, and an embedded space in a character set is legal.

Proof:

Code: Select all

$ python3
Python 3.8.10 (default, May 26 2023, 14:05:08) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.search(r'a[^ ]c', 'abc')
<re.Match object; span=(0, 3), match='abc'>
>>> re.search(r'a\Sc', 'abc')
<re.Match object; span=(0, 3), match='abc'>
As you can see, Python matched the pattern. I was just filing a bug report, and I appreciate the workaround you offered.
Eric Pement
2 Cor. 4:5

Post Reply