Page 1 of 1

regex bug in negated character set

Posted: Mon Aug 14, 2023 11:23 pm
by epement
Thank you, Tim, for all the work you have put into Bible Analyzer. I am using BA version 5.5.1.18 for Windows, and have come across a bug.

I am using Advanced Search, search method Regular Expression, and using this valid Basic Regular Expression.

Code: Select all

h[^ ]d
This matches three characters: the letter 'h', followed by a negated character set consisting of anything other than a space, and the letter 'd'.

Bible Analyzer returns an error message:
"Search error. Please see the User Manual for information on how to use each search method."

If you don't mind me asking, exactly which regex engine was used to enable regexes in Bible Analyzer? It looks like you're using PCRE's (Perl Compatible Regular Expressions), but Perl permits a space character in the expression pattern. This is not an error for Perl or most PCRE-compatible tools.

Re: regex bug in negated character set

Posted: Tue Aug 15, 2023 6:11 am
by Tim
It uses the Python regex engine.

For your search try
h\Sd
with the \S matching any non whitespace character.

Re: regex bug in negated character set

Posted: Tue Aug 15, 2023 7:27 pm
by epement
Thanks for replying quickly, Tim. I know that I can use other synonyms for the regex that I used. The escape sequence \S is one of them, or I could have used [^[:space:]] or [^\x20] instead. My point is that that pattern I used should have matched, and an embedded space in a character set is legal.

Proof:

Code: Select all

$ python3
Python 3.8.10 (default, May 26 2023, 14:05:08) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.search(r'a[^ ]c', 'abc')
<re.Match object; span=(0, 3), match='abc'>
>>> re.search(r'a\Sc', 'abc')
<re.Match object; span=(0, 3), match='abc'>
As you can see, Python matched the pattern. I was just filing a bug report, and I appreciate the workaround you offered.