[FREE] Extension Xreg V1.2- PCRE regular expressions for App Inventor1

1-Introduction
The Xreg extension only contains two methods but covers 95% of Java functionality on regular expression.
What does it do more than the other two extensions that exist? It works for matches, capture groups, and any sort of replacement in string! See the following post:
[Regular Expression]

Both methods are designed simple enough to be integrated into App Inventor as on-chain blocks. See the same integration in AutoIt:
https://www.autoitscript.com/autoit3/docs/functions/StringRegExp.htm

Many sites have introductions to regular expressions, aimed at beginners.
The following page contains the exhaustive list of PCRE features:
https://www.pcre.org/original/doc/html/pcrepattern.html

2-Blocks Images
Regexp
Check if a string fits a given regular expression pattern.

RegexpReplace
Replace text in a string based on regular expression pattern.

3-Documentation Regexp()
Regexp ( text:text, pattern:text, mode:int, start:int) : list
• text - The subject string to check
• pattern - The regular expression to match
• mode - A number to indicate how the function behaves.
        1 - return data for FIRST match: [match1] if successful, else []
        2 - return data for ALL matchs: [match1, match2, ....] if successful, else []
• start -The string position to start the match.

What are data for match?

  1. first character index of the match
  2. full match text

That's it if the model does not contain any captured groups.
If the model designer has requested the capture of numbered groups, the text of the groups is added to the list but not their position.
For one match, the data are:

  1. first character index of the match
  2. full match text
  3. text of capture group number 1
  4. text of capture group number 2
  5. etc...The programmer knows how many capture groups he has created in his pattern !

What is Error?

No error if start > length(string) : the function return []. No error if start < 1 : the function assumes start=1. No error if mode < 1 : the function assumes mode=1. No error if mode > 1 : the function assumes mode=2. Regardless of the mode, if the function fails, it returns the list: [0, error message].
A frequent error comes from the regular expression, the error message indicates approximately the index of the syntax error in pattern.

4-Documentation RegexpReplace()
RegexpReplace(text:text, pattern:text, replacement:text, count:int): text
• text - The subject string to check
• pattern - The regular expression to match
• replacement - The text to replace the regular expression matching text with. To insert full match use $0 (or \0). To insert captured groups text, use $1,...,$9 (or \1,...,\9) as back-references.
• count - The number of times to execute the replacement in the string. Use 0 for global replacement.

Remarks
1-To separate back-reference replacements from actual (replaced) numbers, wrap them with curly braces, i.e: "${1}5".
2-If a "" needs to be in the replaced string it must be doubled. This is a consequence of the back-references mechanism.
3-The "" and "$" replacement formats are the only valid back-references formats supported.

5-Examples
#1: The search is case sensitive.
Searches for the three letters AND or ING but in uppercase. Since there is none in the text, the function returns the list: [].

#2: The options (?i) et (?x)
Placed at the beginning of the pattern, the option (?i) makes the search case-insensitive. The (?x) option makes spaces insignificant.

#3: Using mode=1
We have a supposedly very long text and we are looking for FIRST words ending with "ing".
The result is list : [ index of FIRST match, text match].
ex3

#4: Using mode=2
We now want ALL the matches that result from the search.

#5: Using the start parameter

#6: Capture groups
In addition to the match also called full match, it is possible to recover match portions called groups.
They are delimited in the regular expression by parentheses. The first opening defines group 1, the second defines group 2, etc...

#7: Subpattern
Sometimes you need to group statements with parentheses without defining a capturing group.
For this we use (?: ... ).
Below we are looking for signed integers in a text.

#8: Comments in regular expression
The example below validates an entire string as a strong password.
The example also shows how a regular expression can be written using the (?x) option. The special character # begins a comment that extends to the end of the line.
ex7a

The subpattern of the form (?=X) which appears several times in the pattern is called Positive Look-ahead: it is a test which is checked if the subpattern X matches from the current position. This test does not consume characters i.e. the current position is not modified in the text.
ex8

#9: Option DotAll
We consider the source text of a Java program and we look for block comments /* ....*/ to color them in green or to translate them or to delete them or ...
To do this, we use the dot "." which matches any character except (by default) a newline (CR, LF, CRLF). As our comments contain "new line" we remove this restriction with the option (?s) called DotAll.
ex9

#10: Date format conversion.
In this example we convert the format "dd/mm/yyyy" to "yyyy-mm-dd" in all the text.

Since dd is the first captured group, it is referred to as $1, mm is the second captured group, it is referred to as $2 and yyyy is the third captureg group referred to $3.
replace1

#11: Remove all tags from html text.
We search for < followed by any character except > repeated any number of times, followed by >. We replace match with the empty text.
replace2

#12: The following example illustrates the need for a double backslash
As above we search for % followed by any character except % and repeated any number of times, and finally followed by %.
replace3

#13: RegexpReplace function limit
The RegexpReplace function recognizes the character pairs $0, $1,... in the replacement text, which makes it possible to treat 9 out of 10 cases. However, if you need a system function or a personal function to define the replacement text, RegexpReplace is faulty. This is because Java allows a Lambda expression as a parameter, while App Inventor does not.
The example below proposes a reusable function ReplaceAllEx which uses the ComputeReplace function to be adapted to each case. It takes a matchResult list as an argument:
[iPos, fullMatch] or [iPos, fullMatch, group1, group2,....]
and returns the text that will replace fullMatch.
replace4
replace4a

6-Download:
fr.danielm.Xreg.aix (6.6 KB)
The latest version 1.5 is here

2 Likes

[FREE] Extension Xreg V1.4- Java regular expressions for App Inventor.
This version cancels the previous one, the example blocks are modified.
Download New Version V1.4
Xreg.aix (6.4 KB)

1-Introduction
The Xreg extension only contains two methods but covers 95% of Java functionality on regular expression.

Many sites have introductions to regular expressions, aimed at beginners.
I'm creating full documentation for App Inventor and a tool to learn how to create regular expressions.

2-Blocks Images
Regexp
Check if a string fits a given regular expression pattern.

RegexpReplace
Replace text in a string based on regular expression pattern.
The count parameter was removed in version 1.4

3-Documentation Regexp()
Regexp ( text:text, pattern:text, mode:int, start:int) : list
Searches for a sequence of text that satisfies the criteria defined by the regular expression.

  • text - The subject string to check
  • pattern - The regular expression to match
  • mode - A number to indicate how the function behaves.
    1 - return data for FIRST match: [match1] if successful, else []
    2 - return data for ALL matchs: [match1, match2, ....] if successful, else []
  • start - Index in text to start the match.

What are data for match?
If the pattern does not contain any captured groups, the match result is:

  • first character index of the match;
  • full match text.

If the pattern designer has requested the capture of numbered (or nommed) groups, the text of the groups is added to the match result but not their position.
For one match, the data are:

  • first character index of the match
  • full match text
  • text of capture group number 1
  • text of capture group number 2
    etc...The programmer knows how many capture groups he has created in his pattern !

What is Error?

No error if start > length(string) : the function return []. No error if start < 1 : the function assumes start=1. No error if mode < 1 : the function assumes mode=1. No error if mode > 1 : the function assumes mode=2. Regardless of the mode, if the function fails, it returns the list: [0, error message].
A frequent error comes from the regular expression, the error message indicates approximately the index of the syntax error in pattern.

4-Documentation RegexpReplace() V1.4
RegexpReplace(text:text, pattern:text, replacement:text): text
Replaces each full match in text with replacement text built on the fly by the function from instructions in the "replacement" text.
Similar to Java-replaceAll().

  • text - The subject string to check
  • pattern - The regular expression to match
  • replacement - The text to replace the regular expression matching text with. To insert full match use $0. To insert captured groups text, use $1,...,$99 as back-references.

5-Versions update
V1.1 : Regexp use 3 modes: 0(bool), 1(first), 2(all)
V1.2 : Remove mode=0, Regexp return [] instead false.
V1.3 : Fix ambiguous cases, examples a* and (a)|\w
V1.4 : Remove count parameter from RegexpReplace() .

1 Like

New title:
[FREE] Extension Xreg V1.5- Java regular expressions for App Inventor.
Version V1.5 adds features to version V1.4.
Download New Version V1.5
Xreg.aix (9.5 KB)

News features in "replacement" string of RegexReplace block:
The replacement string can contain special sequences, defined after each match, which the function interprets in the following way:

Sequence Description
$0 The sequence $0 denotes the complete match. The function first replaces each occurrence of $0 in replacement with the full match, then replaces the full match in the original text with replacement.
  ➤ RegexpReplace("-123-", "\d", "$0$0") returns the text "-112233-".
$1,...,$99 The sequence $ followed by a digit or two digits will be replaced in the replacement string by the specified number capture group.
The parser considers the two digits following '$', if it finds two. If a capture group with this number exists, the sequence "$dd" will be replaced by the capture group n°dd, and if no group has this number, the analyzer repeats the same test with "$d" and will add the second digit behind group n°d.
You can create up to 99 capture groups in Java.
  ➤ RegexpReplace("06/10/2023", "(\d\d)/(\d\d)/(\d\d\d\d)", "$0 -> $3-$2-$1") returns the text "06/10/2023 -> 2023-10-06".
  ➤ RegexpReplace("123", "(\d)", "$14") returns the text "142434" because the group n°14 does not exist so $14 is interpreted as $1 followed by the number 4 for all three occurrences.
  ➤ RegexpReplace("345789a", "(((3)(4))(5))((7)(?8))(9)(a)", "$10") returns "a" because group 10 exists (count the opening parentheses).
  ➤ RegexpReplace("345789a", "(((3)(4))(5))((7)(?8))(9)(a)", "$12") returns "3452" because group 12 does not exist so the function adopts the interpretation "$1 followed by the number 2".
${1}2 Represents group n°1 followed by the number 2. If capture group n°12 does not exist, it is the same to write $12, but if it exists, the braces are used to force a rather than another.
  ➤ RegexpReplace("345789a", "(((3)(4))(5))((7)(?8))(9)(a)", "${1}0") returns "3450", group n°1 followed by '0'.
${name} The sequence ${name} will be replaced in the replacement string by the capture group named "name". This method of referencing groups is especially useful when there are many capture groups to manage.
  ➤ RegexpReplace("06/10/2023", "(?<day>\d{2})/(?<month>\d{2})/(?<year>\d{4}) ", "${year}-${month}-${day}") returns the text "2023-10-06".
In addition to the previous sequences supported by Java, RegexpReplace() recognizes the following syntaxes:
$Ln, $L{n}, $L{name} The letter 'L' or 'l' (Lower) inserted between '$' and the group number or name will make the group lowercase.
  ➤ RegexpReplace("App Inventor", ".+", "$L0") returns the text "app inventor": the full text in lower case.
$Un, $U{n}, $U{name} The letter 'U' or 'u' (Upper) inserted between '$' and the group number or name will capitalize the group.
  ➤ RegexpReplace("Le Corbeau et le Renard", "(.)e", "$u1e") returns "Le CorBeau et le Renard": all characters immediately before the letter e are capitalized, when they have a capital letter!
$Tn, $T{n}, $T{name} The letter 'T' or 't' (Title) inserted between '$' and the number or name of the group will capitalize the first letter of each word in the group.
  ➤ RegexpReplace("app inventor", "(.+)", "$t1") returns "App Inventor".
2 Likes

[FREE] Extension Xreg V1.6- Regulars Expressions for App Inventor.
Version V1.6 adds features to version V1.5.
Download New Version V1.6:
Xreg.aix (10.7 KB)

(?(DEFINE)(?<name>X)) Procedure submodel: imitates the behavior of a non recursive subprogram named "name". This concept is not supported by Java8, the syntax of the procedure declaration is borrowed from PCRE.
  ➤ (?(DEFINE) (?<byte> 25[0-5] | 2[0-4]\d | 1\d\d | [1-9]?\d) )
is an example of a procedure subpattern that defines a byte and can be called repeatedly with the syntax (?&byte), see below for a complete example.
Note that: 25[0-5] characterizes bytes from 250 to 255, 2[0-4]\d from 200 to 249, 1\d\d from 100 to 199 and [1-9]?\d from 0 to 99. The order of the alternatives is important, but despite these precautions the model would succeed twice with 256, once with 25 and once with 6. To improve our model we must also tell it, "no number before , nor after" (see below).

(?&name) Calling a submodel procedure: Xreg performs pre-processing on the model by replacing each call (?&name) with the submodel (?:X).
The procedure subpattern definition can be anywhere in the model, before or after the call, and one procedure subpattern can call another, but not itself.
  ➤ Validation of an IPV4 address:
(?x)
(?(DEFINE) (?<byte> (?<! \d) (?: 25[0-5] | 2[0-4]\d | 1\d\d | [1-9]? \d) (?! \d) )) # proc byte
\b (?&byte) (\.(?&byte)){3} \b) )

Alternate valid writing with two procedure subpatterns:
(?x) (?&IPV4) # call to IPV4
(?(DEFINE) (?<IPV4> \b (?&octet) (\. (?&octet)){3} \b) ) # proc IPV4
(?(DEFINE) (?<byte> (?<! \d) (?: 25[0-5] | 2[0-4]\d | 1\d\d | [1-9]? \d) (?! \d) )) # proc byte.

Title: Xreg Guide (FR and EN )
Regulars expressions syntax of Xreg with examples.
Please, replace .txt to .htm to look guide in navigator.
XregGuideENv1.6.txt (75.0 KB)
XregGuideFRv1.6.txt (81.7 KB)

Table of contents:
1-Regulars expressions
2-The Xreg V1.6 extension for App Inventor
3-Analyzer operating options
4-Special characters
5-Custom Character Sets
6-Predefined sets of characters
7-POSIX Character Sets
8-Java Character Sets
9-Anchors
10-Assertions
11-Parenthesised groups
12-Alternatives
13-Quantifiers
14-Appendix - Unicode Characters