Regular Expression

danielmistura · May 1, 2023, 6:17pm

My text has 150 pages and it has , or ; or ! or .....

Juan_Antonio · May 1, 2023, 7:03pm

Can you give another example of input and output of what you want?

i.e. i have aaa bbb ccc ddd and i want to get aa bb cc

danielmistura · May 1, 2023, 7:35pm

Another example:
I would like in a text, to search for all the words followed by the character "; ".
More generally, I would like to use regex in App inventor because I'm used to doing it in other languages.

Juan_Antonio · May 1, 2023, 8:06pm

danielmistura · May 1, 2023, 8:49pm

Very interesting method, I had not thought of it. THANKS.
Nevertheless, the complexity of the implementation removes all the charm of regex.

TIMAI2 · May 1, 2023, 8:58pm

danielmistura · May 1, 2023, 9:07pm

How to recover capture groups with Kevinkun.
regex = "\b((\w)[\w]*?\2\b)"
text = "x abaa, bac; 1211."

Kevinkun · May 2, 2023, 5:34am

KevinkunRegex extension can capture groups, but you need to figue out how to write the right regular expression.

danielmistura · May 2, 2023, 6:05am

You do not retrieve capture groups but only complete matches.
If I need a smaller group, for example (\w) I don't know how to get it.

Kevinkun · May 2, 2023, 6:47am

here is the source code of GetMatches

	public List<String> GetMatches(String string, String pattern) {
		List<String> ls = new ArrayList<String>();
		Pattern p = Pattern.compile(pattern);
		Matcher m = p.matcher(string);
		while (m.find()) {
			ls.add(m.group());
		}
		return ls;
	}

I have no idea how to change it to meet your need.

danielmistura · May 2, 2023, 7:11am

I prepare the specifications of only two functions:
Regexp(text,reg,flag,start)
RegexpReplace(text,reg,replace)
and I study java regex, in particular m.group()
then I come back to you.

danielmistura · May 2, 2023, 7:31am

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexGroup {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("i(s)");
String input = "My name is Khan and m not a terrerist.";
Matcher m = pattern.matcher(input);
m.find();
String grp0 = m.group(0);
String grp1 = m.group(1);
System.out.println("Group 0 " + grp0);
System.out.println("Group 1 " + grp1);
System.out.println(input);
}
}

output:
Group 0 is ----------> full match of occurrence 1
Group 1 s ----------> captured group number 1 of occurrence 1

Kevinkun · May 2, 2023, 7:35am

in fact I alread tried with this:

	@SimpleFunction(description = "获取符合规则表达式的片段，返回列表")
	public List<String> GetMatches2(String string, String pattern) {
		List<String> ls = new ArrayList<String>();
		Pattern p = Pattern.compile(pattern);
		Matcher m = p.matcher(string);
		m.find();
		for (int i = 0; i < m.groupCount(); i++) {
			ls.add(m.group(i));

		}

		return ls;
	}

and I got this:

danielmistura · May 2, 2023, 9:29am

Instead of ["aba", "a"]

Patryk_F · May 2, 2023, 10:09am

Try changing:

for (int i = 1; i <= m.groupCount(); i++) {
			ls.add(m.group(i));

So that it would show group 1 and 2 instead of 0 and 1.

I tested in a compiled extension:

chart

@SimpleFunction(description = "获取符合规则表达式的片段，返回列表")
 public List<String> GetMatches2(String string, String pattern) {
    List<String> ls = new ArrayList<String>();
    Pattern p = Pattern.compile(pattern);
    Matcher m = p.matcher(string);
    while (m.find()) {
      for (int i = 1; i <= m.groupCount(); i++) {
        ls.add(m.group(i));
      }
   }
   return ls;
}

chart

danielmistura · May 2, 2023, 10:32am

Project:

Only one bloc !
Change "error in pattern" to "error in pattern or start"

danielmistura · May 2, 2023, 12:06pm

EX1
Regexp("Aujourd'hui c'est Dimanche", "\bDi")
---> 1
Regexp("Aujourd'hui c'est dimanche", "\bDi")
---> 0

EX2
Regexp("Dimanche Lundi Mardi Mercredi", "\b\w+di\b", 1)
---> [1, [15, "Lundi"]]

EX3
Regexp("Dimanche Lundi Mardi Mercredi", "\b(\w+)(di)\b", 1)
---> [1, [15, "Lundi", "Lun", "di"]]

EX4
str="Dimanche Lundi Mardi Mercredi"
list = Regexp(str, "\b(\w+)(di)\b", 1)
while list[1] = 1
#traiter list
list = Regexp(str, "\b(\w+)(di)\b", 1, list[2][1])

--->[1, [15, "Lundi", "Lun", "di"]]
--->[1, [21, "Mardi", "Mar", "di"]]
--->[1, [30, "Mercredi", "Mercre", "di"]]
--->[-1]

EX5
str="Dimanche Lundi Mardi Mercredi"
list = Regexp(str, "\b(\w+)(di)\b", 2)

--->[1, ["Lundi", "Lun", "di"], ["Mardi", "Mar", "di"], ["Mercredi", "Mercre", "di"]]

Kevinkun · May 4, 2023, 5:21am

I added 3 blocks, the block name and above image explains what they can do.
You can download the new extension here: (正则表达式插件 · 浮云小站)

danielmistura · May 4, 2023, 6:06am

Kevin, it's not a personal need to be able to access captured groups but the need of any serious regexp user.
I will test your new version.
Cordially

danielmistura · May 4, 2023, 7:52am

It still lacks ReplaceAll which supports groups.

EX1
We want to replace all vowels with @
str=""Where have all the flowers gone?"
str=RegexpReplace(str, "[aeiou]", "@")
---> "Wh@r@ h@v@ @ll th@ fl@w@rs g@n@?"

EX2
We want to change the format of a date
str="date: 14:27 28/03/2023"
str=RegexpReplace(str, "(\d{2})/(\d{2})/(\d{4})", "$2.$1.$3")
---> "date: 14:27 03.28.2023"

$0 is full match: "28/03/2023"
$1 is captured group number 1: "28"
$2 is captured group number 2: "03"
$3 is captured group number 3: "2023"
The "/" is not captured, is remplaced by "."

Cordially