Problem in replaceAll predicate

Discussions related to Visual Prolog
ahmednadi
VIP Member
Posts: 84
Joined: 15 Sep 2009 14:06

Problem in replaceAll predicate

Unread post by ahmednadi » 22 Oct 2014 12:55

Dear Sir;
Could you help me?

I have an Arabic text and I want to modify some characters to be written in some way.

Code: Select all

Str1=string::replaceAll(Str," ب "," ب"), Str2=string::replaceAll(Str1," ل "," ل"), Str3=string::replaceAll(Str2," و "," و"),
But it doesn't work and the output still like the input.

The aim is to remove the white space after each character in condition that this character has a white space before it.

How can I overcome this?

Thank you in advance.

regards;

Ahmed Nady
Last edited by ahmednadi on 22 Oct 2014 13:19, edited 2 times in total.

User avatar
Thomas Linder Puls
VIP Member
Posts: 1622
Joined: 28 Feb 2000 0:01

Re: Problem in replaceAll predicate

Unread post by Thomas Linder Puls » 22 Oct 2014 14:45

The [code]...[/code] tags cannot handle your arabic characters correctly. This is how it looks without the tags (and I guess this is what you have written in your editor):
ahmednadi wrote:Str1=string::replaceAll(Str," ب "," ب"),
Str2=string::replaceAll(Str1," ل "," ل"),
Str3=string::replaceAll(Str2," و "," و"),
I guess the problem can be reading direction. Unicode have some mechanism for switching between left-to-right and right-to-left reading direction based on the charactes in use. Some Windows routines utilize these mechanism, but other things doesn't.

As example the sciLexer editor have a problem on that account, because Windows will write the text using the reading direction order, but the editor actually think that everything is layed-out left-to-right. So what you see on the screen may not actually correspond to what is in the file.

It it is such a problem it can be in one or more ofht following places:
  1. The input can be different than you thought
  2. Your code can be different than you thought
  3. The thing you use to view the result in can behave different than you thought
You can control the code by using \uXXXX instead of writing the arabic letters directly in the editor:

Code: Select all

Str1=string::replaceAll(Str," \u0628 "," \u0628"), Str2=string::replaceAll(Str1," \u0644 "," \u0644"), Str3=string::replaceAll(Str2," \u0648 "," \u0648"),
(Notice that XXXX in \uXXXX must be four hexadecimal digits. I believe, I have recalculated the the ones used in your example to the proper hex values).

But that only fix bullet 2. I don't know how to deal with bullet 1 and 3.
Regards Thomas Linder Puls
PDC

User avatar
Thomas Linder Puls
VIP Member
Posts: 1622
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls » 22 Oct 2014 15:05

The change in reading direction mentioned above is implicit, each character is ltr (left-to-right), rtl or neutral:
  • When an ltr character is met the reading direction changes to ltr
  • When an rtl character is met the reading direction changes to rtl
  • When a neutral character is met the reading direction remains what it currently is
The normal space character is neutral, but I belive there also exist explicit ltr and rtl space characters.

There are also invisible characters for changing the the direction (that will only affect the direction of imediately following neutral characters).

There is also a non-breakable space (I think it is \u00A0). If your input contains any such characters the "find" patterns may not match even thought they appear to match at first sight.

" A" does not match " A" if one of them have a non-breakable space and the other a normal space.
Regards Thomas Linder Puls
PDC

ahmednadi
VIP Member
Posts: 84
Joined: 15 Sep 2009 14:06

Unread post by ahmednadi » 1 Nov 2014 12:21

Dear Sir;
Thank you.
I try all these but there are some cases still have problems.

I try the following:

Code: Select all

Str1=string::replaceAll(Str," \u0644 "," \u0644"),  %ل Str2=string::replaceAll(Str1,"\u00A0\u0644\u00A0","\u00A0\u0644"),  %ل
But bad result I have.

Could you help me to understand this problem?

Regards;

AHMED NADY

User avatar
Thomas Linder Puls
VIP Member
Posts: 1622
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls » 1 Nov 2014 20:46

I will have to see the contents of Str, but it have to be in

I think it will be best if you add code like this to get accurate data in a file (attach the file to your reply):

Code: Select all

        O = outputStream_file::createUtf8("text.txt"),         Bin = binary::createAtomicFromPointer(convert(pointer, Str), sizeof(Str)),         O:writef("%\n============================================\n%\n", Str, Bin),         O:close()
Regards Thomas Linder Puls
PDC

ahmednadi
VIP Member
Posts: 84
Joined: 15 Sep 2009 14:06

Unread post by ahmednadi » 2 Nov 2014 11:17

Dear Sir;
Please find attached herewith the STRING file.
Regards;
AHMED
Attachments
arbSent.txt
(848 Bytes) Downloaded 340 times

User avatar
Thomas Linder Puls
VIP Member
Posts: 1622
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls » 2 Nov 2014 18:25

It seems the other place you wan't to change have the form:

Code: Select all

"\u0020\u0627\u0644\u0020"
" ال "

  • \u0627 is ا
  • \u0644 is ل

Regards Thomas Linder Puls
PDC

Post Reply