Page 1 of 1

Problem in replaceAll predicate

Posted: 22 Oct 2014 12:55
by ahmednadi
Dear Sir;
Could you help me?

I have an Arabic text and I want to modify some characters to be written in some way.

Code: Select all

Str1=string::replaceAll(Str," ب "," ب"), Str2=string::replaceAll(Str1," ل "," ل"), Str3=string::replaceAll(Str2," و "," و"),
But it doesn't work and the output still like the input.

The aim is to remove the white space after each character in condition that this character has a white space before it.

How can I overcome this?

Thank you in advance.

regards;

Ahmed Nady

Re: Problem in replaceAll predicate

Posted: 22 Oct 2014 14:45
by Thomas Linder Puls
The [code]...[/code] tags cannot handle your arabic characters correctly. This is how it looks without the tags (and I guess this is what you have written in your editor):
ahmednadi wrote:Str1=string::replaceAll(Str," ب "," ب"),
Str2=string::replaceAll(Str1," ل "," ل"),
Str3=string::replaceAll(Str2," و "," و"),
I guess the problem can be reading direction. Unicode have some mechanism for switching between left-to-right and right-to-left reading direction based on the charactes in use. Some Windows routines utilize these mechanism, but other things doesn't.

As example the sciLexer editor have a problem on that account, because Windows will write the text using the reading direction order, but the editor actually think that everything is layed-out left-to-right. So what you see on the screen may not actually correspond to what is in the file.

It it is such a problem it can be in one or more ofht following places:
  1. The input can be different than you thought
  2. Your code can be different than you thought
  3. The thing you use to view the result in can behave different than you thought
You can control the code by using \uXXXX instead of writing the arabic letters directly in the editor:

Code: Select all

Str1=string::replaceAll(Str," \u0628 "," \u0628"), Str2=string::replaceAll(Str1," \u0644 "," \u0644"), Str3=string::replaceAll(Str2," \u0648 "," \u0648"),
(Notice that XXXX in \uXXXX must be four hexadecimal digits. I believe, I have recalculated the the ones used in your example to the proper hex values).

But that only fix bullet 2. I don't know how to deal with bullet 1 and 3.

Posted: 22 Oct 2014 15:05
by Thomas Linder Puls
The change in reading direction mentioned above is implicit, each character is ltr (left-to-right), rtl or neutral:
  • When an ltr character is met the reading direction changes to ltr
  • When an rtl character is met the reading direction changes to rtl
  • When a neutral character is met the reading direction remains what it currently is
The normal space character is neutral, but I belive there also exist explicit ltr and rtl space characters.

There are also invisible characters for changing the the direction (that will only affect the direction of imediately following neutral characters).

There is also a non-breakable space (I think it is \u00A0). If your input contains any such characters the "find" patterns may not match even thought they appear to match at first sight.

" A" does not match " A" if one of them have a non-breakable space and the other a normal space.

Posted: 1 Nov 2014 12:21
by ahmednadi
Dear Sir;
Thank you.
I try all these but there are some cases still have problems.

I try the following:

Code: Select all

Str1=string::replaceAll(Str," \u0644 "," \u0644"),  %ل Str2=string::replaceAll(Str1,"\u00A0\u0644\u00A0","\u00A0\u0644"),  %ل
But bad result I have.

Could you help me to understand this problem?

Regards;

AHMED NADY

Posted: 1 Nov 2014 20:46
by Thomas Linder Puls
I will have to see the contents of Str, but it have to be in

I think it will be best if you add code like this to get accurate data in a file (attach the file to your reply):

Code: Select all

        O = outputStream_file::createUtf8("text.txt"),         Bin = binary::createAtomicFromPointer(convert(pointer, Str), sizeof(Str)),         O:writef("%\n============================================\n%\n", Str, Bin),         O:close()

Posted: 2 Nov 2014 11:17
by ahmednadi
Dear Sir;
Please find attached herewith the STRING file.
Regards;
AHMED

Posted: 2 Nov 2014 18:25
by Thomas Linder Puls
It seems the other place you wan't to change have the form:

Code: Select all

"\u0020\u0627\u0644\u0020"
" ال "

  • \u0627 is ا
  • \u0644 is ل