Problem with readline and file UTF16BE

Discussions related to Visual Prolog
PERRAUD
Posts: 14
Joined: 3 Sep 2001 23:01

Problem with readline and file UTF16BE

Unread post by PERRAUD » 14 Dec 2018 17:04

Hello,

a doc with image, and my file is here: https://1drv.ms/f/s!AuypDPPFdHXsikQBQWJiZs6kgqIL

I use Vip 7.5 ce (Build 7502)
I have a problem with the functions openfile and readLine
I have a file that looks ok, read normally with Notepad ++, or Notepad.
The format is UTF16 with BOM Big Endian.

My code :

Code: Select all

            . . . .     try      FileStream = inputStream_File::openFile(MyFile)     catch _ do        messageLog_E(2004, Merr),        fail     end try.     Enr=Filetream:readLine(),     MM= format("Enr:%",Enr),     writelog(MM),            . . . .
the readline function does not correctly decode the contents of the file.
( see the doc with image )

Note : the call of « file::isUnicode(Full),» is false ( ?)

I need help please.

Thank you.
Daniel Perraud

User avatar
Thomas Linder Puls
VIP Member
Posts: 1624
Joined: 28 Feb 2000 0:01

Re: Problem with readline and file UTF16BE

Unread post by Thomas Linder Puls » 15 Dec 2018 21:21

PFC does not have any support for Big Endian Unicode.

Apparently, relevant Windows functions doesn't have support for it either.
Regards Thomas Linder Puls
PDC

PERRAUD
Posts: 14
Joined: 3 Sep 2001 23:01

Re: Problem with readline and file UTF16BE

Unread post by PERRAUD » 17 Dec 2018 8:27

Thank you for your reply.

So which format would you advise me to use, which could have unicode characters?
Daniel Perraud

User avatar
Thomas Linder Puls
VIP Member
Posts: 1624
Joined: 28 Feb 2000 0:01

Re: Problem with readline and file UTF16BE

Unread post by Thomas Linder Puls » 17 Dec 2018 9:51

Utf-16 Big Endian is natural on Linux/Unix (and thus Mac) based platforms, because these platforms uses big endian storage.

Windows on the other hand uses little endian storage so on Windows utf-16 (i.e. without Big Endian) is more natural. Windows has full support for that format. And strings in Visual Prolog are in that format.

However, utf-8 is highly preferable for files, unless you are mainly dealing with Chinese characters and the like. It has the advantage that it is based on single bytes and therefore it does not have any "endian" issues. So the format is equally natural on Windows and Linux/Unix.

Moreover files in English (and other languages that only/mainly uses the letters A-Z) has half the size in utf-8 as in utf-16.

The Visual Prolog IDE only store files in utf-8 (with a utf-8 "byte mark").
Regards Thomas Linder Puls
PDC

PERRAUD
Posts: 14
Joined: 3 Sep 2001 23:01

Re: Problem with readline and file UTF16BE

Unread post by PERRAUD » 17 Dec 2018 14:36

Yes you are right.
Alas this file format imposed me.
I have a Delphi function library that can read this type of file, I will use calls to external functions.
Thanks again for your answers.

I'm investing in version 8.0 soon. I hope that project migration will be easy.
For 7.0 to 7.5, there had been some re-writing.
Daniel Perraud

User avatar
Thomas Linder Puls
VIP Member
Posts: 1624
Joined: 28 Feb 2000 0:01

Re: Problem with readline and file UTF16BE

Unread post by Thomas Linder Puls » 20 Dec 2018 23:45

Regarding upgrades it will be easier when there are fewer versions between. Here there is only one step:
Regards Thomas Linder Puls
PDC

Post Reply