readString problm

Discussions related to Visual Prolog
User avatar
Tonton Luc
VIP Member
Posts: 814
Joined: 16 Oct 2001 23:01

readString problm

Unread post by Tonton Luc » 19 May 2009 15:39

Hi,
Why using the following code to read a txt file of 169976 octets, the note display 137 octets only and the size of myNewFile.txt = 139 octets ? (Str contein only the begining of the file) :shock:

Code: Select all

        Str = file::readString("c:\\myFile.txt",_),         note(toString(string::length(Str))),         file::writeString("c:\\myNewFile.txt",Str,false()),  
...I cann't post myFile.txt (generated by an OCR) because it's private.

If I delete 38 caracters from caracter 139, readString read all the new txt file (myNewFile).
I've the same result if I delete more of 38 caracters from caracter 139.
I've the same result if I delete more of 39 caracters from caracter 138.
I've the same result if I delete more of 40 caracters from caracter 137.
But if I delete 37 caracters from caracter 139, readString read ONLY the 139 first caracters of myNewFile.txt.
It's very strange. I don't understand what's append.
Please find the 38 caracters deleted enclosed (zzzz_139.txt)=> maybe somebody explain me why the length of the string generated by a file::readString of this file = 0 ?

Code: Select all

        Bin_ini = file::readBinary("c:\\myFile.txt"),         X = 139,         Byt = 38,         binary::splitBinary(Bin_ini,X,_,Bin_ini_suite),         binary::splitBinary(Bin_ini_suite,Byt,Bin_del,_),         file::writeBinary(string::format("c:\\zzzz_%.txt",X),Bin_del),         NewBin = binary::delete(Bin_ini,X,Byt),         file::writeBinary("c:\\myNewFile.txt",NewBin),         NewStr = file::readString("c:\\myNewFile.txt",_),
Attachments
zzzz_139.txt
(38 Bytes) Downloaded 300 times

User avatar
Thomas Linder Puls
VIP Member
Posts: 2438
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls » 25 May 2009 18:21

Your file is not a proper text file, because text cannot contain null-characters. Even though the entire file is read into a large piece of memory, the string terminates at the first null-character (because that is the nature of strings).
Regards Thomas Linder Puls
PDC

User avatar
Tonton Luc
VIP Member
Posts: 814
Joined: 16 Oct 2001 23:01

Unread post by Tonton Luc » 26 May 2009 6:58

Hi Thomas,
Tks for your reply.
How to detect null-characters ?
Does zzzz_139.txt contain only null-characters or only one ?

User avatar
Thomas Linder Puls
VIP Member
Posts: 2438
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls » 26 May 2009 7:46

It contains many null-chars (see image).

Since it (obviously) isn't a text file (despite the txt extension) you should not read it as a text file.

You can read it into a binary:

Code: Select all

Bin = file::readBinary("c:\\myfile.txt"), ...
Whether this is a good approach depends on what you intend to do with the contents of the file.
Attachments
zzzz_139.png
zzzz_139.txt (in Hex)
zzzz_139.png (5.19 KiB) Viewed 3214 times
Regards Thomas Linder Puls
PDC

User avatar
Tonton Luc
VIP Member
Posts: 814
Joined: 16 Oct 2001 23:01

Unread post by Tonton Luc » 26 May 2009 8:06

Hi,
I need to read this file to search some words to get some values.
For example, if myFile.txt contain "number of white cars = 150", to get "150", I search "number of white cars = " and frontstr and fronttoken.

When I see your answer, I suppose my approach is not good. It will better to use readBinary instead readString, no ?

User avatar
Thomas Linder Puls
VIP Member
Posts: 2438
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls » 26 May 2009 18:55

It is difficult to give good advise without knowing the format of the file. Your description sound like a text file, but the contents you have shown does absolutely not look like a text file.
Regards Thomas Linder Puls
PDC

Gildas Menier
VIP Member
Posts: 78
Joined: 8 Jun 2004 23:01

Unread post by Gildas Menier » 26 May 2009 19:07

Hi Luc,

dumb (?) suggestion : load your binary with ReadBinary, replace the nulls by 32 (or any space value) / (binary:: search and setvalue ?), convert it back to a string and perform your test (?).
(The best way would of course to have access to a kind-of grammar for your file and analyze it using a parser)

Regards
Gildas

User avatar
Tonton Luc
VIP Member
Posts: 814
Joined: 16 Oct 2001 23:01

Unread post by Tonton Luc » 27 May 2009 6:53

...and what binary value do I need to search to replace a null character ?

Code: Select all

Bin_ini = file::readBinary("c:\\myFile.txt"), if Pos = binary::search(Bin_ini,$[???]) then     binary::setValue(Bin_ini,Pos,$[???]) end if,

User avatar
Thomas Linder Puls
VIP Member
Posts: 2438
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls » 27 May 2009 18:08

Whether this at all makes sense depends on the format of the file.

Does it contain ANSI or Unicode strings?

What is between the strings ? Unicode characters are two bytes each, so it is important to know where they start?

A Unicode null-character is two zero bytes. But two consecutive zero-bytes is only a null character if the two bytes are in the same character. Latin based Unicode-text has zero-bytes in nearly each second byte, and a null-character is two zero-bytes, this gives three consecutive zero-bytes. But the first two are in different characters.

All in all, my advice is to obtain more information about the format of the file.
Regards Thomas Linder Puls
PDC

User avatar
Tonton Luc
VIP Member
Posts: 814
Joined: 16 Oct 2001 23:01

Unread post by Tonton Luc » 3 Jun 2009 6:44

Hi,
Thanks for your help.
I've changing the format generated by the OCR : I get Unicode txt file and all is ok now.
Thanks.

Post Reply