Page 1 of 1

Filtering the needed facts of very large internal database files

Posted: 10 Jun 2022 17:38
by Kari Rastas
I have in an internal database USA’s custom statistics of monthly trade in goods with different countries. In the database are now 17.5 million facts. Every month new facts are added to it.

The problem in creating pages and charts of different product group’s import and export (for example Gases Sitc 34) with different countries, using and going through the large database in memory is very time consuming. Producing the yearly and monthly charts and html-pages of nearly 100 product groups did take almost 24 hours. So developing and/or correcting the pages was rather “difficult”.

Finally I divided the database in smaller each one product groups facts having files. Reading the rows of such large data file, which is fast, and writing the wanted fact rows in a new file is easy. With the new internal database files the time to make the charts and pages can now be made in some tens of minutes.

Is it possible to create the needed internal database asserting the chosen rows during this reading the rows of the input stream?

Code: Select all

clauses     readProductTerms(InFile,ProductCode):-         Input = inputStream_file::openFile8(InFile),             readProductStream(Input,string::format(",\"%\",",ProductCode)),         Input:close(),!.   predicates     readProductStream : (inputStream Input,string SearchStr) procedure(i,i).   clauses     readProductStream(Input,_):-         Input:endOfStream(),         !.     readProductStream(Input,SearchStr):-         String = Input:readLine(),             handleString(String,SearchStr),         readProductStream(Input,SearchStr).   predicates     handleString:(string RowsString,string SearchString).   clauses     handleString(String,SearchStr):-         LEN = string::length(String),         LEN > 8,         string::search(String,SearchStr)=N,         N>0,!,         ?????             assert this fact in the string to internal database in memory         .

Re: Filtering the needed facts of very large internal database files

Posted: 10 Jun 2022 21:10
by Harrison Pratt
Maybe something like this?

Code: Select all

class facts     myData : (string).   class predicates     readProductTerms : (string InFile, string ProductCode). clauses     readProductTerms(InFile, ProductCode) :-         Input = inputStream_file::openFile8(InFile),         SearchCode = string::format(string::format(",\"%\",", ProductCode)),         foreach Input:repeatToEndOfStream() and S = Input:readLine() and string::length(S) > 8 and string::search(S, SearchCode) > 0 do             % do your custom parsing here             assert(myData("Some data you extract from S"))         end foreach,         Input:close().
Given the large size of your application data, you might want to create a productReader class that creates instances of a productReader that read each ProductCode.

Re: Filtering the needed facts of very large internal database files

Posted: 11 Jun 2022 7:48
by Kari Rastas
The readLine produces a string.

Assert needs that string to be identified as a fact.

That is the problem, to which I do not know the solution. I suppose there must be some predicate to perform that.

It would rather useful in case of very large internal databases to pick the wanted/needed facts using inputstream and forming a smaller database straight to memory. Naturally that also can be easily to accomplished by writing the picked fact strings to a output stream forming a new smaller database in a file and then consulting that new database.

Re: Filtering the needed facts of very large internal database files

Posted: 11 Jun 2022 10:54
by Harrison Pratt
If the data on disk is already in the form of VIP prolog facts, you can do something like the below to assert facts into different databases (if those facts have different structures).

Code: Select all

class facts - myDataDB     myData : (string).   class facts - yourDataDB     yourData : (integer).   clauses     run() :-         MyS = "myData( \"III\" )",         YourS = "yourData(333)",         if MyTerm = tryToTerm(myDataDB, MyS) and YourTerm = tryToTerm(yourDataDB, YourS) then             assert(MyTerm),             assert(YourTerm)         end if.

Re: Filtering the needed facts of very large internal database files

Posted: 11 Jun 2022 16:26
by Kari Rastas
Of course the large datafile is in VIP dataformat. I would not have asked the question, if it would not been. I have used PDC prolog for over 30 years.

That tryToTerm(myDataDB, String) is exactly what I need and what I was originally searching, but there is a "problem". It doesn't exist, at least not in VIP7.5, which I mainly use. it demands tryToTerm(_). When I tried that toTerm before writing this question the forum, the answer was that the type of term can not be decided. I have not yet checked the situation with VIP10.

I have VIP10 (updated it 5 months ago), but I have not yet started to use, because when I updated the code for my picture DLL the new font - "type" caused some problems (with choosing the angle of the text). I have had not the energy and time to figure out the needed changes. Nowadays it takes time and lots of coffee to find the needed new information, so large the VIP has become because the demands of developement of computers and programming languages.

Re: Filtering the needed facts of very large internal database files

Posted: 12 Jun 2022 14:04
by Harrison Pratt
Have you considered writing a tiny tool in Vip10 just to do the data allocation so you can continue to use your Vip7.5 application? It would be a "safe" and easy way to get up to speed on Vip10. The conversion of your 7.5 legacy app(s) in their entirety could be tedious.

Re: Filtering the needed facts of very large internal database files

Posted: 13 Jun 2022 9:33
by Thomas Linder Puls
Writing:

Code: Select all

hasDomain(myDataDB, MyTerm), MyTerm = tryToTerm(MyS)
Will give same effect as:

Code: Select all

MyTerm = tryToTerm(myDataDB, MyS)
When we have that much data we always store it in an SQL database (but usually we also have the need to share it simultaneously between many clients).

But when having it in memory you should really consider creating some "indexes" in form of maps and the like. See Collection library. I can't remember how the collection library looked in vip 7.3, but I am pretty sure that there were at least algebraic red-black trees.