Which is a better structure for a database?
For storing data is it better to use the internal fact database, or an array or other collection of objects?
- Ferenc Nagy
- VIP Member
- Posts: 215
- Joined: 24 Apr 2007 12:26
It depends on the mode of filling and the retrieval of the data
Fact (databases)
mean the simplest way and the oldest elegant own feature of the Prolog since the birth of the ancestor of VIP, that is since Turbo Prolog (or earlier cavemen's Prologs).
The terms can be looked using partly bound compound variables:
will give out all one legged pirates not only Long John Silver.
The beginners are able to learn assert, asserta, assertz, retract and retractall predicates easily.
You can easily use facts in LIFO, FIFO and random access.
I do not know the size limit when the above predicates slow down because of walking all along the stored facts.
The chained data bases
are very elegant but not designed for beginners. You can walk along chains, insert new terms at its beginning, and end, even before and after the current element, or replace a term. It supports b-trees indexing (pointing to) the elements on a chain using a string as key.
They are the best off all but they are not included in the Personal Edition.
They can be hold in the memory or on the hard disk very easily. The opening predicate needs a parameter to decide it then you need not to take care of the storage mode.
The terms can be looked for partly bound compound variables, too:
will give out all one legged pirates not only Long John Silver.
Their limit was about one million term. I could program the logging of the changes using chains. This feature used to be very important to me because I used TurboImage on HP with logging and recovery to a given moment but unavailable in MS Access.
Binary arrays
provide access to one ore more dimensional arrays equally sized atoms using their index.
E. g. you have a table of mutual distances of 10 cities. You can store them in an 10*10 real array or in an 10*(10-1) div 2 array. The correction of a data at a given index is very simple.
The other collections
appeared in the recent versions are taken from the C language.
Search using more than a field in not allowed as above. The records must have unique identifiers.
If you have migrated from C to Prolog then you may like them the best.
mean the simplest way and the oldest elegant own feature of the Prolog since the birth of the ancestor of VIP, that is since Turbo Prolog (or earlier cavemen's Prologs).
The terms can be looked using partly bound compound variables:
Code: Select all
person(Family,FirstName,NickName,Gender,Age,1,"pirate")
The beginners are able to learn assert, asserta, assertz, retract and retractall predicates easily.
You can easily use facts in LIFO, FIFO and random access.
I do not know the size limit when the above predicates slow down because of walking all along the stored facts.
The chained data bases
are very elegant but not designed for beginners. You can walk along chains, insert new terms at its beginning, and end, even before and after the current element, or replace a term. It supports b-trees indexing (pointing to) the elements on a chain using a string as key.
They are the best off all but they are not included in the Personal Edition.
They can be hold in the memory or on the hard disk very easily. The opening predicate needs a parameter to decide it then you need not to take care of the storage mode.
The terms can be looked for partly bound compound variables, too:
Code: Select all
person(Family,FirstName,NickName,Gender,Age,1,"pirate")
Their limit was about one million term. I could program the logging of the changes using chains. This feature used to be very important to me because I used TurboImage on HP with logging and recovery to a given moment but unavailable in MS Access.
Binary arrays
provide access to one ore more dimensional arrays equally sized atoms using their index.
E. g. you have a table of mutual distances of 10 cities. You can store them in an 10*10 real array or in an 10*(10-1) div 2 array. The correction of a data at a given index is very simple.
The other collections
appeared in the recent versions are taken from the C language.
Search using more than a field in not allowed as above. The records must have unique identifiers.
If you have migrated from C to Prolog then you may like them the best.
Last edited by Ferenc Nagy on 15 Sep 2014 3:40, edited 1 time in total.
TIA, Regards,
Frank Nagy
Frank Nagy
- Thomas Linder Puls
- VIP Member
- Posts: 1437
- Joined: 28 Feb 2000 0:01
I will have to say that I disagree rather strongly on these matters.
In short:
I first learned about sets and maps in mathematics in first class (age 6-7). It was more or less when the C-programming language was "invented".
Sets and maps comes from mathematics, and first emerged into functional programming languages (which are heavily inspired by mathematics). Because they are such a good idea they have in various forms spread to practically all programming languages. Including C++ and Visual Prolog (it is likely that collection libraries also exist for plain C).
Enough about that and back to Visual Prolog.
Nondeterm fact databases
Pros
Pros
Cons
Pros
In short:
- Collections: Yes! these are the ones to use. See Collection library
- Nondeterm fact databases can be used where it makes sense
- ChainDB and B+-trees: Don't use them.
I first learned about sets and maps in mathematics in first class (age 6-7). It was more or less when the C-programming language was "invented".
Sets and maps comes from mathematics, and first emerged into functional programming languages (which are heavily inspired by mathematics). Because they are such a good idea they have in various forms spread to practically all programming languages. Including C++ and Visual Prolog (it is likely that collection libraries also exist for plain C).
Enough about that and back to Visual Prolog.
Nondeterm fact databases
Pros
- Easy to learn.
- Easy to use.
- Flexible search mechanisms.
- Can be serialized to files.
- Fully embedded in the language.
- Bad performance for large databases (search/look up is O(N)).
- Serialized forms are hard to deal with over time (i.e. when changing the code).
Pros
- Learning is medium difficult.
- Use is easy.
- Integrates well in the language (due to polymorphism and generics).
- Code is very clear.
- Good performance (search/look up O(log(N)), and even O(1) for some variants).
- Any datatype can be used as keys.
Cons
- No serialization
Pros
- Good performance (search/look up O(log(N))).
- Can be on disc (serialized) provided the data is serializable (i.e. if data doesn't contain objects and predicates)
- Learning is rather hard.
- Integrates poorly in the language.
- Code is obscure (even to people that master it).
- Serialized forms are fragile (one byte wrong can mean that everything is lost, with no means for recovery).
- Keys must be strings of fixed length.
Regards Thomas Linder Puls
PDC
PDC
- Thomas Linder Puls
- VIP Member
- Posts: 1437
- Joined: 28 Feb 2000 0:01
A little extra advice. Let this be your natural consideration order of collections:
Especially for arrays notice this:
Pros
So unless your problem really fit to the array concept, I suggest that you first make sure that the additional performance is really needed, and that a hash data structure is not a better solution.
- redBlack tree based (e.g. mapM_redBlack)
- hash table based (e.g. mapM_hash)
- array's (e.g. arrayM)
Especially for arrays notice this:
Pros
- Look up in an array is very efficient (= O(1)). But hash tables are also O(1).
- The keys must be numbers from 0 to "Size".
- The array will need storage for all elements from 0 to "Size" (there cannot be holes)
So unless your problem really fit to the array concept, I suggest that you first make sure that the additional performance is really needed, and that a hash data structure is not a better solution.
Last edited by Thomas Linder Puls on 21 Sep 2014 16:33, edited 1 time in total.
Regards Thomas Linder Puls
PDC
PDC
- Ferenc Nagy
- VIP Member
- Posts: 215
- Joined: 24 Apr 2007 12:26
Connection of databases with the outer world
Connection of databases with the outer world
Nondeterm fact databases
The save predicate writes the whole contents to a disk file of them in one program statement.
The consult predicate read the whole contents from a disk file of them in one program statement.
The disk files look almost like a comma separated data file that is they are easily handled by Excel or Access.
You can import these file using delimiters comma and opening parentheses, text surrounded by double quotes.
Step #1 of using saved facts by Excel.
Step #2 Omit some columns is not shown.
Step #3 Deletion of extra closing parentheses and periods.
You can write a macro to do these steps.
The chained databases
can be moved from memory to disk and back but their structure is so obscure that they cannot be uses with Excel and Access.
The collections have not predicates to save and read from disk files as a whole.
You have to write out their element one by one and fill from disk files from scratch.
The binary arrays can be written to the disk but the result is not for human inspection.
Nondeterm fact databases
The save predicate writes the whole contents to a disk file of them in one program statement.
The consult predicate read the whole contents from a disk file of them in one program statement.
The disk files look almost like a comma separated data file that is they are easily handled by Excel or Access.
Code: Select all
clauses
compose(p("Left Scale","Grid","Lines","Style"),s(1,"Dotted","Pen")).
compose(p("Bottom Scale","Grid","Lines","Style"),s(1,"Dotted","Pen")).
compose(p("Picture","Frame","Lines","Style"),s(5,"Solid","Pen")).
compose(p("Right Scale","Grid","Lines","Style"),s(1,"Dash Dot Dot","Pen")).
compose(p("Left Scale","Axes","Lines","Style"),s(3,"Solid","Pen")).
compose(p("Top Scale","Axes","Lines","Style"),s(2,"Solid","Pen")).
compose(p("Right Scale","Axes","Lines","Style"),s(2,"Solid","Pen")).
compose(p("Bottom Scale","Axes","Lines","Style"),s(3,"Solid","Pen")).
compose(p("Top Scale","Grid","Lines","Style"),s(1,"Dash Dot Dot","Pen")).
compose(p("Plot","Frame","Lines","Style"),s(5,"Solid","Pen")).
compose(p("Foreground","Edges","Lines","Style"),s(1,"Solid","(~Screen) & Pen)")).
compose(p("Foreground","Cut Edges","Lines","Style"),s(1,"Dashed","(~Screen) & Pen)")).
compose(p("Background","Cut Edges","Lines","Style"),s(1,"Dotted","Pen")).
compose(p("Background","Edges","Lines","Style"),s(1,"Solid","Pen")).
Step #1 of using saved facts by Excel.
Step #2 Omit some columns is not shown.
Step #3 Deletion of extra closing parentheses and periods.
You can write a macro to do these steps.
The chained databases
can be moved from memory to disk and back but their structure is so obscure that they cannot be uses with Excel and Access.
The collections have not predicates to save and read from disk files as a whole.
You have to write out their element one by one and fill from disk files from scratch.
The binary arrays can be written to the disk but the result is not for human inspection.
- Attachments
-
- Step #1 of using saved facts by Excel.
Step #2 Omit some columns is not shown. - Conversion of facts by Excel.png (26.3 KiB) Viewed 18401 times
- Step #1 of using saved facts by Excel.
-
- Step #3 Deletion of extra closing parentheses and periods.
- Delete closing parentheses and dots.png (21.29 KiB) Viewed 18401 times
TIA, Regards,
Frank Nagy
Frank Nagy
I use the Chaindb and every record insert I duplicate with an append file function to a text-file for data security and data duplication/mirror
I have noticed that for example Mysql databases can crash as well.
So should i change my chain db code, to start using for example Mysql and Odbc to have beter performance?
Or should I use the redblack tree, is the redblack tree managed on disk or in memory, and how can i save the redblack tree to a file, and how to use with multiple users
thankyou
I have noticed that for example Mysql databases can crash as well.
So should i change my chain db code, to start using for example Mysql and Odbc to have beter performance?
Or should I use the redblack tree, is the redblack tree managed on disk or in memory, and how can i save the redblack tree to a file, and how to use with multiple users
thankyou
- Ferenc Nagy
- VIP Member
- Posts: 215
- Joined: 24 Apr 2007 12:26
Everything is good for something but nothing good for everything
My father often told:
Original version rephrased for the discussion forum:
The sultan has a dungeon with 100 cells having remote control. The prisoners do not notice at once if the door is unlocked. His majesty got out of the bed in good mood. He ordered the Prolog programmer sitting in the control tower to unlock each cells. Then he changed his mind and ordered: "Turn back the lock of every second cells." He reconsidered himself again: . Soon, whimsically: "Turn over the lock of every fourth cells." ... So on, in a cycle from 1 to 100.
Which cells remain open in the final state?
I think for this problem the best storage mode is the binary array.
Second version: The programmer has a detailed personal inventory about the captives.
The sultan orders "Release the shiite captives who have been jailed in 1981."
The chained database and the simple fact handling is the best way to fulfill his command.
Third kind of majestic wish: "Decrease the temperature of each cell with 5 centigrades for the sunnite captives named Ali."
If the programmer in the control tower tries to solve the problem with simple retract+assert statements then he makes a program in his hurry which fells in an endless loop and poor prisoners fin themselves on the absolute zero.
The usage of chained databases or the collection is recommended.
If the sultan exactly gives the unique identifier of the prisoners to be released then the usage of chained databases or the collection is recommended.
I explain here how his sermon can be applied to the questionEverything is good for something but nothing good for everything.
I twist the mathematical problem of the capricious sultan and his dungeon.Which is the best database handling tool of VP?
Original version rephrased for the discussion forum:
The sultan has a dungeon with 100 cells having remote control. The prisoners do not notice at once if the door is unlocked. His majesty got out of the bed in good mood. He ordered the Prolog programmer sitting in the control tower to unlock each cells. Then he changed his mind and ordered: "Turn back the lock of every second cells." He reconsidered himself again: . Soon, whimsically: "Turn over the lock of every fourth cells." ... So on, in a cycle from 1 to 100.
Which cells remain open in the final state?
I think for this problem the best storage mode is the binary array.
Second version: The programmer has a detailed personal inventory about the captives.
The sultan orders "Release the shiite captives who have been jailed in 1981."
The chained database and the simple fact handling is the best way to fulfill his command.
Third kind of majestic wish: "Decrease the temperature of each cell with 5 centigrades for the sunnite captives named Ali."
If the programmer in the control tower tries to solve the problem with simple retract+assert statements then he makes a program in his hurry which fells in an endless loop and poor prisoners fin themselves on the absolute zero.
The usage of chained databases or the collection is recommended.
If the sultan exactly gives the unique identifier of the prisoners to be released then the usage of chained databases or the collection is recommended.
TIA, Regards,
Frank Nagy
Frank Nagy
- Thomas Linder Puls
- VIP Member
- Posts: 1437
- Joined: 28 Feb 2000 0:01
Notice that we only maintain the chain db for backwards compatibility. We consider it deprecated. And will eventually discontinue it.
The only recommendation we have about chain db is: migrate away from it.
Our "positive" recommendations only involve fact databases, collections and SQL databases.
The only recommendation we have about chain db is: migrate away from it.
Our "positive" recommendations only involve fact databases, collections and SQL databases.
Regards Thomas Linder Puls
PDC
PDC
- Ferenc Nagy
- VIP Member
- Posts: 215
- Joined: 24 Apr 2007 12:26
Why is the chain db deprecated?
I know that every new development pushes out some old long standing stuff.Notice that we only maintain the chain db for backwards compatibility. We consider it deprecated. And will eventually discontinue it.
This is a general rule of software development.
Please explain the special case of chain db
- Where did it come from to the Turbo Prolog?
- Who invented it?
- Was it present in other languages?
- What were its best application in commercial Prolog programs?
- Why have the VIP Development Team deprecate it?
TIA, Regards,
Frank Nagy
Frank Nagy
- Thomas Linder Puls
- VIP Member
- Posts: 1437
- Joined: 28 Feb 2000 0:01
That was a lot of questions.
I am not really aware of the history of the chainDB in Visual Prolog.
chainDB consist of two things chains and B+trees. Chains is a rather trivial data structure quite similar to fact databases, you can read the wikipedia article about B+trees.
B+trees and data structures similar to chains is most likely used in many other places and languages. But our chainDB API and disk representation is homecooked (i.e. a propriety format).
"What were its best application in commercial Prolog programs?" I cannot answer that question. We have used chainDB as a tool in our programs, just like we use lists, windows, files, fact databases, etc . To me the question is similar to asking: What is the best application of lists in ...
The reasons for deprecating it is the most interesting ones.
Concurrent access from several programs is another problem. The ChainDB have a very simple synchronisation mechanism for this, which will almost definitely mean that the database becomes a (show stopping) bottleneck as soon as you scale to just a few concurrent users.
It is not for nothing that huge companies spends lots of efforts on SQL databases.[/list]
I am not really aware of the history of the chainDB in Visual Prolog.
chainDB consist of two things chains and B+trees. Chains is a rather trivial data structure quite similar to fact databases, you can read the wikipedia article about B+trees.
B+trees and data structures similar to chains is most likely used in many other places and languages. But our chainDB API and disk representation is homecooked (i.e. a propriety format).
"What were its best application in commercial Prolog programs?" I cannot answer that question. We have used chainDB as a tool in our programs, just like we use lists, windows, files, fact databases, etc . To me the question is similar to asking: What is the best application of lists in ...
The reasons for deprecating it is the most interesting ones.
- It is too fragile to be uses as a real (persistent) database (more on that below)
- It is far too complicated to learn and use
- It has many strange restrictions, especially the fixed lenght of keys in B+tres
- It have many type flaws (many things uses "uncheckedConverts" rather than being compile time type safe).
- The collection library is much better for in memory data (flexible, type safe, efficient, easy to learn and use).
Concurrent access from several programs is another problem. The ChainDB have a very simple synchronisation mechanism for this, which will almost definitely mean that the database becomes a (show stopping) bottleneck as soon as you scale to just a few concurrent users.
It is not for nothing that huge companies spends lots of efforts on SQL databases.[/list]
Regards Thomas Linder Puls
PDC
PDC
-
- VIP Member
- Posts: 341
- Joined: 14 Nov 2002 0:01
On the other hand SQL databases produce a large overhead to accomplish all their nice features. And worse the load, they put on computers, is, that installation/patching/administration of for example an Oracle database system is a complicated and time consuming task for humans. Furthermore learning SQL, especially Oracle SQL and PL/SQL, is a lot more complicated than learning to use the ChainDB ..the Oracle docu is huge, check it at http://docs.oracle.com/database/121/index.htm.
In many cases, when concurrent access to the data is not needed, an SQL database is an oversized solution. It is more beneficial then, to have a small storage layer, which completely resides inside the program, rather than having to run a database system outside of the program.
The ChainDB has one useful functionality, which is missing in SQL databases: It supports chains (= doubly linked lists), which allow data to be inserted after/before any node in O(1). SQL databases in contrast only support ordering data according to sort keys (for which processing can be speed up by creating indexes).
I plead for not letting ChainDB simply die, but replacing it in some later VIP version by a modern solution. The solution does not need to support concurrent data access, but it should provide type safety, variable key length, non-string keys, cooperation with collections, improved efficiency, chains, and maybe more.
Ideally a modernized ChainDB would also support storing objects. Means, when you have developed automatic object serialization in some future VIP version, you could accompany it with a modernized ChainDB, which together will deliver an easy-to-use and low-overhead way of maintaining objects in non-volatile storage.
Regards
Martin
In many cases, when concurrent access to the data is not needed, an SQL database is an oversized solution. It is more beneficial then, to have a small storage layer, which completely resides inside the program, rather than having to run a database system outside of the program.
The ChainDB has one useful functionality, which is missing in SQL databases: It supports chains (= doubly linked lists), which allow data to be inserted after/before any node in O(1). SQL databases in contrast only support ordering data according to sort keys (for which processing can be speed up by creating indexes).
I plead for not letting ChainDB simply die, but replacing it in some later VIP version by a modern solution. The solution does not need to support concurrent data access, but it should provide type safety, variable key length, non-string keys, cooperation with collections, improved efficiency, chains, and maybe more.
Ideally a modernized ChainDB would also support storing objects. Means, when you have developed automatic object serialization in some future VIP version, you could accompany it with a modernized ChainDB, which together will deliver an easy-to-use and low-overhead way of maintaining objects in non-volatile storage.
Regards
Martin
- Thomas Linder Puls
- VIP Member
- Posts: 1437
- Joined: 28 Feb 2000 0:01
Re:
Well, as said we consider chainDB deprecated, and we will not develop or replace it. We can only use databases which are highly reliable, have backup facilities can scale to thousands of concurrent users, can be used remote, etc., etc. And therefore we will not put effort into such software.
That does not rule out the possibility of persisting objects, that issue is completely orthogonal to the actual storage medium.
For persisting smaller amounts of data, I will suggest saving and consulting fact databases. After the consult data it can be transferred to collections for better performance.
For large amount of data I will strongly suggest using a proper database. But I agree that there is a learning overhead for that.
Large databases that are heavily used can give a huge load on a machine. But small databases with little access give completely insignificant load.
I will not recommend starting with Oracle. Oracle should only be used when you have very huge data, load, etc. Oracle is expensive, complex and have large footprints in your computer.
MS SQL Express is free and it is quite simple to use and have a little footprint.
MySQL is also free (open source; now under Oracle wings). I have found it a little more complicated to use, but it has been a while since I last used it so things may have improved. There used to be a problem with parameterized queries in their ODBC driver, but that may also be solved.
(By the way, this discussion forum is in a MS SQL database and the wiki is in a MySQL database, but none of them uses Visual Prolog for anything).
That does not rule out the possibility of persisting objects, that issue is completely orthogonal to the actual storage medium.
For persisting smaller amounts of data, I will suggest saving and consulting fact databases. After the consult data it can be transferred to collections for better performance.
For large amount of data I will strongly suggest using a proper database. But I agree that there is a learning overhead for that.
Large databases that are heavily used can give a huge load on a machine. But small databases with little access give completely insignificant load.
I will not recommend starting with Oracle. Oracle should only be used when you have very huge data, load, etc. Oracle is expensive, complex and have large footprints in your computer.
MS SQL Express is free and it is quite simple to use and have a little footprint.
MySQL is also free (open source; now under Oracle wings). I have found it a little more complicated to use, but it has been a while since I last used it so things may have improved. There used to be a problem with parameterized queries in their ODBC driver, but that may also be solved.
(By the way, this discussion forum is in a MS SQL database and the wiki is in a MySQL database, but none of them uses Visual Prolog for anything).
Regards Thomas Linder Puls
PDC
PDC
and how is this solution?; use the QDBM tool for the indexes and the chaindb only for storing records in the chains without using the btree of the chaindb tool, and dont use large text-fields in the records,
this is the link of the qdbm tool: 3rd:QDBM
ty
this is the link of the qdbm tool: 3rd:QDBM
ty