DEMML.org - Classification System

DEMCS™ Basics / Background

Storing Content: basic storage methods

In order to understand why certain decisions were made when designing the DEMCS™ it is important to understand the why certain other decisions were made in relation to the way DEMML™ content is stored. There are two basic ways that data can be stored: in a database or as separate files simply stored on a hard drive.

Databases

A database is a system where lots of individual bits of data are put together in one file or a limited set of really large files in a very organized system of tables with certain fields used to relate one table to another. This system can be indexed in multiple ways so that data can be found based on specific fields within the data. Once the indexes are created, finding things can be very fast, as long as the appropriate indexes have already been built. Each additional index increases the amount of work that the database software - often called a "database engine" - must do. The data can also be presented in multiple different ways depending on the needs of the user. Unfortunately, databases also require special, proprietary software to present that data. When there is a lot of data and it is on a server being accessed by lots of people, that "database engine" software must be very robust and powerful which can make it very expensive to create or maintain.

Many web sites are actually databases with a web based front end. When the user searches for all the ink-jet printers on a shopping web site, a database engine is retrieving all the matching items from its database and sending that data to another program which then sends it out to the user in the form of a temporary, custom-made web page. This requires a lot of complicated, expensive, customized software that must work correctly all the time or the user can't do their shopping.

A wiki is another example of a database with a web based front end. When a user edits a wiki they are not editing the web site directly. They are editing a form which is sent to software which deciphers which parts of the content the user edited, then sends those changes to a database which stores them along with all the other changes made by all the other users. The final version of that article is built up from all the changes made by all the users. This software is usually very robust and has become very popular. But it is still a database with all the problems that come along with maintaining a database.

The main problem with a database is that users must have access to the database engine in order to use the content in any meaningful way. Yes, one can cut and paste from a web page presented by a wiki but that data then becomes nothing more than text marked up with some html. It has lost all context and any metadata that may have been associated with that content within the back-end database. A user can't just copy a part of the database to their computer and use it in the same way it was used within the database.

Finally, if you want to add more types of data to a database, you must redesign the database and convert all of the existing data, even if none of that data will use the additions. This can be a huge, costly, and time-consuming process.

To sum up, a database:

Can be fast when searching for specific, predefined things.
By using different indexes, a database can easily present the data in different sort orders and organizations.
Can not search for anything that has not been already indexed.
Requires complicated software on the server to "serve up" the data.
This software is often expensive in either time or money to produce, purchase, and/or maintain.
The user must have access to the database engine or the web site in order to access the data. Users can't just copy pieces of the data without complicated export, import procedures.
Are difficult to redesign.

Files-in-Folders

In contrast to a database system, simply placing files in folders is very simple to do and maintain. Basically, the files just sit there. The only "maintenance" that is required is simply backing up the files on a regular basis. But even this is far simpler and less expensive than backing up a database.

To provide access to the data, all that is needed is a simple web server with no fancy features.
Users can simply copy the files they need using their web browser, FTP software, or by direct copying if they have access to the files themselves.
Once copied, the files are directly useable with no modification or importing into another database.
If you decide to add more types of data to some of the files, there is no reason to modify all the existing files.
What you gain in simplicity, you loose in flexibility and searchability.
- The files can only be presented in the manner that they are organized on the hard drive. You can't look at the entire collection at once. Just one folder at a time.
- There are only two ways to search for data in the files: Look at each file in turn or run software that will look at each file in turn like the standard Windows search function.
Fortunately there is software that can make up for each of these shortcomings and much of this software already exists.

Why DEMML™ uses Files-in-Folders

Databases can be incredibly useful and powerful. Otherwise they wouldn't have been invented in the first place. But a database of the magnitude required to hold all the educational material in the world would be quite a behemoth. It would require loads of processing power and significant people and brain power to keep it running. Either all the students in the entire world would have to access the data on the same server or special software would have to be devised to copy subsets of database to other servers. Then, custom software would need to be created just to display the data to students either through their web browsers or using software on their desktops. If the structure of the data changed then all of that software would have t be redesigned as well. Finally, students wouldn't be able to simply copy files in order to acquire new content to study.

By using the basic files-in-folders storage system, DEMML™ allows simple servers to distribute content and it allows the most flexibility in how that content is distributed. Users can keep that content on a thumb-drive and easily share that content with others. Content can be e-mailed, or even just cut and pasted from a web site.

The final and very important reason for using the files-in-folders data storage method, is that a strictly enumerative classification system can also be used as the naming system for all the folders in the file system. If a user wants to know where to store a particular file, all they need to do is look at the classification code for the content in the file. If they want to know the classification code for a file or set of files (which are already in their correct locations), all they have to do is look at the path name for the current folder. This consistency is very important for the long term usability of the DEMML™ system.

Next: DEMCS™ tree structure...

First Published: May 15, 2007 — Last Modified: May 15, 2007

Distributable Educational Material Markup LanguageTM

Section Links

News:

First Alpha version of schema published.

Created DEMML™ blog site.

Added new Features and Benefits page.

New Powerpoint about Communications Systems

History:

How DEMML™ was Invented

DEMCS™ Basics / Background

Storing Content: basic storage methods

Databases

Files-in-Folders

Why DEMML™ uses Files-in-Folders