How DEMML™ was Invented
A Little Background
If you had the patience to read my bio, you saw that I have always had a penchant for designing systems to organize information. When I was a kid, I had a lot of fish tanks. I used to keep track of things like water temp, acidity, when I changed water, etc. all in a little notebook. I don't know why. I thought I might learn something from it. Rather than rewrite the headings every day, I created a template with holes in it for the data. I would lay that over the page and fill in the information. To read the page, I would lay the template on top and read the numbers through the holes. I had created a database and I didn't even know it. Much later, when asked to create a database for one of my employers, I designed an entire database to track all accounting and manufacturing for a printed-circuit-board plant. I just did what seemed to make the most sense and be most efficient without knowing that there were rules for how to do this, called database normalization. As it turns out, my design was almost all what they call 4th normal form which is usually pretty difficult to do. I just have a knack for these kinds of things.
Definitions
For this discussion, I make a distinction between a subject and a topic. A subject is a generalized area of study. A subject can be broken down into sub-subjects but each is still general in that there are multiple things that fall under that one subject or sub-subject. A topic, on the other hand, is very specific. In fact, even something as specific as the integral of the sine function is still considered a subject. A topic would be the history of how the integral of the sine function was derived; the proof of what the integral of the sine function is; what one can do with the integral of the sine function; and then, very specifically, what is the integral of the sine function. All are topics contained within the subject called "Integral of Sine Function." Mathematicians could probably find even more specific topics for this subject. A topic is the smallest, most narrow set of information that makes any sense together. While many books teach the proof and the history on the same page, they are still in separate paragraphs and usually under separate headings.
Finally, a topic can contain one or more "facts." For instance, the proof of the integral of the sine function may require several steps. Each step is a fact. In the history of the integral of the sine function, the mathematician and the year it was discovered are separate facts. A fact would not make sense apart from the other facts in the topic. There is much more to the DEMML™ format than subjects, topics, and facts but these are the terms that have widely differing definitions so these are the ones that needed clarification. The other parts will make sense when you read them.
Learning is Hard Work
When I came back to school I realized that learning was not going to be nearly as easy as it used to be. First of all, the pace was much faster than it used to be in 1978. There is a lot more to learn, yet they still expect you to squeeze it into four years. Second, somehow it seemed that the textbooks had gotten a lot harder to read. More often than not I had no clue what the author was getting at. Usually because the writing was very terse or the author assumed that the reader had a complete mastery of all previous topics. Sometimes it seemed as if the author was "documenting" the topic rather than explaining it for someone who, by definition, doesn't know anything about it.
I tried searching the internet but it took hours to find even slightly related web pages. I often had to wait till the next day or next week to get my questions answered. It was a lot of work. For some chapters, I had to go through this routine for almost all of the 10 or 20 main topics contained in the chapter. It was grueling. And the delays made it very difficult to learn anything at all. By the time I could get my questions asked it was often too late to really help.
Then there are the professors. I have had a few really good teachers in my long, drawn-out academic career. The rest? Let's just say that my learning style did not seem to mesh with the teacher's teaching style very well. But that's a big part of the problem. A teacher can only have one teaching style, yet they have a room full of students who each have a different learning style. If they are lucky, one in forty will match.
Flashcards and the Leitner Cardfile System
After all this work, I could usually understand the concepts being taught but memorizing certain things was much more difficult for me than before. While one cannot get through college by memorizing everything, there are certain things that one does need to just know -as well as one's own phone number -if one is going to get anything done in a reasonable amount of time. I started making a bunch of flashcards but I quickly lost track of which ones I had made and which ones I hadn't. Once I had made a flash card it was lost in the stack till it happened to come up in the list again. Being an old computer nerd and a big Palm® Handheld fan, I started looking online for a flashcard program. There are quite a lot of them out there. Most of them are really lame and can only handle plain text. Almost none of them could handle mathematical equations. The ones that could handle pictures required about a five step process just to get the picture in the database and then some fancy coding to get it to show up. Certainly not ideal or efficient.
While looking for flashcard programs I came across something called the Leitner Cardfile System, sometimes also called spaced repetition. The basic premise is that you take a stack of flashcards and a set of boxes. You put all the cards in the first box and work through them. If you know the answer you move the card back to the second box. If not then you put it back in the first box. After a delay, you work through the cards in the second box. If you know the answer you put the card back one more box, but, if you don't, then the card goes back to the first box. You continue working through all of these cards in all of these boxes till everything is back to the last box. You work through each successive box progressively less often. This way you work on the cards you don't know more often than those you do know. This is a great trick and has helped me through more than one math test. You can do it without all the boxes simply by moving cards you know further to the back of a deck than ones you don't know. Try it. It really works.
I couldn't find a decent program do do flashcards on my Palm® Handheld but I did buy a couple for Windows. The first was so full of bugs, I got my money back and the second was such a pain to use in the end that I just gave up. A big problem with all of these flashcard programs is that you have to do a lot of work to enter everything yourself. Some of them have several web pages trying to teach the user how to create effective flash cards. Entering anything other than plain text is still a real pain. When they do work, all you really learn are the answers to those exact questions. To really learn the material one would have to create over a dozen flashcards for each and every topic. It is so time consuming that it just isn't worth it in the end. Finally, the flashcard programs that do use the spaced repetition do so for each individual card. There is nothing that ties together all the cards associated with one topic as far as making use of spaced repetition is concerned..
There had to be a better way
After another semester or so of grinding my way through some pretty indecipherable textbooks I thought there just had to be a better way. I often ended up buying several old textbooks at the used book store to use for reference. If one didn't explain something clearly, then sometimes one of the others would. Unfortunately, usually not. It was as if there were some kind of conspiracy to hide the knowledge from us through obfuscation. I found myself wishing I could just turn to a book or web site or something that had multiple different explanations for each and every topic in my textbooks. But there aren't. Each book or web page has their own one explanation and finding one that worked for me was too much work. And that's when it hit me. All of the educational material out there is really just data that has never been completely organized before. If I took all the different explanations for any one topic and put them all in one place then it would be easy for people to learn what they needed to know and move on, rather than spend hours searching for answers.
Classification
In order to put all these explanations together and put them where people could find them I knew I would need a classification system that is more detailed than ever used before. There are a lot of books in the Library of Congress. But in each book there are hundreds or thousands of topics. If you take all the books on a certain subject, there are definitely thousands of different very specific topics. However, many of these topic overlap. So, I spent a month studying classification systems. I learned that the Library of Congress Classification (LCC) system is considered "enumerative" because every subject can be broken down into a discrete number of sub-subjects. I also learned about other systems that are called "faceted" because many topics have multiple facets such as the history of wheat. Such a subject could be classified in either history or agriculture depending on what was considered the most important. The only way to really employ a faceted classification system is in a database where you can index things on more than one topic. Because I knew I wanted to keep things as simple as possible I chose to use an enumerative approach. This simply means that I can put each subject in one file folder with sub-subjects in sub-folders.
It took me a while but I also finally figured out how to encode each of these subjects, sub-subjects, and topics in such a way that new sub-subjects can always be inserted in between any other sub-subjects and any existing subject can be broken up into previously un-thought-of sub-subjects. I decided to use these code numbers as the names of the file folders where the actual content would be stored. If each folder was just named for the subject then the pathnames would quickly get too long for any computer system to handle. By using the code numbers, the pathnames are kept reasonably short. The complete system is explained over in the Classification part of the standard. Suffice it to say that trillions and trillions of topics can be classified using fewer than 30 characters in the path name. And that includes the slashes. In order to keep the system somewhat familiar to most people, I decided to use the LCC system for the first couple of levels. The LCC system won't work for the whole thing because it was only designed to classify books. There just isn't enough room in the LCC numbering system for all the different very specific topics that are taught in classrooms today. But at least it brings some familiarity to the system right from the start. Anyone who has been in a library can quickly find the general area to start looking for the topic they need. And, after using this system for a while, people will also know where to find related books in the library with no cross referencing required. All they have to do is look at the first part of the DEMCS™ code.
The File Format
Right off the bat, I knew I wanted to use XML to mark up and store the data. I wanted people to be able to create content using technologies they already new. It was also important that developers be able to use technologies they were already familiar with or it would be really hard to get them to write any programs. So, I decided that all of the actual content should be in HTML (or XHTML). People already know how to do that. Because I had never even seen XML before, I spent another month learning XML, sort of. I spent a couple of weeks trying to learn XML schema after that. Not easy stuff, let me tell you. I learned enough to know that I can do what I need to do using these technologies. It will just take a while to finish things up.
I have spent a considerable amount of time working out the structure of the file format. With XML everything is nested within something else. I have quite a long list of what should be nested inside of what. A topic can have one or more facts with multiple explanations for any one of these facts. A topic can also have associated questions or problems with multiple hints, solutions, and answers for each one. But each of those could have multiple different explanations. Naturally, there is much more than can be described here. In the end, it becomes an incredibly rich resource of material on just this one topic. I call each of these parts of a topic an "item." Students can choose to use as few or as many of these items as they need to understand the topic.
As I have thought about this over the months, I keep thinking of new things that need to be incorporated. I decided that each item in each topic should have prerequisites so the students can know what other things to study before starting any topic. I realized that teachers would want to be able to indicate to students what they should learn for a course so I introduced the concept of an electronic syllabus. This can tell students not only what they need to learn but how well they need to learn it. (Each item in each topic has an assigned proficiency level.) Sometimes, it is easier to learn if material is presented in an orderly fashion, so I added in the concept of an electronic lesson plan that tells the student exactly which items to study for each topic and in what order. I have even figured out a way so that programmers can make use of spaced repetition to help students really learn the material. But, rather than focus on individual questions and answers, the programs can raise their focus to the concepts within a topic and the topic as a whole. Since each fact within each topic will eventually have dozens or hundereds of questions submitted, the software will be able to continuously ask different questions, all testing one basic fact. It goes on and on. I think I have incorporated just about everything that any educator could want into this system. But I need to consult with professional educators to make sure I am not leaving anything out.
Current Status
DEMML is really an invention in progress. I have hundreds of pages of notes on how to design this system. I still need to meet with educators to get their feedback. Then I will finalize the structure and start designing the schema. That is going to be a pretty big chore. There are so many different ways to do the same things. I want to design it so that it is easiest for programmers to read it in and use it in their programs. I have figured out exactly how I will encode all of the branches of the vast hierarchical tree of knowledge but I haven't developed the actual trees for each of the major academic disciplines. Again, I need to meet with educators and their professional associations to work these out.
I have figured out exactly how I will distribute the content and how to handle communications between contributors, vettors, and students. Now I just need to build up a test system and make it all work. All this takes time. And, let's not forget, I am a student myself. Right now this is a part time project, even though it is my full time obsession. Hopefully, I will be able to get some help and get this thing moving pretty quickly.
Wish me luck.
References:
- Leitner Cardfile System
- Spaced Repetition
- http://en.wikipedia.org/wiki/Spaced_repetition
- http://www.supermemo.com/english/ol/background.html
- Spaced Repetition for Learning Concepts: A new neurobiological foundation for research and a computer-aided means of performing said research by Grant S Robertson. .(http://www.ideationizing.com/2010/11/span.html)
A simple internet search will reveal hundereds of additional web sites discussing both of these topics.