Given what little I know about how to design a new programming language, what would this new language be? Since describing a programming language in detail can take an entire book, I will try to describe here the "main characteristics", i.e. the "vision", the "basic flavor" of this new programming language.
Officially, this new programming language doesn't yet have a name. "GePSyPL" (pronounced "JEPP-see-pill") is more of an internal project name (it's just a acronym of "GEneral-Purpose SYstems Programming Language"). I'm no marketing expert, but some candidate names for this new language might be:
- nunskin
- teamtractor
- syntaxbride
- positionish
Let's examine each of these potential names for this new programming language.
[Source]
Why talk about the "skin" of a "nun" to name this language?
2.1) Good nuns love their students, not their paycheck. Nunskin doesn't care about getting tenure or publishing arcane papers. If the same thing can be explained with simpler words, then that's the terminology Nunskin will adopt, since students will understand faster and with less pain. Some things you'll just never hear coming out of the mouth of a good nun:
"Classes are themselves objects, and therefore are members (instances) of some other class. For example, Integer belongs to Integer class. Integer class is called a metaclass. All metaclasses belong to class Metaclass, which is the only instance of Metaclass class, which is itself an instance of Metaclass." (Raphael A. Finkel, Advanced Programming Language Design, 1996, p. 148.)
2.2) Good nuns teach good habits. Steve McConnell quotes Aristotle, with a little twist (Code Complete, 2004, p. 833): "The [programming] virtues, then, are engendered in us neither by nor contrary to nature [...] their full development in us is due to habit [...] Anything that we have to learn to do we learn by the actual doing of it [...] Men will become good builders as a result of building well and bad ones as the result of building badly [...] So it is a matter of no little importance what sort of habits we form from the earliest age -- it makes a vast difference, or rather all the difference in the world."
Nunskin has a very clear notion of good and evil, and intends to inculcate good programming habits. For example:
- Can you imagine a strict nun letting a "naked new" run around without
smacking its bottom? Would you dare leave garbage lying around with such
a nun watching you? (i.e. Nunskin encourages a clear understanding of who
owns what piece of memory);
- Can you imagine an old-fashioned nun letting you frolic with object-oriented
programming, while you still haven't mastered old-fashioned structured programming?
(For example, subprograms with single points of entry and exit, i.e. no multiple
"returns");
- Have you ever received a written note from a real old-fashioned nun? The
handwriting is perfect, there are no English mistakes, everything is well-documented.
Nunskin encourages good documentation, among others with "sticky comments" (comments
immediately above something are automatically "harvested" as the documentation for
it), also "data islands" (which allows the source code to be a kind of
"public highway"
for all other development tools), etc.
2.3) Nunskin is just a "skin" over C++. Good nuns look at your heart, and don't get hung up your external appearances. The "heart" of C++ is good (i.e. the "conceptual structure", often called "the object model" of C++). So internally, Nunskin is just plain C++.
Nunskin's first implementation will also be a simple compiler that generates C++, which will then need to be compiled again with a C++ compiler to give an executable program. (Of course, eventually Nunskin will compile straight to machine code.) Why C++? On top of having a good object model, C++ is dirt cheap, widely-used, efficient, and employers cannot have "the rug pulled out from under their feet" if ever Nunskin stops being supported, because they can just push a button and continue coding in C++.
2.4) Good nuns just do little things with great love (Mother Theresa). Putting a simple "skin" over C++ is not exactly a revolutionary software engineering Big Bang. On the other hand, this is only an unfunded research project by a guy who is trying to dust off his programming skills, so aiming for a humble goal seems totally appropriate!
2.5) Honor thy Father and Mother. Today, we are just dwarfs sitting the shoulders of Computer Science giants like Charles Babbage, John Von Neumann, Donald Knuth, Edsger Dijkstra, C.A.R. Hoare, Niklaus Wirth, Dennis Ritchie, Ole-Johan Dahl and Kristen Nygaard, Barbara Liskov, Bjarne Stroustrup, Steve McConnell, John Lakos, etc. Nunskin has an "Easter Egg" mode in which every keyword and operator in the language can be replaced with the name of a famous computer scientist.
For example, "goto" can be replaced by "Edsger_Dijkstra" (author of "Go To Statement Considered Harmful", and inventor of the expression "structured programming"). "|>" (i.e. the inheritance symbol) can become "Ole_Johan_Dahl" or "Kristen_Nygaard", who were important for the developpement of object-oriented programming. "gepsypl" of course is "Charles_Babbage", one of the inventors of the computer. "++" is "Bjarne_Stroustrup", the inventor of C++. And so on.
[Source]
Why compare a programming language to a farm tractor? And tractors have only one steering wheel, so why a team?
3.1) A Ferrari or Porsche is elegant and refined. Farm tractors, on the contrary, are somewhat ugly and crude, but hard to break and easy to fix. The TeamTractor programming language is similar in many ways, for example:
- "memCopy[p%q]" is considered superior to "while (*q++ = *p++);" In other words, no
matter how elegant and sophisticated a programming language, the most terse (and
often most efficient) way to get something done is to just call the right subprogram
with the right parameters. It won't impress the girls, but usually gets the job done
faster and with less bugs.
- Instead of revving the "template metaprogramming" engine while stopped at
a red light, TeamTractor prefers to just cross the profitability finish line first.
- While others brag about how quickly they can return "tuple rvalue references",
TeamTractor programmers prefer waiting until they've figured out what exactly they
are supposed to return, and who owns it.
Etc.
3.2) Typical Ferrari owners don't encourage others to drive their car. A tractor, on the other hand, is usually a "Communal vehicle": Dad, Mom, kids, neighbors, even temporarily hired workers will drive it, since they all have the same goal: a good farm. Similarly, professional software development is a team activity. Each individual has to make sacrifices for the common good (i.e. producing high-quality source code in this case).
3.3) Tractors are powerful, but dangerous. Every year, farmers are killed in tractor accidents. Sure, tricycles would be much safer, but farmers prefer to accept the risk in return for a powerful machine that gets the job done. In a similar way, TeamTractor is not some child-proof "toy" programming language.
3.4) Tractors are useless by themselves. There is nothing wrong with having a Ferrari or a Porsche all alone in your garage, but a tractor is useless without a plow, a disk harrow, a seeder, a manure spreader, etc. TeamTractor is useless in itself, but it can solve all your problems in five lines of code, if you connect the right modules!
[Source]
One of the most striking features of C++ is just how bad its syntax is. Just about every rule for a clear language is broken by C++. If the underlying object model of C++ is the "soul", and the syntax of C++ is the "body", then the body and soul of C++ are divorced! (Actually, even "divorce" isn't a strong enough term, since many divorced couples manage to remain polite and civilized toward each other!)
This new language wants a happy and stable marriage between the "body" of the language (syntax, keywords, operators, everything visible in other words) and its "soul" (the underlying concepts). For example:
4.1) Significant whitespace. As far as reasonable, the logical structure is the physical formatting (Indentation instead of "{}" or "BEGIN-END" pairs, carriage returns instead of noisewords like "do" or "then", etc.). For example:
if someVariable '< SOME_MAX_LIMIT doThis[] doThat[] andDoThatAlso[] else someOtherVariable = 1 + 2 keyboard requestFromUser
4.2) ASCII metaphors. As far as reasonable, chose ASCII characters that actually look like the operations they mean. In other words, imagine you're a computer science teacher, and you have a classroom of kids in front of you, kids who've never programmed before in their lives, and you have to explain that concept. What little story would you tell them? What drawings would you make on the blackboard? A few suggestions:
- Comments, i.e. "#";
- Subprogram call, i.e.
"mySubprogram[inputParam % outputParam]"
- Expressions, i.e. parentheses.
4.3) Keywords that are obvious, meaningful, consistent, etc.. For example, "var" for "variable" instead of "auto" in C++, "import/export" instead of the C++ "include/public", etc.
4.4) "Public Highway" of software development tools. C++ is so painfully difficult to parse that software development tools tend to stay away from it. I'd like to do the exact opposite. I'd like SyntaxBride to be like a "public highway" of software development tools: so easy to parse that any tool can understand and collaborate with it. Class Browsers? No problem, a programmer can quickly whip up a tool that reads the source code and makes a list of all types in the program. "IntelliSense" or "code palette" tool? No problem, a programmer can read the source code and pop up a menu offering you the possible parameters for the subprogram you are trying to call, etc. UML diagrams? No problem, any programmer can parse the source code and find all the types and their relationships. Etc.
This is why we put so much effort on increasing the parsing speed (by improving the syntax): in SyntaxBride, the source code becomes a kind of "big database" for a large portion of all software development information, so it becomes very important to quickly seek, update, add, and delete any subset of a potentially huge program. A lot of careful database design must go into this new programming language.
[Source]
"Positionish" is like English and Spanish, but for the position in the source code. In Positionish, everything has a kind of "locational gravity". It's like a magic kitchen where the potato peeler is attracted to the same drawer every time, so any cook can walk into that kitchen and find things quickly. But how can we achieve this "locational gravity"?
5.1) Left to right and top to bottom. Because of the nature of writing (applying fine lines of pigment, usually suspended in a liquid which needs to dry) and the fact most people are right-handed, we're used to writing from left to right, and top to bottom (going the other way tends to smudge what we've just written). No programming language that I can think of goes against this convention (although nothing in the nature of computers would prevent it), and "Positionish" is no exception. But many programming languages have some constructs which go "against the flow", i.e. right to left. "Positionish" tries to keep things "left to right and top to bottom".
5.2) Genus before species. Our intellect naturally tends to go from the confused to the distinct. Aristotle's famous example goes something like: "Ah, something is coming. Now it's getting closer, so I can see this something is a man and not a horse or a bear. Now that it's even closer, I can see this man is Socrates." Because "Positionish" uses the left-to-right order, genus comes as much as possible before species. This rule applies for example to:
- Digraphs:
- "#" means "comment", so "#(" means "start of multiline comment" and "#)"
means "end of multiline comment";
- "`" (the "back-tick") means "bit", so bit-wise logical operators are simply "`and",
"`or", "`not", etc.
- Keywords: the usual keywords like "var", "type", "module", etc., are the "genus", they give us a general idea of the "species", the details that are coming next (like a variable declaration, with all the specific details like the name of the variable, its type, its initial value, etc.).
5.3) Upside-down bricklaying. Building a brick wall is simple: you lay the first row of bricks, then you add mortar and lay the second row of bricks, and so on. Each successive row of bricks depends on the previous rows for its solidity. In "Positionish" it's the same, except the first row is at the top of the page (i.e. "upside-down", like the yoga guy in the picture), and the last rows are at the bottom (because of the usual writing and reading order).
So if you're writing a program that contains cats and dogs and protons and electrons and DNA and muscle cells, you'll start at the top by programming the protons and electrons, then you'll "lay the next row" by programming the DNA and muscle cells (which depend on protons and electrons), then you'll continue with another "row of source code" farther down the page, to program the cats and dogs (which depend on all the previous "rows of source code"). So the "heavy" and "complicated" and "important" things in your program naturally end up toward the bottom of the page (like the head of the yoga guy!).
5.4) Oriented separators, not tag-value pairs. When organizing information, many languages use something called "tag-value" pairs. So if you want to transmit information about your shoes, you could have "color=black; size=10; style=classic". By their very nature, "tag-value" pairs normally don't have a fixed order. In "Positionish", the preferred method is what you might call an "oriented separator". For example, a subprogram or "function" has input and output parameters. So if we just decide once and for all that the input always comes before the output, we just need a separator (like the "%" symbol). Whatever precedes "%" is an input parameter, and whatever follows "%" is output. So "mem_copy[p%q]" clearly copies something from "p" into "q". Same thing with the "export" keyword. We just set the convention that everything inside a type or a module is "private" by default, so anything coming after the "export" keyword is "public". The information content is the same as "tag-value" pairs, and no extra typing is required, but the position of things becomes far more predictable from one programmer to the next.
5.5) Source code files are "closed". Some languages allow "spooky action at a distance". For example, in C++ a macro defined in one file can silently transform the meaning of anything you are looking at, but in another file. (A less-extreme example: namespaces can span several files, so looking at a definition of a namespace in one file doesn't tell you what else is in that namespace.) In Positionish, if something is in a file, then it's in that file, and all you need to understand that thing is in that file. (For example, the filemap statement, but the principle applies to many other elements in Positionish.)