What is Intelligence?
Before addressing the problems with the Turing Test, the notions of intelligence, and thought must be clarified. Obviously I do not consider the definition of thought to be, precisely, that which the Turing Test is testing for. The Turing Test has been designed in such a way as to avoid having to define thought, which is clever, but unfortunate. It merely provides a condition, that if met, asserts that thought exists, while saying nothing for the case where the condition is not met.
Thought, unfortunately, is too elusive in any comfortable form, for the purposes of this document. (I do have my own notions, but they are specific to a framework and philosophy that would be alien in concept to other philosophies and frameworks.) The purpose of this document isn't to undermine the Turing Test on its own grounds, but rather to establish a metric that provides more incentive to machine intelligence, and on a broader spectrum. Further, I don't think it's useful to get into any arguments about whether thought is a prerequisite for intelligence, intelligence for thought, possibly neither being prerequisite to the other, or perhaps they're one and the same.
Intelligence is a more agreeable term for definition, or at least, to state requirements for, and to highlight observable symptoms of whether or not these requirements have been met.
Perhaps intelligence is a result of evolution that has aided the survival of all species having it, or perhaps not. Regardless, given the language used when describing intelligence, or intelligent behaviour, I think that it's fair to consider intelligence a faculty of using acquired knowledge (whether explicit or implicit) and learned skills to solve problems presented to the entity in question. Intelligence, therefore, is not only the ability to store and relate and use information, but also to adjust behaviour based on experience. While not all problems must be solved, some do -- inherently, any threat to survival is a problem that must be solved.
Finally, the Intelligence Quotient is commonly used as a measure of intelligence. Despite the fact that it is often considered a poor metric, the fact remains that it is a more popular test than the Turing Test, in general, and deals exclusively with solutions to problems presented to the testee. However, I do not propose an IQ test, rather I present this merely to support the notion that problem solving ability is core to intelligence.
The Problem with the Turing Test
The problem is that much of AI is concerned with a kind of software known as the chatterbot. A chatterbot is a conversational program, designed to simulate conversation with a human as best as is possible. Chatterbots largely fail to achieve thought or intelligence based on the ease of creating a superficial algorithm that creates an approximation of the desired semblance without requiring intelligence or thought. As such, contestants for the Loebner Prize can persue reward without actually advancing reasearch for true machine intelligence.
This is not to say that there is no merit in chatterbots, nor that all chatterbots are designed without an approach to real machine intelligence. Many, however, are. Most of the ones I have personally interacted with are of this nature, though many of their authors will claim that their approach may yield intelligence. I won't bother making examples since I'm not here to pick fights or demerit anyone's work.
Conventional approaches that work more towards semblances than actual intelligence may have more utility in them, at least at present. In fact, I'm quite confident that they do. The most stunning example of real machine intelligence that I've personally witnessed was a game with fictional animals. To the public, it was a form of entertainment.
On the other hand, the most useful applications of AI seems to be fuzzy logic circuits that can land a helicopter with a broken blade, or neural nets that can form their own heuristics with training. Neither of these things are truly intelligent, but they do provide us with solutions to problems that have not been solved by other means.
Despite the relative lack of utility in genuine intelligence oriented directions, at present, I hope that the true goal of AI is to arrive at genuine intelligence, artificially, as opposed to using the adjective "artificial" to degrade the notion of intelligence. Lest AI become a derogatory term.
Finally, regardless of accuracy, the Turing Test, as applied in the Loebner Prize competitions, does not permit any work that cannot engage in conversation. This means that each entry must be capable of language, no matter how intelligent, thinking or no. This fact alone barrs much of AI research, and I feel it biases almost all of the popularity towards a very small share of the work being done.
As previously mentioned, Turing's metric was cleverly constructed to be deliberately open-ended in implied definition of the term, "thought." I would like to extend this somewhat to intelligence with new focus. The adjustment is simple but effective, instead of judging how well software can behave in the semblance of a human, a superior test, in my opinion, is the measure of convincingly software can resemble an animal, in the general case. To clarify, I mean that the software should not be expected to exhibit the behavior of any animal in specific, but to exhibit convincing animal behaviour.
Since cosmetics are problematic, and mostly irrelevant, artificial animals should be described by intermediaries, who report to judges. Additionally, real animals should be used as controls, and described similarly, and in similar detail. It is best if the complexity of behavior in the control animals roughly matches that of the artificial animal, so that an appropriate contrast can be made.
In competitions, where it is unfeasible to match each artificial animal to a real one, classes should be made, and each participant is free to determine the class they wish to compete in. Each class being worth only a limited amount of points, regardless of how convincing the entry is. Additionally, points may be awarded according to the detail permitted by the entry (with an established minimum -- "you must be this detailed to ride") up to the maximum permitted by the class.
The reason that detail must be reported equally between control and artificial animals is simply that an extant organism in full detail has the advantage of being obvious, where an artificial animal will likely be missing cosmetics, particularly if it is purely software. Consider that full detail, in this sense, could mean simply forgoing the use of intermediaries. Additionally, fictional animals would automatically lose, unless contrasted against controls that the judges have no knowledge of.
Finally, because fictional animals should be permitted -- why not? -- the animals can never be named. This further removes unfair cues from judging, since the animals will be judged on animal behaviour in general, the judges will not attempt to fit each into a specific mold.
Complexity of Behaviour
Further elaboration is required on the use of the term "complexity of behaviour" in this context. It must be meaningful, and it must relate strongly to the notion of intelligence in order to be useful. For example, a intricate ritual that could be easily pre-programmed does not denote complexity of behaviour in this sense.
The behaviour that counts must be based on experience and must be a component to the solution of a problem, or stated a little more naturally, the achievement of some goal or desire. Unfortunately this is extremely difficult to define, but I think it is safe to leave this open to the discretion of the judges, and as with the Turing Test, the goal is to convince the judges rather than to meet an arbitrary technical requirement.
Since learning can be subtle and take time, especially in many animals, it's best to audition each artificial animal over a greater period of time than would be taken with conversational software. This provides both an advantage to truly intelligent works, and a severe disadvantage to superficial ones. Learning will show gradually over time, and in a realistic manner, where attemts at mere semblance are bound to betray their nature eventually.
Finally, interaction should be carried out by intermediaries or another party independant of the entry or judging. The auditions should not be permitted to execute unmolested, but rather an enviroment should be provided that permits interaction to ensure that events are not scripted. Also, to permit those interacting with the animals to provoke as much in the way of complex behaviour from the animals as possible. This also serves a disadvantage to superficial approaches and an advantage to genuine approaches.
I feel these guidelines are conducive to a healthy spirit of achievement in AI research. Not only do they better fit present technology, but they also provide a broader range of behaviour. On an aesthetic aside, what's cuter than baby animals learning about their environment? (Perhaps human babies, but that's usually punctuated by the mother yelling "don't!" during the interesting moments.)
These are just guidelines, however. I'd be happy with anything that adequately reflects ths same spirit found here. Additionally, they're not meant to replace the Turing Test or the Loebner Prize, I wouldn't ask the authors of countless bots to give up their work. I could never compete in their field, and they probably feel the same way about mine. I confess that my original thoughts were to replace the existing tests, but after consideration, diversity is simply a much better idea.
Finally, I feel the need to point out that I've left a lot out. The guidelines I've described don't fully account for everything required in a competition. In particular, they take no ownership of scoring, but I hope I've provided enough detail to seed a complete set of guidelines. I also hope that I have demonstrated that the environment is not as rich as it ought to be.