During the initial stage of Qbo’s software selection I could check the amount of tools and “individual” projects on artificial vision, chatterbots and speech recognition and synthesis available on the Internet. Some of these developments were even used by important research centres in their robots.
Every development contributed with something different from others but they all had something in common: they required previous training in order to work optimally.
Speech recognition engines require two types of files: a grammar file and an acoustic model previously trained with different voices and text transcriptions.
Every word in the grammar file must be previously trained in the acoustic model, a very difficult job for a small group of people considering there was not an open-source acoustic model in English available for the community which was big enough to contain the grammar our platform required.
So that you can get an idea we would need hundreds of hours of speech transcriptions from different users in order to create an acceptable acoustic model.
The same happened with chatterbots: many developments but only a few had conversational data bases big enough to have an acceptable conversation and most of them worked by searching for an answer stored in their data base.
In 1950 Alan Turing proposed a test to prove the existence of intelligence in a machine but 60 years later nobody has ever succeeded. In 1990 they created the Loebner Prize provided with 100.000 dollars for the computer programme able to deceive a human jury which has not been achieved yet. Ray Kurzweil foretells this will be done in 2029.
Therefore, at the moment the community has the suitable tools but they do not have enough external support to make them really powerful. Maybe it is because nobody sits in front of a computer with a microphone to read out some sentences and upload them on the Internet without knowing exactly the result of this work. Or, perhaps, these tools do not sufficiently encourage the community to help with their growth.
We can see a very clear example of all this with the language translator of all-powerful Google. Nobody doubts that in this project Google has engineers and mathematicians generating and trying all kind of approximation algorithms, neural networks, natural language and semantics systems. But, despite getting some successful results they are still far from offering a “natural” and comprehensible translation. So, what was Google’s last step to improve its translation system? Asking users for help. Google proposes a translation and the user tries to improve it, that is the sense of this POST: “sharing is necessary to improve and grow”.
What would happen if we integrated and related all these “open-source” developments, a speech synthesizer, a speech recognition engine, a chatterbot and an artificial vision system with a platform able to interact with the outside and at the same time, each platform would share those data with the rest?
Imagine for example that an individual’s platform creates and stores a complete map of the Louvre Museum with artificial vision (SLAM) sharing this map with other platforms through a centralized system. From that moment any new platform placed in that environment would have a previously made map and it could move in that environment.
Now imagine thousands of platforms sharing all kind of data from the conversations stored by the chatterbot, speech transcriptions, objects, maps, images, gestures…and all this being updated on real time thanks to neural networks or learning algorithms.