Deep learning has provided state-of-the-art LLM models capable of generating answers in a specific dialog context, having a desired speech tone that creates an impression of a personality to the user, thus manifesting a specific character and attitude. Trustworthiness of responses can be further increased when the system implements retrieval augmented generation (RAG), a method where relevant and accurate information found in documents is provided to the input prompt of a pre-trained transformer. The RAG method alleviates the problem of confabulation, where a response by the model has high confidence but it is inaccurate or undesired, since the probabilistic behavior is restricted to follow the content of these documents. Accurate automatic speech recognition (ASR) and realistic text-to-speech (TTS) synthesis are now available as a ready-to-use cloud service, which can facilitate verbal communication between the system and the user. Using a rapid software development (RAD) tool, like the Unity development platform, we can animate 3D avatars and sync the movement of lips with the generated voice, as part of an augmented reality (AR) experience.
Blueavatars’ VirtualZeus is a prototype mobile application of an HCI for a chatbot, where the user can ask questions to a virtual character from the Greek mythology, incarnated through the combination of computer animation and augmented reality with spoken natural language processing and generation. The ancient god Zeus has been chosen as a testbed for the four aspects of the project’s implementation that are visualization, verbal interaction, information retrieval and natural language generation. The modules in the AI tier of the architecture use the Google Cloud Speech-To-Text API and the ElevenLabs Text-To-Speech generative models, while a locally deployed LLM enables robust English and Greek generation. This requirement is satisfied by Llama Krikri, a lightweight model that was created by ISLP by fine-tuning Meta’s Llama 3.1 8B on Greek corpora. The 3D Avatar of Zeus was created with Blender, where shape keys have been designed to allow the lips to move according to the desired visemes, the facial characteristic that many phonemes may potentially share. The prototype is an Android application, which uses the ARCore component in Unity to project the face of Zeus in the environment that is captured by the device camera.
The user speaks to the microphone to prompt a locally deployed LLM, and hears the response while the face of VirtualZeus is animated. The virtual persona was designed through extensive prompt engineering, where a role was specified and guided with a list of additional instructions. The RAG system has a collection of documents on Greek mythology that are composed by human experts, and contain information like names, relationships, genealogy, places and events, mostly from the works of Hesiod and Homer. A limited evaluation of the system revealed that most of the users prefer the HCI demonstrated by VirtualZeus that resembles the natural dialog between humans, instead of a text-based chatbot UX. Additionally, 78% of the users found the answers given on Greek mythology accurate.