State of Text-To Speech

#14

I can see a lot of players failing to make it funny. I don’t particularly look forward to having the game repeatedly tell me “You have been targeted by coxcoxcoxcoxcox!”

1 Like
#15

Personally I would much prefer that TTS is not used for any story related text, I’d rather there be no voice acting there than TTS.

However I would like to see it used in all private/guild chat and sidequests/missions. And ofcourse that there is an option to turn it off, because it’ll most likely irritate some people.

1 Like
#16

Yeah I am definitely of the opinion that no TTS is better than bad TTS.

An option for visually impaired players? Sure. But definitely off by default.

#17

I think it would be pertinent to this discussion if we all went and spent an hour or two playing moonbase alpha.

1 Like
#18

I watched about 20 seconds of the first YouTube hit on “Moonbase Alpha”. There are many ways to design something wrong, and the developers of Moonbase Alpha certainly found one. For those uninterested in going through that exercise, consider how long it takes to ignore a spoken sentence that you don’t want to hear versus a text message you don’t want to read. The spoken sentence just won’t go away.

I hope that I-Novae Studios will pursue Teamspeak integration so that voice communication is a standard part of gameplay. I am a gamer who wants to interact with other players through the game, so I have no interest in text-to-speech or even canned voice (i.e. voice acting). That said, I can imagine experiences in game that are scripted, and text-to-speech or canned voice seems appropriate there. I’ll just do my best to avoid them.

An example of how canned voice is not interesting to me is the “Jumping” announcement in EVE Online. The woman has a beautiful voice and the delivery is velvety smooth, but with so many jumps in the game the repetition got to me. I’d rather have the game fiddle with the sound effects so that they’re not always the same. That is, if the game world is procedural, then the sound effects can modulate according to the environment. A nearby star in a jump causes reverb to one element of the jump sound. Low fuel produces a rattle sound in another element. Starting a jump in a bendy jump tunnel plays with the volume. And so on.

#19

The canned voice in the X Serries is one of the greatest parts about it in my oppinion.

#20

Speaking about TS integration, I’m not so sure. Most of the time, players have already their own server. And more importantly, how would you give management rights to players ? Because the TS would be public, it would most likely be a happy mess at best, a troll nest at worst.

But well, as long as it is only an option and not compulsory, it’ll be fine.

#21

I think he was more talking about integrating the voice tech and not the dedicated standalone server stuff. Think voice chat in CounterStrike, Battlefield, Call of Duty. See here:

http://teamspeak.com/?page=teamspeak3sdk

#22

He was. It may be that I-Novae Studios is relying on the fact that players can modify the game to permit stuff like Teamspeak integration. For example, ARMA 3 has multiple mods that use Teamspeak to implement various radios. In one mod the short range radio has fairly low quality audio, while the longer range (and much bulkier) radio has better quality audio. There are mic-press chirps and everything. I’m fairly certain that game sounds can also be sent through the Teamspeak channels to indicate background activity to the receiver.

#23

It’s called positional audio, there’s a teamspeak plugin to enable it and mumble has the feature by default, iirc. It’s really great for games like arma. the game basically tells teamspeak where the person is or what effects to add to their voice and etc. Though im not sure how useful it would be for a game like infinity, surely having realistic voice carry distance isnt so important in the vacuum of space.

1 Like
#24

The important bit for me is that the sounds the pilot hears are also heard by anyone the pilot communicates with. Other players would hear the pilot’s background chatter on other communications channels, warning claxons, impacts, weapon fire, possibly simulated explosion sounds, scanner noises, atmospheric noises, etc.

Using a proper microphone with a separate Teamspeak server means that ambient noises (such as people yelling in the background) are not heard by others. But that also means that game noises are not heard.

It could be argued that a space-faring society would have the pilot using a proper microphone so that the ambient noises of combat wouldn’t interfere with their critical communications. I would say that gameplay invites keeping those ambient noises.

#25

AEIOU!

(John Madden)

3 Likes
#26

Right, time for some nightmares because that crap is going to stick in my head for a long time.

1 Like
#27

Of course, not by default! :smiley:

[edit] Though it seems IVONA does a bit better than Sam, ‘???’ (reads only one). But ‘uuuuuuuu’ pronounces every vowel as a single one, goes into stutter repeat without losing breath… ever.

#28

*in a kill streak voice*

#ULTRA BUMP!

#29

I use IVONA TTS ro read back to me on windows. Indian English is my fav, and a lot more TTS companies now use a version of Indian English. I just really like the female Indian English accent.

#30

Yeah, but as Flavien points out:

The issue is that IVONA, now part of Amazon, and others offer TTS as a service. Meaning that TTS conversion that had occurred before for a different client would have to be saved to I-Novae Studios Azure servers in order to not have to pay few cents repeatedly.

1 Like
#31

Seems we will soon have a way to do this client-side.
https://15.ai/
DeepThroat (/ˈdēpˌTHrōt/): Natural emotive high-fidelity text-to-speech synthesis with minimal viable data

Warning: Includes swearing/cursing

Not dubbed by the voice actors of Team Fortress 2.

Another example… the little ponies skit:
https://www.youtube.com/watch?v=Fj6dufqpQOw

P.S. I do find the Big Chungus meme funny.
https://knowyourmeme.com/memes/big-chungus

1 Like
#32

So I tinkered with an open-source TTS engine called MaryTTS for several hours and wrote a Python script that just reads the log file for text notifications.


There is no need to compile MaryTTS, there is already a release for Windows, where you just run the marytts.bat file to run the MaryTTS Web server locally: (marytts-installer-5.2.zip)

Download link for the executable file for the log reader, which watches for changes within the log, fetches voice lines and then sends those voice lines to the local MaryTTS Web server:
https://1drv.ms/u/s!AuEqVKK0eUKDhuQIs5ommY8AI_Alig?e=fpbLFq (SHA-256 hash = 09C3B8E6EAC3C26C08EA6064E1AF03390114FCE641A19AA35086BC3001B6049E)
Source code:
https://gist.github.com/Pendrokar/9311de9c2be9e51c608372637d4e8583

Demonstration video:



Installation steps:

  1. Unpack and run marrytts.bat file from the MaryTTS installer, which will run the TTS engine and has an interface accessible through a web browser.
  2. Download and move the executable file to Documents\I-Novae Studios\Infinity Battlescape\Logs
    (Valid path in File Explorer)
  3. Launch Infinity Battlescape, so that a new log file is created
  4. Run the exe file
  5. Play Battlescape
  6. You may adjust TTS volume through Windows Sound Mixer

The voice may sound more robotic due to two randomized parameters I add to it, so that the same line is never read in the same manner.

But this is what an open-source solution can already give. I used plain input text option, but there are other options that allow to attach certain emotions to phrases. As friendly fire and critical hits currently are also read slowly and calmly.

6 Likes
#33

Went through some of the TTS and AI voice actor services. Sadly, all AI enchanced TTS still offer the same type of service. Pay their data cloud to have it generate an audio file. Even though most of their costs goes into generating the neural network, that just makes audio files based on text input. No reason to have a supercomputer generate the files. Other than to keep control. :unamused:

While the following is impressive, it cannot be mass produced:

Even if these NPCs would have simple backgrounds: