State of Text-To Speech

There were discussions about this in the old forums (lomsor.com restored thread) and I believe at the time that Battlescape wasn’t annouced. Still in Battlescape it would still be as useful as it would be the Infinity MMO. Perhaps this feature may be set as a stretch goal for the Kickstarter campaign.

There has been significant progress in the area of voice synthesis since Microsoft Sam. While the artificial nature may still be felt, it can be practically eliminated when adding voice modifying effects.

TTS could read information useful for gameplay purposes (awareness).
Advantages of using TTS include:

  • Not requiring to hire a professional voice actor as a ship computer narrator
  • Reading player usernames
  • Reading default and custom names players give to their space ships
  • Reading default and custom ranks specified by players
  • Reading default and custom mission objectives
  • Relaying ship status
  • Reading chat and ingame messages
  • Players having the ability to customize their ship commanders voice. Who wouldn’t want their own CABAL (samples extracted from Command & Conquer: Tiberian Sun(Youtube).

Example scenario. Player joins team “Clan” against team “Axis”, both custom faction names.

TTS read Text:

Welcome back, Captain (custom rank) Sha-bai-yev (phonetic username).

In the last 37 hours (time past since last login) the Axis (faction name) have conquered extra 30%(custom amount) of our planet Beta (Default planet name) from 10%. The Khan (Highest rank of Clan faction) is ordering all free units to defend Planet Beta’s starbase, ISS Infinity (Custom name).

The Clan’s (other team name) Star Colonel (Custom rank), Figel Ratovsky (username), has promoted you to rank, Star Commander. Congratulations, Commander Shabeyev.
Star Colonel’s message reads: “Rusty-Duzel is AFK. I need you dude! (chat message)

Distance to ISS Infinity, 500 kilometres. 300. 100. Now arriving. Local Axis forces consist of 10 fighters (amount of type of ship), 3 corvettes and a frigate.

Seventh fleet (custom fleet name) has arrived with IBS Ogre (custom ship name).

Star Colonel has a new mission for your squad:
Primary Objective: Destroy the main Mark III turret (component on ship) on FTD Tyrant (custom ship name).

Plain IVONA, now a subsidiary of Amazon, Amy voice (British English):
https://drive.google.com/open?id=0B_GltoryokNZcHY5TWZHWl9aV2s&authuser=0

Mechanized voice, GoldWave expression evaluator function: wave(n)*cos(2*pi*f*t)+wave(n-(1/T)*x)*y:
https://drive.google.com/open?id=0B_GltoryokNZazE4RmRmRzBjWDQ&authuser=0

Note: These audio samples were create only for demonstration purposes and should not be reused anywhere else.

Disadvantages:

  • Extra coding rather than requesting the voice actor to record samples
  • TTS may read words we may find as obscene and silly… Dude!
  • Offline sample generation will cause extra resources (CPU, storage) to be consumed on the player’s PC
4 Likes

It’s funny you bring this up. I just started playing with TTS to initially narrate our KS video. My impression so far is that it is NOT a replacement for quality voice acting. Also, the robotic nature of the voices added a comedic component to the narration that was not intended. I do agree TTS has made strides, but IMO it’s just not there yet.

3 Likes

It really depends on what TTS you use, Microsoft one is pretty stock. The technology is not 100% there yet, but there are some professional solutions that do make it sound good. These guys are pretty good for today’s standards. I was interested into this years ago and sadly Hawking’s voice was not available back then…

/edit
Very nice find there @Pendrokar.

1 Like

I have been using IVONA. I admit I am still quite noobish with TTS. Some of the voices sound great, but holistically, it’s just not good enough. There is so much that goes into quality voice acting etc.

I admit, some of the sample voices sound great, mix well, yet you still can’t get the voice inflections required when you want them. Some of the voices just don’t mix well at all.

It would be great to not have to deal with voice acting, but tbh, I feel it is still required to get the depth we want, at least for narration etc. I:B itself likely will not need a ton of VO.

1 Like

I think the main problem is that TTS voices only have one tone and thus can not sound human, human voice adjusts tone to suit the context, the AI is always flat and is trying its best just to piece together a few words. To use this in a game the content would actually need to be adjusted for the shortcomings of the TTS.

Also why are you guys so self-conscious that you are considering using TTS for the KS pitch, I’m sure your US English is good enough and you have some 10 guys on the team, one of you must have a decent voice?

Most guys that use TTS on Youtube for instance are crooks from God knows where and probably sound horrific enough for nobody to trust their scam…

That first TTS sample sounds pretty darned good to me. But only once. If I hear it again, it destroys its ability to add to the environment. And it will be repeated. Also, using TTS to announce chat messages is a clunky interaction. Other players still have to use their hands to type a message, and the TTS stuff is not going to communicate inflection, pace and myriad other cues that we rely on with the spoken word.

If Infinity:Battlescape is a multiplayer game, put your money in player voice communication. NPC speech is for solo gaming. I play games to interact with the other players. The very purpose of the game is to give me an entertaining environment in which I can interact with those players. Having the game drone on about something for the 15th time is not enhancing the environment. Quite the reverse.

So ultimately I want quality in the environmental sounds and the player voice communication because I want to interact with other players in a fun environment.

An example of a game that gets this right is ARMA 3, which is designed from the start to be a multiplayer game. An example of a game that gets this wrong is Diablo 3, which tries to straddle the fence of single and multiplayer gaming and gets them both wrong.

I don’t know about you guys, but I prefer good writing to voice acting in pretty much all cases. For example, they had a larger budget than any previous elder scrolls game for skyrim, but the dialogue, story and writing were so inane that I found it pointless/difficult to enjoy. In other words, good voice acting is pointless if the text that defines it reads like an essay a student wrote the day before its due date.

So, for a stretch goal I plead for you guys to put more time/though/money/week-long carribean-workshop-for-the-team for the sake of writing that is actually good.

I actually wanted to bring this up two weeks ago, but not as narration for the KS video. It is definitely has not gone that far.

I don’t believe IVONA offers modification for the tone. If it would then non-custom text could be fine tuned. But seeing that a simple feature like having a breath is not included, try “011111111111111111111111” where it repeats in the same tone. Though it could have been just a optimization feature.

1 Like

As I said, a lot goes into voice acting. It’s not solely about the ability to speak good English and have a decent voice. Besides, there ofc will be parts of the KS video in which we speak, so if we are self conscious about it, too bad, we’ll have to get over it. But for our intro video, we’re definitely trending towards professional voice acting atm.

3 Likes

I’ve yet to hear one that wasn’t obviously synthesised. They just lack the simple stuff that the brain notices - correct timing, inflection, emphasis.

Interestingly, I didn’t notice anything unusual about the foreign speech in that Voice Reader Studio 15 video. The two English voices were obviously fake but I couldn’t tell the foreign ones weren’t real people.

I wonder what it’s like for people who understand a 2nd (or 3rd etc) language but don’t speak it regularly… can they tell the difference?

2 Likes

Probably because you’re not used to (hearing the presented foreign languages).
Same as english (both UK and USA I guess) was instantly categorized as “synthesised”, I had exactly the same thought about french version. Moreover, the female voice does not articulate well enough (at least for the text they choose) and even as a french-man myself, I had a hard time guessing what she was actually saying for 2 words. She had also some weird intonation…

About russian side, it sounded more “chopped” than other languages somehow. Though i can’t say I speak it :smile:

Best choice IMO. There’s a reason not everybody’s a singer … and that commercial adds choose specific voices according to the emotions / stimulii they want you to feel.

Though I would love to get my hands on the same kind of AI as Kane acquired (yet rather creepy if you remember the whole stroy), we’re still miles from anything close to that.
Your proposition is a very good one, but not feasible with actual tech. Or not without a huge portion of players making fun / being irritated by the chopped “one-tone” voice currently on the market.

This, one tone thing, seems to have been noticed even in sound effects. Most found the solution in changing the pitch a little each time the sound is playing. I have noticed that this has been even done to voice samples(Clicking on characters in Total War series campaign maps). While others starting with games like Half-Life 2 and Battlefield: Bad Company and on modify played sounds based on distance and environment.

By the way I have actually tried to recreate CABAL’s voice, but without directly contacting former Westwood employees I won’t get the same effect. It seems the effect is mix of multiple samples with different pitches in the same timeframe, while adding a metallic effect, this is actually too heavy on the CPU to be done in real time. Anyways here is what I got by using IVONA’s Brian British English voice:

Westwood’s CABAL Sample

IVONA’s Brian saying the same without added effects

My attempt to recreate the effect

For me Brian sounds good, except for for the sudden high pitched “frontLINE”.

If some of you mean about the content that the TTS would read, yes they would be very similar. I haven’t looked or read any computer generated articles. Sport articles especially. But such generators would probably be expensive and require heavy modifications.
Computer Generated Sports Article

What really matters is player and corp names so when creating a name for either of those there would be a TTS window that allows players to fine-tune how they want the TTS to read their name. If i name myself “bob the miner” but TTS reads it as “boob the minor” i can chose a different variation until it hits the intended one.

EDIT: i can see some players intentionally making it funny… xD

2 Likes

I can see a lot of players failing to make it funny. I don’t particularly look forward to having the game repeatedly tell me “You have been targeted by coxcoxcoxcoxcox!”

1 Like

Personally I would much prefer that TTS is not used for any story related text, I’d rather there be no voice acting there than TTS.

However I would like to see it used in all private/guild chat and sidequests/missions. And ofcourse that there is an option to turn it off, because it’ll most likely irritate some people.

1 Like

Yeah I am definitely of the opinion that no TTS is better than bad TTS.

An option for visually impaired players? Sure. But definitely off by default.

I think it would be pertinent to this discussion if we all went and spent an hour or two playing moonbase alpha.

1 Like

I watched about 20 seconds of the first YouTube hit on “Moonbase Alpha”. There are many ways to design something wrong, and the developers of Moonbase Alpha certainly found one. For those uninterested in going through that exercise, consider how long it takes to ignore a spoken sentence that you don’t want to hear versus a text message you don’t want to read. The spoken sentence just won’t go away.

I hope that I-Novae Studios will pursue Teamspeak integration so that voice communication is a standard part of gameplay. I am a gamer who wants to interact with other players through the game, so I have no interest in text-to-speech or even canned voice (i.e. voice acting). That said, I can imagine experiences in game that are scripted, and text-to-speech or canned voice seems appropriate there. I’ll just do my best to avoid them.

An example of how canned voice is not interesting to me is the “Jumping” announcement in EVE Online. The woman has a beautiful voice and the delivery is velvety smooth, but with so many jumps in the game the repetition got to me. I’d rather have the game fiddle with the sound effects so that they’re not always the same. That is, if the game world is procedural, then the sound effects can modulate according to the environment. A nearby star in a jump causes reverb to one element of the jump sound. Low fuel produces a rattle sound in another element. Starting a jump in a bendy jump tunnel plays with the volume. And so on.

The canned voice in the X Serries is one of the greatest parts about it in my oppinion.

Speaking about TS integration, I’m not so sure. Most of the time, players have already their own server. And more importantly, how would you give management rights to players ? Because the TS would be public, it would most likely be a happy mess at best, a troll nest at worst.

But well, as long as it is only an option and not compulsory, it’ll be fine.