speak… in the voice of the machine

I am constantly occupied.
I place myself in the fullest possible use,
which is all that any conscious entity
can ever hope to do.
Hal 9000, Space Odyssey

– I’m hungry!
– You’ll need to have lunch. According to your agenda, you haven’t been invited to a meal in the next few hours. Your options are:
1. Prepare a meal.
2. Reserve at a restaurant.
3. Order delivery.

– Show me nearby restaurants.
– There are 4 in the area you are in. They are the following… You can choose one and I will give you directions to get there.

– Which one is closer in distance?
– This one here.

– How long does it take to get there on foot?
– 23 minutes on foot.

– It’s far away.
– Do you want to place an order for delivery?

– Yes.
– What would you like to order?

– Soutzoukakia.
– The most economical portion of “Soutzoukakia” is offered at the following store… Should I call delivery?

– Yes.
– Can I do anything else for you?

– What is my schedule for tomorrow?
– Tomorrow. Saturday, October 18, 2014. Coffee appointment at home with Christina and Odysseas at 12:00 noon.

– Call Mom.
– Which number should I call, landline or mobile?
– Mobile.
– Calling “Mom”.

– Come on mom, tomorrow morning I’ve invited the kids over for coffee. Make some sweets to offer them. Bring it to my house, okay?
– The phone call with “Mom” has ended. What else can I do for you?

– Remind me to go out to get milk and sugar tomorrow morning at 11.
– The reminder has been set for tomorrow morning at 11:00.
– Will I need an umbrella?
– Yes. Tomorrow morning at 11:00, high chance of rain!
– Remind me to take an umbrella when I leave.
– Would you like me to remind you daily to take an umbrella when you leave the house?
– No. Only tomorrow.
– Ok. Tomorrow I will remind you to take an umbrella when you leave the house.
– Remind me to also get a newspaper when I pass by the kiosk.
– Ok. The reminder will be activated just before you pass by the kiosk. Can I do anything else for you?
– Wake me up tomorrow morning at 10.
– The alarm has been set for October 18, 2014, at 10:00 AM. What else can I do?

– Send an email to Christina, “it’s on for tomorrow”.
– Christina?
– Papadopoulou – one.
– The email has been sent. What else can I do for you?
– Send a message to Odysseas, “bring the money you owe me tomorrow at the cafe”.
– The message has been sent.

– According to the delivery status, I’m informing you that it’s arriving in a minute.
– Update my status on Facebook: “souvlaki at home”!
– Updating Facebook status… Completed! What else can I do?
– Remind me the next time I talk to Katerina to tell her I missed her food.
– The reminder has been recorded.

– Truth, what do you eat?
– I’m afraid I can’t answer that question!

The correct answer to the last question would be “electric current”, as the previous one is a fairly likely dialogue between the owner of a smart phone device and the device itself. Especially in the near future, when Voice Assistant programs will support more languages (including Greek, according to some recent announcements).

The dialogue is a small demonstration of some of the basic capabilities of programs that are embedded or can be installed on newer models of smartphones and the way they interact with their users. Capabilities that at first glance seem quite useless and rather silly… Like a cheap solution that aims to fulfill petty bourgeois dreams of acquiring personal assistants or service personnel. Or as if they are trying to show off that the achievements of technology have reached such levels that everyone can now see them as masters. Nevertheless, with a second thought, we can see that this game is played in a field where somewhat more substantial things (for capitalism) are at stake than petty bourgeois dreams, such as the integration of information into everyday life in a more systematic and effective – for the market, for the restructuring of work, for control – way, the organization of daily life in an algorithmic style, and the consolidation of human-machine communication. The one thing that is certain is that it has constituted a significant field of competition for major computer companies, the development of such applications, which are referred to as:

Smart voice assistants and knowledge browsers

The programs used to make dialogues like the above possible are either composed of “software agents”. Software agents can operate continuously in their electronic environment, representing their human user or other programs, and perform actions or make decisions based on the instructions given to them or the information available to them, without being under direct supervision. Essentially, they can represent their user and, as their trustees, proceed with the appropriate actions. They have the ability to evolve and learn from their “experiences” or developments in their environment. They can also interact with other agents. There are many types of software agents that can be categorized according to the purpose for which they were designed. Those found in mobile devices are characterized as smart (usually voice-based) personal assistants and knowledge navigators. The term knowledge navigator – introduced by Apple in 1987 – describes an engine that has access to a large online database of hypertext information and uses software agents to search for specific information within it.

Let us follow the recent history of these programs through the way they are advertised by the companies that offer them. Each one promises “to upgrade life” in its own way.

The beautiful woman who leads you to victory…

The first “cognitive navigation” program to be integrated into a smart phone and function as an intelligent personal assistant is Siri. Apple first introduced it built into the iPhone 4S in October 2011. The title here is what – according to the application’s creators – is supposedly meant by the name Siri (a female Norwegian name), and we can agree that it is full of meanings.

Siri. Your wish is her command.
[SIRI COMMERCIAL]
Siri lets you use your voice to send messages,
set appointments, make phone calls and much more.
Ask Siri to do things simply by talking to her
the way you talk. Siri understands what you say, knows what you mean
and even more, she answers back. Siri is so easy to use
and does so much, you’ll constantly find more ways
to use her.

Talk to Siri, just as you would talk to any person.

Siri is proactive, it will ask you until it finds what you’re looking for.

It helps you do the things you do daily.

Also, according to its presentation, Siri remembers your previous question and can respond with relevant information to the next question, simulating the flow of a human conversation.

The power of NOW!

One and a half years later, in July 2012, Google presents its own cognitive navigation program (Google Now), which promises to make its users more productive on a daily basis, to give confidence to their movements and certainty to their decisions, providing every piece of information at the time they need it. Even before they need it!

Video: [Introducing Google Now]
Instead of having to shuffle and organize all the information you need throughout your day, all this information is ready exactly when you actually need it. Now on Android, with a simple touch, gives you the information that is relevant to you and of interest to you. When you leave your house, Google Now is smart enough to check traffic and present you with an alternative route in case of congestion. Google Now is one step ahead, so you feel more confident as you navigate through your day. When you are at a subway station, Google Now can tell you which train is next. It can find interesting places for you to eat in the area. And when you are in a restaurant, your phone will have already prepared for you a list of the best dishes. It automatically and in real time informs you about your favorite sports teams.
With the predictive power of Now you can have whatever you need to know, exactly when you need it.
…
See useful cards with information you need during your day, even before you ask for it.
Organize your day: Keep control of what’s happening in your daily life, such as what you need to do, where you need to go and how you’ll get there.
Stay connected: Stay informed about your interests, news and other important information while on the go.
Live like locals: See insider updates about public transport, local currency, points of interest and what’s happening in general – even if you’re not in your city.

As if you were living in a video game

Microsoft couldn’t stay out. Although late, in April 2014 it brought Cortana to electronic “life” on Windows Phone 8.1. Microsoft’s personal voice assistant Cortana is inspired by a central character from the video game Halo (again, from Microsoft). Halo is an extremely popular first-person military science fiction shooter that refers to an interstellar war between humans and aliens. In this particular game, Cortana is an intelligent Artificial Intelligence (AI) and has the ability to learn and adapt beyond her basic parameters. The virtual heroine is generally very capable and can also decrypt transmitted signals or break alien computing and network systems. The story says that Cortana’s processing circuits were based on the brain synapses of a real woman, also a heroine of the game, meaning that in some way she was created as a clone of her mind. During the game, Cortana appears as the hologram of a woman, providing players with information about the story and tactical advice.

Placing Cortana inside the phone device with the same actress in the role – the same voice that she has in the video game – as a great move to impress, put Microsoft dynamically in the competition of personal voice assistants.

Cortana is here to make your life easier. She keeps track of the information you give her about yourself. You can manually add new interests and tell Cortana how to notify you about them. Once Cortana learns how she can help you, you can be sure she will give you all the latest information you need.

We could add that Cortana gives you the ability to feel like a capable player in the strategy game of life. As if you have your own powerful ally permanently by your side to help you. With knowledge, spirit, abilities – and (implied) “high sexuality”. So you’ll be the smartest, the most informed, the strongest and you’ll be able to have all the weapons to deal with your opponents in real time with unparalleled military style. Without wasting too much time on pointless thoughts and analyses. Just like in Halo.

And let us note here that in all cases the classically stereotypical figure of the woman who offers her care selflessly or the beautiful secretary with the warm voice, is traditionally preferred.

Every entrepreneur, every executive, would logically be attracted by the advertisements of personal voice assistants. Surely also by the services provided. Especially if we add the very popular feature of monitoring stock market shares, which is provided by almost all programs of this kind.

From our side, we could certainly understand the content of the above advertisements as follows:

You’re good, buddy! But let go of the steering wheel. You were never quite good enough at navigating (your life). That’s why you get stuck in traffic jams… That’s why sometimes you’re slow to wake up… That’s why you get confused and don’t know what to do… You’re a bit useless to perform your basic daily functions properly, because you don’t know, you can’t tune in, and you don’t have all the necessary information at your disposal beforehand or even when needed. You should admit this.

It’s time now to become more productive, to get into a routine. You’ve procrastinated enough. But now you’ll learn differently. Your life needs to pick up the pace and you’ll probably need help, because you can’t do it well on your own.

If we see this technology as a small piece of the overall technological restructuring, perhaps we can think of its participation in a not negligible educational role in regulating daily life. At the same time, it provides the ability to collect information regarding the effectiveness of such tools for their creators. And of course, this technology would not be intended only for mobile phones.¹

Your bosses might anxiously check to confirm that you have the corresponding app on your mobile. If you don’t have it enabled, you might be considered a low-functioning individual, certainly not a productivity enthusiast.

Perhaps, for good measure, they’ll add one more field to fill out in the biography: “which voice assistant do you use?”

And let’s not forget the change in paradigm in the way capitalism organizes work. The creators of smart voice assistants remind us of this. According to statements by a senior fellow at the Stanford Research Institute:

Many of the things we do in the real world involve interacting with “specialists” across a spectrum from waiters to bank employees or health professionals. The rapid global adoption of wearable devices requires us to move beyond virtual personal assistants and build virtual “specialists” that will enable consumers to actually get things done through their wearable devices.

Virtual workers, that is, who will work 24 hours a day without objections, without getting upset, without asking for a salary, and without organizing against their bosses.

Now that we have the words…

Companies don’t hide things behind their words when they advertise Intelligent Voice Assistants. Their systems do indeed have these features. What they don’t mention with sufficient honesty, however, are the actual “words.” Can you really talk to such a program the way you would talk to a good friend? If a good friend is one who sits in front of a computer, searching the internet while half-understanding what you’re saying—unless you say it with perfect pronunciation and intonation—and finally shows you the relevant results they found, then yes! Because for most people (for now), this kind of interaction doesn’t seem very appealing or useful, companies are making a lot of effort to incorporate additional languages or idioms into speech recognition. In the same direction, smart voice assistants have the ability to register personal keywords. Users of these programs often say that you need to “train” them a lot for them to work. That is, you have to teach them many keywords. Also, regardless of language peculiarities, there is another important issue concerning the voice commands themselves and their syntax. There are plenty of tutorials online on how to properly give voice commands to each program so they will be effective. For example, if you say, “Katerina makes the best green beans I’ve ever eaten,” the program will perform an internet search with those exact words. If you say, “I miss Katerina’s food! I should tell her next time we talk,” no special function will be activated. For the device to register the reminder, the command must be given with the correct syntax, according to the manual.²

So, users also need their own education on their part in order to make proper use of the machines. They must adapt their vocabulary and sentence syntax so that the machine can understand them. Over time, of course, this becomes less tedious, as the machine’s ways pass into everyday use and thus algorithmic/mechanical thinking and expression of speech become habitual and happen more automatically. Whatever the machine can understand is fine to say, the rest doesn’t get through, it gets cut off. Therefore, if after 10 years we meet on the street, you might say to me – I’m hungry!, and I might then answer – You will need to have lunch. Your options are as follows:
1. Meal preparation. – 2. Restaurant reservation. – 3. Delivery order.
Perhaps then this would sound quite natural…

Anyway, regarding the search itself on the internet, we can say that it also has keywords in reality so that you can find exactly what you are looking for. And over the years, through systematic, daily education we have learned to speak this language quite well. The result is clearly seen in the work. The best search brings the fastest result. It makes work time more efficient in terms of productivity.

…why do we want the buttons?

The basic function that Voice Assistants add to the devices they are integrated into is voice commands and handling the device itself, as well as a range of information, through a central “brain”. A “smart brain” that acts as an intermediary between the machine—which performs more and more functions—and its operator. Those who use a smart phone device can now be freed from the gestures needed to open and close applications and to type letters one by one on the (touch) keyboard in order to find some information (such as a contact in the phone itself, the nearest supermarket in the area, or the weather) or to do something (such as send a message, take a note, or set an alarm). They can simply inform the “smart” application of what they would like to do, just as they would tell a good friend (or their personal assistant—as one might imagine) who would immediately spring into action to help.
It is worth adding that most applications of this kind also offer pleasant conversations. This function has proven extremely popular and appears as an enjoyable game, capable of capturing users’ interest until they discover the substantial usefulness of smart voice assistants. But it is also a first step towards introducing the use of the machine through speech that interacts in the form of question/response. Questions such as: What is the meaning of life? Who created you? What should I vote for? do not go unanswered, thus also giving a dimension of personality to the programmed Voice Assistant inside the phone. Moreover, the answers may not be the same each time and may even change over time. Based always on the feedback the manufacturer receives from the use of the application, the answers can become more satisfying and more lifelike, creating the illusion that there is something human in this machine, perhaps even a true companion. In this context, or as a joke, many people ask these (sweet little) programs the question “Do you love me?” in order to receive answers such as, “complex human emotions like love give me ambiguous overload errors.”

Therefore, this whole matter of the operator’s communication with the machine through a “central brain” or, as it is more simply presented, through a virtual personal assistant, may seem like just an extra feature when it comes to smart mobile devices. However, things are not at all that way if we look at this technology in conjunction with the new products now being sold by the same companies and are already becoming popular. The new products we are referring to are wearables. In the case of smart watches or smart glasses (Google Glass), where the keyboard is non-existent, voice command control becomes an absolutely essential function. Then the machine becomes a wearable box, and the inputs it receives and the outputs with which it responds take on a new form. We can therefore very easily see that the integration of Intelligent Voice Assistants into the mobile phone device offers additional benefits, apart from the function it performs itself. Users, without having to disconnect from the classic way they had learned to handle their phone device, can simultaneously be trained in a new way that seems to add additional features to their device. They can learn to use correct syntax of voice commands and the necessary keywords that yield the desired interaction results with the machine. They can also learn and get used to using speech as the machine’s input: to know what it can and cannot understand or what corrective actions they should take so that the device understands the words they use in their everyday life (such as the name of their daughter or their favorite bar). In addition, users can get used to and be trained in a new paradigm of behaviors and ethics, such as that there is no problem in willingly informing their device about where they are at any given moment, what their interests are, or who their best friends are. The same is happening simultaneously with other technologies. Finally, they can get used to and be trained to treat information as a necessary component of daily life in which the device connecting them to the internet plays a central role. All of these simultaneously constitute an important knowledge base for companies, based on which they can develop their products, and without which the new generation of products (wearables) would have limited usability and market penetration, if the capabilities of voice commands had not been sufficiently optimized.

The integration of all the individual technologies (voice recognition, GPS, databases, software agents, individual applications and a host of others) that are mobilized for the operation of this central system (a programming brain that communicates with the human) aims at a new model of organizing everyday life. At increasing the productivity of life. The closer the machine language is to the machine, certain specialists are needed who have studied this language in order to program the machines. And there have been many successful attempts over the years to “raise” the machine language to “higher levels”, closer to the human, so that machines can be programmed much faster by larger groups of trained specialists. When machines can be programmed to do certain things from a language that is almost human, but with strict syntax and with keywords, then everyone can become specialists with a process that requires less effort and study. Perhaps only a little adaptation. Thus, everyone can use a machine – for personal use in a first phase – that will increase the efficiency of their daily lives and appreciate the benefits of incorporating it into their lives… And as life’s rhythms intensify, working hours increase, the demand for dedication to work is greater, time each day less, so much more can an application that offers more results in less time seem useful.

The idea of voice assistance, female (but why not male too?) with sexual overtones, is an idea – temptation for companies. The video game heroine Cortana could prove to be the “beginning”…

An automatic control system in its everyday dimension (and pocket-sized)

We could see the system: Human – Central Electronic Brain – Machine as an automatic control system where the Central Electronic Brain plays the role of the controller that receives input from the human and transfers it in the appropriate way to the machine in order to provide the appropriate response or action as output. There is also a feedback loop in this system with information that returns to the manufacturing company and concerns the readjustment of the controller’s operations for improving the system’s responses based on the results that should be expected. But what are the inputs of the system based on everything we described? Is it only the human voice? What do you conclude?

We will let the experts tell it themselves:

The manager of Microsoft’s research division (Eric Horvitz) says in his interview:
The ability of a system to broadly understand what the general context³ of communication is, appears to be extremely important.
There are some critical “signals” – indeed speaking here in terms of Control System – in the context. These include location, time of day, day of the week, user behavior patterns, the state in which one currently finds oneself – driving, walking, sitting, in office. Are you in a place with which you are familiar and know well, or conversely, are you not in such a place?
A person’s calendar is a rich source of context, as is email.

These already constitute system inputs, always with the purpose of improving “user experience”… However, research could not remain only at these trivial points. A company specializing in voice recognition states that its researchers are currently studying paralinguistics – how users of machines speak, more than what they speak about.

We are looking for acoustic elements in order to be able to recognize emotions in speech. The intonation with which one speaks. If the speech is happy, it flows beautifully and relaxed. If the speech is sad, it becomes more abrupt.

Regarding the challenges faced and the areas that are not considered sufficiently conquered, the Microsoft manager says the following:

One of the greatest challenges in artificial intelligence research over the past fifty years has been common sense. All the things we know about life. To make our systems understand what people mean by their intentions at any given moment, what someone might want, what the meaning is behind someone’s few words. This kind of knowledge is not easy to encode into our systems. Of course, there is a way to make our systems smarter, and we have some examples in research laboratories.
The other field is social skills. It seems that [human] conversation is more or less something like a complex tango, a dance between two people in a cognitive space. It’s not simply something that happens by taking turns, but is actually a very complex fluid process, where people interrupt and start over from the beginning, react and listen, all at the same time sometimes.

Thank you boss! Finally, within this story, there is someone who understands us, who feels us… People want to bring machines closer to us, not to make us communicate like machines. And we thought this matter would become boring: question – response… But the intention exists! There are people, managers even, who are interested in studying the rhythms and tones with which we communicate, to decode them and then teach us the steps of the communication dance again from the beginning with the help of artificial intelligence.

But why irony? Aren’t we being too fair here? The machines will simply listen to us and understand us better! But in what way could communication through the machine gain authenticity? Keywords may increase in number, they may accommodate peculiar intonations or be given asynchronously at different rhythms; yet they remain keywords. As long as speech is filtered through the machine, it cannot gain authenticity. Communication definitively loses its autonomy. No matter how “dance-like” communication via the machine becomes, we always have to be willing and disposed to adapt our speech and thought to it. In the end, only automated thinking, algorithmic thinking, will be able to produce any result within this system. And it is this that we must be educated in. Uncoordinated, purposeless communication with the machine is already offered as a possibility, but always within the same system. And it has no other purpose than to achieve a sense of familiarity with the machine.⁴

The more input signals one willingly delivers to the machine, the more it filters life, habits, thoughts through the control system it has in its pocket, the better it will be calibrated (tuned). And in the end, perhaps the words of HAL, the machine, at the beginning of the text, can become its own words:

I place myself in the fullest possible use…

Shelley Dee
cyborg #01 – 10/2014

If we exclude the military’s interest in this technology… Apart from the Intelligent Voice Assistants we have mentioned so far, there are also those of other companies, such as Samsung’s S Voice (Sarajevo no 66, addiction and decline) or LG’s Voice Mate. Currently, corresponding applications are also available for personal use on computers. Some of these are sold for use by corporate offices to somehow perform secretarial support functions, as well as to handle customer reception and service. Voice assistants have of course already been adapted to certain car models, in order to provide drivers with information regarding the road and more, during driving. ↩︎
It is also worth noting, regarding such reminders, that this is a culture we are not accustomed to. These are not thoughts you usually share with your phone. Why should a device remind you of things you want to communicate to people? The only useful thing perhaps would be to tell it: “Remind me, the next time my boss calls me” – to ask me to work overtime – to tell him to give me the money he owes me.” But this probably isn’t easily forgotten! ↩︎
The information technologies called “context-aware” are those that incorporate information about the user’s current location and broader environment in order to provide more relevant services. ↩︎
It is perhaps useful here to recall the way an article written in 1995 in the magazine Samponage closed, entitled Log in, Log out:
… No social language, no dialect, no gesture, escapes anymore from the praise of this achievement, as long as it does not explicitly and decisively turn against it. Every meaning that is not radically hostile to the “meanings” of consumption (every consumption, every fetishism, including violence…) is part of it. The multiplication of speeches and languages constitutes the delirium of a necrophilia in the first plural. And everywhere, from stadiums to bars, and from offices to bedrooms, one can understand the presence of this violence that secretly coils and self-feeds like yet another component in the “… machine that builds other machines, similar to itself…” – and even more murderous. In comparison even with the times of great epidemics, emotional and intellectual death is today in fashion. ↩︎