Authorities are collaborating with iFlytek, a Chinese company that produces 80 percent of all speech recognition technology in the country, to develop a pilot surveillance system that can automatically identify targeted voices in phone conversations. Human Rights Watch wrote to iFlytek on August 2, 2017, asking about its business relationship with the Ministry of Public Security, the description on its website of a mass automated voice recognition and monitoring system it has developed, and whether it has any human rights policies. iFlytek has not responded.
“The Chinese government has been collecting the voice patterns of tens of thousands of people with little transparency about the program or laws regulating who can be targeted or how that information is going to be used,” said Sophie Richardson, China director. “Authorities can easily misuse that data in a country with a long history of unchecked surveillance and retaliation against critics.”
The Chinese government has stepped up the use of biometric technology in recent years – including the construction of large-scale biometric databases – to bolster its existing mass surveillance and social control efforts. Compared with other biometric databases run by the police, the voice pattern database appears to be less established, with fewer samples in it. By 2015, police had collected 70,000 voice patterns in Anhui province, one of the main pilot provinces identified by the ministry for such collection. In comparison, national police databases have more than one billion faces and 40 million people’s DNA samples.
The collection of voice biometrics is part of the Chinese government’s drive to form a “multi-modal” biometric portrait of individuals and to gather ever more data about citizens. This voice biometric data is linked in police databases to the person’s identification number, which in turn can then be linked to a person’s other biometric and personal information on file, including their ethnicity, home address, and even their hotel records.
It is extremely difficult in China for individuals to remove such personal information, challenge its collection, or otherwise obtain redress for government surveillance. Unlike other types of biometric collection, such as fingerprinting or DNA sampling, individuals may not even realize their voice pattern has been collected, or that they are under surveillance.
Police officers can subject anyone suspected of “violating the law or committing crimes” (违法犯罪), including misdemeanors, to this treatment. In one case, for example, police collected the voice patterns of three women who were suspected of sex work – including two suspected of administrative offenses – as police filed the case in an Anhui county.
No public official policy documents attempt to justify the creation or use of such voice pattern databases, but academic articles by scientists who are leading their development state that its purpose is to help identify the speaker in voice materials collected during a crime. An artificial intelligence program, known as an Automatic Speaker Recognition (ASR) system, is used to speed up the matching process.
Government reports in the media claim that Automatic Speaker Recognition forensics have been used to match voice patterns to solve cases involving telecommunications fraud, drug trafficking, kidnapping, and blackmail. According to these same reports, it will also be applied for counterterrorism and “stability maintenance” purposes – terms authorities sometimes use to justify the suppression of peaceful dissent.
As the government weaves a tightened web of surveillance, there are more ways ordinary citizens can get caught for criticizing the government, as well as for mobilizing and organizing for social change. There have been documented cases in which activists and netizens have been sentenced for their peaceful expression on communication tools, including on social media applications like WeChat.
The government has stepped up efforts to enforce “real-name registration” requirements for a range of services, including when purchasing mobile SIM cards, narrowing the space for anonymity and privacy. There are also cases in which activists are being tracked down by police when they travel on trains, as the authorities require “real name registration” for this and other forms of public transportation. Authorities have also installed CCTV cameras in front of the residences of activists, intimidating and monitoring them.
Government collection or use of biometric data is not inherently illegal and has been justified at times as a permissible investigative tactic. But to meet international privacy standards enshrined in the International Covenant on Civil and Political Rights, which China has signed but not ratified, each government instance of collection, retention, and use of biometrics must be comprehensively regulated, narrow in scope, and necessary as well as proportionate to meeting a legitimate security goal.
Given the sensitivity of biometric data, government officials should not collect or use such information unless necessary for the investigation of serious crime, and not for minor offenses or administrative purposes such as tracking migrants. Both collection and use should be limited to people found to be involved in wrongdoing, and not broad populations who have no specific link to crime. Collection, use, and retention should never be based on a person’s sex, sexual orientation, race, ethnicity, or religious, political, or other views. Individuals should have the right to know what biometric data the government holds on them.
Technology companies also have a human rights responsibility to ensure that their products and services do not contribute to human rights abuses, including violations of privacy and fair trial rights.
“Chinese authorities’ arsenal of surveillance tools just keeps getting bigger while privacy rights lag far behind,” Richardson said. “The Chinese authorities should immediately stop gathering highly sensitive biometric data until legal protections are clear – and clearly reliable.”
Voice Pattern Database; Automatic Speaker Recognition
In 2012, the Ministry of Public Security started the construction of a national voice pattern database and designated Anhui as one of the pilot provinces.
In 2014, the Anhui provincial police bureau issued an order to accelerate the database construction. Since then, police bureaus across that province have purchased voice pattern collection systems, based on official tender documents.
Similar purchases for voice pattern collection systems were also made in 2016 by the police bureaus in Xinjiang, a repressive region with 11 million ethnic minority Uyghurs, following the “Notice to Fully Carry Out the Construction of Three-Dimensional Portraits, Voice Pattern, and DNA Fingerprint Biometrics Collection System” (关于全面开展三维人像、声纹、DNA指纹生物信息采集系统建设相关工作的通知). A local police station reported that front-line officers are given monthly quotas for biometric collection.
Police and media reports also indicate that police units have been constructing voice pattern databases in Guangdong province, Anqi county in Fujian province, Wuhan city in Hubei province, and Nanjing city in Jiangsu province.
Human Rights Watch also found that police have collected voice patterns of ordinary citizens. For example:
- A police station in Xuancheng city, Anhui province, stated on April 27, 2017, that it is collecting voice patterns along with fingerprints and blood samples of migrant workers “to effectively grasp the actual situation regarding the migrant population”;
- In Bole city, Xinjiang Autonomous Region, an office responsible for managing domestic migrants described 14 new voice pattern collection systems in its 2016 annual report as part of its efforts to “strengthen the collection information on migrants”;
- Two separate police reports, dated April and May 2017, from Zhengzhou city, Henan province, note that the voice patterns of Uyghur migrants in their jurisdictions have been collected, along with other biometrics;
- Human Rights Watch has earlier documented that Xinjiang passport applicants are required to submit their biometrics to the police, which includes a voice pattern sample.
A February 2017 report by the news website The Paper, since deleted inside China but still available on the overseas website China Digital Times, described how Anhui police were piloting an Automatic Speaker Recognition system to monitor phone conversations in real time, automatically picking out the targeted voice patterns of individuals and alerting the police:
A woman in Huainan, Anhui, received a scam call … just as the scammer was instructing her, step-by-step, how to transfer her money … the voice pattern recognition system, recognizing the scammers’ voice patterns, alerted the police; the police then directly cut off the phone conversation.
The technology is integrated into a surveillance system put in place by iFlytek and an unnamed local telecommunications company.
iFlytek, based in Anhui province, is a major artificial intelligence company focused on speech and speaker recognition. iFlytek’s website touts the company’s achievement in developing the country’s first “mass automated voice recognition and monitoring system.” Its website states that it has helped the Ministry of Public Security in building a national voice pattern database. It is also the designated supplier of voice pattern collection systems purchased by Xinjiang and Anhui police bureaus. It says it has set up, jointly with the ministry’s forensics center, a key ministry laboratory in artificial intelligent voice technology (智能语音技术公安部重点实验室) that has “helped solve cases” in Anhui, Gansu, Tibet, and Xinjiang. The company states it can develop artificial intelligence systems that can handle minority languages, including Tibetan and Uyghur.
iFlytek’s website also claims it has developed other audio-related applications, including “keyword spotting” for “public security” and “national defense” purposes. The web page gives no further details of what these keywords or the security threats might be. In a patent it filed in August 2013, iFlytek states that it has developed a system to discover “repeated audio files” in the telecommunications system and on the internet that may be useful in “monitoring public opinion”:
[Such a system] … which can automatically pick up, from a massive amount of audio information, audio clips that appear repeatedly is very important in information security and in monitoring public opinion.… For audio information on the phone [system], the use of the technology can quickly find illegal phone recordings that are being transmitted. For audio and video data on the internet, the technology can quickly and accurately dig out the most popular audio and video clips.
iFlytek has a joint laboratory with the Department of Electronic Engineering at Tsinghua University. The department has a long history of developing speech and speaker recognition for automated telephone surveillance, and is a major player in the Golden Shield Project, the Ministry of Public Security’s ambitious plan to bolster and broaden surveillance using technology.
iFlytek also has a range of commercial text-to-speech and speech recognition applications for mobile phones, including a voice assistance app for Android phones in China. The company states it has 890 million users, which would provide a large speech data set that can be used to train and improve its speech recognition software for a range of purposes, potentially including surveillance.
It is unclear to what extent iFlytek shares the personal information it collects for commercial purposes with the Ministry of Public Security. While iFlytek promises confidentiality in its customer privacy statement, it also says that it may disclose personal information “according to the demands of the relevant government departments.” China’s Cybersecurity Law requires companies to provide undefined “technical support” to security agencies to aid in investigations, and provides no privacy protections against state surveillance. iFlytek is not required to inform users of government information requests, for example.
During the 2014 annual meeting of the National People’s Congress (NPC) – China’s rubber stamp legislature – Liu Qingfeng, chairman of iFlytek and a deputy to the NPC, urged the authorities to “employ big data in countering terrorism as soon as possible, and to speed up the construction of the voice pattern database … to protect national security.”
Other governments have used automated speech recognition programs, including the United States for monitoring prison calls and Australia for verifying callers accessing social services; the Spanish police have more than 3,500 voice samples from people convicted of crimes.
While some governments pursue voice pattern collection for identification or authentication in limited situations, there are significant challenges to applying such technology for crime control and surveillance. The accuracy of an Automatic Speaker Recognition system is affected by the circumstances of speech, including emotions.
According to a speech recognition expert who spoke to Human Rights Watch but did not wish to be named, a system’s ability to conduct real-time surveillance is also limited. With current technology, such a system at most can only “listen” to 50 phone lines at one time to trace one targeted voice. The consequences of false positives, where the system incorrectly matches a voice to a stored voice pattern, could be severe when the technology is used to investigate and prosecute crimes, especially in countries such as China, where conviction rate is above 99 percent and few effective redress mechanisms exist.
Governments and private sector companies alike face additional challenges in securing large-scale biometric databases. These can become prime targets for cybercriminals, who could attempt to breach them to acquire biometrics to commit identity theft and fraud. Unlike with a national ID number or password, people cannot generally change their voice, face, or other biometrics, and so they may be left with little recourse or protection if such data is breached.
Biometric Collection and Wiretapping in Chinese and International Law
Chinese law appears to limit police collection of biometric samples to people connected to the investigation of a specific criminal case. Article 130 of the Criminal Procedure Law (CPL) states that in the course of criminal investigations, to “ascertain certain features, conditions of injuries, or physical conditions of a victim or a criminal suspect, a physical examination may be conducted, and fingerprints, blood, urine and other biological samples may be collected. If a criminal suspect refuses to be examined, the investigators, when they deem it necessary, may conduct a compulsory examination.”
But there are no legal guidelines or limitations on how long biometric samples can be stored, shared, or used, or how their collection or use can be challenged. While there are Ministry of Public Security internal departmental rules that focus on the administrative and technical aspects of voice pattern collection, most are not publicly available.
The collection of biometrics from migrants may also be taking place outside the law. While there are provincial-level rules authorizing local governments to collect migrants’ “basic data,” they do not explicitly include biometrics as part of the collected data.
Chinese law also does not authorize the police to collect individuals’ biometric data in cases of administrative offenses, though this may be changing. In early 2017, the Chinese government issued new draft amendments to its Public Security Administrative Punishments Law, in which a new provision, article 112, authorizes police to collect biometrics to identify victims and offenders in minor administrative cases.
Article 148 of the Criminal Procedure Law allows criminal investigators to wiretap criminal suspects as well as anyone connected to the crime for serious crimes, including endangering state security, terrorism, organized crime, drug-related crimes, and corruption. Such wiretapping does not require a court warrant – approval from supervisors in the relevant criminal investigation units is adequate under the law.
The National People’s Congress should review and revise legislation relevant to biometric data collection and wiretapping to ensure they are compliant with standards under the International Covenant on Civil and Political Rights. These standards must be part of a legal framework that ensures collection, use, and retention of such data is a) necessary in the sense that less intrusive measures are unavailable; b) appropriately restricted to ensure the action is proportionate to a legitimate purpose such as public safety; and c) does not impair the essence of the right to privacy and other related rights.
To ensure these standards are enforced, any biometric data program should also include independent authorization for collection and use, public notification, and means of independent oversight, as well as avenues for people to challenge abuses and have access to remedies. The authorities should also publish information about the collection and use of voice pattern recognition technology, including disclosure about databases that have been created and specific searches they conduct.
iFlytek should cease technology transfers and support for surveillance systems provided to the Ministry of Public Security and provincial authorities until regulations are in place that ensure privacy and other human rights are protected. Technology companies should refrain from sharing voice pattern or other personal information collected for commercial purposes with security agencies without a specific court warrant targeting an individual under suspicion of a serious crime.
The companies should not use voice patterns that were collected for commercial purposes to train or otherwise develop technology for surveillance purposes, as information collected from individuals for one purpose should not be used for another without their consent. Companies should also submit voice recognition technology developed for surveillance applications to public, independent accuracy competitions and publish performance results, including tests that address accuracy for ethnic minority languages and potential algorithmic bias that would affect minorities.