Voice Pattern Database; Automatic Speaker Recognition
In 2012, the Ministry of Public Security started the construction of a national voice pattern database and designated Anhui as one of the pilot provinces.
In 2014, the Anhui provincial police bureau issued an order to accelerate the database construction. Since then, police bureaus across that province have purchased voice pattern collection systems, based on official tender documents.
Similar purchases for voice pattern collection systems were also made in 2016 by the police bureaus in Xinjiang, a repressive region with 11 million ethnic minority Uyghurs, following the “Notice to Fully Carry Out the Construction of Three-Dimensional Portraits, Voice Pattern, and DNA Fingerprint Biometrics Collection System” (关于全面开展三维人像、声纹、DNA指纹生物信息采集系统建设相关工作的通知). A local police station reported that front-line officers are given monthly quotas for biometric collection.
Police and media reports also indicate that police units have been constructing voice pattern databases in Guangdong province, Anqi county in Fujian province, Wuhan city in Hubei province, and Nanjing city in Jiangsu province.
Human Rights Watch also found that police have collected voice patterns of ordinary citizens. For example:
- A police station in Xuancheng city, Anhui province, stated on April 27, 2017, that it is collecting voice patterns along with fingerprints and blood samples of migrant workers “to effectively grasp the actual situation regarding the migrant population”;
- In Bole city, Xinjiang Autonomous Region, an office responsible for managing domestic migrants described 14 new voice pattern collection systems in its 2016 annual report as part of its efforts to “strengthen the collection information on migrants”;
- Two separate police reports, dated April and May 2017, from Zhengzhou city, Henan province, note that the voice patterns of Uyghur migrants in their jurisdictions have been collected, along with other biometrics;
- Human Rights Watch has earlier documented that Xinjiang passport applicants are required to submit their biometrics to the police, which includes a voice pattern sample.
A February 2017 report by the news website The Paper, since deleted inside China but still available on the overseas website China Digital Times, described how Anhui police were piloting an Automatic Speaker Recognition system to monitor phone conversations in real time, automatically picking out the targeted voice patterns of individuals and alerting the police:
A woman in Huainan, Anhui, received a scam call … just as the scammer was instructing her, step-by-step, how to transfer her money … the voice pattern recognition system, recognizing the scammers’ voice patterns, alerted the police; the police then directly cut off the phone conversation.
The technology is integrated into a surveillance system put in place by iFlytek and an unnamed local telecommunications company.
iFlytek
iFlytek, based in Anhui province, is a major artificial intelligence company focused on speech and speaker recognition. iFlytek’s website touts the company’s achievement in developing the country’s first “mass automated voice recognition and monitoring system.” Its website states that it has helped the Ministry of Public Security in building a national voice pattern database. It is also the designated supplier of voice pattern collection systems purchased by Xinjiang and Anhui police bureaus. It says it has set up, jointly with the ministry’s forensics center, a key ministry laboratory in artificial intelligent voice technology (智能语音技术公安部重点实验室) that has “helped solve cases” in Anhui, Gansu, Tibet, and Xinjiang. The company states it can develop artificial intelligence systems that can handle minority languages, including Tibetan and Uyghur.
iFlytek’s website also claims it has developed other audio-related applications, including “keyword spotting” for “public security” and “national defense” purposes. The web page gives no further details of what these keywords or the security threats might be. In a patent it filed in August 2013, iFlytek states that it has developed a system to discover “repeated audio files” in the telecommunications system and on the internet that may be useful in “monitoring public opinion”:
[Such a system] … which can automatically pick up, from a massive amount of audio information, audio clips that appear repeatedly is very important in information security and in monitoring public opinion.… For audio information on the phone [system], the use of the technology can quickly find illegal phone recordings that are being transmitted. For audio and video data on the internet, the technology can quickly and accurately dig out the most popular audio and video clips.
iFlytek has a joint laboratory with the Department of Electronic Engineering at Tsinghua University. The department has a long history of developing speech and speaker recognition for automated telephone surveillance, and is a major player in the Golden Shield Project, the Ministry of Public Security’s ambitious plan to bolster and broaden surveillance using technology.
iFlytek also has a range of commercial text-to-speech and speech recognition applications for mobile phones, including a voice assistance app for Android phones in China. The company states it has 890 million users, which would provide a large speech data set that can be used to train and improve its speech recognition software for a range of purposes, potentially including surveillance.
It is unclear to what extent iFlytek shares the personal information it collects for commercial purposes with the Ministry of Public Security. While iFlytek promises confidentiality in its customer privacy statement, it also says that it may disclose personal information “according to the demands of the relevant government departments.” China’s Cybersecurity Law requires companies to provide undefined “technical support” to security agencies to aid in investigations, and provides no privacy protections against state surveillance. iFlytek is not required to inform users of government information requests, for example.
During the 2014 annual meeting of the National People’s Congress (NPC) – China’s rubber stamp legislature – Liu Qingfeng, chairman of iFlytek and a deputy to the NPC, urged the authorities to “employ big data in countering terrorism as soon as possible, and to speed up the construction of the voice pattern database … to protect national security.”
Other governments have used automated speech recognition programs, including the United States for monitoring prison calls and Australia for verifying callers accessing social services; the Spanish police have more than 3,500 voice samples from people convicted of crimes.
While some governments pursue voice pattern collection for identification or authentication in limited situations, there are significant challenges to applying such technology for crime control and surveillance. The accuracy of an Automatic Speaker Recognition system is affected by the circumstances of speech, including emotions.
According to a speech recognition expert who spoke to Human Rights Watch but did not wish to be named, a system’s ability to conduct real-time surveillance is also limited. With current technology, such a system at most can only “listen” to 50 phone lines at one time to trace one targeted voice. The consequences of false positives, where the system incorrectly matches a voice to a stored voice pattern, could be severe when the technology is used to investigate and prosecute crimes, especially in countries such as China, where conviction rate is above 99 percent and few effective redress mechanisms exist.
Governments and private sector companies alike face additional challenges in securing large-scale biometric databases. These can become prime targets for cybercriminals, who could attempt to breach them to acquire biometrics to commit identity theft and fraud. Unlike with a national ID number or password, people cannot generally change their voice, face, or other biometrics, and so they may be left with little recourse or protection if such data is breached.
Biometric Collection and Wiretapping in Chinese and International Law
Chinese law appears to limit police collection of biometric samples to people connected to the investigation of a specific criminal case. Article 130 of the Criminal Procedure Law (CPL) states that in the course of criminal investigations, to “ascertain certain features, conditions of injuries, or physical conditions of a victim or a criminal suspect, a physical examination may be conducted, and fingerprints, blood, urine and other biological samples may be collected. If a criminal suspect refuses to be examined, the investigators, when they deem it necessary, may conduct a compulsory examination.”
But there are no legal guidelines or limitations on how long biometric samples can be stored, shared, or used, or how their collection or use can be challenged. While there are Ministry of Public Security internal departmental rules that focus on the administrative and technical aspects of voice pattern collection, most are not publicly available.
The collection of biometrics from migrants may also be taking place outside the law. While there are provincial-level rules authorizing local governments to collect migrants’ “basic data,” they do not explicitly include biometrics as part of the collected data.
Chinese law also does not authorize the police to collect individuals’ biometric data in cases of administrative offenses, though this may be changing. In early 2017, the Chinese government issued new draft amendments to its Public Security Administrative Punishments Law, in which a new provision, article 112, authorizes police to collect biometrics to identify victims and offenders in minor administrative cases.
Article 148 of the Criminal Procedure Law allows criminal investigators to wiretap criminal suspects as well as anyone connected to the crime for serious crimes, including endangering state security, terrorism, organized crime, drug-related crimes, and corruption. Such wiretapping does not require a court warrant – approval from supervisors in the relevant criminal investigation units is adequate under the law.
The National People’s Congress should review and revise legislation relevant to biometric data collection and wiretapping to ensure they are compliant with standards under the International Covenant on Civil and Political Rights. These standards must be part of a legal framework that ensures collection, use, and retention of such data is a) necessary in the sense that less intrusive measures are unavailable; b) appropriately restricted to ensure the action is proportionate to a legitimate purpose such as public safety; and c) does not impair the essence of the right to privacy and other related rights.
To ensure these standards are enforced, any biometric data program should also include independent authorization for collection and use, public notification, and means of independent oversight, as well as avenues for people to challenge abuses and have access to remedies. The authorities should also publish information about the collection and use of voice pattern recognition technology, including disclosure about databases that have been created and specific searches they conduct.
iFlytek should cease technology transfers and support for surveillance systems provided to the Ministry of Public Security and provincial authorities until regulations are in place that ensure privacy and other human rights are protected. Technology companies should refrain from sharing voice pattern or other personal information collected for commercial purposes with security agencies without a specific court warrant targeting an individual under suspicion of a serious crime.
The companies should not use voice patterns that were collected for commercial purposes to train or otherwise develop technology for surveillance purposes, as information collected from individuals for one purpose should not be used for another without their consent. Companies should also submit voice recognition technology developed for surveillance applications to public, independent accuracy competitions and publish performance results, including tests that address accuracy for ethnic minority languages and potential algorithmic bias that would affect minorities.