中国：音声認証データの収集プライバシーへの脅威

（ニューヨーク） - 国家規模で音声認証データベースを構築する目的から、中国政府が個人の「音声パターン」サンプルを収集している、とヒューマン・ライツ・ウォッチは本日述べた。

中国政府当局はiFlytek社（アイフライテック・科大訊飛）と協力して、通話から対象となる「声」を自動的に特定できる監視システムのパイロット版を開発している。iFlytek社は、中国における音声認識テクノロジーの80％を生み出す中国企業だ。ヒューマン・ライツ・ウォッチは2017年8月2日、同社に書簡を送付。公安部との提携関係や、大規模に開発中の自動音声認識および監視システムをめぐるウェブサイト上の記述、それにともなう人権保護方針の有無について聞いたが、まだ回答を得ていない。

ヒューマン・ライツ・ウォッチの中国部長ソフィー・リチャードソンは、「中国政府はプログラムについてや、誰が対象となりうるのか、どのように情報が使用されるのかを規制する法律についてほとんど透明性がない状態のまま、人びとの音声パターンを数万人という規模で収集している」と指摘する。「歯止めなき監視と政府批判者に対する報復行為が続く中国では、当局がいとも簡単にこうしたデータを悪用しうる。」

中国政府は近年、既存の大がかりな監視と社会管理の取り組みを強化のために、大規模な生体認証データベースを構築するなど、生体認証テクノロジーの活用を進めてきた。警察が運用しているその他の生体認証データベースと比較して、音声パターンのそれはまだ十分に確立されているとはいえず、サンプル数も比較的少ない。こうした収集のために公安部が選んだ主なパイロット省のひとつである安徽省で、2015年までに警察は7万もの音声のパターンを収集した。一方、国家警察のデータベースには、10億人超の顔、4,000万人のDNAサンプルが記録されている。

音声認証データの収集は、個人の「マルチモーダルインターフェース」生体ポートレートを構築し、市民に関してさらなる情報を集めたいという中国政府の動きの一部だ。警察データベースに記録された音声認証データはIDナンバーとリンクされ、ひいてはそれがその他の生体認証データや、民族性・住所・ホテルの滞在記録といった個人情報にまでリンクされる可能性もある。

中国では、こうした個人情報を削除したり、その収集に異議を唱えたり、政府の監視に対する補償を勝ち取ることは非常に困難だ。指紋採取やDNAサンプルのようなその他のタイプの生体認証データの収集とは異なり、自分の音声パターンが収集されたことや、監視下に置かれていること自体に気づかない可能性がある。

公的な提案書や警察の報告書によると、警察は「標準的」「統合的」な「情報収集」の際に、指紋や手のひら採取、顔写真、尿およびDNAサンプルといった生体認証データとともに音声パターンも収集するという。

警察官は、軽犯罪も含む「法または罪を犯した」と疑われる者を、この取扱いの対象にすることができる。たとえばあるケースでは、安徽省で警察が事件を立件した際に、（行政犯罪容疑2件を含む）セックスワークの疑いで3人の女性の音声パターンを収集した。

このような音声パターンのデータベース構築および使用を正当化する内容の公式文書は皆無だが、開発を主導する科学者たちによる学術論文の数々から、音声マテリアルをもとに犯罪進行中に話し手を特定しようという政府の意図が伺える。自動音声認識（ASR）システムとして知られているAI（人工知能）プログラムは、マッチングの過程を迅速化するために使用されている。

国営メディアの報道によると、これまでテレコミュニケーション詐欺や違法薬物売買、誘拐、脅迫を含む数々のケース解決のために、音声パターンを一致させるこの自動音声認識システムが、法医学的に使われてきたと政府は主張している。これら一連の報道ではまた、テロ対策や「治安維持」の目的にも応用されているという。これらは、非暴力の意見表明の弾圧を正当化するために当局がしばしば使ってきた言葉だ。

政府が監視網を強化して張りめぐらすにつれ、政府批判や社会変革のための活動で、一般市民が拘束されることが増えている。WeChatのようなソーシャルメディアアプリといった対話ツールを用いた平和的な言論のために、活動家やネチズンらが有罪判決を受けるケースが過去にもあった。

政府は、モバイル端末のSIMカードを購入するときなど、各種サービスをめぐり幅広く「実名登録」を強制する努力を強めており、匿名性とプライバシーのスペースはますます縮小している。公的移動機関の切符購入にも「実名登録」が義務付けられていることから、列車での移動中に活動家が警察に尾行されるケースもある。当局はまた、活動家たちの自宅前にCCTVカメラを設置することで、威嚇し監視をしている。

政府による生体認証データの収集・使用は本質的に違法ではなく、時に許容範囲内の捜査手法として正当化されてきた。しかし、中国は署名したものの、まだ批准していない「市民的及び政治的権利に関する国際規約」が保障する国際的なプライバシー基準を満たすために、生体認証データの収集・保存・使用に関して各国政府は、包括的な規制を持ち、かつその適用範囲も、安全保障上の正当な目標の達成に必要かつ比例したものでなくてはならない。

生体認証データがセンシティブであることに照らせば、政府関係者は重大犯罪の捜査に必要でない限り、たとえば軽犯罪や移民の追跡といった行政上の目的では、こうした情報を収集、または活用すべきではない。収集も活用もすでに犯罪に関与していたことが判明している者に限るべきであり、特定の犯罪に関与していない市民一般を広く対象にすべきでない。収集・使用・保持は、性別や性的指向、人種、民族性、あるいは宗教的・政治的ほかの信条に基づくものであってはならない。個人は、政府がどのような生体認証データを所有しているのかを知る権利がある。

テクノロジー企業はまた、自社製品やサービスが、プライバシーや公正な裁判を受ける権利といった人権の侵害に関与していないことを確実にする、人権上の責任を負っている。

リチャードソン中国部長は、「中国当局の監視ツールは、プライバシー権の保護が大幅に遅れているのを尻目に、増え続けている」と述べる。「法的保護が明確かつ信頼できるものになるまで、中国政府当局は高度に繊細な生体認証データの収集を直ちに停止する必要がある。」

iFlytekと法的枠組みに関する情報の詳細は以下をご覧ください。

Voice Pattern Database; Automatic Speaker Recognition

In 2012, the Ministry of Public Security started the construction of a national voice pattern database and designated Anhui as one of the pilot provinces.

In 2014, the Anhui provincial police bureau issued an order to accelerate the database construction. Since then, police bureaus across that province have purchased voice pattern collection systems, based on official tender documents.

Similar purchases for voice pattern collection systems were also made in 2016 by the police bureaus in Xinjiang, a repressive region with 11 million ethnic minority Uyghurs, following the “Notice to Fully Carry Out the Construction of Three-Dimensional Portraits, Voice Pattern, and DNA Fingerprint Biometrics Collection System” (关于全面开展三维人像、声纹、DNA指纹生物信息采集系统建设相关工作的通知). A local police station reported that front-line officers are given monthly quotas for biometric collection.

Police and media reports also indicate that police units have been constructing voice pattern databases in Guangdong province, Anqi county in Fujian province, Wuhan city in Hubei province, and Nanjing city in Jiangsu province.

Human Rights Watch also found that police have collected voice patterns of ordinary citizens. For example:

A police station in Xuancheng city, Anhui province, stated on April 27, 2017, that it is collecting voice patterns along with fingerprints and blood samples of migrant workers “to effectively grasp the actual situation regarding the migrant population”;
In Bole city, Xinjiang Autonomous Region, an office responsible for managing domestic migrants described 14 new voice pattern collection systems in its 2016 annual report as part of its efforts to “strengthen the collection information on migrants”;
Two separate police reports, dated April and May 2017, from Zhengzhou city, Henan province, note that the voice patterns of Uyghur migrants in their jurisdictions have been collected, along with other biometrics;
Human Rights Watch has earlier documented that Xinjiang passport applicants are required to submit their biometrics to the police, which includes a voice pattern sample.

A February 2017 report by the news website The Paper, since deleted inside China but still available on the overseas website China Digital Times, described how Anhui police were piloting an Automatic Speaker Recognition system to monitor phone conversations in real time, automatically picking out the targeted voice patterns of individuals and alerting the police:

A woman in Huainan, Anhui, received a scam call … just as the scammer was instructing her, step-by-step, how to transfer her money … the voice pattern recognition system, recognizing the scammers’ voice patterns, alerted the police; the police then directly cut off the phone conversation.

The technology is integrated into a surveillance system put in place by iFlytek and an unnamed local telecommunications company.

iFlytek

iFlytek, based in Anhui province, is a major artificial intelligence company focused on speech and speaker recognition. iFlytek’s website touts the company’s achievement in developing the country’s first “mass automated voice recognition and monitoring system.” Its website states that it has helped the Ministry of Public Security in building a national voice pattern database. It is also the designated supplier of voice pattern collection systems purchased by Xinjiang and Anhui police bureaus. It says it has set up, jointly with the ministry’s forensics center, a key ministry laboratory in artificial intelligent voice technology (智能语音技术公安部重点实验室) that has “helped solve cases” in Anhui, Gansu, Tibet, and Xinjiang. The company states it can develop artificial intelligence systems that can handle minority languages, including Tibetan and Uyghur.

iFlytek’s website also claims it has developed other audio-related applications, including “keyword spotting” for “public security” and “national defense” purposes. The web page gives no further details of what these keywords or the security threats might be. In a patent it filed in August 2013, iFlytek states that it has developed a system to discover “repeated audio files” in the telecommunications system and on the internet that may be useful in “monitoring public opinion”:

[Such a system] … which can automatically pick up, from a massive amount of audio information, audio clips that appear repeatedly is very important in information security and in monitoring public opinion.… For audio information on the phone [system], the use of the technology can quickly find illegal phone recordings that are being transmitted. For audio and video data on the internet, the technology can quickly and accurately dig out the most popular audio and video clips.

iFlytek has a joint laboratory with the Department of Electronic Engineering at Tsinghua University. The department has a long history of developing speech and speaker recognition for automated telephone surveillance, and is a major player in the Golden Shield Project, the Ministry of Public Security’s ambitious plan to bolster and broaden surveillance using technology.

iFlytek also has a range of commercial text-to-speech and speech recognition applications for mobile phones, including a voice assistance app for Android phones in China. The company states it has 890 million users, which would provide a large speech data set that can be used to train and improve its speech recognition software for a range of purposes, potentially including surveillance.

It is unclear to what extent iFlytek shares the personal information it collects for commercial purposes with the Ministry of Public Security. While iFlytek promises confidentiality in its customer privacy statement, it also says that it may disclose personal information “according to the demands of the relevant government departments.” China’s Cybersecurity Law requires companies to provide undefined “technical support” to security agencies to aid in investigations, and provides no privacy protections against state surveillance. iFlytek is not required to inform users of government information requests, for example.

During the 2014 annual meeting of the National People’s Congress (NPC) – China’s rubber stamp legislature – Liu Qingfeng, chairman of iFlytek and a deputy to the NPC, urged the authorities to “employ big data in countering terrorism as soon as possible, and to speed up the construction of the voice pattern database … to protect national security.”

Other governments have used automated speech recognition programs, including the United States for monitoring prison calls and Australia for verifying callers accessing social services; the Spanish police have more than 3,500 voice samples from people convicted of crimes.

While some governments pursue voice pattern collection for identification or authentication in limited situations, there are significant challenges to applying such technology for crime control and surveillance. The accuracy of an Automatic Speaker Recognition system is affected by the circumstances of speech, including emotions.

According to a speech recognition expert who spoke to Human Rights Watch but did not wish to be named, a system’s ability to conduct real-time surveillance is also limited. With current technology, such a system at most can only “listen” to 50 phone lines at one time to trace one targeted voice. The consequences of false positives, where the system incorrectly matches a voice to a stored voice pattern, could be severe when the technology is used to investigate and prosecute crimes, especially in countries such as China, where conviction rate is above 99 percent and few effective redress mechanisms exist.

Governments and private sector companies alike face additional challenges in securing large-scale biometric databases. These can become prime targets for cybercriminals, who could attempt to breach them to acquire biometrics to commit identity theft and fraud. Unlike with a national ID number or password, people cannot generally change their voice, face, or other biometrics, and so they may be left with little recourse or protection if such data is breached.

A booth displays face recognition software at an exhibition in Beijing, China, September 27, 2017. © 2017 Reuters

Biometric Collection and Wiretapping in Chinese and International Law

Chinese law appears to limit police collection of biometric samples to people connected to the investigation of a specific criminal case. Article 130 of the Criminal Procedure Law (CPL) states that in the course of criminal investigations, to “ascertain certain features, conditions of injuries, or physical conditions of a victim or a criminal suspect, a physical examination may be conducted, and fingerprints, blood, urine and other biological samples may be collected. If a criminal suspect refuses to be examined, the investigators, when they deem it necessary, may conduct a compulsory examination.”

But there are no legal guidelines or limitations on how long biometric samples can be stored, shared, or used, or how their collection or use can be challenged. While there are Ministry of Public Security internal departmental rules that focus on the administrative and technical aspects of voice pattern collection, most are not publicly available.

The collection of biometrics from migrants may also be taking place outside the law. While there are provincial-level rules authorizing local governments to collect migrants’ “basic data,” they do not explicitly include biometrics as part of the collected data.

Chinese law also does not authorize the police to collect individuals’ biometric data in cases of administrative offenses, though this may be changing. In early 2017, the Chinese government issued new draft amendments to its Public Security Administrative Punishments Law, in which a new provision, article 112, authorizes police to collect biometrics to identify victims and offenders in minor administrative cases.

Article 148 of the Criminal Procedure Law allows criminal investigators to wiretap criminal suspects as well as anyone connected to the crime for serious crimes, including endangering state security, terrorism, organized crime, drug-related crimes, and corruption. Such wiretapping does not require a court warrant – approval from supervisors in the relevant criminal investigation units is adequate under the law.

The National People’s Congress should review and revise legislation relevant to biometric data collection and wiretapping to ensure they are compliant with standards under the International Covenant on Civil and Political Rights. These standards must be part of a legal framework that ensures collection, use, and retention of such data is a) necessary in the sense that less intrusive measures are unavailable; b) appropriately restricted to ensure the action is proportionate to a legitimate purpose such as public safety; and c) does not impair the essence of the right to privacy and other related rights.

To ensure these standards are enforced, any biometric data program should also include independent authorization for collection and use, public notification, and means of independent oversight, as well as avenues for people to challenge abuses and have access to remedies. The authorities should also publish information about the collection and use of voice pattern recognition technology, including disclosure about databases that have been created and specific searches they conduct.

iFlytek should cease technology transfers and support for surveillance systems provided to the Ministry of Public Security and provincial authorities until regulations are in place that ensure privacy and other human rights are protected. Technology companies should refrain from sharing voice pattern or other personal information collected for commercial purposes with security agencies without a specific court warrant targeting an individual under suspicion of a serious crime.

The companies should not use voice patterns that were collected for commercial purposes to train or otherwise develop technology for surveillance purposes, as information collected from individuals for one purpose should not be used for another without their consent. Companies should also submit voice recognition technology developed for surveillance applications to public, independent accuracy competitions and publish performance results, including tests that address accuracy for ethnic minority languages and potential algorithmic bias that would affect minorities.