Some observations on the Great Firewall of China

Suddenly, without any warning:
Aug  8 02:04:32 neolith1 sshd[12862]: Bad protocol version identification 
Say what?

Aug 20 05:57:34 neolith1 sshd[14803]: Bad protocol version identification 

I work as a security officer at the National Supercomputer Centre at Linköping 
University. It is my job to be paranoid.

I pay a lot of attention to our ssh logs. We have something like fifty thousand 
ssh logins per day, and anywhere up to half a million failed login attempts. I 
don't like seeing things in my logs that I don't understand. It makes me twitch.

I certainly didn't understand those log entries that suddenly started 
appearing. From out of the blue, Chinese IP addresses would connect to the ssh 
port on one of our systems and throw what looked like random bytes at it. Each 
address would connect just once. We had never seen those IP addresses before, 
and they would go away again, never to return.

The plot thickens: Moreover, it seemed that nobody had seen these addresses in 
any relevant context. I checked all of my usual sources for information on 
botnets, on ssh brute-force scanners, on machines exhibiting a generalized 
anti-social behaviour. Nothing. These addresses were clean. Nobody had seen so 
much as an insulting blog post comment from them.

I started discreetly contacting colleagues at other sites, to check if they saw 
anything similar. Again, nothing. Oh, sometimes people would see some "Bad 
protocol version identification" messages, but on closer look there was always 
some simple explanation. Perhaps somebody had ran an nmap or Nessus scan 
against them - these scans are easily recognizable. Sometimes, somebody was 
scanning for open http proxies. But this stuff? Nope.

Except, a few select colleagues did see the same thing. Chinese addresses, 
making one-shot connections to port 22 on a small number of target systems, 
sending pseudo-random binary data. To this date, I am aware of four sites 
around the world that have received these probes. In all reason, there must be 
more targetted sites, but I haven't found them.

This really made my admin senses tingle. Were we compromised? Were these weird 
payloads triggers for some kind of backdoor on our system? I will spare you the 
details, but the level of scrutiny I subjected the system to was, well, 
scrutinous. I found nothing. Unfortunately, this didn't necessarily mean that 
we weren't compromised. It might just mean that I hadn't looked hard enough. So 
I looked some more. Once again, nothing.

We captured some of the probe payloads in full. I could see no pattern in the 
data, no kind of protocol that I could recognize. Was this some kind of 0day 
sshd exploit? I threw the data at a carefully monitored sshd process, to see if 
it would trigger some unexpected code path. Nothing.

Eureka: Then, almost a year after I saw that first probe, I was looking at the 
log data from yet another angle to try to figure this out, and I finally saw 
the pattern. Epiphany. I felt like a genius, and at the same time like the 
world's greatest idiot.

The blindingly obvious thing I had been missing for a year was that each and 
every one of these probes was followed after 5-20 seconds by an ssh login 
attempt from China, and the attempts were mostly successful and apparently 
legitimate. The legitimate logins came from somewhere completely else, but 
still in China.

What can I say? We have lots of Chinese users. We have lots of logins from 
China, nothing strange about that. In our systems, 5-20 seconds means several 
pages of log data to scroll through. Still, I should have caught this much 

My colleagues at the other targetted sites confirmed my observations. Once they 
knew what to look for, they saw the same thing. My only consolation is that 
those very clever people also hadn't found this correlation.

So, to more precisely describe what we have found: a small subset of the ssh 
logins from Chinese IPs to two of our systems are preceded by one or two 
connections from unrelated Chinese IP addresses, in which opaque binary data is 
thrown at sshd. These addresses come from all over China, from all sorts of 

I see no particular pattern in which users are targetted, or what IP addresses 
they log in from and I see no pattern or structure in the opaque data, except 
for the following:

For a few weeks around May/June 2011, the probe payloads actually looked like 
SSL handshakes,
We see lots and lots of ssh brute-force attacks from Chinese IP addresses. 
However, these ssh login attempts do not appear to trigger any probes.

So, what is this all about? Well, I have a theory.

Conjecture: This is the Great Firewall of China, working to protect its 
citizens from unlawful foreign influences.

My hypothesis is that just over a year ago, a new function in the firewall went 
into limited beta test, where a sample of outgoing ssh connections from China 
is carefully selected for secondary screening.

I don't know what the selection criteria are, but apparently the probing only 
involves certain Chinese source networks (hence the lack of probes in 
connection to brute-force attacks) and certain target hosts.

For the selected ssh connections, the target system is probed from one or two 
IP addresses under the control of the Chinese government. These may be 
otherwise innocent addresses that are spoofed at the level of the great 
firewall, or they may be actual computers under remote control by the 
government - I have no way to tell.

I don't know what the probes are supposed to accomplish. My only guess is that 
the government is looking for certain services it doesn't approve of, like open 
proxies or Tor relays, and that precise fingerprinting may be too expensive. 
Instead, they resort to an inspection method similar to fuzzing, where 
pseudo-random data is thrown at the server, just to see what happens.

In some cases, the legitimate ssh connections are unsuccessful; they appear to 
be interrupted. This may be a result of the firewall deciding the target system 
to be unsuitable and injecting RST packets into the TCP stream to kill it.

The last few weeks, the frequency of the probing has increased. This might mean 
the beta test period is nearing its end, and that this function is about to 
become more widely deployed.

Conclusion: I do believe my theory above is more or less correct. I have no way 
to prove it, but it does fit all observed facts (including some which I am 
unable to disclose), and I have no tenable alternative explanations. It also 
matches the known repulsive censorship the Chinese government subjects its 
citizens to. I strongly dislike this probing of our systems that the Chinese 
government appears to be performing.

Of course, I may be mistaken. Any feedback or alternative explanations are 
welcome. You can reach me at ni...@nsc.liu.se.

- Leif Nixon, 2011-11-07

