Matt Williams wrote:
On Aug 12, 2009, at 5:57, Leam Hall <[email protected]> wrote:

What does Java provide that PHP can't do faster and with lighter resource usage?

Concurrency and threading to name a couple...
I've got a system that's gotten complicated enough that it's "outgrowing" PHP. One big advantage in PHP is that you can get more productivity out of rookie programmers. It takes a good programmer 1 1/2 years to be able to produce usable Java, and some programmers never get good at it. The ideas in this system are complicated enough that I think I'd have a hard time hiring another programmer who could handle it, so the simplicity advantage of PHP is gone. I'm starting to want static types so that the compiler is watching my back and so that my IDE can do automated refactoring.

I'm thinking of gradually moving to the JVM but using Scala instead of Java. After 2 years of working in C#, Java really seems like C#--. I mean, even PHP has closures today. Type inference, generics and other features in C# make Java seem like it's going backwards. On the other hand, if I'm doing my own sysadmin or paying somebody to sysadmin my systems, I don't want to be stuck in Windows. I know a lot of people think the type system of Scala is over-complicated, but after 2 years of lover's quarrels with the C# type system, Scala provided the general theory that informs my practice in C#.

I'm interested in logic programming and other inference systems, as well as specialized databases: there's a lot of that written in Java. Java's never quite going to have the efficiency of C, but it's better for systems work than PHP. If I feel the need for scripting there's always Groovy, Jython, etc.

My big beef with the JVM (and the CLR) is the UTF-16 scandal; perhaps I'm a cultural imperialist, but I process lots of text (billions and billions of characters) that is mainly:

(i) us-ascii,
(ii) iso-latin-1,  and
(iii) Unicode that is mainly us-ascii with occasional spattering of iso-latin-1 and other unicode characters

For me, UTF-8 encodes text at about (1+epsilon) bytes per character; the JVM and CLR encode text at (2+epsilon) bytes per characters. A few years ago, when I was stuck on 32-bit machines, that was often the difference between a program that could run in RAM and a program that couldn't. Since text processing is limited by memory bandwidth, it often means large text-processing programs run about twice as slow on the JVM as they do in UTF-8 based environments.

What makes it a scandal is that UTF-16 pretends to be a fixed-width encoding when it really isn't. Code that works correctly with, say, English or Japanese will break when you're processing Chinese or mathematical characters. Code written with the fast random access that Java provides doesn't generalize to all languages, so you need to fall back to the same sequential access methods that you use handling UTF-8 in PHP.

A big advantage of PHP for unicode handling is that it "does no harm;" I've often seen Java and CLR systems fail seriously because of limitations in how they handle Unicode characters, particularly when dealing with junky input data.
_______________________________________________
New York PHP User Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

http://www.nyphp.org/show_participation.php

Reply via email to