This content was published by Andrew Tomazos and written by several hundred members of the former Internet Knowledge Base project.

Sun and Java 5 Language Proper

Welcome to the sixth edition of the IKB Newsletter.

We want to talk about a company called Sun Microsystems, Inc., and one of their products called Java.

Sun is an American corporation listed on the NASDAQ stock market.

Most of the big money in the computing industry (at least in software) is on the NASDAQ -– and most computing corporation headquarters are located somewhere on the west coast of the USA, many near San Francisco. The bay area and nearby Santa Clara valley is sometimes called "Silicon Valley" because of this.

++ COMPUTING CORPORATIONS ++

If you assume that a share in a company is worth the price people currently buy and sell it at -- and then multiply that price by the number of shares for that company that exist -- you arrive at the total value of the company.

This number is called a company's market capitalization. It gives you an idea of how much power and size a company has, because there are ways for the company to leverage its capitalization to invest in things (like hiring armies of people to do work for them).

Sometimes a company has a high market capitalization because people expect that it will make a big profit in the future, other times because it is doing well right now.

You can tell the difference by simply dividing the market capitalization by the profit the company is making. This is called the P/E ratio (price to earnings). If this number is high it means that people expect the company will make a bigger profit in the future than they are now. But just because most people expect something to happen, doesn't mean that it will. There is a lot of money in proving people wrong in this respect.

For example, Microsoft’s market cap is about $300 billion US dollars. Their P/E is quite low at 25. This is one of the biggest computing companies. The low P/E indicates that people expect it to get smaller.

Apple’s market cap is about $40 billion. Their P/E is about average at 40. This is a medium-sized corporation, and people expect it to maintain its size.

Google’s market cap is $80 billion. Their P/E is high at about 90. This is a big corporation by virtue of the fact that people expect it to have a big future.

Some others quickly:
Yahoo: $50 billion and 30;
IBM: $130 billion and 15;
Cisco: $120 billion and 20.

And finally Sun Microsystems, Inc.: $15 billion and 80.


++ SUN MICROSYSTEMS, INC. ++

Sun is a medium sized corporation with a relatively long history. They’re very network-focused, and do a lot of things. They are strong in enterprise computing and do end-to-end hardware and software, however they are most well-known for being the birthplace and still copyright holder of Java.

The brand-name "Java" became wildly popular, and now Sun has started naming everything Java to cash in on it. So as not to confuse things, we are here to talk about Java the computer programming language.

++ JAVA ++

Java is first and foremost a computing programming language. The language is tightly integrated with a very large set of standard libraries. Libraries are reusable source code (sometimes already compiled) that a programmer can reference when writing a new program, rather than having to write everything from scratch.

When Java source code files are compiled from human-readable form into binary form for the computer -– the binary form is not run directly -– it is run by a Java interpreter program.

There are Java interpreters for all the major operating systems, and chances are one has quietly made it on to your system already.

Even though a Java program runs a bit slower because it is not directly executed by your computer, the cool thing is that any compiled Java program will run as-is on any operating system.

This is a huge win for software developers because it means that they don’t have to prepare different versions for different operating systems. Java's tagline is "Write once, run anywhere."

It also allows a layer of security to be enabled. This means that by default a Java program you download over the internet can be stopped from accessing your local hard drive, or from using the network. This helps protect your system from spyware. Java is by far the most powerful and flexible platform that has this layer of security.

Sun is constantly developing Java, and last year a major new version was released.

[Begin rant.] The version naming scheme (or lack there of) that Sun is using is a bit of a corporate comedy act. Originally there was “Java 1.0” and then “Java 1.1”, then came “Java 2.0 or 1.2”, then it forked into three flavors called J2SE, J2EE and J2ME. They went “1.3 or 2.0” and “1.4 or 2.0” and then last year there was “J2SE 1.5 or 5.0 or 2.0” and now instead of “J2EE 1.5 or 5.0 or 2.0” we are going to have Java EE 5, and presumably next year Java SE 6 and Java ME 6. Next they are going to realize that the E is redundant and we are going to have JavaS7, JavaE7 and JavaM7. I hate to think what the marketing department is going to do when they hit 10. [End rant.]

Anyway, for a bit of fun we are going to go through some of the changes that were made to the Java Language Proper (as opposed to the libraries or the interpreter) from version 4 to version 5. A lot of people formed an opinion about the Java Language Proper based on the older versions, and there were some major changes that were interesting.

I also think it would be fun for the non-programmers to take a look inside the evolution of programming language design.

++ THE PROBLEM ++

The chief complaint about Java was its verbosity. Java is very closely related to and largely based on an older programming language called C++. The Java language designers fixed a lot of the problems with C++. Java is also a lot easier for a computer to understand (to parse). Along with these pluses, they introduced one major new flaw.

Almost the entire problem with the early language proper was that it simply took too many characters to say one thing. Sun calls this drudgery or writing boilerplate code. There are fine balances in the details of language design. Lets go through some of the specific changes to get a feel for that.

++ STATIC IMPORT ++

In Java, nearly all symbols are bundled up into “object classes.” Like a number of languages, this allows you to put a set of related things under a unified name. For example, you could put all of your math-related stuff under the name Math. One of those things might be a function for getting the cosine of a number, which you name cos. Then to refer to it you just say, for example Math.cos, which refers to the cos thing under Math.

The problem was that if you then want to get "the cosine of x cubed times 5 rounded to the lowest integer," you had to type “Math.floor(5*Math.cos(Math.pow(x,3)))” which is a little silly, and a mouthful. In most other modern languages you could just type “floor(5*cos(x**3)))” which is a lot saner.

In Java 5 this is fixed. You simply type “import static Whatever.*” at the top of your program and all the symbols from Whatever will be available without putting “Whatever.” in front.

++ AUTOBOXING ++

Quick definition: The term "bit" is short for “binary digit,” also represented as either a 1 or a 0. From now on when we say bit, we mean a binary digit. Binary data is a sequence of bits, or a sequence of ones and zeroes.

In a program you have small clumps of bits which are used to represent things.

The smallest chunk of bits you deal with directly are usually in lots of between about 8 and 64. Within a 64-bit number we can represent with fairly good accuracy numbers so large they could count every atom in the universe (10 to the power of 80 or so).

As it turns out there are real applications for larger and more accurate numbers, but most of us will never need them. (Interestingly, they are a standard part of the Java library under the names BigDecimal and BigInteger)

Apart from a number, the other exciting things these small “primitive type” bit spaces can hold are: (1) a single character symbol from almost any known human language, (2) the data store address of another piece of data, or (3) a raw instruction for your computer's processor.

In Java there are eight different types of these chunks you can use. We give each type a name. The type “int” represents 32 bits interpreted together as a number. The type "char” represents a 16 bit encoded character symbol. The type “byte” represents an 8-bit number. etc.

For example, when I want to work with a number, I can say “int x = 3”. This gives me a symbol “x” representing a 32-bit chunk with the number 3 stored on it.

Out of the eight primitive types we need to build more complex types by combining them. For example, I might want to represent a point on a plane (x,y).

One way I can do that is by chopping a 32-bit number in half and using the left part for the x and the right part for the y. It would be really fast to for the computer to work with, and most likely what we would do in assembler, but it is messy to work, and the approach doesn't scale well.

Another way I could do it is by saying “int x = 3” and “int y = 5” to store (3,5), but I want a way to combine them together under one handle.

I could create an “object class” called Point, and then put “int x = 3” and “int y = 5” in there. That is closer to how it's done.

Actually, what we do is put “int x” and “int y” into an “object class” Point. Then when we want a point we say “Point p”, and that creates a symbol p I can toss around. The symbol p has its own x and y inside it.

If we want a second Point we can just say "Point q" and now we have two without having to declare "int x" and "int y" again for "q".

When we want to get at the individual xs and ys again we just say “p.x”, “p.y”, "q.x" or "q.y". The cool part is that they are combined into one symbol for the rest of the time when we don't need to access the individual parts.

By doing this I have created a new “type” of symbol. There are a variety of ways of making new types that are based on other types. You can also put chunks of code into the object class that can make changes to x and y. For example, I could make a function called “rotate” in Point that rotated the point around the origin (0,0). To call it on the point p I just say “p.rotate.”

These ideas spawned what is called the “object-oriented programming paradigm.” There is a huge body of knowledge on the subject. Java, like C++, is an object-oriented programming language. Many of the features of the language center around building objects from other objects to make more complex objects.

There are also several ways to put up "firewalls" between different objects so that they cannot break each other. This supposedly reduces complexity by putting things into small modular pieces with well-defined connections between them.

Anyway, all object classes in Java are based on one type called “Object.” Objects can do basic things like copy themselves, and determine if one is equal to another one.

The eight primitive types (int, char etc) are not object classes. Each of the primitive types has a corresponding object class built for it based on Object. They are called things like “Integer” for the “int” and “Character” for the “char.”

Why have both kinds?

The problem was that a lot of the Java language likes dealing with object classes and not primitive types. On the other hand all this object-fuss is slow, because the computer deals with bits directly in its natural state, not these “objects.” The computer can work with non-object primitive types a lot faster.

So the Java language designers made both available.

This duality lead to annoying conversions between the primitive types and their corresponding object classes. Typing “new Integer(x)” or “x.intValue()” when you should just be typing “x”.

In Java 5 these conversions are made automatically. It's called "autoboxing" because the object is like a box we wrap around the primitive type. Either automatically boxing up a primitive into an object type -- or the opposite -- automatically unboxing one of these objects back into its primitive type.

++ FOREACH ++

Some objects represent a group of objects. Sometimes you want to do something with each member of a group individually. Before you had to manually create an "Iterator" object that helped you do this.

For example if you had a “GroupOfPoints gp” you had to say something like “for (Iterator iter = gp.iterator(); iter.hasNext(); ) { Point p = (Point) iter.next(); ...” which is quite a mouthful.

Now you just say “for (Point p : gp) { ...” which means “for each Point p in gp do... something”.

Much more sensible, less typing and nicer to look at.

++ GENERICS ++

This is going to be tricky. The Generics feature allows you to do what is called parametric polymorphism. C++ has been able to do this since forever. Originally Sun left it out of Java, either out of laziness, or because they genuinely thought it was unnecessarily complex. Turns out they backflipped and have made it part of Java 5.

Parametric polymorphism means literally that you can change something based on a parameter. It’s a silly name. The feature should be called “type arguments.”

What it allows you to do is use an “object type” as a parameter to another “object type.” In doing so a new special-purpose type is created.

Example: rather than the GroupOfPoints example above, we define a Group in general and then just pass the symbol “Point” as the type it holds.

In the definition of Group you can just use some symbol “T” in all the places where you would have used “Point” in “GroupOfPoints.”

That way, when you need a GroupOfFrogs you can just say “Group (of Frogs)” and not have to rewrite Group to handle Frogs.

As it turns out the way you write it is just “Group” or “Group.” You can use such a symbol anywhere you can use a normal symbol like “GroupOfPoints.”

Prior to Java 5 the way to do this was to make a GroupOfObjects class that dealt with Objects. Because everything is based on an Object, it works.

The problem is that you had to change stuff back to its original type when you took it out of the Group which meant more typing –- and it also meant you could accidentally put stuff into the Group of the wrong type.

++ OTHER NEW FEATURES ++

The other new features are:

(a) “varargs” which allows you to write a function that takes a variable number of arguments (also reintroduced from C++ (and C).)

(b) good enumerated types, which allows you to write in a simple and safe way types with only a few named values (a “Season” for example, can hold a value of “Winter,” “Summer,” “Spring,” or “Fall” and nothing else). Before you had to do this with int symbols and map each to a number manually.

(c) “annotations” which are too complicated to explain in a way everyone would understand. They are a convenience that let you add meta-data to source code in the file itself. Normally you would have to keep them in a second file.

++ SUMMARY ++

There are many different ways in which you can control a computer. Beyond the tedious use of assembly, there are a number of common industrial languages like C, C++, Java, C#, Visual Basic, Perl, Python and even Ruby that allow fine-grained authoring of a range of programs.

A developer has to choose between many different tools beyond simply the details of the programming language. Indeed a lot of programmers find the features of the languages themselves not as relevant as the rest of these choices.

Some find the “programmer's assistant programs” more interesting (like the IDE), others the sturdiness and speed of the libraries they will reuse. Yet others still, find the style of programming more interesting (the programming paradigm).

In any case, Java is a popular component of many developers’ work-flow, and it is worth knowing about. This is neither a recommendation nor a criticism.

More generally interesting is language design as a field of study. It is interesting to see the factors involved in getting a human and a computer able to “talk to” each other.

++ DISCLAIMER ++

The IKB is in no way associated with Sun Microsystems Inc. Some employees of Sun have chosen to sign up to the IKB. This article is written without Sun's (or the IKB members' that are also Sun employees) approval, endorsement or consent. The IKB is an independent project and by policy does not accept advertisements of any kind.

Comment from David Salzman: Sun's P/E may be high because of its leadership in software, but its revenues come primarily from hardware. This presents a conundrum: Becoming more of a software company and less of a systems (including hardware) company would make it *less* valuable.

Comment from Aaron Daves: Does the same logic apply to Apple?

[Morley Chalmers: these IKE articles are becoming steadily more and more interesting (and simultaneously better and better written). I found this one particularly apt since I'm currently working within Servoy, a SQL front end written in Java, within which I work with JavaScript. While JavaScript bares no real relationship with Java other than a confusingly similar name and object-orientedness, I found the above very helpful in grasping the concepts of languages and what's going on within the environment I'm working within. Much appreciation.]

Back to Index