Java Hosting » Servlet Hosting » JSP tutorial » Servlet internationalization

Servlet internationalization

A great deal of web servers also support transparent solutions, where a single URL can be used to view the same content in multiply languages, with the language chosen depending of the client's choice or location.

For example, the united nations website can be read in English, German, French, Slovenian, etc. Which language you see is depending on how you've configured your browser. You may get impression that dynamic translation is occuring, but the thing is that the server is just choosing the multiply static content available in several different languages.

The JDK 1.0.2 and JSDK 1.0 has introduced Streams for byte-based data. Servlets have a ServletInputStream for reading the request body and a ServletOutputStream for writing the response body.

The upper byte is discarded when a 16-bit Unicode character is written to an OutputStream. When a byte is read from an InputStream it is converted to a Unicode character by adding a 0 upper byte. Therefore only the ISO-8859-1 character encoding can be used with Streams.

The JDK 1.1 has recentely added support for other character encodings with Readers and Writers which use char to byte and byte to char converter classes that you can select by specifying the desired encoding.

Servlet API's are still used for representation of the byte-binary data in versions 2.0+. Text is read from a Reader which is returned by ServletRequest.getReader() and written to a PrintWriter which is returned by ServletResponse.getWriter().

The getReader method returns a Reader which uses the correct character encoding as specified in the request message's Content-Type header. The getWriter() method uses the character encoding which was specified in the response message before requesting the PrintWriter.

If you don't specify the enconding, the servlet engine can do it autmoatically.

Even if the character encoding can be automatically determined it is highly advisible to specify it explicitly to avoid unnecessary caching of the response body writen in the wrong character encoding.

For example the following code is sent from the greek alphabet (which cannot be displayed as ISO 8859-1) in a text/html body using the UTF-8 character encoding:

rsr.setContentType("text/html;charset=UTF-8"); PrintWriter out = rsr.getWriter(); for(char c='\u0391'; c<='\u03A9'; c++) out.print(c);
Contact sales!