Monday, May 28, 2012

Character Encoding Troubles

In the past, I encountered an issue with a legacy application that migrated from WebLogic Server running on a old Windows server to a newer Tomcat application server running on Red Hat Linux environment. This application has been running fine without issues for weeks until the developer suddenly started to complain that this application is not saving the registered ® and copyright © trademark symbols to the database. The developer also said that when he runs the application from his Windows laptop, he's able to save these two symbols into the database.

Earlier in my career, I was developer for a software company that built multilingual software specializing in Asian languages, I recognize this as a character encoding issue. So I asked the developer to send me the source code related to this issue and here's the relevant part of the code:

    // Sending updates to the database
    update.setValue("data",encodeString(text));
    // Inserting the data to the database
    insert.setValue("data",encodeString(text));

That seemed odd, what does the method encodeString do? Here's the implementation:

public String encodeString(String value) throws java.io.UnsupportedEncodingException
{
  log("encodeString", "Begin");
  if (value == null)
  {
    log("encodeString", "value == null");
    return value;
  }
  
  byte [] btValue = value.getBytes();
  String encodedValue = new String(btValue, _ISO88591);
  
  /*
  Charset utf8charset = Charset.forName("UTF-8");
  Charset iso88591charset = Charset.forName("ISO-8859-1");

  ByteBuffer inputBuffer = ByteBuffer.wrap(btValue);

  // decode UTF-8
  CharBuffer data = utf8charset.decode(inputBuffer);

  // encode ISO-8559-1
  ByteBuffer outputBuffer = iso88591charset.encode(data);
  byte[] outputData = outputBuffer.array();
  byte[] inputData = inputBuffer.array();
 
  log("ISO-8859-1: ", new String(outputData));
  log("UTF-8: ", new String(inputData));
  
  //String encodedValue = new String(btValue, _ISO88591);
  String encodedValue = new String(inputData);
  String encodedValue_ISO88591 = new String(inputData, _ISO88591);
  //encodedValue.getBytes("UTF-8");
  log("Encoded UTF: ", encodedValue);
  log("Encoded ISO88591: ", encodedValue_ISO88591);
  */
  
  return encodedValue;
}

Wow! I can see that the developer is trying to get a handle on this encoding business and hence all the commented out R&D code, but clearly this developer is not familiar with character set encoding issues.

The main problem is with the following line of code:

  byte [] btValue = value.getBytes();

From Java API manual, the getBytes() method encodes the string into a sequence of bytes using the platform's default charset. On the old legacy Windows Server that this application was originally running on, it was probably using Windows code page 1252, which is basically ISO-8859-1 and hence the register and copyright symbols were correctly encoded. However, on the Red Hat Linux operating system, the default encoding was ascii and therefore the register/copyright symbol got converted into question marks.

Java strings are internally unicode (UTF-16) and typically the JDBC drivers will provide the appropriate conversions to and from the database. Therefore, fix to this application is simple, all one have to do is merely change the following two lines of code and get rid of the entire encodeString() method:

    // Changed from : update.setValue("data",encodeString(text));
    update.setValue("data",text);
    // Changed from : insert.setValue("data",encodeString(text));
    insert.setValue("data",text);

Monday, May 14, 2012

Working with Tibco EMS Message Topics using F# and Clojure

In my previous blog post, I showed how to connect to a Tibco EMS Queue with F# and Clojure to represent integration interoperability from .NET and Java platforms. Message queues are a way to implement the Request-Reply pattern, which is one of the many enterprise integration patterns described in the book Enterprise Integration Patterns. Another basic enterprise integration pattern is the Publish-Subscribe pattern, which can be implement via JMS topics. This blog post shows how to connect to JMS topic from .NET and Java using Tibco EMS as the JMS provider.

Here is the F# version:

#r @"C:\tibco\ems\6.3\bin\TIBCO.EMS.dll"

open System
open TIBCO.EMS

let serverUrl = "tcp://localhost:7222"
let producer = "producer"
let consumer = "consumer"
let password = "testpwd"
let topicName = "testTopic"


let subscribeToTopic serverUrl userid password topicName messageProcessor =
    async {
        let connection = (userid,password)
                         |> (new TopicConnectionFactory(serverUrl)).CreateTopicConnection
        let session = connection.CreateTopicSession(false,Session.AUTO_ACKNOWLEDGE)
        let topic = session.CreateTopic(topicName)
        let subscriber =  session.CreateSubscriber(topic)
        connection.Start()
        printf "Subscriber connected!\n"
        while true do
            try
                subscriber.Receive() |> messageProcessor
            with _ ->  ()
        connection.Close()
    }

let publishTopicMessages serverUrl  userid password topicName messages =
    let connection = (userid,password)
                     |> (new TopicConnectionFactory(serverUrl)).CreateTopicConnection
    let session = connection.CreateTopicSession(false,Session.AUTO_ACKNOWLEDGE)
    let topic = session.CreateTopic(topicName)
    let publisher = session.CreatePublisher(topic)
    connection.Start()

    messages
    |> Seq.iter (fun item -> session.CreateTextMessage(Text=item)
                             |> publisher.Send)
                             
    connection.Close()

// Just dump message to console for now
let myMessageProcessor (msg:Message) =
    msg.ToString() |> printf "%s\n"

let consumeMessageAsync = subscribeToTopic "tcp://localhost:7222" "consumer" "testpwd"


let produceMessages topicName messages = publishTopicMessages "tcp://localhost:7222" "producer" "testpwd" topicName messages 


// Asynchronously start the topic subscriber
Async.Start(consumeMessageAsync "testTopic" myMessageProcessor)


// Publish messages to the Tibco EMS topic
[ "Aslund"; "Barrayar"; "Beta Colony"; "Cetaganda"; "Escobar"; "Komarr"; "Marilac"; "Pol"; "Sergyar"; "Vervain"]
|> produceMessages "testTopic"


printf "Done!"

One thing to point out is that Tibco, unfortunately, did not implement IDisposable for it's Connection objects; perhaps in it's bid to stay faithful to the Java API. That design choice seems unfortunate to me in the sense that I no longer can leverage C#'s using keyword or F#'s use keyword to automatically close connection. I suppose it is fairly trivial to subclass the QueueConnection and TopicConnection class and add the IDisposable interface, but I feel that Tibco should have done this and developed the Tibco .NET API using idioms that are .NET specific.

Putting my rants aside, here is the equivalent Clojure code to connect to Tibco Topics:

(import '(java.util Enumeration)
        '(com.tibco.tibjms TibjmsTopicConnectionFactory)
        '(javax.jms Message JMSException  Session
                    Topic TopicConnectionFactory
                    TopicConnection TopicSession
                    TopicSubscriber))
                  
(def serverUrl "tcp://localhost:7222")
(def producer "producer")
(def consumer "consumer")
(def password "testpwd")
(def topicName "testTopic")

;------------------------------------------------------------------------------
; Subscribe to Topic asynchronously
;------------------------------------------------------------------------------
(defn subscribe-topic [server-url user password topic-name process-message]
    (future
        (with-open [connection (-> (TibjmsTopicConnectionFactory. server-url)
                                   (.createTopicConnection user password))]
            (let [session (.createTopicSession connection false Session/AUTO_ACKNOWLEDGE)
                  topic (.createTopic session  topic-name)]
                (with-open [subscriber (.createSubscriber session topic)]
                    (.start connection)
                    (loop []                       
                        (process-message (.receive subscriber))
                        (recur)))))))

;------------------------------------------------------------------------------
; Publishing to a Topic
;------------------------------------------------------------------------------
(defn publish-to-topic [server-url user password topic-name messages]
    (with-open [connection (-> (TibjmsTopicConnectionFactory. server-url)
                               (.createTopicConnection user password))]
        (let [session (.createTopicSession connection false Session/AUTO_ACKNOWLEDGE)
              topic (.createTopic session  topic-name)
              publisher (.createPublisher session topic)]
            (.start connection)
            (doseq [item messages]
                (let [message (.createTextMessage session)]
                    (.setText message item)
                    (.publish publisher message))))))
                    
                      
; Create function aliases with connection information embedded                    
(defn produce-messages [topic-name messages]
    (publish-to-topic "tcp://localhost:7222" "producer" "testpwd" topic-name messages))

(defn consume-messages [topic-name message-processor]
    (subscribe-topic "tcp://localhost:7222" "consumer" "testpwd" topic-name message-processor))

; Just dump messages to console for now
(defn my-message-processor [message]
    (println (.toString message)))
    
; Start subscribing messages asynchronously
(consume-messages "testTopic" my-message-processor)                            
    
; Publish to topic
(def my-messages '("alpha" "beta" "gamma" "delta"
                   "epsilon" "zeta" "eta" "theta"
                   "iota" "kappa" "lambda" "mu" "nu"
                   "xi", "omicron" "pi" "rho"
                   "signma" "tau" "upsilon" "phi",
                   "chi" "psi" "omega"))                    

(produce-messages  "testTopic"  my-messages)    

When I fire up both scripts, the messages published to the topic would be received by both the .NET and Java clients. With these scripts, I can easily swap out the message generators or message processors as needed for any future testing scenarios.