Sarasi's Blog

Thursday, March 31, 2011

Client-Server Computing

The Client-Server computing means the client communicates with another program (the server) for purpose of exchanging information.

Clients are applications that run on computers. Servers are computers or processes that manage network resources.

Client's responsibility is to handle the user interface, translate the user's request into the desired protocol, send the request to the server, wait for the server's response, translate the response into "human-readable" results, and present the results to the user.

Server's functions include, listen for a client's query, process that query and return the results back to the client.

Networks connect clients and servers.

Typical client-server interaction goes like this :

User runs client software to create a query.

Client connects to the server.

Client sends the query to the server.

Server analyzes the query.

Server computes the results of the query.

Server sends the results to the client.

Client presents the results to the user.

2-Tier Model

2-Tier architecture is used to describe client-server systems where the client requests resources and the server responds directly to the request, using its own resources. This means that the server does not call on another application.

3-Tier Model

In 3-tier architecture, there is an intermediary level, meaning the architecture is generally split up between :

A client : which requests the resources, equipped with a user interface for presentation purpose.

The application server (middleware) : whose task is to provide the requested resources by calling on another server.

The data server : which provides the application server with the data it requires.

3-Tier architecture generally contains Presentation Layer (a client), Business Access Layer (middleware) and Data Access Layer (data server).

Wednesday, February 2, 2011

Converting Printed Sinhala Documents to Formatted Editable Text

This is my final year project at Department of Computer Science, Faculty of Engineering, University of Peradeniya. Dr.Roshan Ragel, Shahina Ajward, Nalani Jayasundara and including myself are in this research project.We won Best Student Papper Award for ICT and Social Transformation by ICIAfS 2010.

INTRODUCTION

There are situations when we only have a printed copy of a document and need to do further modifications or need to merge content of two documents. The worst case is that even for adding small text, we have to apply all the font features and re-adjust the whole document again. We also get instances when we need to digitize books and material to editable text so that our search engines and tools can be used
on them. The typical process of digitizing text document is performed by scanning the printed copies to images and converting them to editable text. Currently optical character recognition (OCR) plays a vital role in converting scanned images of books, magazines, and newspapers into machine-readable text. It avoids the need for retyping already printed material for editing.

Most of the existing OCR solutions are commercial and they provide the editable text documents which facilitate international languages such as English. In Sri Lanka both Sinhala and Tamil languages are widely used in print and there are a few attempts to develop system for Sinhala language. In this project, we have identified the OCR algorithm to be used for Sinhala and developed an application for digitizing Sinhala characters. Any OCR implementation consists of a number of pre-processing steps and a classification method to recognize characters. In this study the approach use to recognize character is language independent and therefore we believe that our system can be extended for Tamil as well.

In addition to the digitization of Sinhala characters, we have developed a method of preserving a number of selected formatting features of a printed document (such as the font size of characters). We believe that this as a useful addition to Sinhala text digitization.

The project could be divided into two phases, character recognition using an OCR technique and extracting and preserving the layout (formatting)information of the document. An editable Sinhala document that preserve formatting could be achieved by integrating the outcome of phases one and two.

In phase one an optical character recognition method is used for identifying characters. This phase comprises the steps of identifying connected components in an image, selecting portion of the image corresponding to the connected components and extracting the features of connected components. A neural network is used to train the system which enables the system to identify characters that are not pre determined.

In phase two projection profiles are used to extract selected features of characters. Extracting and preserving layout or font features of a document will tremendously reduce the burden of the user during the process of editing and reproducing the same document with modifications.

The recognized characters (the outcome of phase one) are embedded with identified features (the outcome of phase two) to reconstruct the original document in Rich Text Format (RTF) format in an editor.

CHARACTER RECOGNITION

Since the system is mainly focused on the character recognition, major analysis was targeted on Optical Character Recognition (OCR) technology. Having the knowledge of OCR it was concluded that the problem analysis was consisted of two areas as image pre-processing and training the system.

Pre-processing
Since the soft copy of the scanned document is in an image file format, pre-processing is done to enhance the quality of the image. After identifying and analyzing several processing steps, it was concluded that required processed image could be obtained. We assumed that an image is from a high quality paper so that it does not need noise removal and documents are scanned without introducing skew.

Training the System
The neural network approach is used to train the system in order to recognize characters. Among various types of neural networks, our focus went with Feed Forward Back-propagation Neural Network. Using back-propagation neural network errors can be propagated backward through the network to control weight adjustment and by the feed-forward information moves in only one direction. So the result could be obtained efficiently and with higher accuracy.

Neural Network was trained for characters obtained from the pre-processed image. It maps set of inputs to a set of target value (outputs). By referring the target values, can recognize characters.

EXTRACTING FORMATTING INFORMATION

Projection profile of text line is an approach for font attribute recognition based on features. Different features are used for font discrimination and they can be derived from visual observation of different fonts and their projection profiles. The selected features are extracted from horizontal or vertical profiles.

RECONSTRUCTING THE DOCUMENT

Terp-Word is an open source word processor and supports ‘.html’ file format apart from a number of other formats such as RTF. As explained earlier the system is developed as two modules, character recognition and layout preserving. The scanned image went through two separate processes and each process generated a text file each as outputs. The text file with recognized characters and the encoded file with extracted features are used to generate an html file mapped to the original scanned document. In this html file, the encoded features are decoded and applied to the corresponding recognized characters. The html file can be loaded to the editor and can be converted to RTF file which facilitate any advance modifications. The resulted RTF file preserves selected font features over the original document.

CONCLUSION AND FUTURE WORK

The main idea in our system was to build a tool which supports editing facility for a scanned image which is in Sinhala language. Being familiar with current technologies which are used in international character recognition, our objective is mainly focused on character recognition of local languages.

The first phase implementation, results only character recognition without preserving original format of a scanned image file. The tool was further developed by adding functionality of second phase implementation which consists with original layout of the document.

Our Objective was mainly focused on character recognition of Sinhala language. Currently, our tool has been tested only for Sinhala language. But, it may support for Tamil language also since our implementation is language independent. Due to the shape of Sinhala characters, there are some limitations of properties of the characters.

The final outcome of the project is a rich software tool which allows the users to get an editable text file from a scanned image by preserving its original formats.

Due to the limited time, we had to restrict ourselves for few selected features. Following suggestions can be made to further improvements of the system. The intensity values of the original document could be used to recognition colors. By encoding the font attributes word wise we would be enabling to apply formats word-wise instead of line-wise as we have done now. Though we have managed to avoid merging of characters in general, due to rounded shape of Sinhala characters still there are few characters suffering from this issue. Further the system can be trained for Tamil character samples so that it can support Tamil language as well.

Sunday, August 15, 2010

Victoria Dam

Beauty of Peradeniya University

Thursday, October 15, 2009

How to Setup WSO2 Mashup on JBoss

The WSO2 Mashup Server is a powerful yet simple and quick way to tailor Web-based information to the personal needs of individuals and organizations.
WSO2 Mashup can be deployed on most of the application servers.This post describes the steps to deploy WSO2 Mashup on JBOSS.

Step1
Download WSO2 Mashup Server here and unzip the package.

Step2
Download jboss-5.0.0.GA.zip and unzip the package.

Step3
Copy conf, database, repository and resources directories in to a new folder.
Ex:/home/sarasi/ms/msRepo

Step4

I refer to my jboss installation directory as JBOSS_HOME.
Go to JBOSS_HOME/server/default/deploy directory.
And create a new folder,ms.war inside that directory.
copy wso2mashup-2.0.0/webapps/ROOT/WEB-INF to JBOSS_HOME/server/default/deploy/ms.war.

Steps5

Now, you need to enable https in JBoss.
Open server.xml in JBOSS_HOME/server/default/deploy/jbossweb.sar directory.
Edit it by adding the following entry.

port="8443" address="${jboss.bind.address}"
scheme="https" secure="true" clientAuth="false"
keystoreFile="/home/sarasi/ms/msRepo/resources/security/wso2carbon.jks"
keystorePass="wso2carbon" sslProtocol = "TLS" />

give the exact location of wso2carbon.jks

Step6
You will update carbon.xml, axis2.xml, registry.xml and user-mgt.xml which can be found at msRepo/conf directory.

open carbon.xml and update the ServerURL element as follows.

https://localhost:8443/ms/services/

Open registry.xml and user-mgt.xml and update DB URL as follows.

jdbc:derby:/home/sarasi/ms/msRepo/database/WSO2CARBON_DB;create=true

We must change the http and https ports in In Transports section of axis2.xml as follows.

class="org.wso2.carbon.core.transports.http.HttpTransportListener">
8080
class="org.wso2.carbon.core.transports.http.HttpsTransportListener">
8443

Step7

open a new command window/shell and change the directory to JBOSS_HOME/bin.
Set the environment variable named “CARBON_HOME” to point to the folder named “msRepo”.

In windows; set CARBON_HOME=C:\ms\msRepo

In linux; export CARBON_HOME=/home/sarasi/ms/msRepo

Start JBoss from the same command window/shell.
Now, You can access the management console using https://localhost:8443/ms/carbon.

Wednesday, October 14, 2009

How to deploy WSO2 Mashup on Apache Tomcat

The WSO2 Mashup Server provides a platform for rapidly deploying Web service Mashups. Combining simple yet rich mashups with reusability, security, reliability and governance, the WOS2 Mashup Server offers enterprise-class service composition.

WSO2 Mashup can be deployed on most of the application servers.I will be describing how one can easily deploy WSO2 Mashup on Apache Tomcat.

Step1
Download wso2mashup-2.0.0.zip and unzip the package.Copy conf, database, repository and resources directories in to a new folder. (i.e:- /home/sarasi/mashup/mahupRepo).

Step2
Download Apache Tomcat (apache-tomcat-6.0.14.zip) and unzip the packege.

Step3
Set the environment variable named “CARBON_HOME” to point to the folder named “mashupRepo”.
Ex: your mashupRepo folder in “/home/sarasi/mashup/mahupRepo”
If you are using Linux,
export CARBON_HOME =/home/sarasi/mashup/mahupRepo
If you are using Windows,
set CARBON_HOME = D:\mashup\mahupRepo

Step4
Now, Mashup needs to be deployed in Apache Tomcat. To do so, follow the instructions below:

Copy “WEB-INF” in unzipped Mashup package/webapps/ROOT.
Create a folder in Tomcat “webapps” folder and paste the WEB-INF folder there.(Ex:/home/sarasi/apache-tomcat-6.0.14/webapps/ms/WEB-INF)

Step5
Now, you need to enable https in your Tomcat installation.(By default HTTPS protocol is disable in Apache Tomcat.)

Open server.xml in Apache Tomcat conf folder. Add the following script tag need to Apache Tomcat's server.xml file.

port="8443" minSpareThreads="5" maxSpareThreads="75"
enableLookups="true" disableUploadTimeout="true"
acceptCount="100" maxThreads="200"
scheme="https" secure="true" SSLEnabled="true"
keystoreFile="CARBON_HOME/resources/security/wso2carbon.jks" keystorePass="wso2carbon"
clientAuth="false" sslProtocol="TLS"/>

HTTPS 8443 port is now open for Apache Tomcat.

Step6
Next, you should change Mashup “Server URL” in the carbon.xml file. You need to add the name of the folder you created in the Tomcat web apps folder as the server url. Following are the steps to change carbon.xml:

Open carbon.xml file in your CARBON_HOME/conf folder.
Change the HTTPS port number from 9443 to 8443. (Tomcat HTTPS enables in port 8443)
If the name of the folder in WSAS webapps is “ms”,Your Final server url should be:
```
https://localhost:8443/ms/services/
```

Step7
You need to now do a small modification in Axis2.xml as well. In Tomcat, HTTP transport is enabled in port 8080, but for Mashup HTTP it is enabled in port 9763. Because of this you have to re-map the HTTP and HTTPS port umbers in Mashup Axis2.xml. Change the HTTP port number from 9763 to 8080 and HTTPS from 9443 to 8443. This should be as follows:

class="org.wso2.carbon.core.transports.http.HttpTransportListener"> 8080 class="org.wso2.carbon.core.transports.http.HttpsTransportListener"> 8443

Step8
Open registry.xml and user-mgt.xml.Update DB URL as follows.

jdbc:derby:/home/sarasi/mashup/mahupRepo/database/WSO2CARBON_DB

Step9
Apache Tomcat configurations are all completed now. Start the Tomcat server. Here are the steps:

Go to Tomcat's bin folder on command prompt.Start tomcat as,(Windows = apache-tomcat-6.0.14\bin>catalina.bat run , Linux = apache-tomcat-6.0.14\bin>catalina.sh run)

Step10
Upon starting up Apache Tomcat, use the following URL to access Mashup:
https://localhost:8443/ms/carbon