Providing Information on the Internet

by Satoshi Yasuda


1. Introduction

The participants in the Asian Historical Statistics Project aim not just to compile information but to make it available to researchers worldwide. The Hitachi H9000 VR100 (HP9000-809) workstation which we installed in February of this year will play an important role in the project. The workstation includes a 100-gigabyte hard disk, which will enable us to store, in addition to the Statistics Project materials and data, articles and other contributions from researchers worldwide.

2. Sending Data

Until recently, the standard means of inputting data into a computer was to use a database system. A user entered a keyword for the information sought, and the information in turn appeared on the computer screen. For example, a Statistics Project researcher searching for information on a particular country would enter the country name and specify the organization and time period, and the appropriate time-series data would appear on the screen. One example would be data from the Long-Term Economic Statistics of Japan (also conducted by the Institute for Economic Research, and the forerunner of the Statistics Project) which is provided by Hitotsubashi University's Information and Documentation Centre for Japanese Economic Statistics.

Researchers could call economic data onto their screens, but only in rare cases could the information actually be analyzed. Usually, the referenced information had to be entered into an analysis system, and the researcher had to go through time-consuming trial-and-error experimentation and input additional data in order to conduct analysis. Until quite recently, users had to input the data into an analysis system, so there was no alternative to using a server system. Only a few analysis systems allowed users to retrieve the desired information directly instead of reinputting results of the reference search.

However, new computers do much more than handle calculations. Expanded internal memory and the addition of external memory mean that they can hold all the data necessary for the trial-and-error experimentation which accompanies analysis. A researcher can conduct all the necessary calculations with a personal computer if he or she has an analysis system with an appropriate database function. In addition, improvements to data server systems mean that it is no longer necessary to provide additional data as the research proceeds. Instead, a researcher can first send all the data he or she wants to use to his own PC, then use an analysis system with a database function to conduct analysis. Researcher is expedited since only the information necessary to conduct data analysis is transferred between the data server and PCs.

The Statistics Project will create files holding a variety of time-series data for particular countries and store them in our server system. This will enable researchers to copy data directly to their personal computers from the server system, and to conduct analysis without further assistance from the server system. In addition, researchers conducting comparative analysis will be able to download data for a number of countries in the same manner. At present, the best means of sending files is by using workstations together with the Internet.

3. Providing Information on the Internet

The Internet has developed rapidly in just a few years. Means of sending information (including numerical data) have evolved through the following three stages:

1. anonymous FTP

2. Gopher

3. WWW (World Wide Web)

In short, thanks to the Internet, one has only to send a file and then, depending on the nature of the file, one can use MPEG to see video images, use an audio program to reproduce sounds, or call up information either manually or automatically. What differentiates these systems from previous database systems is that it is no longer only numerical data which can be sent; now video and audio "information" can be sent in the form of digitized files.

The Statistics Project system will send, in addition to numerical estimates, handwritten materials or numerical values, difficult though they are to input into computers with machine readable characters. It will be possible to load into memory and view images of all materials which might prove useful to making calculations of any nature. Below we examine the main characteristics of the three modes of Internet transmission.

(1) Anonymous FTP

As the name suggests, anonymous FTP (File Transfer Protocol) enable susers to log into a server (workstation) without a specific identification (ID) and to transfer data to their own workstations or personal computers. In short, this is a file transfer system for which the user does not need an ID. Instead, he or she uses a special "anonymous" or "guest" ID. If the user is able to connect to the desired workstation, no password is necessary (generally, the user is asked to enter his or her e-mail address). It is possible to freely access the programs and data in locations intended to be open to the public. If the desired data is in the accessed workstation, it can be transferred to the user's workstation or personal computer.

In order to use anonymous FTP, the terminal must be equipped with either an FTP program or a program which supports FTP functions. A drawback of the FTP method is that the user must know the name of the workstation with the information and the name of the appropriate file.

(2) Gopher

With FTP, there is no way for a user to find out the names of files he or she does not already know. For this reason, the server can make an index file which makes it possible for the contents of the location to be viewed. Users can search Gopherspace by keywords to find sites, which is not possible in FTP, then call up a site's index file, view the contents, and search for the data file they wish to retrieve. This process, which is menu-driven, is called Gopher.

To use Gopher, a terminal must be equipped with either a Gopher program or a program which supports Gopher functions. Some of these programs are equipped with various video or audio multimedia functions, but most of them just retrieve the information, leaving individual users to make use of other functions as they wish.

(3) World Wide Web (WWW)

A new system overcomes the deficiencies of the above programs and allows users to view any kind of information at will. This is the WWW viewing system (generally called a browser), which combines the functions of anonymous FTP and Gopher to enable users to freely retrieve information from the Internet. Among the best-known WWW products are Mosaic, provided by NCSA, Netscape, by the firm of the same name, and Microsoft's Internet Explorer. These products can be used with Windows 3.1, Windows 95, or Macintosh machines as well as workstations.

The World Wide Web presently serves as the standard for transmitting information on the Internet. Moreover, the standard is being advanced by new functions which continue to be added to existing capabilities.

The Asian Historical Statistics Project intends to make full use of the new technology by making the voluminous information presently being compiled by our researchers available on the Web. Below is an example of the World Wide Web system now being tested by Project researchers. The URL, or address, is

http://www.ier.hit-u.ac.jp/COE/index.html

Using browsers such as Netscape to access this address will enable users to view the image seen in Figure 1. By moving the scroll bar on the right side, users can see the underlined topics. Clicking the mouse on a topic brings its contents into view or, in some cases, calls up further topics. Figure 2 shows the first issue of the English version of the newsletter as it appears online.

We also plan to expand our use of the Web to make discussion papers and other materials available.

4. Providing Materials for Analysis

In the last few years, the capabilities and memory capacity of personal computers has come to exceed those of the ordinary large-size computers of the 1980s. As a result, the database systems described in Section 2 can now be operated by clients working at their own terminals. In short, personal computers become client-server-type database systems. The data server does not process or analyze the data requested but sends data in its original form to the client, who processes and analyzes the information on his own machine.

Given the major changes in the research environment, the Statistics Project requires the computer capacity to handle voluminous data inputting and compilation, and needs a server system capable of processing and analyzing the data. To meet these demands, we have recreated our working environment by linking the mainframe computer of the Institute for Economic Research, the recently installed workstation of the Statistic Project, and the Project's personal computers into a physical circuit (for Internet access), thus helping to ensure an organic flow of information and effective data analysis. The diagram below depicts the nature of the connection between the mainframe, workstation, and PCs.

Figure 3 The Flow of Information Between the Statistics Project Computers

Large volumes of data can be processed by first using the mainframe computer, then passing the results on to PCs which can use analysis or graphics programs for verification. In addition, some of the data can be stored in the workstation for further analysis or sent to other researchers in Japan and abroad.

With this type of network, needless to say, it is necessary for the different computers to work closely together to perform efficiently. It is particularly important that the workstation's data server function be organically integrated into the analysis system.

One part of this type of PC analysis system uses the programs VisualData and VisualEconometrics as tools for time-series analysis. They also cause the workstation to function as a client server. These two programs were developed by Professor Stefan P. Schleicher of the University of Graz, in Austria, and by Prof. Tong Li of Wirtschafts- und Sozialwissenschaftliches Rechen-zentrum (WSR), also in Austria. (Both programs need Windows 3.1 or Windows 95.) The copyright for these programs is held by WSR and by Osterreichishes Institute fur Wirtschaftsforschung (WIFO) as well as by the two creators. Those systems include a local database on PCs and a remote database on the workstation; they can not only use these databases as necessary but are also capable of linking them to conduct analysis. We can input and output data either in the original form or by using spreadsheets such as Excel.

Figure 4 VisualData's Data Selection Screen

Figure 5 VisualData's Graph Screen

5. Conclusion

The researchers in the Statistics Project will make the information and results which we compile freely available to the research community on the World Wide Web. As necessary, we will also develop our capacity to provide information on the Internet, whose capabilities are continuously growing. Naturally, we will announce any new developments in our system of providing information, so we invite you to check the Statistics Project's Web site periodically.


Satoshi Yasuda
Hitotsubashi University, Institute of Economic Research