HTML Text

Use standard HTML for websites

You can learn HTML by example by just looking at standard HTML source. Before attempting to develop a website please also first read. http://usability.gov/. This web-site developed by NCI is an excellent resource for designing usable, useful and accessible web sites and user interfaces. It also links to a helpful HTML Primer archived at NCSA at UIUC, and an article on Web users.

All source at the lakdivaCoins site was hand edited with emacs. See for example the source of the HTML template I use to guide the creation of a new page for a coin I want to add to my site. Simple HTML is browser independent and easy to update since they have minimum of the amount of formating commands. Pages written by the numerous html editors such as Frontpage are ridiculously complicated and often 10 times larger and will not display on all browsers. Extensions such as "javascript" "style sheets" are not needed for most of the web pages in which editors like FrontPage put them. Micro (Brained) software try to propagate HTML pages which work only on their browsers in an attempt to fool computer illiterate persons when surfing to assume they have have a better browser. I don't give in to that kind of anti-social behavior.

I personally prefer Netscape since I can switch off both auto-loading of images and javascript. This allows much faster surfing and avoids loading advertisements which are almost always images and avoids distraction by javascript driven new browser windows that keep popping up to try grab your attention. Javascript although has some nice features has been abused so much by advertisers I see no advantage of making a website dependent on it. Most important features of a website can be done in plain HTML. For example one of the most frequent use of it is to open a link in a new browser window to ensure that the main index is still displayed. This can be done with standard HTML option target="xxx" within the link to open a new window with image where xxx is any name for window. If the same name is used it also ensures that the user does not have to keep closing windows since the same window will be reused. Only once to be able to blink two images have I felt the need to use javascript.

Links are on of the most useful features that websites have over printed books. To illustrate I have put a number of links on this page. Although they are not too difficult to find using a search engine such as the now popular Google sometimes it is not obvious which site has the most relevant information. Use them as often as possible. They even make easy to save bookmarks you can get to when necessary.

It is also important to keep each webpage focused on a single topic, with a very explicit title which what is displayed on the search output. Make the Title explain the content fully but briefly, and don't assume the reader ias already at your site, since he probably isn't, and just reviewing a list returned by a search engine. Readers looking for information will clearly select reading stories with the search keywords are in the title line which are often listed first in search results. The more pages you have with more different title lines the more indexed is your site, and more frequently found via search engine.

I have a page for each coin I have posted on Lankan Coin website which are linked in a grouped hierarchy of index pages which discuss common properties and history of each Era. The mean file size of about 300 HTML text document created so far is 3.5 Kbytes with a mode at 2.5 Kbytes. About 90% of the file are under 5.5 Kbytes with the largest being 18 Kbytes.

Individual pages for each coin type simplifies specific cross-link with other sites. Many such webpages could some day merge into a comprehensive global coin catalog online on web with links for each coin type to a specific webpage for that coin. Clearly an almost impossible task for any individual collector, but could be a reality if many web enthusiastic collectors create pages for coins in their collections.

Many printers have been setup to optimally print pages from browsers with about 720 pixels across. Full screen on many older PC moniters was 800x600 pixels When the webpage requires a large width because of images and text displayed on a webpage I have seen some printers squeeze the last image horizontally to fit. It would have been better if it scaled the whole web page equally in both vertical and horizontal directions to fit the printed page which seems to be the best logical choice but is not what is done by browsers for reasons I don't understand. Giving allowance for margins I find that I can place two 250 pixel images to the right of small table in my page layout and have the page print out OK. So each coin image is scaled down when displayed to a width between 200 and 250 pixels. Someday I will make them clickable so that just the to images are displayed at full original scale. Currently to see full resolution it needs to be opened in a new window.

The LakdivaCoin pages are developed on a 10-year old UNIX SunOS 4.1.4 workstation using an ordinary text editor EMACS typing in the HTML formating text. It is tested on a old Netscape 4.07 browser I still use on this computer which is allways up and rarely needs to be rebooted unlike a PC with a MS/OS. I have seen images sometime print in very garbled fashion from a MS/IE browser for reasons I don't understand. To get optimum results please view/print the LakdivaCoin pages on a Netscape web browser.

Another fundamental reason to keep websites in basic HTML is to ensure that pages will be properly indexed and archived by robot crawlers. The value of a website is if the server replies to robot crawler with for example: This page uses frames, but your browser doesn't support them. that is all that will get indexed by the search engine, and the pages you painstakingly created may go unnoticed in CyberSpace. The example site given of a leading book publisher had less pages indexed in google.com than this site. Read interesting note about Search Engines.

There are many HTML options one can use to setup pages. However these option often put unnecessary constraints on the browser to display the page in the best possible way. Since the same page would be I read-with different browsers with a multitude of different display area, optimizing the layout to one's browser is a waste of time. I have found it best to use as little as options as needed and let the page fit as best as possible to the available display screen. For example one should never specify both the width and the height which could lead to image distortion if they have not been matched to the dimensions of the image. Using either the width or the height achieves the same image reduction without posibilty of distortion. Width specification on table elements can lead to poor displays.

One fact which I have come to realize over is that what is obvious to those of us who develop web sites and have been many years on the web are not the case for persons with less web interaction. Two issues below were highlighted by replies I got from persons I directed to my web page. And both very educated persons, but probably don't use the computer 14 hours per day like I do. For example it seems to be not obvious that

When one uses a word in a sentence as a link that the word needs to be clicked to follow for more details.
I didn't use links such as "More" and "Details" "Click Here" so that if pages are printed it will look look and more like a publication which is one of it's final aims. I also have lots of cross-links within the text and such links would reduce the readability of the text.
Putting instructions at the bottom of the page about surfing the links may not be seen. Putting these instructions on the top of the page means only the instructions may be seen ...
That there could be more on the page than that is displayed, and that one must pull down the scroll bar on the right to the bottom to see the all of the page.
Since links are rarely followed it is advised that one puts all links on the top page so it will catch readers attention, but that makes the page longer and the links at bottom of page are not seen.
I hardly put any images on the link pages to ensure they load up fast although thumbnail images may probably have helped get readers attention.

I subsequently added some surfing instructions on the long index pages. If you have other suggestions, particularly from an infrequent web users point of view please send me comments by E-mail.

Loading webpages and Images from other sites on the Internet

The ethical convention on the Internet when referencing another work is to put an anchor href to the external URL. To avoid the reader loose your site one can make the external link open in a new browser window. This can be easily done with a HTML link without use of JavaScript. IMHO this is better than hiding the new page within a Frame and not displaying the source URL. To avoid this trap I request the reader to click on a URL which will then jump my website out of the Frame and display my URL in a new browser window to be book-marked.

The strength of the Internet is hyper-links. Technically there is nothing stopping one linking image source on an external server at another site to display on your page. This does not violate the other servers copyright since you have not copied and archived an image without permission. The only drawback is that the other server might remove the image from being online or change the name of reference URL to make the link break. If the owner of the other server does not like you doing so since he has to pay for you exploiting his bandwidth, he could change the image to something you would not like to see displayed on your page. Many flag their images with an explicit to get a free advertisment from anyone who uses it.

A classic example of this ebay motivated me to put down my thoughts on the issue. Is it a self correcting system that persons will stop linking to images of particularly those who object to then pay for the bandwidth used. This is more of a concern when used for someone else's commercial enterprise. I have for example denied my server replying to references from major Internet auction sites such as ebay. That does not stop someone copying the text and I have sometimes seen text from these pages copied directly to Auction listings. All I have done so have is to explicitly not grant any permission for anyone to do that.

Even links made to useful legitimate sites could also lead to problems. Frequntly the webpage goes offline like older issues of an online Newspaper. If the information critical I generally save a copy of the source text. I have also even seen good free websites which have established a lot of links, sell the domain name to an advertising agency which uses the hits they get from the links. This slam works since many link pages nce created are never corrected. The solution to both problems is to regularly check and correct your webpages.

IMHO it reflects that the Internet was designed for free academic use and problems like this arise when it grows commercially and the server often need to pay for the amount of usage rather than the surfer who is at most charged by duration of usage rather than download volume.

Taking your website on a laptop or CDROM

It is useful to be able to take your website with you when traveling and be able to show it or refer to it without needing an Internet connection. I have given talks with it using a screen projector. Maybe publish it on CDROM when it is complete some day. To be able to do this one needs to be careful in the way one write links. When addressing page or image URL's on your site you need to avoid making any explicit ‹a href=http://yourdomain.org/directory/filename.html› links and link to page using relative location of the file, for example ‹a href=directory/filename.html›. One must also remove any ‹base href=http://yourdomain.org› statements which automatically insert the base URL to the relative URL specifications.

Web sites are replacing part of the role of publications in numismatic Journals. Personally one of the main reasons I restarted my collection is because it was so much simpler to find items and easy to display them on the web. Electronic groups on the Internet are replacing part of the role of local Coin Clubs. It is easy to get help with information from experts on the topic all over the world and to share your knowledge with others on the Internet. Connected to the Internet you could be the guru online even if there are many others offline who have far more knowledge on the subject.

There are lots of sophisticated tools in the PC CyberSpace to allow even a novice to develop webpages. I personally have no idea about them since I use UNIX workstations and type and edit the HTML directly rather than use any web-editor. It seems about time that organizations such as ANA encourage the use of the Internet particularly among the younger collectors who are growing up with computers and the Internet comes more naturally to them than to the older generation. Maybe online exhibitions with awards attracting entries from all over US and even the world. The exhibits can also remain online indefinitely rather than a few days of a coin show and be seen and judged by collectors all over the world rather than just the few who have the time and can afford to travel and attend a particular coin show.

Optical Character Recognition OCR

First let me make a comment about Copyright. Books older than 1922 I am told are OK. Indian publishes are reprinting them anyway, so why not on the web. I wrote to Macmillan for permission to put a 1926 book on the web and they never replied. I assumed they didn't care but didn't want to grant permission. It used to be 50 years after death of Author in this case Codrington who died in 1942, so copyright went away in 1992 but returned in 1997 when they made it 70 years after death. The Law is also not clear to me about books that were reprinted. Copyright law for publications on the Internet are poorly defined and have hardly been tested legally. So until that time ...

Scans of text should be sent through OCR since readable images are over 100 times larger and in any case the text is not searchable. This is necessary to ensure your website is found by those interested in the content. You need to scan at about 300 dpi to give the OCR software the best chance of identifying the characters. Lower dpi does not provide sufficient resolution and higher resolution could confuse it by deformities in the print. Particularly if the book is old and discolored, the scan needs to be done in Color so the OCR can suppress the brown and identify the black print. Some higher quality photocopies are able to suppress the brown tone if it is necessary to use photocopies. I OCR with TextBridge 9.0. It allows in-line correction of text which the software feels it cannot read, which is highlighted to be read by human eye. Personally I feel that the OCR software in the store shelf market is not as good as it could be, probably because the want to sell better software at exorbitant prices to business who afford to pay.

The text is saved as ordinary text, although the system allows you to save as HTML. The HTML written by the software is almost unreadable, and impossible to edit. It is much simpler to insert the few lines of HTML formating code using a computer program of a few lines.

However it is questionable if it is faster to OCR old text and carefully proof read and correct the output, over just having it typed in two times and detect the typing errors by comparison of the files. If an average touch typist can enter about 35 words per minute or 2000 per hour, it clearly took us lot more than 6 hours to OCR edit and proof read 16500 words. . When I asked Prof. Raj Reddy a former Dean of Computer Science at CMU who head an Universal Library , how to get a rare 1822 book on Hindoostan to the web, he recommended that it be sent to India to be typed. I am seeking such a data entry facility in Lanka.