3/6/2023 0 Comments Docear utf8 japaneseNihon Terebi News, which I watch to practice listening – Shift-JIS.Here, the character-set the page I chose defaults to is “utf-8″. You should look for a line that says near the top: In both cases, you can use the shortcut of typing CTRL+U or Command+U (Mac users). For Firefox and Opera users, this is a simple matter of clicking on “view” then either “source” or “page source”. On your web browser, you can view the character-set of a website using the “view source” option. So, if you see a Japanese website and even if the website represents a major company, you can see different character sets used. The problem is that over time, lots of cruft, old websites and outdated software still persists in computing and the Internet, so although most modern programs, web-browsers and such all use UTF-8 by default, there’s plenty of leftovers that don’t. To make Unicode easier to program, the UTF-8 encoding scheme is presented to the public in 1993, which takes off after that.These of course are used in Japanese as “kanji”. By 1992, the first 20,000 Chinese characters are mapped out, with another 40,000 added in 2001.Already by 1991, Japanese Hiragana and Katakana letters were mapped out, but not Kanji Chinese characters. At first, it only had the English/ASCII characters mapped out, but as it grew, it absorbed languages and characters into it’s massive mapping scheme. Though begun in 1986, the famous Unicode Standard came out in 1991/1992, as a way of having a single character set and standard to store all characters from all languages.Both EUC-JP and Shift-JIS compete for presenting Japanese text on the Internet, and both are still frequently in use. This being a Microsoft invention, gains widespread use, though EUC-JP is thought by some to produce cleaner, easier to manipulate computer code. In the late 1990′s Microsoft in conjunction with a Japanese company create a new character set known as Shift-JIS.As you can guess from the name, it was widely used in UNIX and other systems. In the early 1990′s ISO-2022 standard is established for presenting languages like Japanese and Korean, which then leads to well-known character sets like EUC-JP (Extended Unix Characters – JP).Still useless for things like Japanese and Chinese which have lots, and lots more characters. They all had the same restriction of 7-bits, but now if you used the French code-page, you could get nice accented letters for example. To compensate for the English-centric behavior of ASCII, other similar code pages were developed for other languages. The ASCII standard was 7-bits in total, so you could only cram in 128 characters. Trouble is, it was English-only, but the de-facto standard of computing back then. This was widely adopted and is still more or less in use today. The ASCII standard supplants earlier and more difficult EBCDIC character sets.If you go back to computing all the way to the 1970′s the process of improving internalization runs like so: This post will contain a few tips about being able to read Japanese. This piece is a bit more techinical than other posts, but on the other hand, I am not an expert either, so hopefully this will find some middle ground. Getting Japanese text to work on a webpage was something that came up recently at work, so I felt inspired to touch on the subject briefly. Web, Japanese and UTF8 Posted: Aug| Author: Doug | Filed under: Bsd, Japanese, Language, Linux, Macintosh, Technology, Windows | 2 Comments »
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |