About .TXT File Format
Computer files can be divided into two broad categories:
binary and text. The distinction is vague because
in many contexts, any file is a sequence of digital bits. For instance,
to the circuits which handle information read from or written to a disk,
there is no distinction between text data and any other sort. The software
concerned with those circuits likewise makes no such distinction. Humans, on the
other hand, are concerned with this distinction.
Text files (plain text files) are files with
generally a one-to-one correspondence between the bytes and ordinary readable characters
such as letters and digits. Therefore any simple program to view a file makes
them human-readable. Generally, they contain ASCII characters and some
control characters such as tabs, line feeds and carriage returns without any embedded
information such as font information, hyperlinks or inline images. But sometimes
text files contain more than ASCII characters if they are encoded
by East-Asian encoding such as SJIS
or Unicode. If the files are written in
Unicode, a UTF standard such
as UTF-8 defines the encoding format. Although text files are generally
human-readable, they can of course be used for data storage by computer
programs. This may be done because text files avoid problems which may arise with binary
files, such as problems of endianness or the byte-length of integers.
Text files can have the MIME type "text/plain", often with suffixes indicating an encoding. Common
encodings for plain text include Unicode UTF-8, Unicode
UTF-16, ISO 8859, and ASCII.
A plain text is textual material, usually in a disk file, that is
(largely) unformatted. A webpage with formatted text is not in plain text
in this sense, but the HTML source is. The distinction is usually not
clear-cut.
Source code of the computer programs is usually written as a text file,
but once compiled, it turned into a binary file as described below.
Transferring text files between Unix, Macintosh, and
Microsoft Windows or DOS computers can
be problematic, as each platform uses different characters to signify a line
break. See new line for a discussion of this confusion. Further cross-platform
confusion occurs because many non-Unix systems have traditionally used
an Extended ASCII character encoding, where the first 128 byte
values conform to ASCII and where the upper 128 byte
values are mapped to textual or punctuation characters, such as curly quotes
or characters having a diacritical mark. Prior to the advent
of Mac OS X, Macintosh users would call
a document a text file so long as all of its non-whitespace bytes were
printable in the Macintosh environment.
The related term, plaintext, is most commonly used in a cryptographic context,
while clear text usually refers to lack of protection from eavesdropping. Usage
of these terms is such that there is some confusion amongst them, especially
among those new to computers, cryptography, or data communications.
Binary files, in contrast, usually contain non-alphabetic characters,
and may contain any byte value at all. They are generally used to store data rather
than textual material in plain text form. Computer programs are typical examples,
as the data and CPU instructions they contain can-in principle-be any binary
value. As a result, compiled applications are often simply referred
to as binaries, as opposed to source code, which is contained
in plain text files. But binary files can also be image files, sound files,
compressed files, etc.-in short, any file content whatsoever, including plain text.
Usually the specification of a binary file's file format indicates how
to handle that file.
Binary files are often encoded into a plain text representation to improve
survivability during transit, using encoding schemes such as Base64.
It is a common misconception that geeks and nerds can read a binary file. The
fact is that binary is nothing more than a number system. The computer can
read the file in any of a number of ways. Binary files are usually encoded
in bytes, that means the binary digits are grouped in eights. If you
open this file in Notepad, for example, each group of eight bits will
be translated as a single character, and you will see a text file. If, however, you were to open it in some other application, that
application will have its own use for each byte: maybe the application will treat each
byte as a number, and it will output a stream of numbers between
0 and 255. If the file were an EXE file, then Windows would attempt
to treat each byte or set of bytes as an instruction.
<< Back to .TXT page
|