Case sensitivity

From TheAlmightyGuru
Jump to: navigation, search
Comparing text with and without case sensitivity.

Case sensitivity describes whether a computer will treat upper and lower case letters as equal. Those systems which view them as different are "case-sensitive" while those that view them the same are "case-insensitive." A case-sensitive system would treat the words "she," "She," and "SHE" differently, while a case-insensitive system would treat them as equals.

Case-insensitive systems are affected by case preservation which can either be "case-preserving" or "non-case-preserving." Non-case-preserving means that all entered text will be converted to specific case used by the system. For example, the FAT16 file system is non-case-preserving, so, if a user names a file "She.txt," the file system will convert it to "SHE.TXT," losing the provided case. However, the NTFS file system is case-preserving, so, a file named "She.txt" would be stored as "She.txt," and would still be found under a search for "SHE.TXT."

In computers, case sensitivity applies primarily to text comparison, file systems, and programming languages.

Personal

I grew up with MS-DOS using the FAT16 file system and programming in various forms of the BASIC programming language, both of which are case-insensitive. I didn't encounter case-sensitivity until I first tried programming in C in my teens, and found it rather off-putting. In my 20s, when I switched my Web server from a Windows host to a Linux host, I discovered my site had scores of broken links because the Linux server used a case-sensitive file system, which I was further frustrated by. I use case-sensitive languages and operating systems more frequently now, but I find them to be unnecessarily relics that only serve to make computing hared to learn than they need to be. The only time I can see case sensitivity being a benefit is during a few forms of text comparison.

Case-Sensitivity in Text Comparison

Text comparison can see some benefit from case sensitivity for things like password checks, but most forms of text comparison are case-insensitive because the disadvantages typically outweigh the advantages. Because of this, most text comparisons are case-insensitive by default, and, when they are case-sensitive, it is made clear to the user.

Advantages

  • By adding more characters to choose from, passwords can be more complex which results in better security.
  • In languages which use capitalization to distinguish words (like English capitalizing proper nouns but not common nouns), it adds to the precision of a comparison or search which may give more accurate results. For example, if a person is searching for the novel "Dune" but not interested in a sand "dune."
  • It helps exclude search results with poorly formatted text. If "DUne" occurs in some text, it's a good indication the document isn't very scholarly, so people probably won't want to see it.
  • There are some edge cases where the case can't be converted automatically. For example, the lowercase German letter "ß" is sometimes converted into uppercase as "SS," but other times just "S." The German word for street, "Straße," becomes "STRASSE," but the word for white, "weiß," becomes "WEIS." To help alleviate this problem, the German alphabet added "ẞ" as an uppercase "ß," but it only did so in 2017, and it's still not fully adopted.

Disadvantages

  • The vast majority of humans interpret differing case as equal by default. For example, if a human searched a database of cars looking for an interior color of "Dune brown," they would expect cars to be included in the results even if they were labeled as "Dune Brown" or "DUNE BROWN." It's ridiculous to expect a user to correctly guess the case of the color.
  • Western languages capitalize the first word in a sentence which creates false-positives in case-sensitive searches. For example, a document which includes the sentence, "Dune sand is coarse," will be included in the results for a search of the novel "Dune." If the search engine tries to correct for this by ignoring the capitalization of the first word in a sentence, it will create a false-negative with the sentence, "Dune is my favorite novel."
  • Case typos are more problematic. It's common to accidentally hold down the shift key too long and type "DUne," which would result in only matching other instances of the word being mistyped.
  • To alleviate problems with case-sensitive comparison, some engines only take case sensitivity into account if the input contains upper case. That is, if the user searches for "dune" it will match "dune," "Dune," and "DUNE," but, if the user enters "DUNE," it will only match "DUNE." Since this differs across different implementations, there will always be confusion as to how it works.

Case-Sensitivity in File Systems

Typically, file systems for home user are case-insensitive (Windows, Macintosh), while those for enterprise use are case-sensitive (Unix, Linux).

Advantages

  • Names have more characters to choose from. Many old file systems only supported short file names (a length of 9 to 11 characters was common) with only a few character variations (often only letters, numbers, and some punctuation). However, modern file systems usually support hundreds of Unicode characters, so having 26 extra uppercase letters to choose from is trivial.
  • Your file names can more accurately match spelling. If you have a file about sand dunes and one about the novel, you can have "Dune" and "dune" in the same area without having to add an identifier like "Dune - novel."
  • On extremely old platforms, case-sensitive file systems are slightly faster. On modern systems, the speed is negligible.

Disadvantages

  • As with text comparison, most humans view "Dune" and "dune" as matching. By default, they will also assume files with similar names are the same. Trying to train this out of someone requires a lot of repetition and accidents before they get it right.
  • Describing a case-sensitive file path is confusing. If a co-worker tells you to go into the "log" directory and open the "clock" file, you won't be able to because it's spelled "LOG" and "Clock" file. This will lead to wasted time needing to clarifying the case. Even users who use case-sensitive file systems often forget to include case in when describing files or directories to other people.
  • File maintenance is confusing. If you're told to open the "read me" file and the directory contains "ReadMe," "README," "readme," "Readme," and "readMe," you have to ask for clarification which can't easily be conveyed. Since modern file systems allow for long file names, it's much less confusing to just use longer file names.
  • Searching for files is more complicated. All of the disadvantages for text comparison apply when trying to find files or directories. Consider looking for everything related to the program FileZilla. Did the developers name their files "FileZilla" to match the title, or did they use the more common all-lower "filezilla?" Perhaps, as programmers, they used the lowerCamelCase "fileZilla?" There is no way to be sure without searching for all case possible variations, of which there are 65,536!
  • Alleviating the disadvantages defeats all the advantages. In order to alleviate all the problems that occur with case-sensitive file systems, users will often name everything in a single case, typically lowercase. But, in doing so, all of the advantages of having a case-sensitive file system are lost. It even becomes worse than a case-preserving case-insensitive file system because you don't even get the benefit of using proper case names.

Case-Sensitivity in Programming Languages

Programming languages are highly varied, but, technical languages like those based on C (C++, C#, Java, JavaScript) are case-sensitive while learning languages like BASIC and Pascal are not. Some are a mix like PHP where variables are case-sensitive, but functions are case-insensitive.

Online searches for the pros and cons of case sensitivity in programming are mostly opinions on coding preferences not case sensitivity. For example, it's common to see people argue that it's easier to read code when classes start with an uppercase letter, but variables start with a lowercase letter. However, this is possible regardless of case-sensitivity.

Advantages

  • It promotes consistency to force programmers to use the same case for everything,
  • It promotes additional consistency when a case convention is set by the language itself. For example, in the built-in Java code, classes use UpperCamelCase, variables use lowerCamelCase, and constants use ALLUPPERCASE.
  • You have more characters to use for a name. This isn't very compelling since short names are typically viewed as bad programming.
  • The compiler and IDE will run faster since it won't have to perform case conversions in memory. Again, this isn't very compelling since the speed difference on modern computers in negligible, you would only notice the difference on older platforms.

Disadvantages

  • There are many conflicting standards for how to apply case which creates confusion. For example, some case standards say abbreviations should be displayed as all uppercase, like "XMLHTTPRequest," others say only the first letter in the abbreviation should be uppercase, like "XmlHttpRequest." This is different across languages, but there is often disagreement within a language, sometimes even in a single name, like the Java class, "XMLHttpRequest." Compound words also cause confusion. Should you use "FileName" or "Filename?" A stickler for the English language would use "FileName" since most dictionaries do not recognize "filename" as a word, but a techie would probably use "Filename" since it's an accepted compound word in computer parlance. Every time a programmer learns a new language or works on a different program, they have to learn a new set of,often internally inconsistent, rules. A case-insensitive language alleviates all these problems.
  • Using case alone to distinguish between classes, functions, and variables sometimes creates unexpected problems. If you have a class named "Object," a variable named "object," and you mistype one for the other, you'll get an error, but it's usually a pretty simple fix. However, if the mistyped code is valid, the program will compile and run and give very unexpected results which could take a long time to debug.

Links

Link-Wikipedia.png