I can find many Ruby codes which use String#force_encoding
. But most of them are wrong. You should not use the method.
Ruby 1.9 Era
In this year, the first release of Ruby 1.9 series was shipped. And I will soon release Ruby 1.9.1-p376. 2009 was the year of Ruby 1.9.
In the next year, Ruby 1.9.2 will be released. It will be completely compatible with Rails 3. It also completely pass to RubySpec and JRuby will soon get compatible with Ruby 1.9.2. You should start porting your codes to Ruby 1.9 right now. Ruby 1.9 is a so good language.
Encoding
When you port your Ruby code to Ruby 1.9, the largest problem is the character encoding problem. Ruby 1.9 treats a string as a sequence of characters rather than a byte sequence. JEG2 explained this topic in his articles "Understanding M17N" .
encode
and force_encoding
There are three methods, String#encode
, #encode!
and #force_encoding
.
For a String
, String#encode
keeps the characters but changes the encoding in which the characters are encoded. #encode
is not destructive but #encode!
is the destructive version. The byte representation of a character is depend on encoding. So #encode
and #encode!
generally change the byte representation of the string.
#force_encoding
in contrarily keeps byte representation but changes characters. After #force_encoding
, sometimes the string become invalid as a character sequence.
In other words, #encode
treats a String
object as a character sequence but #force_encoding
treats it as a byte sequence.
Abuse of force_encoding
force_encoding
is too much used. I think this is because of example codes.
The following codes are quoted from the rdoc of Regexp.fixed_encoding
.
r.fixed_encoding? #=> true r.encoding #=> #<Encoding:UTF-8> r =~ "\u{6666} a" #=> 0 r =~ "\xa1\xa2".force_encoding("euc-jp") #=> ArgumentError r =~ "abc".force_encoding("euc-jp") #=> nil
M17N example code need to create string with various encodings in order to show how M17N works. So that kind of codes tend to use force_encoding
much.
In addition, example codes sometimes cannot assume its source encoding. Particularly, when the code is printed in a book, a string literal cannot have any encoding. So book author sometimes needs to use force_encoding
.
On the other hand, general application codes can have their right source encoding with the magic comment. In most cases, the codes must concern with characters rather than their byte representation.
Ruby 1.9 was designed so that you can write M17N applications only with #encode
and magic comments. You don't need force_encoding
.
force_encoding
is for middleware authors. For example, net/http, rack, rails or PostgreSQL adapter. These kind of libraries must accept byte sequence from an external stream and reinterpret the sequence into a string with an encoding. So they need to use force_encoding
. And they have the responsibility to provide a string with a correct encoding to library users.
If you need force_encoding
in your application, it must be a bug of a middleware.
Conclusion
What you need are
- magic comments,
String#encode
.
You don't need to use String#force_encoding
in your application.