How to globally ignore the invalid byte sequence of UTF-8 string?

If you only want to manipulate the original byte, you can try to encode it as ASCII-8BIT/binary.

str.force_encoding("BINARY ")。 Split ("n")

However, this will not bring your U back, because in this case, the source string is ISO-8859 (or a similar string):

“- Men\xFC -”。 force _ encoding(" ISO-8859- 1 ")。 Encoding ("UTF-8")

= & gt“-menu——”

If you want to get multibyte characters, you must know what the source character set is.

Once you force coding, if the data comes from your database, you can change your ASCII-8 bit or binary coding; Ruby should report them. That's all. Alternatively, you can use the monkeypatch database driver to force coded reading of all its strings. This is a huge and possibly completely wrong method.

The correct answer will be to solve your string encoding. This may require their database repair, database driver connection coding repair or combination. All bytes still exist, but if you are working with a given character set, you should, if possible, let ruby know that you want your data to be encoded in this way. The error is that mysql2 driver connects to mysql database with Latin encoded data, but UTF-8 character set should be specified for the connection. This guide takes Latin data from DB, interprets it as UTF-8 instead of Latin, and then converts it into UTF-8.

If you can explain the string here, the answer is possible. You can also look at the possible global (-ish)Rails solution for this answer, the default string encoding.