View Single Post
02/14/19, 06:56 AM   #4
merlight
AddOn Author - Click to view addons
Join Date: Jul 2014
Posts: 671
Originally Posted by sirinsidiator View Post
This is not a bug, but simply an encoding issue.
If this is not a bug, it's a terrible feature.

TL;DR: When you have all strings in the game in UTF-8, your string handling functions should not operate in LATIN-1.

Originally Posted by sirinsidiator View Post
The Lua string functions assume your input sequence is ASCII, but you used UTF8 for your .lua file.
Lua, as in pure Lua 5.1 interpreter, most likely works with C locale, which means string functions only work on individual bytes and assume ASCII and don't care about byte values 128 and above. Because of this, string.find("\195\160", "%s") returns nil ... neither 195 nor 160 represent any character in ASCII, and so cannot match "%s" (space).

Enter ESOLua, modified interpreter. Despite the fact that strings in the ESO API are, for obvious reasons, UTF-8 encoded, string matching functions treat strings as LATIN-1 encoded. Therefore, string.find("\195\160", "%s") returns 2, matching the trailing byte of this two-byte character (in LATIN-1, 160 is a space character). This is BOLLOCKS.

Originally Posted by sirinsidiator View Post
This means the à character in your test corresponds to the two byte sequence "c3 a0" instead of "e0". According to https://www.ascii-code.com/ "c3" is "Latin capital letter A with tilde" and "a0" "Non-breaking space".
ASCII is a 7-bit encoding. The described meaning of "c3" and "a0" comes from LATIN-1, which is a superset of ASCII, but not ASCII.

Originally Posted by sirinsidiator View Post
The game font cannot properly render the first one since it uses utf8 instead of ASCII, so it shows a box
It's not a font issue, it's because "c3" is not a valid UTF-8 sequence. I don't know why the OP's client renders tofu, mine didn't render anything, but either way "c3" with no trailing byte in UTF-8 sequence is an error, not a character.

Originally Posted by sirinsidiator View Post
and the space is handled by gmatch.
And that's the problem. It's a space only for gmatch assuming wrong encoding, for everyone else it's the second byte of "à".

Originally Posted by sirinsidiator View Post
Try to convert your .lua file to ASCII and it should work as expected although it will break any "real" UTF8 strings you use and the letter will be rendered as a box unless you use a custom font.
I find this advice confusing. Converting Lua source to ASCII means replacing all non-ASCII characters with "\123" escapes (UTF-8-encoded, of course). Which would be tedious and wouldn't solve the OP's issue. Because "\195\160" == "à", the matching function will see the same bytes as before.
  Reply With Quote