View Single Post
02/14/19, 09:32 AM   #5
sirinsidiator
 
sirinsidiator's Avatar
AddOn Author - Click to view addons
Join Date: Apr 2014
Posts: 1,578
I admit I may not have been completely correct about everything I wrote, but the point still stands that it is not a bug, but just wrong assumptions being made.

Since the pattern classes do not support unicode, one would need to use the appropriate replacements in order to get the expected output:
Lua Code:
  1. local inStr = "1à1";
  2. for outStr in inStr:gmatch("[^\t-\r ]+") do
  3.  d(outStr)
  4. end
I am not sure if it would be a good idea to change the string library so it supports unicode, but doesn't follow the Lua documentation on the web anymore. Maybe they should instead add luautf8? That way we'd have a unicode enabled string library. They already added the utf8 module from Lua 5.3 after I requested it a while ago.

Originally Posted by merlight
I find this advice confusing. Converting Lua source to ASCII means replacing all non-ASCII characters with "\123" escapes (UTF-8-encoded, of course). Which would be tedious and wouldn't solve the OP's issue. Because "\195\160" == "à", the matching function will see the same bytes as before.
Guess I was not clear about that. I was referring to Notepad++'s convert feature. It attempts to exchange the byte sequences with the appropriate LATIN-1 code. Of course it won't solve anything, but it would demonstrate that the code would work if you used the expected encoding for the input.

Last edited by Dolby : 02/14/19 at 05:37 PM.
  Reply With Quote