Thread Tools Display Modes
05/31/14, 05:51 PM   #1
Sharlikran
 
Sharlikran's Avatar
AddOn Author - Click to view addons
Join Date: Apr 2014
Posts: 626
Unicode in a map name

Lua Code:
  1. ["bleakrock/bleakrockvillage_base"] = {"Bleakrock Village", "Ödfels^N,in", "\xd6dfels^N,auf", "\xEF\xBF\xBDdfels^N,auf", "village de Morneroc^md",},
I have some map names with escape sequences in them. One is a BOM sequence.
Lua Code:
  1. function Harvest.GetNewMapName(mapName)
  2.     local result = nil
  3.     for newMapName, translations in pairs(Harvest.mapSystem) do
  4.         if Harvest.contains(translations, mapName) then
  5.             if result then
  6.                 return nil --there are more than one possible maps, skip to prevent wrong data
  7.             else
  8.                 result = newMapName
  9.             end
  10.         end
  11.     end
  12.     return result
  13. end
That function is used to look for possible results. It will find "Ödfels^N,in" but not "\xd6dfels^N,auf" and "\xEF\xBF\xBDdfels^N,auf" because I think the string is being treated as a literal match, not as a unicode string.

Lua Code:
  1. function Utf8to32(utf8str)
  2.     assert(type(utf8str) == "string")
  3.     local res, seq, val = {}, 0, nil
  4.     for i = 1, #utf8str do
  5.         local c = string.byte(utf8str, i)
  6.         if seq == 0 then
  7.             table.insert(res, val)
  8.             seq = c < 0x80 and 1 or c < 0xE0 and 2 or c < 0xF0 and 3 or
  9.                   c < 0xF8 and 4 or --c < 0xFC and 5 or c < 0xFE and 6 or
  10.                   error("invalid UTF-8 character sequence")
  11.             val = bit32.band(c, 2^(8-seq) - 1)
  12.         else
  13.             val = bit32.bor(bit32.lshift(val, 6), bit32.band(c, 0x3F))
  14.         end
  15.         seq = seq - 1
  16.     end
  17.     table.insert(res, val)
  18.     table.insert(res, 0)
  19.     return res
  20. end
I googled that but i'm not really trying to check for valid UTF-8 chars, and amke sure there aren't any errors. What I am wondering is what simple command do I use to take a string like "\xEF\xBF\xBDdfels^N,auf" and match it with "�dfels^N,auf". Where "�" is the BOM sequence.

When "\xEF\xBF\xBDdfels^N,auf" == "�dfels^N,auf" then then the result would be true.
Lua Code:
  1. string.escape(string1, string2)
Is there something like above where string1 = "\xEF\xBF\xBDdfels^N,auf" and string2 = "�dfels^N,auf" so then it's true?

Last edited by Sharlikran : 05/31/14 at 05:55 PM.
  Reply With Quote
06/01/14, 06:15 AM   #2
Iyanga
AddOn Author - Click to view addons
Join Date: Apr 2014
Posts: 183
Originally Posted by Sharlikran View Post
I have some map names with escape sequences in them. One is a BOM sequence.
That's very unlikely.

That function is used to look for possible results. It will find "Ödfels^N,in" but not "\xd6dfels^N,auf" and "\xEF\xBF\xBDdfels^N,auf" because I think the string is being treated as a literal match, not as a unicode string.
"\xEF\xBF\xBD" is just...garbage, so is \xd6.

You can verify this easily with the commands
/script d("\xEF\xBF\xBD")
and
/script d("\195\150")
  Reply With Quote
06/02/14, 12:37 AM   #3
Sharlikran
 
Sharlikran's Avatar
AddOn Author - Click to view addons
Join Date: Apr 2014
Posts: 626
Originally Posted by Iyanga View Post
That's very unlikely.
I'm surprised you would say that. I don't see why you would doubt me. What do I have to gain by lying. The names are from the Esohead.lua file from the daily merge site?

Originally Posted by Iyanga View Post
"\xEF\xBF\xBD" is just...garbage, so is \xd6.

You can verify this easily with the commands
/script d("\xEF\xBF\xBD")
and
/script d("\195\150")
How do I specify a hex value of a char in Lua then. Most programming languages accept the above syntax. So EFBFBD becomes \xEF\xBF\xBD. I understand the syntax may be wrong that's why I'm asking. Do I define a BOM sequence with %EFBFBDx or what?

Also understand I am working from a file that has been through who knows what. I have to stick with what is given and try to come up with ways to match the chars I mentioned in the OP. In the event I can't do it I really don't care, but I wanted to at least try.

Last edited by Sharlikran : 06/02/14 at 12:41 AM.
  Reply With Quote
06/03/14, 10:49 AM   #4
Iyanga
AddOn Author - Click to view addons
Join Date: Apr 2014
Posts: 183
Originally Posted by Sharlikran View Post
I'm surprised you would say that. I don't see why you would doubt me. What do I have to gain by lying. The names are from the Esohead.lua file from the daily merge site?
http://en.wikipedia.org/wiki/Byte_order_mark


Do I define a BOM sequence with %EFBFBDx or what?
The BOM sequence is:
"\239\187\191"
  Reply With Quote
06/03/14, 04:05 PM   #5
Sharlikran
 
Sharlikran's Avatar
AddOn Author - Click to view addons
Join Date: Apr 2014
Posts: 626
Originally Posted by Iyanga View Post
http://en.wikipedia.org/wiki/Byte_order_mark

The BOM sequence is:
"\239\187\191"


Ok I'm a moron. It's something else, sorry, too many Google results made me think it was a BOM sequence. I still have to see what to do with it. Thanks for being patient with me when I had declared it BOM when it wasn't.

Last edited by Sharlikran : 06/03/14 at 04:08 PM.
  Reply With Quote

ESOUI » Developer Discussions » Lua/XML Help » Unicode in a map name

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off