Thread Tools Display Modes
08/09/15, 09:30 AM   #1
merlight
AddOn Author - Click to view addons
Join Date: Jul 2014
Posts: 671
Exclamation [outdated] Bug in string.gsub

lua5.1 / lua5.2 console (from ubuntu packages):
Lua Code:
  1. function df(...) print(string.format(...)) end
  2. df("%q %d", string.gsub("une rune de puissance", "^(%l+) ", {une="<une>", de="<de>"}))
output:
Code:
"<une>rune de puissance"	1
This is correct output. The pattern "^(%l+) " is supposed to match a lower-case word and a space at the start of the string, and if that word is a key in the table ("une" or "de"), replace it with the corresponding value from the table. There's one such word, and the function makes 1 replacement.

lua5.1* in ESO:
Lua Code:
  1. /script df("%q %d", string.gsub("une rune de puissance", "^(%l+) ", {une="<une>", de="<de>"}))
output:
Code:
"<une>rune <de>puissance"	3
This is incorrect. First, the function replaces "de" that is not at the start of the string. Second, it says it made 3 replacements.

Edit: minor correction. The second value returned from gsub is the number of matches, not the number of replacements (words that match but are not in the table are not replaced). Of course 3 is still wrong, as there should be only 1 match.

Last edited by merlight : 08/10/15 at 10:34 AM.
 
08/16/15, 09:23 PM   #2
Sasky
AddOn Author - Click to view addons
Join Date: Apr 2014
Posts: 231
It's just not recognizing the beginning anchor (^).
Lua Code:
  1. /script string.gsub("une rune de puissance", "^(%l+) ", function(str) d(str) end)
  2.  
  3. une
  4. rune
  5. de

As far as the number of matches, it's correct (without the beginning anchor). It matches 3 words with spaces after, even though it can only replace two.

I suppose as a hack way, you could use a function and only replace the first time it's called:
Lua Code:
  1. function singleSub(str, match, lookup)
  2.     local inReplace = true
  3.     return str:gsub(match, function(str)
  4.         if not inReplace then
  5.             return str
  6.         end
  7.         inReplace = false
  8.         return lookup[str] or str
  9.     end)
  10. end
The fix for ZOS would be to see why ^ is being ignored in patterns.
 
08/17/15, 06:03 AM   #3
merlight
AddOn Author - Click to view addons
Join Date: Jul 2014
Posts: 671
string.gsub has an optional max_replacements parameter, so atm I'm passing 1 to prevent multiple replacements.

Some more tests:
Lua Code:
  1. /script function gsubtest(pat, str) local i=0; local function repl(m) i=i+1; return string.format("(%d:%s)", i, m); end; df("gsub(%q, %q) -> %q", pat, str, string.gsub(pat, str, repl)); end
  2.  
  3. /script gsubtest("une rune de puissance", "^r*une")
  4. -> "(1:une) rune de puissance"
  5.  
  6. /script gsubtest("une rune de puissance", "^r*une ")
  7. -> "(1:une )(2:rune )de puissance"
  8.  
  9. /script gsubtest(" une rune de puissance", "^r*une ")
  10. -> " une rune de puissance"

Warning: Spoiler


It appears the anchor actually works at first. The 3rd example string starts with a space, and there's no match, which is correct. But in the 2nd example, after the first replacement (1:une ), the anchor incorrectly matches at "rune", as if the matcher was reset and thought it was at the start of the string.
 
08/17/15, 12:24 PM   #4
Sasky
AddOn Author - Click to view addons
Join Date: Apr 2014
Posts: 231
I wonder if it's consuming the input pattern as it matches. Something like:

First match of "une rune de puissance" is "une ".
It then tries to match against "rune de puissance", then "de puissance", etc.

If that's the case, you could get the space in a separate step.
Lua Code:
  1. string.gsub("une rune de puissance", "^(%l+)", {une="<une@>", de="<de@>"}):gsub("@> ",">",1)
 
08/17/15, 04:10 PM   #5
circonian
AddOn Author - Click to view addons
Join Date: May 2014
Posts: 613
Originally Posted by merlight View Post
string.gsub has an optional max_replacements parameter, so atm I'm passing 1 to prevent multiple replacements.
Sorry I would have posted something sooner if I thought you needed a fix. I thought you were just reporting the bug. Specifying a max of 1 replacement seems to work or you could just remove the space then it can't match more than once and could only match the first word because it does use the anchor on the first match, so its not completely ignoring the anchor. Either seems to work. It looks like its acting more like a gmatch (although yes ^ doesn't work with that, I mean its) iterating though the string. When it finds the first match & makes a replacement it starts over from that point & counts that as the ^ anchor point.
Lua Code:
  1. -- Change "^(%l+) " to "^(%l+)"
  2. df("%q %d", string.gsub("une rune de puissance", "^(%l+)", {une="<une>", de="<de>"}))

EDIT:
Originally Posted by Sasky View Post
I wonder if it's consuming the input pattern as it matches. Something like:

First match of "une rune de puissance" is "une ".
It then tries to match against "rune de puissance", then "de puissance", etc.

If that's the case, you could get the space in a separate step.
Lua Code:
  1. string.gsub("une rune de puissance", "^(%l+)", {une="<une@>", de="<de@>"}):gsub("@> ",">",1)
Oh and it looks like sasky already posted that :P
But yes that was my conclusion to when I tested it.

Last edited by circonian : 08/17/15 at 04:24 PM.
 

ESOUI » Developer Discussions » Bug Reports » [outdated] Bug in string.gsub

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off