[outdated] Bug in string.gsub

merlight · 08/17/15, 06:03 AM

string.gsub has an optional max_replacements parameter, so atm I'm passing 1 to prevent multiple replacements.

Some more tests:

Lua Code:

/script function gsubtest(pat, str) local i=0; local function repl(m) i=i+1; return string.format("(%d:%s)", i, m); end; df("gsub(%q, %q) -> %q", pat, str, string.gsub(pat, str, repl)); end
 
/script gsubtest("une rune de puissance", "^r*une")
-> "(1:une) rune de puissance"
 
/script gsubtest("une rune de puissance", "^r*une ")
-> "(1:une )(2:rune )de puissance"
 
/script gsubtest(" une rune de puissance", "^r*une ")
-> " une rune de puissance"

Warning: Spoiler

It appears the anchor actually works at first. The 3rd example string starts with a space, and there's no match, which is correct. But in the 2nd example, after the first replacement (1:une ), the anchor incorrectly matches at "rune", as if the matcher was reset and thought it was at the start of the string.

Sasky · 08/17/15, 12:24 PM

I wonder if it's consuming the input pattern as it matches. Something like:

First match of "une rune de puissance" is "une ".
It then tries to match against "rune de puissance", then "de puissance", etc.

If that's the case, you could get the space in a separate step.

Lua Code:

string.gsub("une rune de puissance", "^(%l+)", {une="<une@>", de="<de@>"}):gsub("@> ",">",1)

circonian · 08/17/15, 04:10 PM

Originally Posted by merlight

string.gsub has an optional max_replacements parameter, so atm I'm passing 1 to prevent multiple replacements.

Sorry I would have posted something sooner if I thought you needed a fix. I thought you were just reporting the bug. Specifying a max of 1 replacement seems to work or you could just remove the space then it can't match more than once and could only match the first word because it does use the anchor on the first match, so its not completely ignoring the anchor. Either seems to work. It looks like its acting more like a gmatch (although yes ^ doesn't work with that, I mean its) iterating though the string. When it finds the first match & makes a replacement it starts over from that point & counts that as the ^ anchor point.

Lua Code:

-- Change "^(%l+) " to "^(%l+)"
df("%q %d", string.gsub("une rune de puissance", "^(%l+)", {une="<une>", de="<de>"}))

EDIT:

Originally Posted by Sasky

I wonder if it's consuming the input pattern as it matches. Something like:

First match of "une rune de puissance" is "une ".
It then tries to match against "rune de puissance", then "de puissance", etc.

If that's the case, you could get the space in a separate step.

Lua Code:

string.gsub("une rune de puissance", "^(%l+)", {une="<une@>", de="<de@>"}):gsub("@> ",">",1)

Oh and it looks like sasky already posted that :P
But yes that was my conclusion to when I tested it.