(This is a draft for a potential Kata series, left as a Kumite for now)
Regex engine from scratch
Throughout this series you'll be gradually piecing together a regex engine, with each successive Kata adding an additional piece of syntax into the mix.
You may wish to copy code you've developed in earlier parts of the series for later parts so that you can gradually build up the engine as you go, or you can cater each solution to the specific syntax proposed. The latter might be better for the first few steps as they're very simple.
More or less the same
Following on from the previous part, we'll now be adding support for the simple quantifiers: *
, +
, ?
. Support for braced quantifiers such as {3}
will not be added, so braces should be treated like letters.
In part 1 you implemented the dot syntax, so you must implement that in this part too.
All other syntax should be treated as letters (and just matched on), even if they would be valid regex syntax in a real regex parser.
For example, even though [a]
would typically be a character class containing the letter a, we haven't implemented character classes yet, so it should be treated as \[a\]
. This also means you can't just import re
, it won't work!
Your function will be provided with a pattern as a string. This pattern should be handled somehow to produce a function that can be used to match on a string. You should expect this prepared function to be called multiple times with different inputs, it can't match just once.
The produced match function should behave as if it had anchors on either end, so it should match the entire string.
Examples
match('.*')('cat') => True # .* should match anything
match('c.?t')('ct') => True # the dot is made optional by the ?
match('c.?t')('cat') => True
match('c[a]t')('cat') => False # We haven't implemented character classes
match('c[a]t')('c[a]t') => True
match('c{2}t')('cct') => False # We also won't be implementing braced quantifiers
match('c{2}t')('c{2}t') => True
# This is a k=1 solution and I'm not too happy with it.
# I was hoping to try and find an okay balance for something
# that could be written and understood in a Kata solution without being a full
# blown automata, but it's kind of unwieldy.
class PeekableIterator(object):
def __init__(self, iterator):
self.iterator = iterator
self.ahead = None
def peek(self):
if self.ahead:
return self.ahead
self.ahead = self.read()
return self.ahead
def read(self):
if (result := self.ahead):
self.ahead = None
return result
try:
return next(self.iterator)
except StopIteration:
return ''
def match(pattern):
ops = []
for part in pattern:
if part == '?':
op = ops.pop()
ops.append(('optional', op))
elif part == '*':
op = ops.pop()
ops.append(('many_or_none', op))
elif part == '+':
op = ops.pop()
ops.append(('many', op))
else:
ops.append(('match', part))
def do_op(stream, op, component):
if op == 'match':
return (component == '.' and stream.peek()) or stream.peek() == component
elif op == 'optional':
op, component = component
if do_op(stream, op, component) and op == 'match':
stream.read()
return True
elif op == 'many_or_none' or op == 'many':
original_op = op
op, component = component
i = 0
while do_op(stream, op, component):
if op == 'match':
stream.read()
i += 1
return original_op == 'many_or_none' or i > 0
raise "Huh?"
def matcher(string):
stream = PeekableIterator(iter(string))
for op, component in ops:
if not do_op(stream, op, component):
return False
if op == 'match':
stream.read()
return stream.read() == ''
return matcher
import codewars_test as test
# TODO Write tests
import solution # or from solution import example
# test.assert_equals(actual, expected, [optional] message)
@test.describe("Example")
def test_group():
@test.it("test case")
def test_case():
# Part 1
cat = match('c.t')
test.assert_equals(cat('cat'), True)
test.assert_equals(cat('cbt'), True)
test.assert_equals(cat('catastrophic'), False)
test.assert_equals(match('c.t')('abc'), False)
test.assert_equals(match('[.]')('[a]'), True)
test.assert_equals(match('[a]')('a'), False)
# Part 2 (could do with more tests)
test.assert_equals(match('.*')('cat'), True)
test.assert_equals(match('.*')(''), True)
test.assert_equals(match('a*')(''), True)
test.assert_equals(match('a*')('a'), True)
test.assert_equals(match('a*')('aa'), True)
test.assert_equals(match('ab?c')('ac'), True)
test.assert_equals(match('ab?c')('abc'), True)
test.assert_equals(match('a+')(''), False)
test.assert_equals(match('a+')('a'), True)
test.assert_equals(match('a+')('aa'), True)
def match(pattern): def matcher(string): return len(pattern) == len(string) and all(a == '.' or a == b for a, b in zip(pattern, string)) return matcher
- def match(pattern):
- def matcher(string):
return len(pattern) == len(string) and all(a == b if a != '.' else True for a, b in zip(pattern, string))- return len(pattern) == len(string) and all(a == '.' or a == b for a, b in zip(pattern, string))
- return matcher
(This is a draft for a potential Kata series, left as a Kumite for now)
Regex engine from scratch
Throughout this series you'll be gradually piecing together a regex engine, with each successive Kata adding an additional piece of syntax into the mix.
You may wish to copy code you've developed in earlier parts of the series for later parts so that you can gradually build up the engine as you go, or you can cater each solution to the specific syntax proposed. The latter might be better for the first few steps as they're very simple.
It all begins with a dot.
For the first step we're going to implement regex's .
syntax.
To keep things simple, every other symbol except for a period should be treated as if it was inside a character set, even if the symbol would be treated as syntax by a real regex engine.
For example, the input [.]
, even though it looks like a character class, should be handled as if it was \[.\]
because we're only implementing the dot metacharacter for now. This also means you can't cheat and just import re!
Your function will be provided a pattern as a string. This pattern should be handled somehow to produce a function that can then be used to match on a string. You should expect this prepared function to be called multiple times with different inputs, it can't match just once.
The produced match function should behave as if it had anchors on either end, so it should match the entire string.
Examples
match('c.t')("cat") => True
match('c.t')("cbt") => True
match('c.t')("catastrophic") => False (must match entire string)
match('c.t')("abc") => False (not a match)
match('[.]')('[a]') => True (we don't support character classes, treat them like letters)
match('[a]')('a') => False (we've not implemented character classes yet)
def match(pattern):
def matcher(string):
return len(pattern) == len(string) and all(a == b if a != '.' else True for a, b in zip(pattern, string))
return matcher
import codewars_test as test
# TODO Write tests
import solution # or from solution import example
# test.assert_equals(actual, expected, [optional] message)
@test.describe("Example")
def test_group():
@test.it("test case")
def test_case():
cat = match('c.t')
test.assert_equals(cat('cat'), True)
test.assert_equals(cat('cbt'), True)
test.assert_equals(cat('catastrophic'), False)
test.assert_equals(match('c.t')('abc'), False)
test.assert_equals(match('[.]')('[a]'), True)
test.assert_equals(match('[a]')('a'), False)
# Shorter form of the last version. add=lambda s,o:sum(filter(lambda n:not o or~(n+o)&1,map(int,s)))
# 1 & 1 is 1, 10 & 1 is 0, we know that the least significant bit for any integer is 1 for odd and 0 for even# so we can just compare the least significant bit of both values to answer the questionadd=lambda s,o:sum(filter(lambda n: n & 1 == o & 1 if o and o < 3 else n, map(int, s)))# Ever so slightly shorteradd=lambda s,o:sum(filter(lambda n: ~(n + o) & 1 if o and o < 3 else n, map(int, s)))- # Shorter form of the last version.
- add=lambda s,o:sum(filter(lambda n:not o or~(n+o)&1,map(int,s)))
# 1 & 1 is 1, 10 & 1 is 0, we know that the least significant bit for any integer is 1 for odd and 0 for even # so we can just compare the least significant bit of both values to answer the question add=lambda s,o:sum(filter(lambda n: n & 1 == o & 1 if o and o < 3 else n, map(int, s))) # Ever so slightly shorter add=lambda s,o:sum(filter(lambda n: ~(n + o) & 1 if o and o < 3 else n, map(int, s)))
add=lambda s,o:sum(filter((id,(1).__and__,lambda x:~x&1)[o%3],map(int,s)))- # 1 & 1 is 1, 10 & 1 is 0, we know that the least significant bit for any integer is 1 for odd and 0 for even
- # so we can just compare the least significant bit of both values to answer the question
- add=lambda s,o:sum(filter(lambda n: n & 1 == o & 1 if o and o < 3 else n, map(int, s)))
- # Ever so slightly shorter
- add=lambda s,o:sum(filter(lambda n: ~(n + o) & 1 if o and o < 3 else n, map(int, s)))