Friday, 28 March 2014

An Alternative Approach to Assignment 2

As we all know, second part of assignment 2 was a total pain to code and especially, debug. After implementing the all_regex_permutations part by simply permuting the given string and checking whether each permutation is valid or not, I decided to try a different approach to make it faster since this implementation fails to complete in a sensible amount of time after a string which is longer than 12 characters is given as input. The first idea in my mind was to construct a really basic regex tree out of the given string and then permute the tree itself to get every possible variation of the regex. So my first attempt looked like this:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def all_regex_permutations(s):
    '''
    Return a SET of valid regex permutations of s

    >>> all_regex_permutations('(1|0)')
    {'(1|0)','(0|1)'}
    >>> all_regex_permutations('((01.))|e')
    {'(1|(e.0))', '(0.(e|1))', '((1|e).0)', '(e.(0|1))', '(e|(0.1))', '((0|1).e)', '(0|(1.e))', '(1|(0.e))', '(e|(1.0))', '((1.e)|0)', '(0.(1|e))', '((e|1).0)', '(e.(1|0))', '((e|0).1)', '((1|0).e)', '(1.(0|e))', '((0|e).1)', '((0.1)|e)', '((e.1)|0)', '((0.e)|1)', '(0|(e.1))', '(1.(e|0))', '((e.0)|1)', '((1.0)|e)'}
    '''
    starcount = 0
    leaves = []

    def addperm(x, l):
        return [l[0:i] + [x] + l[i:] for i in range(len(l) + 1)]

    def perm(l):
        if len(l) == 0:
            return [[]]
        return [x for y in perm(l[1:]) for x in addperm(l[0], y)]

    def sampletree(_str):
        if((_str.count('(') + _str.count(')')) ==
           2 * (_str.count('.') + _str.count('|'))):
            tree = []
            for _char in _str:
                if(_char in ['0', '1', '2', 'e']):
                    leaves.append(_char)
                    tree.append(Leaf('e'))
            for i in range(_str.count('.')):
                tree.append(DotTree(tree.pop(), tree.pop()))
            for i in range(_str.count('|')):
                tree.append(BarTree(tree.pop(), tree.pop()))
            if len(tree) == 1:
                return tree[0]
            raise ValueError()
        raise ValueError()

    def tostring(r):
        if isinstance(r, Leaf):
            return r.get_symbol()
        if isinstance(r, StarTree):
            return tostring(r.get_children()[0]) + '*'
        if isinstance(r, DotTree) or isinstance(r, BarTree):
            return ('(' + tostring(r.get_children()[0]) +
                    r.get_symbol() + tostring(r.get_children()[1]) + ')')

    def all_tree_permutations(r):
        permlist = []
        if isinstance(r, Leaf):
            permlist.append(r)
        if isinstance(r, BarTree) or isinstance(r, DotTree):
            firstchild = r.get_children()[0]
            secondchild = r.get_children()[1]
            for _fprm in all_tree_permutations(firstchild):
                for _sprm in all_tree_permutations(secondchild):
                    if isinstance(r, BarTree):
                        permlist.append(BarTree(_fprm, _sprm))
                        permlist.append(BarTree(_sprm, _fprm))
                    else:
                        permlist.append(DotTree(_fprm, _sprm))
                        permlist.append(DotTree(_sprm, _fprm))
        return permlist

    def addstars(strlst):
        lst = []
        for _str in strlst:
            for index, _char in enumerate(_str):
                if(_char in ['0', '1', '2', 'e', '*', ')']):
                    lst.append(_str[:index + 1] + '*' + _str[index + 1:])
        return lst

    def permuteleaves(strlst):
        lst = []
        for _str in strlst:
            for permedleaves in perm(leaves):
                s = _str
                for index, _char in enumerate(_str):
                    if _char == 'e':
                        s = (s[:index] + permedleaves.pop() + s[index + 1:])
                lst.append(s)
        return lst

    while s.find('*') != -1:
        starcount += 1
        s = s[:s.find('*')] + s[s.find('*') + 1:]
    try:
        firstree = sampletree(s)
    except:
        return set()
    strlist = [tostring(x) for x in all_tree_permutations(firstree)]
    strlist = permuteleaves(strlist)
    while starcount > 0:
        strlist = addstars(strlist)
        starcount -= 1
    return set(strlist)

To go through the code, addperm and perm methods are there to be used to permute any list or string you throw at them, then we sampletree method which creates a regex tree disregarding every syntax rule you had to use at build_regex_tree and parsing through every string like an insatiable monster that devours any string as a meal and exudes a regex tree. I use sampletree to create my initial regex tree which I then use to permute. We then have the tostring method, it is a really simple methods that gives you a string representation of any regex tree you present. Then we come to the heart of the code, all_tree_permutations method takes a string and then supposedly returns every possible permutation of it by iterating through each permutation of its children. It does not always work as intended and it needs a bit of rethinking. I also have an addstars method which adds stars to every possible place you can put it on. I am using addstars method as it gets rid of loads of complexity on the algorithm if you tried to just parse unary trees, I simply modify the string representation with every possible permutation of stars. Permuteleaves function essentially combines every possible representation of leaves with every other representation of trees therefore completing the problem. When I first drafted the code, I did not feel a need to use a method to permute the leaves since if I could find every possible combination of tree permutations, I would be done with the problem. After a bit of coding and trying a myriad of ways, I simply could not find an easy way to permute trees altogether with their leaves. Permuteleaves method is there for sake of convenience, if you can find a better way to merge two parts, please contact!

Anyways after fiddling with tree permutations for a while, I could not come up with a 100% complete solution therefore I moved onto another approach. This way, the solution is complete and the code is also a bit easier to follow. The basic idea is to use a variation of the addstars method on the code above to add each operator iteratively. Here is the code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def all_regex_permutations(s: str) -> set:
    '''
    Return a set of valid regex permutations of s

    >>> len(all_regex_permutations('(1|0)'))
    2
    >>> len(all_regex_permutations('(0.1*)'))
    6
    >>> len(all_regex_permutations('(0.1*)*'))
    12
    >>> len(all_regex_permutations('((01.))|e'))
    24
    '''
    permset = set([])
    starcount = s.count('*')
    numcharacters = len(s)
    operatorlist = (['.' for i in range(s.count('.'))] +
                    ['|' for i in range(s.count('|'))])

    for char in s:
        if not char in ['0', '1', '2', 'e', '(', ')', '.', '|', '*']:
            return permset

    def allperms(_str):
        '''Return a generator of all permutations of a string'''
        if len(_str) <= 1:
            yield _str
        else:
            for perm in allperms(_str[1:]):
                for i in range(len(perm) + 1):
                    yield perm[:i] + _str[0:1] + perm[i:]

    def addoperator(strlst, operator):
        lst = []
        for _str in strlst:
            for index in range(1, len(_str)):
                if (
                    _str[index - 1] in ['0', '1', '2', 'e', ')'] and
                    _str[index] in ['0', '1', '2', 'e', '(']
                ):
                    startindex = index - 1
                    endindex = index + 1
                    balance = 1
                    if _str[index - 1] == ')':
                        while _str[startindex] != '(' or balance != 1:
                            if _str[startindex] == ')':
                                balance += 1
                            if _str[startindex] == '(':
                                balance -= 1
                                if balance == 1:
                                    break
                            startindex -= 1
                    balance = 2
                    if _str[index] == '(':
                        while _str[endindex] != ')' or balance != 1:
                            if _str[endindex] == '(':
                                balance += 1
                            if _str[endindex] == ')':
                                balance -= 1
                                if balance == 1:
                                    break
                            endindex += 1
                    lst.append(
                        _str[:startindex] +
                        '(' +
                        _str[startindex:index] +
                        operator +
                        _str[index:endindex] +
                        ')' +
                        _str[endindex:]
                    )
        return lst

    def addstars(strlst):
        lst = []
        for _str in strlst:
            for index, _char in enumerate(_str):
                if(_char in ['0', '1', '2', 'e', '*', ')']):
                    lst.append(_str[:index + 1] + '*' + _str[index + 1:])
        return lst

    for _char in ['(', ')', '*', '.', '|']:
        while s.find(_char) != -1:
            s = s[:s.find(_char)] + s[s.find(_char) + 1:]

    leafperms = [i for i in allperms(s)]
    for oplist in allperms(operatorlist):
        lst = leafperms[:]
        for operator in oplist:
            lst = addoperator(lst, operator)
        for i in range(starcount):
            lst = addstars(lst)
        permset |= set(lst)

    for i in permset:
        if numcharacters != len(i):
            return set([])

    return permset

In this version I still have a helper method that gives you the permutation of any list or string you throw at it.(Such convenience!) Then we have the main method that does the manual labor. Addoperator function accepts a list of strings and an operator then gives a list of strings in which the given operator is added to every possible part of each string in the list. The way it does this is by checking any possible indexes to insert an operator, and then checking the balance of parentheses to insert its respective opening and closing parentheses.(This part was hell to debug!) Then we still have our addstars method which I described earlier and that's it for helpers. Firstly, code strips the string of all the characters that are not leaves and then permutes them into a list. We iterate through every operator and add these to our permutation of leaves and that's it! I personally like this way so much that I stuck with this code instead working on the previous way to compute permutations, feel free to derive anything that's working from the tree implementation and thanks for reading!

8 comments:

  1. Wow, nice complex code there! Very heuristical—analyzes what the final thing looks like and creates those patterns, rather than a more naive approach of trying unnecessary places. It's also great because I strongly doubt anyone else had exactly the same solution. (I thought of doing something like addstars at first, but the route I ended up going was very different.)

    One interesting point is that this still doesn't handle all types of long strings. For example, all_regex_permutations('((01.))|e*********') always yields a MemoryError on my machine, while all_regex_permutations('((01.))|e******') is okay (and, indeed executes pretty quickly).

    Note that a bug in your code actually allows any character outside of '(', ')', '.', '|', and '*' to be a leaf in certain circumstances. The following calls all produce something when they shouldn't: all_regex_permutations('8'), all_regex_permutations('123'), all_regex_permutations('123*'). But when you introduce a binary operator, as in all_regex_permutations('13().'), it correctly returns the empty set.

    ReplyDelete
  2. Thanks for noticing that! I changed the code so that it checks whether each character is legal, that's ought to fix it!

    ReplyDelete
  3. The extra 3 lines of code starts at line 20

    ReplyDelete
  4. nice job, i don't think i would have found an alternative method!!

    ReplyDelete
  5. what does the yield keyword do?

    ReplyDelete
    Replies
    1. When you use yield instead of return for a method, the method becomes a generator so it basically calculates permutations every time it needs instead of calculating them all at once which would be a waste of memory. You can read more about them here: https://wiki.python.org/moin/Generators

      Delete