Python - Regular Expressions
matchObj = re.search(r'(\d+\.\d+)', text) if matchObj: price = float( '%s' % (matchObj).group(0) )
Search and Replace
re.sub(regex, replacement, str) performs a search-and-replace across subject, replacing all matches of regex in str with replacement. The result is returned by the sub() function. The str string you pass is not modified.
split() splits a string into a list delimited by the passed pattern. The method is invaluable for converting textual data into data structures that can be easily read and modified by Python as demonstrated in the following example that creates a phonebook.
First, here is the input. Normally it may come from a file, here we are using triple-quoted string syntax:
>>> input = """Ross McFluff: 834.345.1254 155 Elm Street ... ... Ronald Heathmore: 892.345.3428 436 Finley Avenue ... Frank Burger: 925.541.7625 662 South Dogwood Way ... ... ... Heather Albrecht: 548.326.4584 919 Park Place"""
The entries are separated by one or more newlines. Now we convert the string into a list with each nonempty line having its own entry:
>>> entries = re.split("\n+", input) >>> entries ['Ross McFluff: 834.345.1254 155 Elm Street', 'Ronald Heathmore: 892.345.3428 436 Finley Avenue', 'Frank Burger: 925.541.7625 662 South Dogwood Way', 'Heather Albrecht: 548.326.4584 919 Park Place']
Finally, split each entry into a list with first name, last name, telephone number, and address. We use the maxsplit parameter of split() because the address has spaces, our splitting pattern, in it:
>>> [re.split(":? ", entry, 3) for entry in entries] [['Ross', 'McFluff', '834.345.1254', '155 Elm Street'], ['Ronald', 'Heathmore', '892.345.3428', '436 Finley Avenue'], ['Frank', 'Burger', '925.541.7625', '662 South Dogwood Way'], ['Heather', 'Albrecht', '548.326.4584', '919 Park Place']]
perl grep and map
def grep(list, pattern): expr = re.compile(pattern) return [elem for elem in list if expr.match(elem)]
def map(list, was, womit): return list(map(lambda i: re.sub(was, womit, i), list)) # was = '.*"(\d+)".*' # womit = r'\1'