strops¶
This module gathers text operations to be run on a string
cut¶
- class
textops.
cut
(sep=None, col=None, default='')¶Extract columns from a string or a list of strings
This works like the unix shell command ‘cut’. It uses
str.split()
function.
- if the input is a simple string, cut() will return a list of strings representing the splitted input string.
- if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
- if only one column is extracted, one level of list is removed.
Parameters:
- sep (str) – a string as a column separator, default is None : this means ‘any kind of spaces’
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: A string, a list of strings or a list of list of strings
Examples
>>> s='col1 col2 col3' >>> s | cut() ['col1', 'col2', 'col3'] >>> s | cut(col=1) 'col2' >>> s | cut(col='1,2,10',default='N/A') ['col2', 'col3', 'N/A'] >>> s='col1.1 col1.2 col1.3\ncol2.1 col2.2 col2.3' >>> s | cut() [['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']] >>> s | cut(col=1) ['col1.2', 'col2.2'] >>> s | cut(col='0,1') [['col1.1', 'col1.2'], ['col2.1', 'col2.2']] >>> s | cut(col=[1,2]) [['col1.2', 'col1.3'], ['col2.2', 'col2.3']] >>> s='col1.1 | col1.2 | col1.3\ncol2.1 | col2.2 | col2.3' >>> s | cut() [['col1.1', '|', 'col1.2', '|', 'col1.3'], ['col2.1', '|', 'col2.2', '|', 'col2.3']] >>> s | cut(sep=' | ') [['col1.1', 'col1.2', ' col1.3'], ['col2.1', 'col2.2', 'col2.3']]
cutca¶
- class
textops.
cutca
(sep=None, col=None, default='')¶Extract columns from a string or a list of strings through pattern capture
This works like
textops.cutre
except it needs a pattern having parenthesis to capture column. It usesre.match()
for capture, this means the pattern must start at line beginning.
- if the input is a simple string, cutca() will return a list of strings representing the splitted input string.
- if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
- if only one column is extracted, one level of list is removed.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cutca(r'[^-]*-([^-]*)-[^=]*=([^=]*)=[^_]*_([^_]*)_') ['col1', 'col2', 'col3'] >>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_'] >>> s | cutca(r'[^-]*-([^-]*)-[^=]*=([^=]*)=[^_]*_([^_]*)_','0,2,4','not present') [['col1', 'col3', 'not present'], ['col11', 'col33', 'not present']]
cutdct¶
- class
textops.
cutdct
(sep=None, col=None, default='')¶Extract columns from a string or a list of strings through pattern capture
This works like
textops.cutca
except it needs a pattern having named parenthesis to capture column.
- if the input is a simple string, cutca() will return a list of strings representing the splitted input string.
- if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
- if only one column is extracted, one level of list is removed.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having named capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: A string, a list of strings or a list of list of strings
Examples
>>> s='item="col1" count="col2" price="col3"' >>> s | cutdct(r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"') {'item': 'col1', 'i_count': 'col2', 'i_price': 'col3'} >>> s='item="col1" count="col2" price="col3"\nitem="col11" count="col22" price="col33"' >>> s | cutdct(r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"') # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE [{'item': 'col1', 'i_count': 'col2', 'i_price': 'col3'}, {'item': 'col11', 'i_count': 'col22', 'i_price': 'col33'}]
cutkv¶
- class
textops.
cutkv
(sep=None, col=None, default='')¶Extract columns from a string or a list of strings through pattern capture
This works like
textops.cutdct
except it return a dict where the key is the one captured with the name given in parameter ‘key_name’, and where the value is the full dict of captured values. The interest is to merge informations into a bigger dict : seemerge_dicts()
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having named capture parenthesis
- key_name (str) – specify the named capture to use as the key for the returned dict Default value is ‘key’
Note
key_name=
must be specified (not a positionnal parameter)
Returns: A dict or a list of dict Examples
>>> s='item="col1" count="col2" price="col3"' >>> pattern=r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"' >>> s | cutkv(pattern,key_name='item') {'col1': {'item': 'col1', 'i_count': 'col2', 'i_price': 'col3'}} >>> s='item="col1" count="col2" price="col3"\nitem="col11" count="col22" price="col33"' >>> s | cutkv(pattern,key_name='item') # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE [{'col1': {'item': 'col1', 'i_count': 'col2', 'i_price': 'col3'}}, {'col11': {'item': 'col11', 'i_count': 'col22', 'i_price': 'col33'}}]
cutm¶
- class
textops.
cutm
(sep=None, col=None, default='')¶Extract exactly one column by using
re.match()
It returns the matched pattern. Beware : the pattern must match the beginning of the line. One may use capture parenthesis to only return a part of the found pattern.
- if the input is a simple string,
textops.cutm
will return a strings representing the captured substring.- if the input is a list of strings or a string with newlines,
textops.cutm
will return a list of captured substring.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cutm(r'[^=]*=[^=]*=') '-col1- =col2=' >>> s | cutm(r'=[^=]*=') '' >>> s | cutm(r'[^=]*=([^=]*)=') 'col2' >>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_'] >>> s | cutm(r'[^-]*-([^-]*)-') ['col1', 'col11'] >>> s | cutm(r'[^-]*-(badpattern)-',default='-') ['-', '-']
cutmi¶
- class
textops.
cutmi
(sep=None, col=None, default='')¶Extract exactly one column by using
re.match()
(case insensitive)This works like
textops.cutm
except it is case insensitive.
- if the input is a simple string,
textops.cutmi
will return a strings representing the captured substring.- if the input is a list of strings or a string with newlines,
textops.cutmi
will return a list of captured substring.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cutm(r'.*(COL\d+)',default='no found') 'no found' >>> s='-col1- =col2= _col3_' >>> s | cutmi(r'.*(COL\d+)',default='no found') #as .* is the longest possible, only last column is extracted 'col3'
cutre¶
- class
textops.
cutre
(sep=None, col=None, default='')¶Extract columns from a string or a list of strings with re.split()
This works like the unix shell command ‘cut’. It uses
re.split()
function.
- if the input is a simple string, cutre() will return a list of strings representing the splitted input string.
- if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
- if only one column is extracted, one level of list is removed.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object as a column separator
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: A string, a list of strings or a list of list of strings
Examples
>>> s='col1.1 | col1.2 | col1.3\ncol2.1 | col2.2 | col2.3' >>> print(s) col1.1 | col1.2 | col1.3 col2.1 | col2.2 | col2.3 >>> s | cutre(r'\s+') [['col1.1', '|', 'col1.2', '|', 'col1.3'], ['col2.1', '|', 'col2.2', '|', 'col2.3']] >>> s | cutre(r'[\s|]+') [['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']] >>> s | cutre(r'[\s|]+','0,2,4','-') [['col1.1', 'col1.3', '-'], ['col2.1', 'col2.3', '-']] >>> mysep = re.compile(r'[\s|]+') >>> s | cutre(mysep) [['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]
cuts¶
- class
textops.
cuts
(sep=None, col=None, default='')¶Extract exactly one column by using
re.search()
This works like
textops.cutm
except it searches the first occurence of the pattern in the string. One may use capture parenthesis to only return a part of the found pattern.
- if the input is a simple string,
textops.cuts
will return a strings representing the captured substring.- if the input is a list of strings or a string with newlines,
textops.cuts
will return a list of captured substring.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cuts(r'_[^_]*_') '_col3_' >>> s | cuts(r'_([^_]*)_') 'col3' >>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_'] >>> s | cuts(r'_([^_]*)_') ['col3', 'col33']
cutsa¶
- class
textops.
cutsa
(sep=None, col=None, default='')¶Extract all columns having the specified pattern.
It uses
re.finditer()
to find all occurences of the pattern.
- if the input is a simple string,
textops.cutfa
will return a list of found strings.- if the input is a list of strings or a string with newlines,
textops.cutfa
will return a list of list of found string.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cutsa(r'col\d+') ['col1', 'col2', 'col3'] >>> s | cutsa(r'col(\d+)') ['1', '2', '3'] >>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_'] >>> s | cutsa(r'col\d+') [['col1', 'col2', 'col3'], ['col11', 'col22', 'col33']]
cutsai¶
- class
textops.
cutsai
(sep=None, col=None, default='')¶Extract all columns having the specified pattern. (case insensitive)
It works like
textops.cutsa
but is case insensitive if the pattern is given as a string.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cutsa(r'COL\d+') [] >>> s | cutsai(r'COL\d+') ['col1', 'col2', 'col3'] >>> s | cutsai(r'COL(\d+)') ['1', '2', '3'] >>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_'] >>> s | cutsai(r'COL\d+') [['col1', 'col2', 'col3'], ['col11', 'col22', 'col33']]
cutsi¶
- class
textops.
cutsi
(sep=None, col=None, default='')¶Extract exactly one column by using
re.search()
(case insensitive)This works like
textops.cuts
except it is case insensitive.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cuts(r'_(COL[^_]*)_') '' >>> s='-col1- =col2= _col3_' >>> s | cutsi(r'_(COL[^_]*)_') 'col3'
echo¶
- class
textops.
echo
¶identity operation
it returns the same text, except that is uses textops Extended classes (StrExt, ListExt …). This could be usefull in some cases to access str methods (upper, replace, …) just after a pipe.
Returns: length of the string Return type: int Examples
>>> s='this is a string' >>> type(s) <class 'str'> >>> t=s | echo() >>> type(t) <class 'textops.base.StrExt'> >>> s.upper() 'THIS IS A STRING' >>> s | upper() Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'upper' is not defined >>> s | echo().upper() 'THIS IS A STRING' >>> s | strop.upper() 'THIS IS A STRING'
length¶
- class
textops.
length
¶Returns the length of a string, list or generator
Returns: length of the string Return type: int Examples
>>> s='this is a string' >>> s | length() 16 >>> s=StrExt(s) >>> s.length() 16 >>> ['a','b','c'] | length() 3 >>> def mygenerator():yield 3; yield 2 >>> mygenerator() | length() 2
matches¶
- class
textops.
matches
(pattern)¶Tests whether a pattern is present or not
Uses re.match() to match a pattern against the string.
Parameters: pattern (str) – a regular expression string Returns: The pattern found Return type: re.RegexObject Note
Be careful : the pattern is tested from the beginning of the string, the pattern is NOT searched somewhere in the middle of the string.
Examples
>>> state=StrExt('good') >>> print('OK' if state.matches(r'good|not_present|charging') else 'CRITICAL') OK >>> state=StrExt('looks like all is good') >>> print('OK' if state.matches(r'good|not_present|charging') else 'CRITICAL') CRITICAL >>> print('OK' if state.matches(r'.*(good|not_present|charging)') else 'CRITICAL') OK >>> state=StrExt('Error') >>> print('OK' if state.matches(r'good|not_present|charging') else 'CRITICAL') CRITICAL
searches¶
- class
textops.
searches
(pattern)¶Search a pattern
Uses re.search() to find a pattern in the string.
Parameters: pattern (str) – a regular expression string Returns: The pattern found Return type: re.RegexObject Examples
>>> state=StrExt('good') >>> print('OK' if state.searches(r'good|not_present|charging') else 'CRITICAL') OK >>> state=StrExt('looks like all is good') >>> print('OK' if state.searches(r'good|not_present|charging') else 'CRITICAL') OK >>> print('OK' if state.searches(r'.*(good|not_present|charging)') else 'CRITICAL') OK >>> state=StrExt('Error') >>> print('OK' if state.searches(r'good|not_present|charging') else 'CRITICAL') CRITICAL
splitln¶
- class
textops.
splitln
¶Transforms a string with newlines into a list of lines
It uses python str.splitlines() : newline separator can be \n or \r or both. They are removed during the process.
Returns: The splitted text Return type: list Example
>>> s='this is\na multi-line\nstring' >>> s | splitln() ['this is', 'a multi-line', 'string']