Nexus File Manager
v2.0
🏠
Dashboard
⚡
System
🌐
›
opt
›
alt
›
python37
›
share
›
doc
›
alt-python37-pyparsing-doc
›
html
Quick:
⬆️ Parent
🌐 Root
🏠 Home
🌍 WWW
📁 Temp
⚙️ Etc
📤 Upload
📁 Create
⚡ WordPress Admin
🔄 Refresh
✏️ HowToUsePyparsing.html
← Back
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="X-UA-Compatible" content="IE=Edge" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>1 Using the pyparsing module — PyParsing 3.0.9 documentation</title> <link rel="stylesheet" href="_static/alabaster.css" type="text/css" /> <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> <script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script> <script type="text/javascript" src="_static/jquery.js"></script> <script type="text/javascript" src="_static/underscore.js"></script> <script type="text/javascript" src="_static/doctools.js"></script> <link rel="index" title="Index" href="genindex.html" /> <link rel="search" title="Search" href="search.html" /> <link rel="next" title="pyparsing" href="modules.html" /> <link rel="prev" title="1 What’s New in Pyparsing 3.0.0" href="whats_new_in_3_0_0.html" /> <link rel="stylesheet" href="_static/custom.css" type="text/css" /> <meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9" /> </head><body> <div class="document"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body" role="main"> <div class="section" id="using-the-pyparsing-module"> <h1><a class="toc-backref" href="#id1">1 Using the pyparsing module</a><a class="headerlink" href="#using-the-pyparsing-module" title="Permalink to this headline">¶</a></h1> <table class="docutils field-list" frame="void" rules="none"> <col class="field-name" /> <col class="field-body" /> <tbody valign="top"> <tr class="field-odd field"><th class="field-name">author:</th><td class="field-body">Paul McGuire</td> </tr> <tr class="field-even field"><th class="field-name">address:</th><td class="field-body"><a class="reference external" href="mailto:ptmcg.pm+pyparsing%40gmail.com">ptmcg<span>.</span>pm+pyparsing<span>@</span>gmail<span>.</span>com</a></td> </tr> <tr class="field-odd field"><th class="field-name">revision:</th><td class="field-body">3.0.0</td> </tr> <tr class="field-even field"><th class="field-name">date:</th><td class="field-body">October, 2021</td> </tr> <tr class="field-odd field"><th class="field-name">copyright:</th><td class="field-body">Copyright © 2003-2022 Paul McGuire.</td> </tr> </tbody> </table> <table class="docutils field-list" frame="void" rules="none"> <col class="field-name" /> <col class="field-body" /> <tbody valign="top"> <tr class="field-odd field"><th class="field-name">abstract:</th><td class="field-body">This document provides how-to instructions for the pyparsing library, an easy-to-use Python module for constructing and executing basic text parsers. The pyparsing module is useful for evaluating user-definable expressions, processing custom application language commands, or extracting data from formatted reports.</td> </tr> </tbody> </table> <div class="contents topic" id="contents"> <p class="topic-title first">Contents</p> <ul class="auto-toc simple"> <li><a class="reference internal" href="#using-the-pyparsing-module" id="id1">1 Using the pyparsing module</a><ul class="auto-toc"> <li><a class="reference internal" href="#steps-to-follow" id="id2">1.1 Steps to follow</a><ul class="auto-toc"> <li><a class="reference internal" href="#hello-world" id="id3">1.1.1 Hello, World!</a></li> <li><a class="reference internal" href="#usage-notes" id="id4">1.1.2 Usage notes</a></li> </ul> </li> <li><a class="reference internal" href="#classes" id="id5">1.2 Classes</a><ul class="auto-toc"> <li><a class="reference internal" href="#classes-in-the-pyparsing-module" id="id6">1.2.1 Classes in the pyparsing module</a></li> <li><a class="reference internal" href="#basic-parserelement-subclasses" id="id7">1.2.2 Basic ParserElement subclasses</a></li> <li><a class="reference internal" href="#expression-subclasses" id="id8">1.2.3 Expression subclasses</a></li> <li><a class="reference internal" href="#expression-operators" id="id9">1.2.4 Expression operators</a></li> <li><a class="reference internal" href="#positional-subclasses" id="id10">1.2.5 Positional subclasses</a></li> <li><a class="reference internal" href="#converter-subclasses" id="id11">1.2.6 Converter subclasses</a></li> <li><a class="reference internal" href="#special-subclasses" id="id12">1.2.7 Special subclasses</a></li> <li><a class="reference internal" href="#other-classes" id="id13">1.2.8 Other classes</a></li> <li><a class="reference internal" href="#exception-classes-and-troubleshooting" id="id14">1.2.9 Exception classes and Troubleshooting</a></li> </ul> </li> <li><a class="reference internal" href="#miscellaneous-attributes-and-methods" id="id15">1.3 Miscellaneous attributes and methods</a><ul class="auto-toc"> <li><a class="reference internal" href="#helper-methods" id="id16">1.3.1 Helper methods</a></li> <li><a class="reference internal" href="#helper-parse-actions" id="id17">1.3.2 Helper parse actions</a></li> <li><a class="reference internal" href="#common-string-and-token-constants" id="id18">1.3.3 Common string and token constants</a></li> <li><a class="reference internal" href="#unicode-character-sets-for-international-parsing" id="id19">1.3.4 Unicode character sets for international parsing</a></li> </ul> </li> <li><a class="reference internal" href="#generating-railroad-diagrams" id="id20">1.4 Generating Railroad Diagrams</a><ul class="auto-toc"> <li><a class="reference internal" href="#usage" id="id21">1.4.1 Usage</a></li> <li><a class="reference internal" href="#example" id="id22">1.4.2 Example</a></li> <li><a class="reference internal" href="#naming-tip" id="id23">1.4.3 Naming tip</a></li> <li><a class="reference internal" href="#customization" id="id24">1.4.4 Customization</a></li> </ul> </li> </ul> </li> </ul> </div> <p>Note: While this content is still valid, there are more detailed descriptions and extensive examples at the <a class="reference external" href="https://pyparsing-docs.readthedocs.io/en/latest/pyparsing.html">online doc server</a>, and in the online help for the various pyparsing classes and methods (viewable using the Python interpreter’s built-in <code class="docutils literal notranslate"><span class="pre">help()</span></code> function). You will also find many example scripts in the <a class="reference external" href="https://github.com/pyparsing/pyparsing/tree/master/examples">examples</a> directory of the pyparsing GitHub repo.</p> <hr class="docutils" /> <p><strong>Note</strong>: <em>In pyparsing 3.0, many method and function names which were originally written using camelCase have been converted to PEP8-compatible snake_case. So ``parseString()`` is being renamed to ``parse_string()``, ``delimitedList`` to ``delimited_list``, and so on. You may see the old names in legacy parsers, and they will be supported for a time with synonyms, but the synonyms will be removed in a future release.</em></p> <p><em>If you are using this documentation, but working with a 2.4.x version of pyparsing, you’ll need to convert methods and arguments from the documented snake_case names to the legacy camelCase names. In pyparsing 3.0.x, both forms are supported, but the legacy forms are deprecated; they will be dropped in a future release.</em></p> <hr class="docutils" /> <div class="section" id="steps-to-follow"> <h2><a class="toc-backref" href="#id2">1.1 Steps to follow</a><a class="headerlink" href="#steps-to-follow" title="Permalink to this headline">¶</a></h2> <p>To parse an incoming data string, the client code must follow these steps:</p> <ol class="arabic simple"> <li>First define the tokens and patterns to be matched, and assign this to a program variable. Optional results names or parse actions can also be defined at this time.</li> <li>Call <code class="docutils literal notranslate"><span class="pre">parse_string()</span></code> or <code class="docutils literal notranslate"><span class="pre">scan_string()</span></code> on this variable, passing in the string to be parsed. During the matching process, whitespace between tokens is skipped by default (although this can be changed). When token matches occur, any defined parse action methods are called.</li> <li>Process the parsed results, returned as a <a class="reference internal" href="#parseresults">ParseResults</a> object. The <a class="reference internal" href="#parseresults">ParseResults</a> object can be accessed as if it were a list of strings. Matching results may also be accessed as named attributes of the returned results, if names are defined in the definition of the token pattern, using <code class="docutils literal notranslate"><span class="pre">set_results_name()</span></code>.</li> </ol> <div class="section" id="hello-world"> <h3><a class="toc-backref" href="#id3">1.1.1 Hello, World!</a><a class="headerlink" href="#hello-world" title="Permalink to this headline">¶</a></h3> <p>The following complete Python program will parse the greeting <code class="docutils literal notranslate"><span class="pre">"Hello,</span> <span class="pre">World!"</span></code>, or any other greeting of the form “<salutation>, <addressee>!”:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pyparsing</span> <span class="k">as</span> <span class="nn">pp</span> <span class="n">greet</span> <span class="o">=</span> <span class="n">pp</span><span class="o">.</span><span class="n">Word</span><span class="p">(</span><span class="n">pp</span><span class="o">.</span><span class="n">alphas</span><span class="p">)</span> <span class="o">+</span> <span class="s2">","</span> <span class="o">+</span> <span class="n">pp</span><span class="o">.</span><span class="n">Word</span><span class="p">(</span><span class="n">pp</span><span class="o">.</span><span class="n">alphas</span><span class="p">)</span> <span class="o">+</span> <span class="s2">"!"</span> <span class="k">for</span> <span class="n">greeting_str</span> <span class="ow">in</span> <span class="p">[</span> <span class="s2">"Hello, World!"</span><span class="p">,</span> <span class="s2">"Bonjour, Monde!"</span><span class="p">,</span> <span class="s2">"Hola, Mundo!"</span><span class="p">,</span> <span class="s2">"Hallo, Welt!"</span><span class="p">,</span> <span class="p">]:</span> <span class="n">greeting</span> <span class="o">=</span> <span class="n">greet</span><span class="o">.</span><span class="n">parse_string</span><span class="p">(</span><span class="n">greeting_str</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">greeting</span><span class="p">)</span> </pre></div> </div> <p>The parsed tokens are returned in the following form:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="s1">'Hello'</span><span class="p">,</span> <span class="s1">','</span><span class="p">,</span> <span class="s1">'World'</span><span class="p">,</span> <span class="s1">'!'</span><span class="p">]</span> <span class="p">[</span><span class="s1">'Bonjour'</span><span class="p">,</span> <span class="s1">','</span><span class="p">,</span> <span class="s1">'Monde'</span><span class="p">,</span> <span class="s1">'!'</span><span class="p">]</span> <span class="p">[</span><span class="s1">'Hola'</span><span class="p">,</span> <span class="s1">','</span><span class="p">,</span> <span class="s1">'Mundo'</span><span class="p">,</span> <span class="s1">'!'</span><span class="p">]</span> <span class="p">[</span><span class="s1">'Hallo'</span><span class="p">,</span> <span class="s1">','</span><span class="p">,</span> <span class="s1">'Welt'</span><span class="p">,</span> <span class="s1">'!'</span><span class="p">]</span> </pre></div> </div> </div> <div class="section" id="usage-notes"> <h3><a class="toc-backref" href="#id4">1.1.2 Usage notes</a><a class="headerlink" href="#usage-notes" title="Permalink to this headline">¶</a></h3> <ul> <li><p class="first">The pyparsing module can be used to interpret simple command strings or algebraic expressions, or can be used to extract data from text reports with complicated format and structure (“screen or report scraping”). However, it is possible that your defined matching patterns may accept invalid inputs. Use pyparsing to extract data from strings assumed to be well-formatted.</p> </li> <li><p class="first">To keep up the readability of your code, use <a class="reference internal" href="#operators">operators</a> such as <code class="docutils literal notranslate"><span class="pre">+</span></code>, <code class="docutils literal notranslate"><span class="pre">|</span></code>, <code class="docutils literal notranslate"><span class="pre">^</span></code>, and <code class="docutils literal notranslate"><span class="pre">~</span></code> to combine expressions. You can also combine string literals with <code class="docutils literal notranslate"><span class="pre">ParseExpressions</span></code> - they will be automatically converted to <a class="reference internal" href="#literal">Literal</a> objects. For example:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">integer</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> <span class="c1"># simple unsigned integer</span> <span class="n">variable</span> <span class="o">=</span> <span class="n">Char</span><span class="p">(</span><span class="n">alphas</span><span class="p">)</span> <span class="c1"># single letter variable, such as x, z, m, etc.</span> <span class="n">arith_op</span> <span class="o">=</span> <span class="n">one_of</span><span class="p">(</span><span class="s2">"+ - * /"</span><span class="p">)</span> <span class="c1"># arithmetic operators</span> <span class="n">equation</span> <span class="o">=</span> <span class="n">variable</span> <span class="o">+</span> <span class="s2">"="</span> <span class="o">+</span> <span class="n">integer</span> <span class="o">+</span> <span class="n">arith_op</span> <span class="o">+</span> <span class="n">integer</span> <span class="c1"># will match "x=2+2", etc.</span> </pre></div> </div> <p>In the definition of <code class="docutils literal notranslate"><span class="pre">equation</span></code>, the string <code class="docutils literal notranslate"><span class="pre">"="</span></code> will get added as a <code class="docutils literal notranslate"><span class="pre">Literal("=")</span></code>, but in a more readable way.</p> </li> <li><p class="first">The pyparsing module’s default behavior is to ignore whitespace. This is the case for 99% of all parsers ever written. This allows you to write simple, clean, grammars, such as the above <code class="docutils literal notranslate"><span class="pre">equation</span></code>, without having to clutter it up with extraneous <code class="docutils literal notranslate"><span class="pre">ws</span></code> markers. The <code class="docutils literal notranslate"><span class="pre">equation</span></code> grammar will successfully parse all of the following statements:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">x</span><span class="o">=</span><span class="mi">2</span><span class="o">+</span><span class="mi">2</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">2</span><span class="o">+</span><span class="mi">2</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">4</span> <span class="n">r</span><span class="o">=</span> <span class="mi">1234</span><span class="o">/</span> <span class="mi">100000</span> </pre></div> </div> <p>Of course, it is quite simple to extend this example to support more elaborate expressions, with nesting with parentheses, floating point numbers, scientific notation, and named constants (such as <code class="docutils literal notranslate"><span class="pre">e</span></code> or <code class="docutils literal notranslate"><span class="pre">pi</span></code>). See <a class="reference external" href="https://github.com/pyparsing/pyparsing/blob/master/examples/fourFn.py">fourFn.py</a>, and <a class="reference external" href="https://github.com/pyparsing/pyparsing/blob/master/examples/simpleArith.py">simpleArith.py</a> included in the examples directory.</p> </li> <li><p class="first">To modify pyparsing’s default whitespace skipping, you can use one or more of the following methods:</p> <ul> <li><p class="first">use the static method <code class="docutils literal notranslate"><span class="pre">ParserElement.set_default_whitespace_chars</span></code> to override the normal set of whitespace chars (<code class="docutils literal notranslate"><span class="pre">'</span> <span class="pre">\t\n'</span></code>). For instance when defining a grammar in which newlines are significant, you should call <code class="docutils literal notranslate"><span class="pre">ParserElement.set_default_whitespace_chars('</span> <span class="pre">\t')</span></code> to remove newline from the set of skippable whitespace characters. Calling this method will affect all pyparsing expressions defined afterward.</p> </li> <li><p class="first">call <code class="docutils literal notranslate"><span class="pre">leave_whitespace()</span></code> on individual expressions, to suppress the skipping of whitespace before trying to match the expression</p> </li> <li><p class="first">use <code class="docutils literal notranslate"><span class="pre">Combine</span></code> to require that successive expressions must be adjacent in the input string. For instance, this expression:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">real</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'.'</span> <span class="o">+</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> </pre></div> </div> <p>will match “3.14159”, but will also match “3 . 12”. It will also return the matched results as [‘3’, ‘.’, ‘14159’]. By changing this expression to:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">real</span> <span class="o">=</span> <span class="n">Combine</span><span class="p">(</span><span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'.'</span> <span class="o">+</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">))</span> </pre></div> </div> <p>it will not match numbers with embedded spaces, and it will return a single concatenated string ‘3.14159’ as the parsed token.</p> </li> </ul> </li> <li><p class="first">Repetition of expressions can be indicated using <code class="docutils literal notranslate"><span class="pre">*</span></code> or <code class="docutils literal notranslate"><span class="pre">[]</span></code> notation. An expression may be multiplied by an integer value (to indicate an exact repetition count), or indexed with a tuple, representing min and max repetitions (with <code class="docutils literal notranslate"><span class="pre">...</span></code> representing no min or no max, depending whether it is the first or second tuple element). See the following examples, where n is used to indicate an integer value:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">expr*3</span></code> is equivalent to <code class="docutils literal notranslate"><span class="pre">expr</span> <span class="pre">+</span> <span class="pre">expr</span> <span class="pre">+</span> <span class="pre">expr</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">expr[2,</span> <span class="pre">3]</span></code> is equivalent to <code class="docutils literal notranslate"><span class="pre">expr</span> <span class="pre">+</span> <span class="pre">expr</span> <span class="pre">+</span> <span class="pre">Opt(expr)</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">expr[n,</span> <span class="pre">...]</span></code> or <code class="docutils literal notranslate"><span class="pre">expr[n,]</span></code> is equivalent to <code class="docutils literal notranslate"><span class="pre">expr*n</span> <span class="pre">+</span> <span class="pre">ZeroOrMore(expr)</span></code> (read as “at least n instances of expr”)</li> <li><code class="docutils literal notranslate"><span class="pre">expr[...</span> <span class="pre">,n]</span></code> is equivalent to <code class="docutils literal notranslate"><span class="pre">expr*(0,</span> <span class="pre">n)</span></code> (read as “0 to n instances of expr”)</li> <li><code class="docutils literal notranslate"><span class="pre">expr[...]</span></code> and <code class="docutils literal notranslate"><span class="pre">expr[0,</span> <span class="pre">...]</span></code> are equivalent to <code class="docutils literal notranslate"><span class="pre">ZeroOrMore(expr)</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">expr[1,</span> <span class="pre">...]</span></code> is equivalent to <code class="docutils literal notranslate"><span class="pre">OneOrMore(expr)</span></code></li> </ul> <p>Note that <code class="docutils literal notranslate"><span class="pre">expr[...,</span> <span class="pre">n]</span></code> does not raise an exception if more than n exprs exist in the input stream; that is, <code class="docutils literal notranslate"><span class="pre">expr[...,</span> <span class="pre">n]</span></code> does not enforce a maximum number of expr occurrences. If this behavior is desired, then write <code class="docutils literal notranslate"><span class="pre">expr[...,</span> <span class="pre">n]</span> <span class="pre">+</span> <span class="pre">~expr</span></code>.</p> </li> <li><p class="first"><a class="reference internal" href="#matchfirst">MatchFirst</a> expressions are matched left-to-right, and the first match found will skip all later expressions within, so be sure to define less-specific patterns after more-specific patterns. If you are not sure which expressions are most specific, use <a class="reference internal" href="#or">Or</a> expressions (defined using the <code class="docutils literal notranslate"><span class="pre">^</span></code> operator) - they will always match the longest expression, although they are more compute-intensive.</p> </li> <li><p class="first"><a class="reference internal" href="#or">Or</a> expressions will evaluate all of the specified subexpressions to determine which is the “best” match, that is, which matches the longest string in the input data. In case of a tie, the left-most expression in the <a class="reference internal" href="#or">Or</a> list will win.</p> </li> <li><p class="first">If parsing the contents of an entire file, pass it to the <code class="docutils literal notranslate"><span class="pre">parse_file</span></code> method using:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">expr</span><span class="o">.</span><span class="n">parse_file</span><span class="p">(</span><span class="n">source_file</span><span class="p">)</span> </pre></div> </div> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">ParseExceptions</span></code> will report the location where an expected token or expression failed to match. For example, if we tried to use our “Hello, World!” parser to parse “Hello World!” (leaving out the separating comma), we would get an exception, with the message:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pyparsing</span><span class="o">.</span><span class="n">ParseException</span><span class="p">:</span> <span class="n">Expected</span> <span class="s2">","</span> <span class="p">(</span><span class="mi">6</span><span class="p">),</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">7</span><span class="p">)</span> </pre></div> </div> <p>In the case of complex expressions, the reported location may not be exactly where you would expect. See more information under <a class="reference internal" href="#parseexception">ParseException</a> .</p> </li> <li><p class="first">Use the <code class="docutils literal notranslate"><span class="pre">Group</span></code> class to enclose logical groups of tokens within a sublist. This will help organize your results into more hierarchical form (the default behavior is to return matching tokens as a flat list of matching input strings).</p> </li> <li><p class="first">Punctuation may be significant for matching, but is rarely of much interest in the parsed results. Use the <code class="docutils literal notranslate"><span class="pre">suppress()</span></code> method to keep these tokens from cluttering up your returned lists of tokens. For example, <code class="docutils literal notranslate"><span class="pre">delimited_list()</span></code> matches a succession of one or more expressions, separated by delimiters (commas by default), but only returns a list of the actual expressions - the delimiters are used for parsing, but are suppressed from the returned output.</p> </li> <li><p class="first">Parse actions can be used to convert values from strings to other data types (ints, floats, booleans, etc.).</p> </li> <li><p class="first">Results names are recommended for retrieving tokens from complex expressions. It is much easier to access a token using its field name than using a positional index, especially if the expression contains optional elements. You can also shortcut the <code class="docutils literal notranslate"><span class="pre">set_results_name</span></code> call:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">stats</span> <span class="o">=</span> <span class="p">(</span><span class="s2">"AVE:"</span> <span class="o">+</span> <span class="n">real_num</span><span class="o">.</span><span class="n">set_results_name</span><span class="p">(</span><span class="s2">"average"</span><span class="p">)</span> <span class="o">+</span> <span class="s2">"MIN:"</span> <span class="o">+</span> <span class="n">real_num</span><span class="o">.</span><span class="n">set_results_name</span><span class="p">(</span><span class="s2">"min"</span><span class="p">)</span> <span class="o">+</span> <span class="s2">"MAX:"</span> <span class="o">+</span> <span class="n">real_num</span><span class="o">.</span><span class="n">set_results_name</span><span class="p">(</span><span class="s2">"max"</span><span class="p">))</span> </pre></div> </div> <p>can more simply and cleanly be written as this:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">stats</span> <span class="o">=</span> <span class="p">(</span><span class="s2">"AVE:"</span> <span class="o">+</span> <span class="n">real_num</span><span class="p">(</span><span class="s2">"average"</span><span class="p">)</span> <span class="o">+</span> <span class="s2">"MIN:"</span> <span class="o">+</span> <span class="n">real_num</span><span class="p">(</span><span class="s2">"min"</span><span class="p">)</span> <span class="o">+</span> <span class="s2">"MAX:"</span> <span class="o">+</span> <span class="n">real_num</span><span class="p">(</span><span class="s2">"max"</span><span class="p">))</span> </pre></div> </div> </li> <li><p class="first">Be careful when defining parse actions that modify global variables or data structures (as in <a class="reference external" href="https://github.com/pyparsing/pyparsing/blob/master/examples/fourFn.py">fourFn.py</a>), especially for low level tokens or expressions that may occur within an <a class="reference internal" href="#and">And</a> expression; an early element of an <a class="reference internal" href="#and">And</a> may match, but the overall expression may fail.</p> </li> </ul> </div> </div> <div class="section" id="classes"> <h2><a class="toc-backref" href="#id5">1.2 Classes</a><a class="headerlink" href="#classes" title="Permalink to this headline">¶</a></h2> <div class="section" id="classes-in-the-pyparsing-module"> <h3><a class="toc-backref" href="#id6">1.2.1 Classes in the pyparsing module</a><a class="headerlink" href="#classes-in-the-pyparsing-module" title="Permalink to this headline">¶</a></h3> <p><code class="docutils literal notranslate"><span class="pre">ParserElement</span></code> - abstract base class for all pyparsing classes; methods for code to use are:</p> <ul> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">parse_string(source_string,</span> <span class="pre">parse_all=False)</span></code> - only called once, on the overall matching pattern; returns a <a class="reference internal" href="#parseresults">ParseResults</a> object that makes the matched tokens available as a list, and optionally as a dictionary, or as an object with named attributes; if <code class="docutils literal notranslate"><span class="pre">parse_all</span></code> is set to True, then <code class="docutils literal notranslate"><span class="pre">parse_string</span></code> will raise a <a class="reference internal" href="#parseexception">ParseException</a> if the grammar does not process the complete input string.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">parse_file(source_file)</span></code> - a convenience function, that accepts an input file object or filename. The file contents are passed as a string to <code class="docutils literal notranslate"><span class="pre">parse_string()</span></code>. <code class="docutils literal notranslate"><span class="pre">parse_file</span></code> also supports the <code class="docutils literal notranslate"><span class="pre">parse_all</span></code> argument.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">scan_string(source_string)</span></code> - generator function, used to find and extract matching text in the given source string; for each matched text, returns a tuple of:</p> <ul class="simple"> <li>matched tokens (packaged as a <a class="reference internal" href="#parseresults">ParseResults</a> object)</li> <li>start location of the matched text in the given source string</li> <li>end location in the given source string</li> </ul> <p><code class="docutils literal notranslate"><span class="pre">scan_string</span></code> allows you to scan through the input source string for random matches, instead of exhaustively defining the grammar for the entire source text (as would be required with <code class="docutils literal notranslate"><span class="pre">parse_string</span></code>).</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">transform_string(source_string)</span></code> - convenience wrapper function for <code class="docutils literal notranslate"><span class="pre">scan_string</span></code>, to process the input source string, and replace matching text with the tokens returned from parse actions defined in the grammar (see <a class="reference internal" href="#set-parse-action">set_parse_action</a>).</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">search_string(source_string)</span></code> - another convenience wrapper function for <code class="docutils literal notranslate"><span class="pre">scan_string</span></code>, returns a list of the matching tokens returned from each call to <code class="docutils literal notranslate"><span class="pre">scan_string</span></code>.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">set_name(name)</span></code> - associate a short descriptive name for this element, useful in displaying exceptions and trace information</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">run_tests(tests_string)</span></code> - useful development and testing method on expressions, to pass a multiline string of sample strings to test against the expression. Comment lines (beginning with <code class="docutils literal notranslate"><span class="pre">#</span></code>) can be inserted and they will be included in the test output:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">digits</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span><span class="o">.</span><span class="n">set_name</span><span class="p">(</span><span class="s2">"numeric digits"</span><span class="p">)</span> <span class="n">real_num</span> <span class="o">=</span> <span class="n">Combine</span><span class="p">(</span><span class="n">digits</span> <span class="o">+</span> <span class="s1">'.'</span> <span class="o">+</span> <span class="n">digits</span><span class="p">)</span> <span class="n">real_num</span><span class="o">.</span><span class="n">run_tests</span><span class="p">(</span><span class="s2">"""</span><span class="se">\</span> <span class="s2"> # valid number</span> <span class="s2"> 3.14159</span> <span class="s2"> # no integer part</span> <span class="s2"> .00001</span> <span class="s2"> # no decimal</span> <span class="s2"> 101</span> <span class="s2"> # no decimal value</span> <span class="s2"> 101.</span> <span class="s2"> """</span><span class="p">)</span> </pre></div> </div> <p>will print:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># valid number</span> <span class="mf">3.14159</span> <span class="p">[</span><span class="s1">'3.14159'</span><span class="p">]</span> <span class="c1"># no integer part</span> <span class="o">.</span><span class="mi">00001</span> <span class="o">^</span> <span class="n">FAIL</span><span class="p">:</span> <span class="n">Expected</span> <span class="n">numeric</span> <span class="n">digits</span><span class="p">,</span> <span class="n">found</span> <span class="s1">'.'</span> <span class="p">(</span><span class="n">at</span> <span class="n">char</span> <span class="mi">0</span><span class="p">),</span> <span class="p">(</span><span class="n">line</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="n">col</span><span class="p">:</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># no decimal</span> <span class="mi">101</span> <span class="o">^</span> <span class="n">FAIL</span><span class="p">:</span> <span class="n">Expected</span> <span class="s2">"."</span><span class="p">,</span> <span class="n">found</span> <span class="n">end</span> <span class="n">of</span> <span class="n">text</span> <span class="p">(</span><span class="n">at</span> <span class="n">char</span> <span class="mi">3</span><span class="p">),</span> <span class="p">(</span><span class="n">line</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="n">col</span><span class="p">:</span><span class="mi">4</span><span class="p">)</span> <span class="c1"># no decimal value</span> <span class="mf">101.</span> <span class="o">^</span> <span class="n">FAIL</span><span class="p">:</span> <span class="n">Expected</span> <span class="n">numeric</span> <span class="n">digits</span><span class="p">,</span> <span class="n">found</span> <span class="n">end</span> <span class="n">of</span> <span class="n">text</span> <span class="p">(</span><span class="n">at</span> <span class="n">char</span> <span class="mi">4</span><span class="p">),</span> <span class="p">(</span><span class="n">line</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="n">col</span><span class="p">:</span><span class="mi">5</span><span class="p">)</span> </pre></div> </div> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">set_results_name(string,</span> <span class="pre">list_all_matches=False)</span></code> - name to be given to tokens matching the element; if multiple tokens within a repetition group (such as <code class="docutils literal notranslate"><span class="pre">ZeroOrMore</span></code> or <code class="docutils literal notranslate"><span class="pre">delimited_list</span></code>) the default is to return only the last matching token - if <code class="docutils literal notranslate"><span class="pre">list_all_matches</span></code> is set to True, then a list of all the matching tokens is returned.</p> <p><code class="docutils literal notranslate"><span class="pre">expr.set_results_name("key")</span></code> can also be written <code class="docutils literal notranslate"><span class="pre">expr("key")</span></code> (a results name with a trailing ‘*’ character will be interpreted as setting <code class="docutils literal notranslate"><span class="pre">list_all_matches</span></code> to <code class="docutils literal notranslate"><span class="pre">True</span></code>).</p> <p>Note: <code class="docutils literal notranslate"><span class="pre">set_results_name</span></code> returns a <em>copy</em> of the element so that a single basic element can be referenced multiple times and given different names within a complex grammar.</p> </li> </ul> <ul id="set-parse-action"> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">set_parse_action(*fn)</span></code> - specify one or more functions to call after successful matching of the element; each function is defined as <code class="docutils literal notranslate"><span class="pre">fn(s,</span> <span class="pre">loc,</span> <span class="pre">toks)</span></code>, where:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">s</span></code> is the original parse string</li> <li><code class="docutils literal notranslate"><span class="pre">loc</span></code> is the location in the string where matching started</li> <li><code class="docutils literal notranslate"><span class="pre">toks</span></code> is the list of the matched tokens, packaged as a <a class="reference internal" href="#parseresults">ParseResults</a> object</li> </ul> <p>Parse actions can have any of the following signatures:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">fn</span><span class="p">(</span><span class="n">s</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">loc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">tokens</span><span class="p">:</span> <span class="n">ParseResults</span><span class="p">)</span> <span class="n">fn</span><span class="p">(</span><span class="n">loc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">tokens</span><span class="p">:</span> <span class="n">ParseResults</span><span class="p">)</span> <span class="n">fn</span><span class="p">(</span><span class="n">tokens</span><span class="p">:</span> <span class="n">ParseResults</span><span class="p">)</span> <span class="n">fn</span><span class="p">()</span> </pre></div> </div> <p>Multiple functions can be attached to a <code class="docutils literal notranslate"><span class="pre">ParserElement</span></code> by specifying multiple arguments to <code class="docutils literal notranslate"><span class="pre">set_parse_action</span></code>, or by calling <code class="docutils literal notranslate"><span class="pre">add_parse_action</span></code>. Calls to <code class="docutils literal notranslate"><span class="pre">set_parse_action</span></code> will replace any previously defined parse actions. <code class="docutils literal notranslate"><span class="pre">set_parse_action(None)</span></code> will clear all previously defined parse actions.</p> <p>Each parse action function can return a modified <code class="docutils literal notranslate"><span class="pre">toks</span></code> list, to perform conversion, or string modifications. For brevity, <code class="docutils literal notranslate"><span class="pre">fn</span></code> may also be a lambda - here is an example of using a parse action to convert matched integer tokens from strings to integers:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">int_number</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span><span class="o">.</span><span class="n">set_parse_action</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">,</span> <span class="n">l</span><span class="p">,</span> <span class="n">t</span><span class="p">:</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">t</span><span class="p">[</span><span class="mi">0</span><span class="p">])])</span> </pre></div> </div> <p>If <code class="docutils literal notranslate"><span class="pre">fn</span></code> modifies the <code class="docutils literal notranslate"><span class="pre">toks</span></code> list in-place, it does not need to return and pyparsing will use the modified <code class="docutils literal notranslate"><span class="pre">toks</span></code> list.</p> <p>If <code class="docutils literal notranslate"><span class="pre">set_parse_action</span></code> is called with an argument of <code class="docutils literal notranslate"><span class="pre">None</span></code>, then this clears all parse actions attached to that expression.</p> <p>A nice short-cut for calling <code class="docutils literal notranslate"><span class="pre">set_parse_action</span></code> is to use it as a decorator:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">identifier</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">alphas</span><span class="p">,</span> <span class="n">alphanums</span><span class="o">+</span><span class="s2">"_"</span><span class="p">)</span> <span class="nd">@identifier</span><span class="o">.</span><span class="n">set_parse_action</span> <span class="k">def</span> <span class="nf">resolve_identifier</span><span class="p">(</span><span class="n">results</span><span class="p">:</span> <span class="n">ParseResults</span><span class="p">):</span> <span class="k">return</span> <span class="n">variable_values</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">results</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> </pre></div> </div> <p>(Posted by @MisterMiyagi in this SO answer: <a class="reference external" href="https://stackoverflow.com/a/63031959/165216">https://stackoverflow.com/a/63031959/165216</a>)</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">add_parse_action</span></code> - similar to <code class="docutils literal notranslate"><span class="pre">set_parse_action</span></code>, but instead of replacing any previously defined parse actions, will append the given action or actions to the existing defined parse actions.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">add_condition</span></code> - a simplified form of <code class="docutils literal notranslate"><span class="pre">add_parse_action</span></code> if the purpose of the parse action is to simply do some validation, and raise an exception if the validation fails. Takes a method that takes the same arguments, but simply returns <code class="docutils literal notranslate"><span class="pre">True</span></code> or <code class="docutils literal notranslate"><span class="pre">False</span></code>. If <code class="docutils literal notranslate"><span class="pre">False</span></code> is returned, an exception will be raised.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">set_break(break_flag=True)</span></code> - if <code class="docutils literal notranslate"><span class="pre">break_flag</span></code> is <code class="docutils literal notranslate"><span class="pre">True</span></code>, calls <code class="docutils literal notranslate"><span class="pre">pdb.set_break()</span></code> as this expression is about to be parsed</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">copy()</span></code> - returns a copy of a <code class="docutils literal notranslate"><span class="pre">ParserElement</span></code>; can be used to use the same parse expression in different places in a grammar, with different parse actions attached to each; a short-form <code class="docutils literal notranslate"><span class="pre">expr()</span></code> is equivalent to <code class="docutils literal notranslate"><span class="pre">expr.copy()</span></code></p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">leave_whitespace()</span></code> - change default behavior of skipping whitespace before starting matching (mostly used internally to the pyparsing module, rarely used by client code)</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">set_whitespace_chars(chars)</span></code> - define the set of chars to be ignored as whitespace before trying to match a specific <code class="docutils literal notranslate"><span class="pre">ParserElement</span></code>, in place of the default set of whitespace (space, tab, newline, and return)</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">set_default_whitespace_chars(chars)</span></code> - class-level method to override the default set of whitespace chars for all subsequently created ParserElements (including copies); useful when defining grammars that treat one or more of the default whitespace characters as significant (such as a line-sensitive grammar, to omit newline from the list of ignorable whitespace)</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">suppress()</span></code> - convenience function to suppress the output of the given element, instead of wrapping it with a <code class="docutils literal notranslate"><span class="pre">Suppress</span></code> object.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">ignore(expr)</span></code> - function to specify parse expression to be ignored while matching defined patterns; can be called repeatedly to specify multiple expressions; useful to specify patterns of comment syntax, for example</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">set_debug(debug_flag=True)</span></code> - function to enable/disable tracing output when trying to match this element</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">validate()</span></code> - function to verify that the defined grammar does not contain infinitely recursive constructs</p> </li> </ul> <ul class="simple" id="parse-with-tabs"> <li><code class="docutils literal notranslate"><span class="pre">parse_with_tabs()</span></code> - function to override default behavior of converting tabs to spaces before parsing the input string; rarely used, except when specifying whitespace-significant grammars using the <a class="reference internal" href="#white">White</a> class.</li> <li><code class="docutils literal notranslate"><span class="pre">enable_packrat()</span></code> - a class-level static method to enable a memoizing performance enhancement, known as “packrat parsing”. packrat parsing is disabled by default, since it may conflict with some user programs that use parse actions. To activate the packrat feature, your program must call the class method <code class="docutils literal notranslate"><span class="pre">ParserElement.enable_packrat()</span></code>. For best results, call <code class="docutils literal notranslate"><span class="pre">enable_packrat()</span></code> immediately after importing pyparsing.</li> <li><code class="docutils literal notranslate"><span class="pre">enable_left_recursion()</span></code> - a class-level static method to enable pyparsing with left-recursive (LR) parsers. Similar to <code class="docutils literal notranslate"><span class="pre">ParserElement.enable_packrat()</span></code>, your program must call the class method <code class="docutils literal notranslate"><span class="pre">ParserElement.enable_left_recursion()</span></code> to enable this feature. <code class="docutils literal notranslate"><span class="pre">enable_left_recursion()</span></code> uses a separate packrat cache, and so is incompatible with <code class="docutils literal notranslate"><span class="pre">enable_packrat()</span></code>.</li> </ul> </div> <div class="section" id="basic-parserelement-subclasses"> <h3><a class="toc-backref" href="#id7">1.2.2 Basic ParserElement subclasses</a><a class="headerlink" href="#basic-parserelement-subclasses" title="Permalink to this headline">¶</a></h3> <ul class="simple" id="literal"> <li><code class="docutils literal notranslate"><span class="pre">Literal</span></code> - construct with a string to be matched exactly</li> </ul> <ul class="simple" id="caselessliteral"> <li><code class="docutils literal notranslate"><span class="pre">CaselessLiteral</span></code> - construct with a string to be matched, but without case checking; results are always returned as the defining literal, NOT as they are found in the input string</li> </ul> <ul class="simple" id="keyword"> <li><code class="docutils literal notranslate"><span class="pre">Keyword</span></code> - similar to <a class="reference internal" href="#literal">Literal</a>, but must be immediately followed by whitespace, punctuation, or other non-keyword characters; prevents accidental matching of a non-keyword that happens to begin with a defined keyword</li> <li><code class="docutils literal notranslate"><span class="pre">CaselessKeyword</span></code> - similar to <a class="reference internal" href="#keyword">Keyword</a>, but with caseless matching behavior</li> </ul> <ul id="word"> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">Word</span></code> - one or more contiguous characters; construct with a string containing the set of allowed initial characters, and an optional second string of allowed body characters; for instance, a common <code class="docutils literal notranslate"><span class="pre">Word</span></code> construct is to match a code identifier - in C, a valid identifier must start with an alphabetic character or an underscore (‘_’), followed by a body that can also include numeric digits. That is, <code class="docutils literal notranslate"><span class="pre">a</span></code>, <code class="docutils literal notranslate"><span class="pre">i</span></code>, <code class="docutils literal notranslate"><span class="pre">MAX_LENGTH</span></code>, <code class="docutils literal notranslate"><span class="pre">_a1</span></code>, <code class="docutils literal notranslate"><span class="pre">b_109_</span></code>, and <code class="docutils literal notranslate"><span class="pre">plan9FromOuterSpace</span></code> are all valid identifiers; <code class="docutils literal notranslate"><span class="pre">9b7z</span></code>, <code class="docutils literal notranslate"><span class="pre">$a</span></code>, <code class="docutils literal notranslate"><span class="pre">.section</span></code>, and <code class="docutils literal notranslate"><span class="pre">0debug</span></code> are not. To define an identifier using a <code class="docutils literal notranslate"><span class="pre">Word</span></code>, use either of the following:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Word</span><span class="p">(</span><span class="n">alphas</span><span class="o">+</span><span class="s2">"_"</span><span class="p">,</span> <span class="n">alphanums</span><span class="o">+</span><span class="s2">"_"</span><span class="p">)</span> <span class="n">Word</span><span class="p">(</span><span class="n">srange</span><span class="p">(</span><span class="s2">"[a-zA-Z_]"</span><span class="p">),</span> <span class="n">srange</span><span class="p">(</span><span class="s2">"[a-zA-Z0-9_]"</span><span class="p">))</span> </pre></div> </div> <p>Pyparsing also provides pre-defined strings <code class="docutils literal notranslate"><span class="pre">identchars</span></code> and <code class="docutils literal notranslate"><span class="pre">identbodychars</span></code> so that you can also write:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Word</span><span class="p">(</span><span class="n">identchars</span><span class="p">,</span> <span class="n">identbodychars</span><span class="p">)</span> </pre></div> </div> <p>If only one string given, it specifies that the same character set defined for the initial character is used for the word body; for instance, to define an identifier that can only be composed of capital letters and underscores, use one of:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>``Word("ABCDEFGHIJKLMNOPQRSTUVWXYZ_")`` ``Word(srange("[A-Z_]"))`` </pre></div> </div> <p>A <code class="docutils literal notranslate"><span class="pre">Word</span></code> may also be constructed with any of the following optional parameters:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">min</span></code> - indicating a minimum length of matching characters</li> <li><code class="docutils literal notranslate"><span class="pre">max</span></code> - indicating a maximum length of matching characters</li> <li><code class="docutils literal notranslate"><span class="pre">exact</span></code> - indicating an exact length of matching characters</li> </ul> <p>If <code class="docutils literal notranslate"><span class="pre">exact</span></code> is specified, it will override any values for <code class="docutils literal notranslate"><span class="pre">min</span></code> or <code class="docutils literal notranslate"><span class="pre">max</span></code>.</p> <p>Sometimes you want to define a word using all characters in a range except for one or two of them; you can do this with the new <code class="docutils literal notranslate"><span class="pre">exclude_chars</span></code> argument. This is helpful if you want to define a word with all <code class="docutils literal notranslate"><span class="pre">printables</span></code> except for a single delimiter character, such as ‘.’. Previously, you would have to create a custom string to pass to Word. With this change, you can just create <code class="docutils literal notranslate"><span class="pre">Word(printables,</span> <span class="pre">exclude_chars='.')</span></code>.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">Char</span></code> - a convenience form of <code class="docutils literal notranslate"><span class="pre">Word</span></code> that will match just a single character from a string of matching characters:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">single_digit</span> <span class="o">=</span> <span class="n">Char</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> </pre></div> </div> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">CharsNotIn</span></code> - similar to <a class="reference internal" href="#word">Word</a>, but matches characters not in the given constructor string (accepts only one string for both initial and body characters); also supports <code class="docutils literal notranslate"><span class="pre">min</span></code>, <code class="docutils literal notranslate"><span class="pre">max</span></code>, and <code class="docutils literal notranslate"><span class="pre">exact</span></code> optional parameters.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">Regex</span></code> - a powerful construct, that accepts a regular expression to be matched at the current parse position; accepts an optional <code class="docutils literal notranslate"><span class="pre">flags</span></code> parameter, corresponding to the flags parameter in the <code class="docutils literal notranslate"><span class="pre">re.compile</span></code> method; if the expression includes named sub-fields, they will be represented in the returned <a class="reference internal" href="#parseresults">ParseResults</a>.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">QuotedString</span></code> - supports the definition of custom quoted string formats, in addition to pyparsing’s built-in <code class="docutils literal notranslate"><span class="pre">dbl_quoted_string</span></code> and <code class="docutils literal notranslate"><span class="pre">sgl_quoted_string</span></code>. <code class="docutils literal notranslate"><span class="pre">QuotedString</span></code> allows you to specify the following parameters:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">quote_char</span></code> - string of one or more characters defining the quote delimiting string</li> <li><code class="docutils literal notranslate"><span class="pre">esc_char</span></code> - character to escape quotes, typically backslash (default=None)</li> <li><code class="docutils literal notranslate"><span class="pre">esc_quote</span></code> - special quote sequence to escape an embedded quote string (such as SQL’s “” to escape an embedded “) (default=None)</li> <li><code class="docutils literal notranslate"><span class="pre">multiline</span></code> - boolean indicating whether quotes can span multiple lines (default=False)</li> <li><code class="docutils literal notranslate"><span class="pre">unquote_results</span></code> - boolean indicating whether the matched text should be unquoted (default=True)</li> <li><code class="docutils literal notranslate"><span class="pre">end_quote_char</span></code> - string of one or more characters defining the end of the quote delimited string (default=None => same as <code class="docutils literal notranslate"><span class="pre">quote_char</span></code>)</li> </ul> </li> </ul> <ul id="skipto"> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">SkipTo</span></code> - skips ahead in the input string, accepting any characters up to the specified pattern; may be constructed with the following optional parameters:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">include</span></code> - if set to true, also consumes the match expression (default is false)</li> <li><code class="docutils literal notranslate"><span class="pre">ignore</span></code> - allows the user to specify patterns to not be matched, to prevent false matches</li> <li><code class="docutils literal notranslate"><span class="pre">fail_on</span></code> - if a literal string or expression is given for this argument, it defines an expression that should cause the <a class="reference internal" href="#skipto">SkipTo</a> expression to fail, and not skip over that expression</li> </ul> <p><code class="docutils literal notranslate"><span class="pre">SkipTo</span></code> can also be written using <code class="docutils literal notranslate"><span class="pre">...</span></code>:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">LBRACE</span><span class="p">,</span> <span class="n">RBRACE</span> <span class="o">=</span> <span class="nb">map</span><span class="p">(</span><span class="n">Literal</span><span class="p">,</span> <span class="s2">"</span><span class="si">{}</span><span class="s2">"</span><span class="p">)</span> <span class="n">brace_expr</span> <span class="o">=</span> <span class="n">LBRACE</span> <span class="o">+</span> <span class="n">SkipTo</span><span class="p">(</span><span class="n">RBRACE</span><span class="p">)</span> <span class="o">+</span> <span class="n">RBRACE</span> <span class="c1"># can also be written as</span> <span class="n">brace_expr</span> <span class="o">=</span> <span class="n">LBRACE</span> <span class="o">+</span> <span class="o">...</span> <span class="o">+</span> <span class="n">RBRACE</span> </pre></div> </div> </li> </ul> <ul class="simple" id="white"> <li><code class="docutils literal notranslate"><span class="pre">White</span></code> - also similar to <a class="reference internal" href="#word">Word</a>, but matches whitespace characters. Not usually needed, as whitespace is implicitly ignored by pyparsing. However, some grammars are whitespace-sensitive, such as those that use leading tabs or spaces to indicating grouping or hierarchy. (If matching on tab characters, be sure to call <a class="reference internal" href="#parse-with-tabs">parse_with_tabs</a> on the top-level parse element.)</li> <li><code class="docutils literal notranslate"><span class="pre">Empty</span></code> - a null expression, requiring no characters - will always match; useful for debugging and for specialized grammars</li> <li><code class="docutils literal notranslate"><span class="pre">NoMatch</span></code> - opposite of <code class="docutils literal notranslate"><span class="pre">Empty</span></code>, will never match; useful for debugging and for specialized grammars</li> </ul> </div> <div class="section" id="expression-subclasses"> <h3><a class="toc-backref" href="#id8">1.2.3 Expression subclasses</a><a class="headerlink" href="#expression-subclasses" title="Permalink to this headline">¶</a></h3> <ul id="and"> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">And</span></code> - construct with a list of <code class="docutils literal notranslate"><span class="pre">ParserElements</span></code>, all of which must match for <code class="docutils literal notranslate"><span class="pre">And</span></code> to match; can also be created using the ‘+’ operator; multiple expressions can be <code class="docutils literal notranslate"><span class="pre">Anded</span></code> together using the ‘*’ operator as in:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ip_address</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="s1">'.'</span> <span class="o">+</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">))</span> <span class="o">*</span> <span class="mi">3</span> </pre></div> </div> <p>A tuple can be used as the multiplier, indicating a min/max:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">us_phone_number</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="s1">'-'</span> <span class="o">+</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">))</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> </pre></div> </div> <p>A special form of <code class="docutils literal notranslate"><span class="pre">And</span></code> is created if the ‘-‘ operator is used instead of the ‘+’ operator. In the <code class="docutils literal notranslate"><span class="pre">ip_address</span></code> example above, if no trailing ‘.’ and <code class="docutils literal notranslate"><span class="pre">Word(nums)</span></code> are found after matching the initial <code class="docutils literal notranslate"><span class="pre">Word(nums)</span></code>, then pyparsing will back up in the grammar and try other alternatives to <code class="docutils literal notranslate"><span class="pre">ip_address</span></code>. However, if <code class="docutils literal notranslate"><span class="pre">ip_address</span></code> is defined as:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">strict_ip_address</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span><span class="s1">'.'</span><span class="o">+</span><span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">))</span><span class="o">*</span><span class="mi">3</span> </pre></div> </div> <p>then no backing up is done. If the first <code class="docutils literal notranslate"><span class="pre">Word(nums)</span></code> of <code class="docutils literal notranslate"><span class="pre">strict_ip_address</span></code> is matched, then any mismatch after that will raise a <code class="docutils literal notranslate"><span class="pre">ParseSyntaxException</span></code>, which will halt the parsing process immediately. By careful use of the ‘-‘ operator, grammars can provide meaningful error messages close to the location where the incoming text does not match the specified grammar.</p> </li> </ul> <ul class="simple" id="or"> <li><code class="docutils literal notranslate"><span class="pre">Or</span></code> - construct with a list of <code class="docutils literal notranslate"><span class="pre">ParserElements</span></code>, any of which must match for <code class="docutils literal notranslate"><span class="pre">Or</span></code> to match; if more than one expression matches, the expression that makes the longest match will be used; can also be created using the ‘^’ operator</li> </ul> <ul class="simple" id="matchfirst"> <li><code class="docutils literal notranslate"><span class="pre">MatchFirst</span></code> - construct with a list of <code class="docutils literal notranslate"><span class="pre">ParserElements</span></code>, any of which must match for <code class="docutils literal notranslate"><span class="pre">MatchFirst</span></code> to match; matching is done left-to-right, taking the first expression that matches; can also be created using the ‘|’ operator</li> </ul> <ul id="each"> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">Each</span></code> - similar to <a class="reference internal" href="#and">And</a>, in that all of the provided expressions must match; however, <code class="docutils literal notranslate"><span class="pre">Each</span></code> permits matching to be done in any order; can also be created using the ‘&’ operator</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">Opt</span></code> - construct with a <code class="docutils literal notranslate"><span class="pre">ParserElement</span></code>, but this element is not required to match; can be constructed with an optional <code class="docutils literal notranslate"><span class="pre">default</span></code> argument, containing a default string or object to be supplied if the given optional parse element is not found in the input string; parse action will only be called if a match is found, or if a default is specified.</p> <p>(<code class="docutils literal notranslate"><span class="pre">Opt</span></code> was formerly named <code class="docutils literal notranslate"><span class="pre">Optional</span></code>, but since the standard Python library module <code class="docutils literal notranslate"><span class="pre">typing</span></code> now defines <code class="docutils literal notranslate"><span class="pre">Optional</span></code>, the pyparsing class has been renamed to <code class="docutils literal notranslate"><span class="pre">Opt</span></code>. A compatibility synonym <code class="docutils literal notranslate"><span class="pre">Optional</span></code> is defined, but will be removed in a future release.)</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">ZeroOrMore</span></code> - similar to <code class="docutils literal notranslate"><span class="pre">Opt</span></code>, but can be repeated; <code class="docutils literal notranslate"><span class="pre">ZeroOrMore(expr)</span></code> can also be written as <code class="docutils literal notranslate"><span class="pre">expr[...]</span></code>.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">OneOrMore</span></code> - similar to <code class="docutils literal notranslate"><span class="pre">ZeroOrMore</span></code>, but at least one match must be present; <code class="docutils literal notranslate"><span class="pre">OneOrMore(expr)</span></code> can also be written as <code class="docutils literal notranslate"><span class="pre">expr[1,</span> <span class="pre">...]</span></code>.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">FollowedBy</span></code> - a lookahead expression, requires matching of the given expressions, but does not advance the parsing position within the input string</p> </li> </ul> <ul class="simple" id="notany"> <li><code class="docutils literal notranslate"><span class="pre">NotAny</span></code> - a negative lookahead expression, prevents matching of named expressions, does not advance the parsing position within the input string; can also be created using the unary ‘~’ operator</li> </ul> </div> <div class="section" id="expression-operators"> <span id="operators"></span><h3><a class="toc-backref" href="#id9">1.2.4 Expression operators</a><a class="headerlink" href="#expression-operators" title="Permalink to this headline">¶</a></h3> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">+</span></code> - creates <a class="reference internal" href="#and">And</a> using the expressions before and after the operator</li> <li><code class="docutils literal notranslate"><span class="pre">|</span></code> - creates <a class="reference internal" href="#matchfirst">MatchFirst</a> (first left-to-right match) using the expressions before and after the operator</li> <li><code class="docutils literal notranslate"><span class="pre">^</span></code> - creates <a class="reference internal" href="#or">Or</a> (longest match) using the expressions before and after the operator</li> <li><code class="docutils literal notranslate"><span class="pre">&</span></code> - creates <a class="reference internal" href="#each">Each</a> using the expressions before and after the operator</li> <li><code class="docutils literal notranslate"><span class="pre">*</span></code> - creates <a class="reference internal" href="#and">And</a> by multiplying the expression by the integer operand; if expression is multiplied by a 2-tuple, creates an <a class="reference internal" href="#and">And</a> of <code class="docutils literal notranslate"><span class="pre">(min,max)</span></code> expressions (similar to <code class="docutils literal notranslate"><span class="pre">{min,max}</span></code> form in regular expressions); if <code class="docutils literal notranslate"><span class="pre">min</span></code> is <code class="docutils literal notranslate"><span class="pre">None</span></code>, interpret as <code class="docutils literal notranslate"><span class="pre">(0,max)</span></code>; if <code class="docutils literal notranslate"><span class="pre">max</span></code> is <code class="docutils literal notranslate"><span class="pre">None</span></code>, interpret as <code class="docutils literal notranslate"><span class="pre">expr*min</span> <span class="pre">+</span> <span class="pre">ZeroOrMore(expr)</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">-</span></code> - like <code class="docutils literal notranslate"><span class="pre">+</span></code> but with no backup and retry of alternatives</li> <li><code class="docutils literal notranslate"><span class="pre">~</span></code> - creates <a class="reference internal" href="#notany">NotAny</a> using the expression after the operator</li> <li><code class="docutils literal notranslate"><span class="pre">==</span></code> - matching expression to string; returns <code class="docutils literal notranslate"><span class="pre">True</span></code> if the string matches the given expression</li> <li><code class="docutils literal notranslate"><span class="pre"><<=</span></code> - inserts the expression following the operator as the body of the <code class="docutils literal notranslate"><span class="pre">Forward</span></code> expression before the operator (<code class="docutils literal notranslate"><span class="pre"><<</span></code> can also be used, but <code class="docutils literal notranslate"><span class="pre"><<=</span></code> is preferred to avoid operator precedence misinterpretation of the pyparsing expression)</li> <li><code class="docutils literal notranslate"><span class="pre">...</span></code> - inserts a <a class="reference internal" href="#skipto">SkipTo</a> expression leading to the next expression, as in <code class="docutils literal notranslate"><span class="pre">Keyword("start")</span> <span class="pre">+</span> <span class="pre">...</span> <span class="pre">+</span> <span class="pre">Keyword("end")</span></code>.</li> <li><code class="docutils literal notranslate"><span class="pre">[min,</span> <span class="pre">max]</span></code> - specifies repetition similar to <code class="docutils literal notranslate"><span class="pre">*</span></code> with <code class="docutils literal notranslate"><span class="pre">min</span></code> and <code class="docutils literal notranslate"><span class="pre">max</span></code> specified as the minimum and maximum number of repetitions. <code class="docutils literal notranslate"><span class="pre">...</span></code> can be used in place of <code class="docutils literal notranslate"><span class="pre">None</span></code>. For example <code class="docutils literal notranslate"><span class="pre">expr[...]</span></code> is equivalent to <code class="docutils literal notranslate"><span class="pre">ZeroOrMore(expr)</span></code>, <code class="docutils literal notranslate"><span class="pre">expr[1,</span> <span class="pre">...]</span></code> is equivalent to <code class="docutils literal notranslate"><span class="pre">OneOrMore(expr)</span></code>, and <code class="docutils literal notranslate"><span class="pre">expr[...,</span> <span class="pre">3]</span></code> is equivalent to “up to 3 instances of <code class="docutils literal notranslate"><span class="pre">expr</span></code>”.</li> </ul> </div> <div class="section" id="positional-subclasses"> <h3><a class="toc-backref" href="#id10">1.2.5 Positional subclasses</a><a class="headerlink" href="#positional-subclasses" title="Permalink to this headline">¶</a></h3> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">StringStart</span></code> - matches beginning of the text</li> <li><code class="docutils literal notranslate"><span class="pre">StringEnd</span></code> - matches the end of the text</li> <li><code class="docutils literal notranslate"><span class="pre">LineStart</span></code> - matches beginning of a line (lines delimited by <code class="docutils literal notranslate"><span class="pre">\n</span></code> characters)</li> <li><code class="docutils literal notranslate"><span class="pre">LineEnd</span></code> - matches the end of a line</li> <li><code class="docutils literal notranslate"><span class="pre">WordStart</span></code> - matches a leading word boundary</li> <li><code class="docutils literal notranslate"><span class="pre">WordEnd</span></code> - matches a trailing word boundary</li> </ul> </div> <div class="section" id="converter-subclasses"> <h3><a class="toc-backref" href="#id11">1.2.6 Converter subclasses</a><a class="headerlink" href="#converter-subclasses" title="Permalink to this headline">¶</a></h3> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">Combine</span></code> - joins all matched tokens into a single string, using specified <code class="docutils literal notranslate"><span class="pre">join_string</span></code> (default <code class="docutils literal notranslate"><span class="pre">join_string=""</span></code>); expects all matching tokens to be adjacent, with no intervening whitespace (can be overridden by specifying <code class="docutils literal notranslate"><span class="pre">adjacent=False</span></code> in constructor)</li> <li><code class="docutils literal notranslate"><span class="pre">Suppress</span></code> - clears matched tokens; useful to keep returned results from being cluttered with required but uninteresting tokens (such as list delimiters)</li> </ul> </div> <div class="section" id="special-subclasses"> <h3><a class="toc-backref" href="#id12">1.2.7 Special subclasses</a><a class="headerlink" href="#special-subclasses" title="Permalink to this headline">¶</a></h3> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">Group</span></code> - causes the matched tokens to be enclosed in a list; useful in repeated elements like <code class="docutils literal notranslate"><span class="pre">ZeroOrMore</span></code> and <code class="docutils literal notranslate"><span class="pre">OneOrMore</span></code> to break up matched tokens into groups for each repeated pattern</li> <li><code class="docutils literal notranslate"><span class="pre">Dict</span></code> - like <code class="docutils literal notranslate"><span class="pre">Group</span></code>, but also constructs a dictionary, using the <code class="docutils literal notranslate"><span class="pre">[0]</span></code>’th elements of all enclosed token lists as the keys, and each token list as the value</li> <li><code class="docutils literal notranslate"><span class="pre">Forward</span></code> - placeholder token used to define recursive token patterns; when defining the actual expression later in the program, insert it into the <code class="docutils literal notranslate"><span class="pre">Forward</span></code> object using the <code class="docutils literal notranslate"><span class="pre"><<=</span></code> operator (see <a class="reference external" href="https://github.com/pyparsing/pyparsing/blob/master/examples/fourFn.py">fourFn.py</a> for an example).</li> </ul> </div> <div class="section" id="other-classes"> <h3><a class="toc-backref" href="#id13">1.2.8 Other classes</a><a class="headerlink" href="#other-classes" title="Permalink to this headline">¶</a></h3> <ul id="parseresults"> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">ParseResults</span></code> - class used to contain and manage the lists of tokens created from parsing the input using the user-defined parse expression. <code class="docutils literal notranslate"><span class="pre">ParseResults</span></code> can be accessed in a number of ways:</p> <ul> <li><p class="first">as a list</p> <ul> <li><p class="first">total list of elements can be found using <code class="docutils literal notranslate"><span class="pre">len()</span></code></p> </li> <li><p class="first">individual elements can be found using <code class="docutils literal notranslate"><span class="pre">[0],</span> <span class="pre">[1],</span> <span class="pre">[-1],</span></code> etc., or retrieved using slices</p> </li> <li><p class="first">elements can be deleted using <code class="docutils literal notranslate"><span class="pre">del</span></code></p> </li> <li><p class="first">the <code class="docutils literal notranslate"><span class="pre">-1``th</span> <span class="pre">element</span> <span class="pre">can</span> <span class="pre">be</span> <span class="pre">extracted</span> <span class="pre">and</span> <span class="pre">removed</span> <span class="pre">in</span> <span class="pre">a</span> <span class="pre">single</span> <span class="pre">operation</span> <span class="pre">using</span> <span class="pre">``pop()</span></code>, or any element can be extracted and removed using <code class="docutils literal notranslate"><span class="pre">pop(n)</span></code></p> </li> <li><p class="first">a nested <a class="reference internal" href="#parseresults">ParseResults</a> can be created by using the pyparsing <code class="docutils literal notranslate"><span class="pre">Group</span></code> class around elements in an expression:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Word</span><span class="p">(</span><span class="n">alphas</span><span class="p">)</span> <span class="o">+</span> <span class="n">Group</span><span class="p">(</span><span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)[</span><span class="o">...</span><span class="p">])</span> <span class="o">+</span> <span class="n">Word</span><span class="p">(</span><span class="n">alphas</span><span class="p">)</span> </pre></div> </div> <p>will parse the string “abc 100 200 300 end” as:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="s1">'abc'</span><span class="p">,</span> <span class="p">[</span><span class="s1">'100'</span><span class="p">,</span> <span class="s1">'200'</span><span class="p">,</span> <span class="s1">'300'</span><span class="p">],</span> <span class="s1">'end'</span><span class="p">]</span> </pre></div> </div> <p>If the <code class="docutils literal notranslate"><span class="pre">Group</span></code> is constructed using <code class="docutils literal notranslate"><span class="pre">aslist=True</span></code>, the resulting tokens will be a Python list instead of a <a class="reference internal" href="#parseresults">ParseResults</a>.</p> </li> </ul> </li> <li><p class="first">as a dictionary</p> <ul class="simple"> <li>if <code class="docutils literal notranslate"><span class="pre">set_results_name()</span></code> is used to name elements within the overall parse expression, then these fields can be referenced as dictionary elements or as attributes</li> <li>the <code class="docutils literal notranslate"><span class="pre">Dict</span></code> class generates dictionary entries using the data of the input text - in addition to <a class="reference internal" href="#parseresults">ParseResults</a> listed as <code class="docutils literal notranslate"><span class="pre">[</span> <span class="pre">[</span> <span class="pre">a1,</span> <span class="pre">b1,</span> <span class="pre">c1,</span> <span class="pre">...],</span> <span class="pre">[</span> <span class="pre">a2,</span> <span class="pre">b2,</span> <span class="pre">c2,</span> <span class="pre">...]</span>  <span class="pre">]</span></code> it also acts as a dictionary with entries defined as <code class="docutils literal notranslate"><span class="pre">{</span> <span class="pre">a1</span> <span class="pre">:</span> <span class="pre">[</span> <span class="pre">b1,</span> <span class="pre">c1,</span> <span class="pre">...</span> <span class="pre">]</span> <span class="pre">},</span> <span class="pre">{</span> <span class="pre">a2</span> <span class="pre">:</span> <span class="pre">[</span> <span class="pre">b2,</span> <span class="pre">c2,</span> <span class="pre">...</span> <span class="pre">]</span> <span class="pre">}</span></code>; this is especially useful when processing tabular data where the first column contains a key value for that line of data; when constructed with <code class="docutils literal notranslate"><span class="pre">aslist=True</span></code>, will return an actual Python <code class="docutils literal notranslate"><span class="pre">dict</span></code> instead of a <a class="reference internal" href="#parseresults">ParseResults</a>.</li> <li>list elements that are deleted using <code class="docutils literal notranslate"><span class="pre">del</span></code> will still be accessible by their dictionary keys</li> <li>supports <code class="docutils literal notranslate"><span class="pre">get()</span></code>, <code class="docutils literal notranslate"><span class="pre">items()</span></code> and <code class="docutils literal notranslate"><span class="pre">keys()</span></code> methods, similar to a dictionary</li> <li>a keyed item can be extracted and removed using <code class="docutils literal notranslate"><span class="pre">pop(key)</span></code>. Here <code class="docutils literal notranslate"><span class="pre">key</span></code> must be non-numeric (such as a string), in order to use dict extraction instead of list extraction.</li> <li>new named elements can be added (in a parse action, for instance), using the same syntax as adding an item to a dict (<code class="docutils literal notranslate"><span class="pre">parse_results["X"]</span> <span class="pre">=</span> <span class="pre">"new</span> <span class="pre">item"</span></code>); named elements can be removed using <code class="docutils literal notranslate"><span class="pre">del</span> <span class="pre">parse_results["X"]</span></code></li> </ul> </li> <li><p class="first">as a nested list</p> <ul class="simple"> <li>results returned from the Group class are encapsulated within their own list structure, so that the tokens can be handled as a hierarchical tree</li> </ul> </li> <li><p class="first">as an object</p> <ul class="simple"> <li>named elements can be accessed as if they were attributes of an object: if an element is referenced that does not exist, it will return <code class="docutils literal notranslate"><span class="pre">""</span></code>.</li> </ul> </li> </ul> <p><a class="reference internal" href="#parseresults">ParseResults</a> can also be converted to an ordinary list of strings by calling <code class="docutils literal notranslate"><span class="pre">as_list()</span></code>. Note that this will strip the results of any field names that have been defined for any embedded parse elements. (The <code class="docutils literal notranslate"><span class="pre">pprint</span></code> module is especially good at printing out the nested contents given by <code class="docutils literal notranslate"><span class="pre">as_list()</span></code>.)</p> <p>Finally, <a class="reference internal" href="#parseresults">ParseResults</a> can be viewed by calling <code class="docutils literal notranslate"><span class="pre">dump()</span></code>. <code class="docutils literal notranslate"><span class="pre">dump()</span></code> will first show the <code class="docutils literal notranslate"><span class="pre">as_list()</span></code> output, followed by an indented structure listing parsed tokens that have been assigned results names.</p> <p>Here is sample code illustrating some of these methods:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">number</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> <span class="gp">>>> </span><span class="n">name</span> <span class="o">=</span> <span class="n">Combine</span><span class="p">(</span><span class="n">Word</span><span class="p">(</span><span class="n">alphas</span><span class="p">)[</span><span class="o">...</span><span class="p">],</span> <span class="n">adjacent</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">join_string</span><span class="o">=</span><span class="s2">" "</span><span class="p">)</span> <span class="gp">>>> </span><span class="n">parser</span> <span class="o">=</span> <span class="n">number</span><span class="p">(</span><span class="s2">"house_number"</span><span class="p">)</span> <span class="o">+</span> <span class="n">name</span><span class="p">(</span><span class="s2">"street_name"</span><span class="p">)</span> <span class="gp">>>> </span><span class="n">result</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_string</span><span class="p">(</span><span class="s2">"123 Main St"</span><span class="p">)</span> <span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="go">['123', 'Main St']</span> <span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">result</span><span class="p">))</span> <span class="go"><class 'pyparsing.ParseResults'></span> <span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">result</span><span class="p">))</span> <span class="go">(['123', 'Main St'], {'house_number': ['123'], 'street_name': ['Main St']})</span> <span class="gp">>>> </span><span class="n">result</span><span class="o">.</span><span class="n">house_number</span> <span class="go">'123'</span> <span class="gp">>>> </span><span class="n">result</span><span class="p">[</span><span class="s2">"street_name"</span><span class="p">]</span> <span class="go">'Main St'</span> <span class="gp">>>> </span><span class="n">result</span><span class="o">.</span><span class="n">as_list</span><span class="p">()</span> <span class="go">['123', 'Main St']</span> <span class="gp">>>> </span><span class="n">result</span><span class="o">.</span><span class="n">as_dict</span><span class="p">()</span> <span class="go">{'house_number': '123', 'street_name': 'Main St'}</span> <span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">result</span><span class="o">.</span><span class="n">dump</span><span class="p">())</span> <span class="go">['123', 'Main St']</span> <span class="go">- house_number: '123'</span> <span class="go">- street_name: 'Main St'</span> </pre></div> </div> </li> </ul> </div> <div class="section" id="exception-classes-and-troubleshooting"> <h3><a class="toc-backref" href="#id14">1.2.9 Exception classes and Troubleshooting</a><a class="headerlink" href="#exception-classes-and-troubleshooting" title="Permalink to this headline">¶</a></h3> <ul id="parseexception"> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">ParseException</span></code> - exception returned when a grammar parse fails; <code class="docutils literal notranslate"><span class="pre">ParseExceptions</span></code> have attributes <code class="docutils literal notranslate"><span class="pre">loc</span></code>, <code class="docutils literal notranslate"><span class="pre">msg</span></code>, <code class="docutils literal notranslate"><span class="pre">line</span></code>, <code class="docutils literal notranslate"><span class="pre">lineno</span></code>, and <code class="docutils literal notranslate"><span class="pre">column</span></code>; to view the text line and location where the reported ParseException occurs, use:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">except</span> <span class="n">ParseException</span> <span class="k">as</span> <span class="n">err</span><span class="p">:</span> <span class="nb">print</span><span class="p">(</span><span class="n">err</span><span class="o">.</span><span class="n">line</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="s2">" "</span> <span class="o">*</span> <span class="p">(</span><span class="n">err</span><span class="o">.</span><span class="n">column</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="s2">"^"</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">err</span><span class="p">)</span> </pre></div> </div> <p><code class="docutils literal notranslate"><span class="pre">ParseExceptions</span></code> also have an <code class="docutils literal notranslate"><span class="pre">explain()</span></code> method that gives this same information:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">except</span> <span class="n">ParseException</span> <span class="k">as</span> <span class="n">err</span><span class="p">:</span> <span class="nb">print</span><span class="p">(</span><span class="n">err</span><span class="o">.</span><span class="n">explain</span><span class="p">())</span> </pre></div> </div> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">RecursiveGrammarException</span></code> - exception returned by <code class="docutils literal notranslate"><span class="pre">validate()</span></code> if the grammar contains a recursive infinite loop, such as:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">bad_grammar</span> <span class="o">=</span> <span class="n">Forward</span><span class="p">()</span> <span class="n">good_token</span> <span class="o">=</span> <span class="n">Literal</span><span class="p">(</span><span class="s2">"A"</span><span class="p">)</span> <span class="n">bad_grammar</span> <span class="o"><<=</span> <span class="n">Opt</span><span class="p">(</span><span class="n">good_token</span><span class="p">)</span> <span class="o">+</span> <span class="n">bad_grammar</span> </pre></div> </div> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">ParseFatalException</span></code> - exception that parse actions can raise to stop parsing immediately. Should be used when a semantic error is found in the input text, such as a mismatched XML tag.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">ParseSyntaxException</span></code> - subclass of <code class="docutils literal notranslate"><span class="pre">ParseFatalException</span></code> raised when a syntax error is found, based on the use of the ‘-‘ operator when defining a sequence of expressions in an <a class="reference internal" href="#and">And</a> expression.</p> </li> <li><p class="first">You can also get some insights into the parsing logic using diagnostic parse actions, and <code class="docutils literal notranslate"><span class="pre">set_debug()</span></code>, or test the matching of expression fragments by testing them using <code class="docutils literal notranslate"><span class="pre">search_string()</span></code> or <code class="docutils literal notranslate"><span class="pre">scan_string()</span></code>.</p> </li> <li><p class="first">Use <code class="docutils literal notranslate"><span class="pre">with_line_numbers</span></code> from <code class="docutils literal notranslate"><span class="pre">pyparsing_testing</span></code> to display the input string being parsed, with line and column numbers that correspond to the values reported in set_debug() output:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pyparsing</span> <span class="k">as</span> <span class="nn">pp</span> <span class="n">ppt</span> <span class="o">=</span> <span class="n">pp</span><span class="o">.</span><span class="n">testing</span> <span class="n">data</span> <span class="o">=</span> <span class="s2">"""</span><span class="se">\</span> <span class="s2"> A</span> <span class="s2"> 100"""</span> <span class="n">expr</span> <span class="o">=</span> <span class="n">pp</span><span class="o">.</span><span class="n">Word</span><span class="p">(</span><span class="n">pp</span><span class="o">.</span><span class="n">alphanums</span><span class="p">)</span><span class="o">.</span><span class="n">set_name</span><span class="p">(</span><span class="s2">"word"</span><span class="p">)</span><span class="o">.</span><span class="n">set_debug</span><span class="p">()</span> <span class="nb">print</span><span class="p">(</span><span class="n">ppt</span><span class="o">.</span><span class="n">with_line_numbers</span><span class="p">(</span><span class="n">data</span><span class="p">))</span> <span class="n">expr</span><span class="p">[</span><span class="o">...</span><span class="p">]</span><span class="o">.</span><span class="n">parseString</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> </pre></div> </div> <p>prints:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">.</span> <span class="mi">1</span> <span class="mi">1234567890</span> <span class="mi">1</span><span class="p">:</span> <span class="n">A</span><span class="o">|</span> <span class="mi">2</span><span class="p">:</span> <span class="mi">100</span><span class="o">|</span> <span class="n">Match</span> <span class="n">word</span> <span class="n">at</span> <span class="n">loc</span> <span class="mi">3</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span> <span class="n">A</span> <span class="o">^</span> <span class="n">Matched</span> <span class="n">word</span> <span class="o">-></span> <span class="p">[</span><span class="s1">'A'</span><span class="p">]</span> <span class="n">Match</span> <span class="n">word</span> <span class="n">at</span> <span class="n">loc</span> <span class="mi">11</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">7</span><span class="p">)</span> <span class="mi">100</span> <span class="o">^</span> <span class="n">Matched</span> <span class="n">word</span> <span class="o">-></span> <span class="p">[</span><span class="s1">'100'</span><span class="p">]</span> </pre></div> </div> <p><cite>with_line_numbers</cite> has several options for displaying control characters, end-of-line and space markers, Unicode symbols for control characters - these are documented in the function’s docstring.</p> </li> <li><p class="first">Diagnostics can be enabled using <code class="docutils literal notranslate"><span class="pre">pyparsing.enable_diag</span></code> and passing one of the following enum values defined in <code class="docutils literal notranslate"><span class="pre">pyparsing.Diagnostics</span></code></p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">warn_multiple_tokens_in_named_alternation</span></code> - flag to enable warnings when a results name is defined on a <a class="reference internal" href="#matchfirst">MatchFirst</a> or <a class="reference internal" href="#or">Or</a> expression with one or more <a class="reference internal" href="#and">And</a> subexpressions</li> <li><code class="docutils literal notranslate"><span class="pre">warn_ungrouped_named_tokens_in_collection</span></code> - flag to enable warnings when a results name is defined on a containing expression with ungrouped subexpressions that also have results names</li> <li><code class="docutils literal notranslate"><span class="pre">warn_name_set_on_empty_Forward</span></code> - flag to enable warnings when a <code class="docutils literal notranslate"><span class="pre">Forward</span></code> is defined with a results name, but has no contents defined</li> <li><code class="docutils literal notranslate"><span class="pre">warn_on_parse_using_empty_Forward</span></code> - flag to enable warnings when a <code class="docutils literal notranslate"><span class="pre">Forward</span></code> is defined in a grammar but has never had an expression attached to it</li> <li><code class="docutils literal notranslate"><span class="pre">warn_on_assignment_to_Forward</span></code> - flag to enable warnings when a <code class="docutils literal notranslate"><span class="pre">Forward</span></code> is defined but is overwritten by assigning using <code class="docutils literal notranslate"><span class="pre">'='</span></code> instead of <code class="docutils literal notranslate"><span class="pre">'<<='</span></code> or <code class="docutils literal notranslate"><span class="pre">'<<'</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">warn_on_multiple_string_args_to_oneof</span></code> - flag to enable warnings when <code class="docutils literal notranslate"><span class="pre">one_of</span></code> is incorrectly called with multiple str arguments</li> <li><code class="docutils literal notranslate"><span class="pre">enable_debug_on_named_expressions</span></code> - flag to auto-enable debug on all subsequent calls to <code class="docutils literal notranslate"><span class="pre">ParserElement.set_name</span></code></li> </ul> <p>All warnings can be enabled by calling <code class="docutils literal notranslate"><span class="pre">pyparsing.enable_all_warnings()</span></code>. Sample:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pyparsing</span> <span class="k">as</span> <span class="nn">pp</span> <span class="n">pp</span><span class="o">.</span><span class="n">enable_all_warnings</span><span class="p">()</span> <span class="n">fwd</span> <span class="o">=</span> <span class="n">pp</span><span class="o">.</span><span class="n">Forward</span><span class="p">()</span><span class="o">.</span><span class="n">set_results_name</span><span class="p">(</span><span class="s2">"recursive_expr"</span><span class="p">)</span> <span class="o">>>></span> <span class="ne">UserWarning</span><span class="p">:</span> <span class="n">warn_name_set_on_empty_Forward</span><span class="p">:</span> <span class="n">setting</span> <span class="n">results</span> <span class="n">name</span> <span class="s1">'recursive_expr'</span> <span class="n">on</span> <span class="n">Forward</span> <span class="n">expression</span> <span class="n">that</span> <span class="n">has</span> <span class="n">no</span> <span class="n">contained</span> <span class="n">expression</span> </pre></div> </div> <p>Warnings can also be enabled using the Python <code class="docutils literal notranslate"><span class="pre">-W</span></code> switch (using <code class="docutils literal notranslate"><span class="pre">-Wd</span></code> or <code class="docutils literal notranslate"><span class="pre">-Wd:::pyparsing</span></code>) or setting a non-empty value to the environment variable <code class="docutils literal notranslate"><span class="pre">PYPARSINGENABLEALLWARNINGS</span></code>. (If using <code class="docutils literal notranslate"><span class="pre">-Wd</span></code> for testing, but wishing to disable pyparsing warnings, add <code class="docutils literal notranslate"><span class="pre">-Wi:::pyparsing</span></code>.)</p> </li> </ul> </div> </div> <div class="section" id="miscellaneous-attributes-and-methods"> <h2><a class="toc-backref" href="#id15">1.3 Miscellaneous attributes and methods</a><a class="headerlink" href="#miscellaneous-attributes-and-methods" title="Permalink to this headline">¶</a></h2> <div class="section" id="helper-methods"> <h3><a class="toc-backref" href="#id16">1.3.1 Helper methods</a><a class="headerlink" href="#helper-methods" title="Permalink to this headline">¶</a></h3> <ul> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">delimited_list(expr,</span> <span class="pre">delim=',')</span></code> - convenience function for matching one or more occurrences of expr, separated by delim. By default, the delimiters are suppressed, so the returned results contain only the separate list elements. Can optionally specify <code class="docutils literal notranslate"><span class="pre">combine=True</span></code>, indicating that the expressions and delimiters should be returned as one combined value (useful for scoped variables, such as <code class="docutils literal notranslate"><span class="pre">"a.b.c"</span></code>, or <code class="docutils literal notranslate"><span class="pre">"a::b::c"</span></code>, or paths such as <code class="docutils literal notranslate"><span class="pre">"a/b/c"</span></code>). Can also optionally specify <code class="docutils literal notranslate"><span class="pre">allow_trailing_delim</span></code> to accept a trailing delimiter at the end of the list.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">counted_array(expr)</span></code> - convenience function for a pattern where an list of instances of the given expression are preceded by an integer giving the count of elements in the list. Returns an expression that parses the leading integer, reads exactly that many expressions, and returns the array of expressions in the parse results - the leading integer is suppressed from the results (although it is easily reconstructed by using len on the returned array).</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">one_of(choices,</span> <span class="pre">caseless=False,</span> <span class="pre">as_keyword=False)</span></code> - convenience function for quickly declaring an alternative set of <a class="reference internal" href="#literal">Literal</a> expressions. <code class="docutils literal notranslate"><span class="pre">choices</span></code> can be passed as a list of strings or as a single string of values separated by spaces. The values are sorted so that longer matches are attempted first; this ensures that a short value does not mask a longer one that starts with the same characters. If <code class="docutils literal notranslate"><span class="pre">caseless=True</span></code>, will create an alternative set of <a class="reference internal" href="#caselessliteral">CaselessLiteral</a> tokens. If <code class="docutils literal notranslate"><span class="pre">as_keyword=True</span></code>, <code class="docutils literal notranslate"><span class="pre">one_of</span></code> will declare <a class="reference internal" href="#keyword">Keyword</a> expressions instead of <a class="reference internal" href="#literal">Literal</a> expressions.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">dict_of(key,</span> <span class="pre">value)</span></code> - convenience function for quickly declaring a dictionary pattern of <code class="docutils literal notranslate"><span class="pre">Dict(ZeroOrMore(Group(key</span> <span class="pre">+</span> <span class="pre">value)))</span></code>.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">make_html_tags(tag_str)</span></code> and <code class="docutils literal notranslate"><span class="pre">make_xml_tags(tag_str)</span></code> - convenience functions to create definitions of opening and closing tag expressions. Returns a pair of expressions, for the corresponding <code class="docutils literal notranslate"><span class="pre"><tag></span></code> and <code class="docutils literal notranslate"><span class="pre"></tag></span></code> strings. Includes support for attributes in the opening tag, such as <code class="docutils literal notranslate"><span class="pre"><tag</span> <span class="pre">attr1="abc"></span></code> - attributes are returned as named results in the returned <a class="reference internal" href="#parseresults">ParseResults</a>. <code class="docutils literal notranslate"><span class="pre">make_html_tags</span></code> is less restrictive than <code class="docutils literal notranslate"><span class="pre">make_xml_tags</span></code>, especially with respect to case sensitivity.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">infix_notation(base_operand,</span> <span class="pre">operator_list)</span></code> - convenience function to define a grammar for parsing infix notation expressions with a hierarchical precedence of operators. To use the <code class="docutils literal notranslate"><span class="pre">infix_notation</span></code> helper:</p> <ol class="arabic simple"> <li>Define the base “atom” operand term of the grammar. For this simple grammar, the smallest operand is either an integer or a variable. This will be the first argument to the <code class="docutils literal notranslate"><span class="pre">infix_notation</span></code> method.</li> <li>Define a list of tuples for each level of operator precedence. Each tuple is of the form <code class="docutils literal notranslate"><span class="pre">(operand_expr,</span> <span class="pre">num_operands,</span> <span class="pre">right_left_assoc,</span> <span class="pre">parse_action)</span></code>, where:<ul> <li><code class="docutils literal notranslate"><span class="pre">operand_expr</span></code> - the pyparsing expression for the operator; may also be a string, which will be converted to a <a class="reference internal" href="#literal">Literal</a>; if <code class="docutils literal notranslate"><span class="pre">None</span></code>, indicates an empty operator, such as the implied multiplication operation between ‘m’ and ‘x’ in “y = mx + b”.</li> <li><code class="docutils literal notranslate"><span class="pre">num_operands</span></code> - the number of terms for this operator (must be 1, 2, or 3)</li> <li><code class="docutils literal notranslate"><span class="pre">right_left_assoc</span></code> is the indicator whether the operator is right or left associative, using the pyparsing-defined constants <code class="docutils literal notranslate"><span class="pre">OpAssoc.RIGHT</span></code> and <code class="docutils literal notranslate"><span class="pre">OpAssoc.LEFT</span></code>.</li> <li><code class="docutils literal notranslate"><span class="pre">parse_action</span></code> is the parse action to be associated with expressions matching this operator expression (the <code class="docutils literal notranslate"><span class="pre">parse_action</span></code> tuple member may be omitted)</li> </ul> </li> <li>Call <code class="docutils literal notranslate"><span class="pre">infix_notation</span></code> passing the operand expression and the operator precedence list, and save the returned value as the generated pyparsing expression. You can then use this expression to parse input strings, or incorporate it into a larger, more complex grammar.</li> </ol> <p><code class="docutils literal notranslate"><span class="pre">infix_notation</span></code> also supports optional arguments <code class="docutils literal notranslate"><span class="pre">lpar</span></code> and <code class="docutils literal notranslate"><span class="pre">rpar</span></code>, to parse groups with symbols other than “(” and “)”. They may be passed as strings (in which case they will be converted to <code class="docutils literal notranslate"><span class="pre">Suppress</span></code> objects, and suppressed from the parsed results), or passed as pyparsing expressions, in which case they will be kept as-is, and grouped with their contents.</p> <p>For instance, to use “<” and “>” for grouping symbols, you could write:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">expr</span> <span class="o">=</span> <span class="n">infix_notation</span><span class="p">(</span><span class="n">int_expr</span><span class="p">,</span> <span class="p">[</span> <span class="p">(</span><span class="n">one_of</span><span class="p">(</span><span class="s2">"+ -"</span><span class="p">),</span> <span class="mi">2</span><span class="p">,</span> <span class="n">opAssoc</span><span class="o">.</span><span class="n">LEFT</span><span class="p">),</span> <span class="p">],</span> <span class="n">lpar</span><span class="o">=</span><span class="s2">"<"</span><span class="p">,</span> <span class="n">rpar</span><span class="o">=</span><span class="s2">">"</span> <span class="p">)</span> <span class="n">expr</span><span class="o">.</span><span class="n">parse_string</span><span class="p">(</span><span class="s2">"3 - <2 + 11>"</span><span class="p">)</span> </pre></div> </div> <p>returning:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'-'</span><span class="p">,</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">,</span> <span class="mi">11</span><span class="p">]]</span> </pre></div> </div> <p>If the grouping symbols are to be retained, then pass them as pyparsing <code class="docutils literal notranslate"><span class="pre">Literals</span></code>:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">expr</span> <span class="o">=</span> <span class="n">infix_notation</span><span class="p">(</span><span class="n">int_expr</span><span class="p">,</span> <span class="p">[</span> <span class="p">(</span><span class="n">one_of</span><span class="p">(</span><span class="s2">"+ -"</span><span class="p">),</span> <span class="mi">2</span><span class="p">,</span> <span class="n">opAssoc</span><span class="o">.</span><span class="n">LEFT</span><span class="p">),</span> <span class="p">],</span> <span class="n">lpar</span><span class="o">=</span><span class="n">Literal</span><span class="p">(</span><span class="s2">"<"</span><span class="p">),</span> <span class="n">rpar</span><span class="o">=</span><span class="n">Literal</span><span class="p">(</span><span class="s2">">"</span><span class="p">)</span> <span class="p">)</span> <span class="n">expr</span><span class="o">.</span><span class="n">parse_string</span><span class="p">(</span><span class="s2">"3 - <2 + 11>"</span><span class="p">)</span> </pre></div> </div> <p>returning:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'-'</span><span class="p">,</span> <span class="p">[</span><span class="s1">'<'</span><span class="p">,</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">,</span> <span class="mi">11</span><span class="p">],</span> <span class="s1">'>'</span><span class="p">]]</span> </pre></div> </div> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">match_previous_literal</span></code> and <code class="docutils literal notranslate"><span class="pre">match_previous_expr</span></code> - function to define an expression that matches the same content as was parsed in a previous parse expression. For instance:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">first</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> <span class="n">match_expr</span> <span class="o">=</span> <span class="n">first</span> <span class="o">+</span> <span class="s2">":"</span> <span class="o">+</span> <span class="n">match_previous_literal</span><span class="p">(</span><span class="n">first</span><span class="p">)</span> </pre></div> </div> <p>will match “1:1”, but not “1:2”. Since this matches at the literal level, this will also match the leading “1:1” in “1:10”.</p> <p>In contrast:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">first</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span> <span class="n">match_expr</span> <span class="o">=</span> <span class="n">first</span> <span class="o">+</span> <span class="s2">":"</span> <span class="o">+</span> <span class="n">match_previous_expr</span><span class="p">(</span><span class="n">first</span><span class="p">)</span> </pre></div> </div> <p>will <em>not</em> match the leading “1:1” in “1:10”; the expressions are evaluated first, and then compared, so “1” is compared with “10”.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">nested_expr(opener,</span> <span class="pre">closer,</span> <span class="pre">content=None,</span> <span class="pre">ignore_expr=quoted_string)</span></code> - method for defining nested lists enclosed in opening and closing delimiters.</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">opener</span></code> - opening character for a nested list (default=”(“); can also be a pyparsing expression</li> <li><code class="docutils literal notranslate"><span class="pre">closer</span></code> - closing character for a nested list (default=”)”); can also be a pyparsing expression</li> <li><code class="docutils literal notranslate"><span class="pre">content</span></code> - expression for items within the nested lists (default=None)</li> <li><code class="docutils literal notranslate"><span class="pre">ignore_expr</span></code> - expression for ignoring opening and closing delimiters (default=``quoted_string``)</li> </ul> <p>If an expression is not provided for the content argument, the nested expression will capture all whitespace-delimited content between delimiters as a list of separate values.</p> <p>Use the <code class="docutils literal notranslate"><span class="pre">ignore_expr</span></code> argument to define expressions that may contain opening or closing characters that should not be treated as opening or closing characters for nesting, such as <code class="docutils literal notranslate"><span class="pre">quoted_string</span></code> or a comment expression. Specify multiple expressions using an <a class="reference internal" href="#or">Or</a> or <a class="reference internal" href="#matchfirst">MatchFirst</a>. The default is <code class="docutils literal notranslate"><span class="pre">quoted_string</span></code>, but if no expressions are to be ignored, then pass <code class="docutils literal notranslate"><span class="pre">None</span></code> for this argument.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">IndentedBlock(statement_expr,</span> <span class="pre">recursive=False,</span> <span class="pre">grouped=True)</span></code> - function to define an indented block of statements, similar to indentation-based blocking in Python source code:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">statement_expr</span></code> - the expression defining a statement that will be found in the indented block; a valid <code class="docutils literal notranslate"><span class="pre">IndentedBlock</span></code> must contain at least 1 matching <code class="docutils literal notranslate"><span class="pre">statement_expr</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">recursive</span></code> - flag indicating whether the IndentedBlock can itself contain nested sub-blocks of the same type of expression (default=False)</li> <li><code class="docutils literal notranslate"><span class="pre">grouped</span></code> - flag indicating whether the tokens returned from parsing the IndentedBlock should be grouped (default=True)</li> </ul> </li> </ul> <ul id="originaltextfor"> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">original_text_for(expr)</span></code> - helper function to preserve the originally parsed text, regardless of any token processing or conversion done by the contained expression. For instance, the following expression:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">full_name</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">alphas</span><span class="p">)</span> <span class="o">+</span> <span class="n">Word</span><span class="p">(</span><span class="n">alphas</span><span class="p">)</span> </pre></div> </div> <p>will return the parse of “John Smith” as [‘John’, ‘Smith’]. In some applications, the actual name as it was given in the input string is what is desired. To do this, use <code class="docutils literal notranslate"><span class="pre">original_text_for</span></code>:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">full_name</span> <span class="o">=</span> <span class="n">original_text_for</span><span class="p">(</span><span class="n">Word</span><span class="p">(</span><span class="n">alphas</span><span class="p">)</span> <span class="o">+</span> <span class="n">Word</span><span class="p">(</span><span class="n">alphas</span><span class="p">))</span> </pre></div> </div> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">ungroup(expr)</span></code> - function to “ungroup” returned tokens; useful to undo the default behavior of <a class="reference internal" href="#and">And</a> to always group the returned tokens, even if there is only one in the list.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">lineno(loc,</span> <span class="pre">string)</span></code> - function to give the line number of the location within the string; the first line is line 1, newlines start new rows</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">col(loc,</span> <span class="pre">string)</span></code> - function to give the column number of the location within the string; the first column is column 1, newlines reset the column number to 1</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">line(loc,</span> <span class="pre">string)</span></code> - function to retrieve the line of text representing <code class="docutils literal notranslate"><span class="pre">lineno(loc,</span> <span class="pre">string)</span></code>; useful when printing out diagnostic messages for exceptions</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">srange(range_spec)</span></code> - function to define a string of characters, given a string of the form used by regexp string ranges, such as <code class="docutils literal notranslate"><span class="pre">"[0-9]"</span></code> for all numeric digits, <code class="docutils literal notranslate"><span class="pre">"[A-Z_]"</span></code> for uppercase characters plus underscore, and so on (note that <code class="docutils literal notranslate"><span class="pre">range_spec</span></code> does not include support for generic regular expressions, just string range specs)</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">trace_parse_action(fn)</span></code> - decorator function to debug parse actions. Lists each call, called arguments, and return value or exception</p> </li> </ul> </div> <div class="section" id="helper-parse-actions"> <h3><a class="toc-backref" href="#id17">1.3.2 Helper parse actions</a><a class="headerlink" href="#helper-parse-actions" title="Permalink to this headline">¶</a></h3> <ul> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">remove_quotes</span></code> - removes the first and last characters of a quoted string; useful to remove the delimiting quotes from quoted strings</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">replace_with(repl_string)</span></code> - returns a parse action that simply returns the <code class="docutils literal notranslate"><span class="pre">repl_string</span></code>; useful when using <code class="docutils literal notranslate"><span class="pre">transform_string</span></code>, or converting HTML entities, as in:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">nbsp</span> <span class="o">=</span> <span class="n">Literal</span><span class="p">(</span><span class="s2">"&nbsp;"</span><span class="p">)</span><span class="o">.</span><span class="n">set_parse_action</span><span class="p">(</span><span class="n">replace_with</span><span class="p">(</span><span class="s2">"<BLANK>"</span><span class="p">))</span> </pre></div> </div> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">original_text_for</span></code>- restores any internal whitespace or suppressed text within the tokens for a matched parse expression. This is especially useful when defining expressions for <code class="docutils literal notranslate"><span class="pre">scan_string</span></code> or <code class="docutils literal notranslate"><span class="pre">transform_string</span></code> applications.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">with_attribute(*args,</span> <span class="pre">**kwargs)</span></code> - helper to create a validating parse action to be used with start tags created with <code class="docutils literal notranslate"><span class="pre">make_xml_tags</span></code> or <code class="docutils literal notranslate"><span class="pre">make_html_tags</span></code>. Use <code class="docutils literal notranslate"><span class="pre">with_attribute</span></code> to qualify a starting tag with a required attribute value, to avoid false matches on common tags such as <code class="docutils literal notranslate"><span class="pre"><TD></span></code> or <code class="docutils literal notranslate"><span class="pre"><DIV></span></code>.</p> <p><code class="docutils literal notranslate"><span class="pre">with_attribute</span></code> can be called with:</p> <ul class="simple"> <li>keyword arguments, as in <code class="docutils literal notranslate"><span class="pre">(class="Customer",</span> <span class="pre">align="right")</span></code>, or</li> <li>a list of name-value tuples, as in <code class="docutils literal notranslate"><span class="pre">(("ns1:class",</span> <span class="pre">"Customer"),</span> <span class="pre">("ns2:align",</span> <span class="pre">"right"))</span></code></li> </ul> <p>An attribute can be specified to have the special value <code class="docutils literal notranslate"><span class="pre">with_attribute.ANY_VALUE</span></code>, which will match any value - use this to ensure that an attribute is present but any attribute value is acceptable.</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">match_only_at_col(column_number)</span></code> - a parse action that verifies that an expression was matched at a particular column, raising a <code class="docutils literal notranslate"><span class="pre">ParseException</span></code> if matching at a different column number; useful when parsing tabular data</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.convert_to_integer()</span></code> - converts all matched tokens to uppercase</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.convert_to_float()</span></code> - converts all matched tokens to uppercase</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.convert_to_date()</span></code> - converts matched token to a datetime.date</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.convert_to_datetime()</span></code> - converts matched token to a datetime.datetime</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.strip_html_tags()</span></code> - removes HTML tags from matched token</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.downcase_tokens()</span></code> - converts all matched tokens to lowercase</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.upcase_tokens()</span></code> - converts all matched tokens to uppercase</p> </li> </ul> </div> <div class="section" id="common-string-and-token-constants"> <h3><a class="toc-backref" href="#id18">1.3.3 Common string and token constants</a><a class="headerlink" href="#common-string-and-token-constants" title="Permalink to this headline">¶</a></h3> <ul> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">alphas</span></code> - same as <code class="docutils literal notranslate"><span class="pre">string.letters</span></code></p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">nums</span></code> - same as <code class="docutils literal notranslate"><span class="pre">string.digits</span></code></p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">alphanums</span></code> - a string containing <code class="docutils literal notranslate"><span class="pre">alphas</span> <span class="pre">+</span> <span class="pre">nums</span></code></p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">alphas8bit</span></code> - a string containing alphabetic 8-bit characters:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ</span> </pre></div> </div> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">printables</span></code> - same as <code class="docutils literal notranslate"><span class="pre">string.printable</span></code>, minus the space (<code class="docutils literal notranslate"><span class="pre">'</span> <span class="pre">'</span></code>) character</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">empty</span></code> - a global <code class="docutils literal notranslate"><span class="pre">Empty()</span></code>; will always match</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">sgl_quoted_string</span></code> - a string of characters enclosed in ‘s; may include whitespace, but not newlines</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">dbl_quoted_string</span></code> - a string of characters enclosed in “s; may include whitespace, but not newlines</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">quoted_string</span></code> - <code class="docutils literal notranslate"><span class="pre">sgl_quoted_string</span> <span class="pre">|</span> <span class="pre">dbl_quoted_string</span></code></p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">c_style_comment</span></code> - a comment block delimited by <code class="docutils literal notranslate"><span class="pre">'/*'</span></code> and <code class="docutils literal notranslate"><span class="pre">'*/'</span></code> sequences; can span multiple lines, but does not support nesting of comments</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">html_comment</span></code> - a comment block delimited by <code class="docutils literal notranslate"><span class="pre">'<!--'</span></code> and <code class="docutils literal notranslate"><span class="pre">'-->'</span></code> sequences; can span multiple lines, but does not support nesting of comments</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">comma_separated_list</span></code> - similar to <code class="docutils literal notranslate"><span class="pre">delimited_list</span></code>, except that the list expressions can be any text value, or a quoted string; quoted strings can safely include commas without incorrectly breaking the string into two tokens</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">rest_of_line</span></code> - all remaining printable characters up to but not including the next newline</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.integer</span></code> - an integer with no leading sign; parsed token is converted to int</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.hex_integer</span></code> - a hexadecimal integer; parsed token is converted to int</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.signed_integer</span></code> - an integer with optional leading sign; parsed token is converted to int</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.fraction</span></code> - signed_integer ‘/’ signed_integer; parsed tokens are converted to float</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.mixed_integer</span></code> - signed_integer ‘-‘ fraction; parsed tokens are converted to float</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.real</span></code> - real number; parsed tokens are converted to float</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.sci_real</span></code> - real number with optional scientific notation; parsed tokens are convert to float</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.number</span></code> - any numeric expression; parsed tokens are returned as converted by the matched expression</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.fnumber</span></code> - any numeric expression; parsed tokens are converted to float</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.identifier</span></code> - a programming identifier (follows Python’s syntax convention of leading alpha or “_”, followed by 0 or more alpha, num, or “_”)</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.ipv4_address</span></code> - IPv4 address</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.ipv6_address</span></code> - IPv6 address</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.mac_address</span></code> - MAC address (with “:”, “-“, or “.” delimiters)</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.iso8601_date</span></code> - date in <code class="docutils literal notranslate"><span class="pre">YYYY-MM-DD</span></code> format</p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.iso8601_datetime</span></code> - datetime in <code class="docutils literal notranslate"><span class="pre">YYYY-MM-DDThh:mm:ss.s(Z|+-00:00)</span></code> format; trailing seconds, milliseconds, and timezone optional; accepts separating <code class="docutils literal notranslate"><span class="pre">'T'</span></code> or <code class="docutils literal notranslate"><span class="pre">'</span> <span class="pre">'</span></code></p> </li> <li><p class="first"><code class="docutils literal notranslate"><span class="pre">common.url</span></code> - matches URL strings and returns a ParseResults with named fields like those returned by <code class="docutils literal notranslate"><span class="pre">urllib.parse.urlparse()</span></code></p> </li> </ul> </div> <div class="section" id="unicode-character-sets-for-international-parsing"> <h3><a class="toc-backref" href="#id19">1.3.4 Unicode character sets for international parsing</a><a class="headerlink" href="#unicode-character-sets-for-international-parsing" title="Permalink to this headline">¶</a></h3> <p>Pyparsing includes the <code class="docutils literal notranslate"><span class="pre">unicode</span></code> namespace that contains definitions for <code class="docutils literal notranslate"><span class="pre">alphas</span></code>, <code class="docutils literal notranslate"><span class="pre">nums</span></code>, <code class="docutils literal notranslate"><span class="pre">alphanums</span></code>, <code class="docutils literal notranslate"><span class="pre">identchars</span></code>, <code class="docutils literal notranslate"><span class="pre">identbodychars</span></code>, and <code class="docutils literal notranslate"><span class="pre">printables</span></code> for character ranges besides 7- or 8-bit ASCII. You can access them using code like the following:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pyparsing</span> <span class="k">as</span> <span class="nn">pp</span> <span class="n">ppu</span> <span class="o">=</span> <span class="n">pp</span><span class="o">.</span><span class="n">unicode</span> <span class="n">greek_word</span> <span class="o">=</span> <span class="n">pp</span><span class="o">.</span><span class="n">Word</span><span class="p">(</span><span class="n">ppu</span><span class="o">.</span><span class="n">Greek</span><span class="o">.</span><span class="n">alphas</span><span class="p">)</span> <span class="n">greek_word</span><span class="p">[</span><span class="o">...</span><span class="p">]</span><span class="o">.</span><span class="n">parse_string</span><span class="p">(</span><span class="s2">"Καλημέρα κόσμε"</span><span class="p">)</span> </pre></div> </div> <p>The following language ranges are defined.</p> <table border="1" class="docutils"> <colgroup> <col width="29%" /> <col width="19%" /> <col width="53%" /> </colgroup> <tbody valign="top"> <tr class="row-odd"><td>Unicode set</td> <td>Alternate names</td> <td>Description</td> </tr> <tr class="row-even"><td>Arabic</td> <td>العربية</td> <td> </td> </tr> <tr class="row-odd"><td>Chinese</td> <td>中文</td> <td> </td> </tr> <tr class="row-even"><td>Cyrillic</td> <td>кириллица</td> <td> </td> </tr> <tr class="row-odd"><td>Greek</td> <td>Ελληνικά</td> <td> </td> </tr> <tr class="row-even"><td>Hebrew</td> <td>עִברִית</td> <td> </td> </tr> <tr class="row-odd"><td>Japanese</td> <td>日本語</td> <td>Union of Kanji, Katakana, and Hiragana sets</td> </tr> <tr class="row-even"><td>Japanese.Kanji</td> <td>漢字</td> <td> </td> </tr> <tr class="row-odd"><td>Japanese.Katakana</td> <td>カタカナ</td> <td> </td> </tr> <tr class="row-even"><td>Japanese.Hiragana</td> <td>ひらがな</td> <td> </td> </tr> <tr class="row-odd"><td>Hangul</td> <td>Korean, 한국어</td> <td> </td> </tr> <tr class="row-even"><td>Latin1</td> <td> </td> <td>All Unicode characters up to code point 255</td> </tr> <tr class="row-odd"><td>LatinA</td> <td> </td> <td> </td> </tr> <tr class="row-even"><td>LatinB</td> <td> </td> <td> </td> </tr> <tr class="row-odd"><td>Thai</td> <td>ไทย</td> <td> </td> </tr> <tr class="row-even"><td>Devanagari</td> <td>देवनागरी</td> <td> </td> </tr> <tr class="row-odd"><td>BasicMultilingualPlane</td> <td>BMP</td> <td>All Unicode characters up to code point 65535</td> </tr> <tr class="row-even"><td>CJK</td> <td> </td> <td>Union of Chinese, Japanese, and Korean sets</td> </tr> </tbody> </table> <p>The base <code class="docutils literal notranslate"><span class="pre">unicode</span></code> class also includes definitions based on all Unicode code points up to <code class="docutils literal notranslate"><span class="pre">sys.maxunicode</span></code>. This set will include emojis, wingdings, and many other specialized and typographical variant characters.</p> </div> </div> <div class="section" id="generating-railroad-diagrams"> <h2><a class="toc-backref" href="#id20">1.4 Generating Railroad Diagrams</a><a class="headerlink" href="#generating-railroad-diagrams" title="Permalink to this headline">¶</a></h2> <p>Grammars are conventionally represented in what are called “railroad diagrams”, which allow you to visually follow the sequence of tokens in a grammar along lines which are a bit like train tracks. You might want to generate a railroad diagram for your grammar in order to better understand it yourself, or maybe to communicate it to others.</p> <div class="section" id="usage"> <h3><a class="toc-backref" href="#id21">1.4.1 Usage</a><a class="headerlink" href="#usage" title="Permalink to this headline">¶</a></h3> <p>To generate a railroad diagram in pyparsing, you first have to install pyparsing with the <code class="docutils literal notranslate"><span class="pre">diagrams</span></code> extra. To do this, just run <code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">pyparsing[diagrams]</span></code>, and make sure you add <code class="docutils literal notranslate"><span class="pre">pyparsing[diagrams]</span></code> to any <code class="docutils literal notranslate"><span class="pre">setup.py</span></code> or <code class="docutils literal notranslate"><span class="pre">requirements.txt</span></code> that specifies pyparsing as a dependency.</p> <p>Create your parser as you normally would. Then call <code class="docutils literal notranslate"><span class="pre">create_diagram()</span></code>, passing the name of an output HTML file.:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">street_address</span> <span class="o">=</span> <span class="n">Word</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span><span class="o">.</span><span class="n">set_name</span><span class="p">(</span><span class="s2">"house_number"</span><span class="p">)</span> <span class="o">+</span> <span class="n">Word</span><span class="p">(</span><span class="n">alphas</span><span class="p">)[</span><span class="mi">1</span><span class="p">,</span> <span class="o">...</span><span class="p">]</span><span class="o">.</span><span class="n">set_name</span><span class="p">(</span><span class="s2">"street_name"</span><span class="p">)</span> <span class="n">street_address</span><span class="o">.</span><span class="n">set_name</span><span class="p">(</span><span class="s2">"street_address"</span><span class="p">)</span> <span class="n">street_address</span><span class="o">.</span><span class="n">create_diagram</span><span class="p">(</span><span class="s2">"street_address_diagram.html"</span><span class="p">)</span> </pre></div> </div> <p>This will result in the railroad diagram being written to <code class="docutils literal notranslate"><span class="pre">street_address_diagram.html</span></code>.</p> <p>Diagrams usually will vertically wrap expressions containing more than 3 terms. You can override this by passing the <cite>vertical</cite> argument to <cite>create_diagram</cite> with a larger value.</p> </div> <div class="section" id="example"> <h3><a class="toc-backref" href="#id22">1.4.2 Example</a><a class="headerlink" href="#example" title="Permalink to this headline">¶</a></h3> <p>You can view an example railroad diagram generated from <a class="reference external" href="_static/sql_railroad.html">a pyparsing grammar for SQL SELECT statements</a>.</p> </div> <div class="section" id="naming-tip"> <h3><a class="toc-backref" href="#id23">1.4.3 Naming tip</a><a class="headerlink" href="#naming-tip" title="Permalink to this headline">¶</a></h3> <p>Parser elements that are separately named will be broken out as their own sub-diagrams. As a short-cut alternative to going through and adding <code class="docutils literal notranslate"><span class="pre">.set_name()</span></code> calls on all your sub-expressions, you can use <code class="docutils literal notranslate"><span class="pre">autoname_elements()</span></code> after defining your complete grammar. For example:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">a</span> <span class="o">=</span> <span class="n">pp</span><span class="o">.</span><span class="n">Literal</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span> <span class="n">b</span> <span class="o">=</span> <span class="n">pp</span><span class="o">.</span><span class="n">Literal</span><span class="p">(</span><span class="s2">"b"</span><span class="p">)</span><span class="o">.</span><span class="n">set_name</span><span class="p">(</span><span class="s2">"bbb"</span><span class="p">)</span> <span class="n">pp</span><span class="o">.</span><span class="n">autoname_elements</span><span class="p">()</span> </pre></div> </div> <p><cite>a</cite> will get named “a”, while <cite>b</cite> will keep its name “bbb”.</p> </div> <div class="section" id="customization"> <h3><a class="toc-backref" href="#id24">1.4.4 Customization</a><a class="headerlink" href="#customization" title="Permalink to this headline">¶</a></h3> <p>You can customize the resulting diagram in a few ways. To do so, run <code class="docutils literal notranslate"><span class="pre">pyparsing.diagrams.to_railroad</span></code> to convert your grammar into a form understood by the <a class="reference external" href="https://github.com/tabatkins/railroad-diagrams/blob/gh-pages/README-py.md">railroad-diagrams</a> module, and then <code class="docutils literal notranslate"><span class="pre">pyparsing.diagrams.railroad_to_html</span></code> to convert that into an HTML document. For example:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyparsing.diagram</span> <span class="k">import</span> <span class="n">to_railroad</span><span class="p">,</span> <span class="n">railroad_to_html</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'output.html'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fp</span><span class="p">:</span> <span class="n">railroad</span> <span class="o">=</span> <span class="n">to_railroad</span><span class="p">(</span><span class="n">my_grammar</span><span class="p">)</span> <span class="n">fp</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">railroad_to_html</span><span class="p">(</span><span class="n">railroad</span><span class="p">))</span> </pre></div> </div> <p>This will result in the railroad diagram being written to <code class="docutils literal notranslate"><span class="pre">output.html</span></code></p> <p>You can then pass in additional keyword arguments to <code class="docutils literal notranslate"><span class="pre">pyparsing.diagrams.to_railroad</span></code>, which will be passed into the <code class="docutils literal notranslate"><span class="pre">Diagram()</span></code> constructor of the underlying library, <a class="reference external" href="https://github.com/tabatkins/railroad-diagrams/blob/gh-pages/README-py.md#diagrams">as explained here</a>.</p> <p>In addition, you can edit global options in the underlying library, by editing constants:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyparsing.diagram</span> <span class="k">import</span> <span class="n">to_railroad</span><span class="p">,</span> <span class="n">railroad_to_html</span> <span class="kn">import</span> <span class="nn">railroad</span> <span class="n">railroad</span><span class="o">.</span><span class="n">DIAGRAM_CLASS</span> <span class="o">=</span> <span class="s2">"my-custom-class"</span> <span class="n">my_railroad</span> <span class="o">=</span> <span class="n">to_railroad</span><span class="p">(</span><span class="n">my_grammar</span><span class="p">)</span> </pre></div> </div> <p>These options <a class="reference external" href="https://github.com/tabatkins/railroad-diagrams/blob/gh-pages/README-py.md#options">are documented here</a>.</p> <p>Finally, you can edit the HTML produced by <code class="docutils literal notranslate"><span class="pre">pyparsing.diagrams.railroad_to_html</span></code> by passing in certain keyword arguments that will be used in the HTML template. Currently, these are:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">head</span></code>: A string containing HTML to use in the <code class="docutils literal notranslate"><span class="pre"><head></span></code> tag. This might be a stylesheet or other metadata</li> <li><code class="docutils literal notranslate"><span class="pre">body</span></code>: A string containing HTML to use in the <code class="docutils literal notranslate"><span class="pre"><body></span></code> tag, above the actual diagram. This might consist of a heading, description, or JavaScript.</li> </ul> <p>If you want to provide a custom stylesheet using the <code class="docutils literal notranslate"><span class="pre">head</span></code> keyword, you can make use of the following CSS classes:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">railroad-group</span></code>: A group containing everything relating to a given element group (ie something with a heading)</li> <li><code class="docutils literal notranslate"><span class="pre">railroad-heading</span></code>: The title for each group</li> <li><code class="docutils literal notranslate"><span class="pre">railroad-svg</span></code>: A div containing only the diagram SVG for each group</li> <li><code class="docutils literal notranslate"><span class="pre">railroad-description</span></code>: A div containing the group description (unused)</li> </ul> </div> </div> </div> </div> </div> </div> <div class="sphinxsidebar" role="navigation" aria-label="main navigation"> <div class="sphinxsidebarwrapper"> <h1 class="logo"><a href="index.html">PyParsing</a></h1> <h3>Navigation</h3> <p class="caption"><span class="caption-text">Contents:</span></p> <ul class="current"> <li class="toctree-l1"><a class="reference internal" href="whats_new_in_3_0_0.html">1 What’s New in Pyparsing 3.0.0</a></li> <li class="toctree-l1 current"><a class="current reference internal" href="#">1 Using the pyparsing module</a><ul> <li class="toctree-l2"><a class="reference internal" href="#steps-to-follow">1.1 Steps to follow</a></li> <li class="toctree-l2"><a class="reference internal" href="#classes">1.2 Classes</a></li> <li class="toctree-l2"><a class="reference internal" href="#miscellaneous-attributes-and-methods">1.3 Miscellaneous attributes and methods</a></li> <li class="toctree-l2"><a class="reference internal" href="#generating-railroad-diagrams">1.4 Generating Railroad Diagrams</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="modules.html">pyparsing</a></li> <li class="toctree-l1"><a class="reference internal" href="CODE_OF_CONDUCT.html">Contributor Covenant Code of Conduct</a></li> </ul> <div class="relations"> <h3>Related Topics</h3> <ul> <li><a href="index.html">Documentation overview</a><ul> <li>Previous: <a href="whats_new_in_3_0_0.html" title="previous chapter">1 What’s New in Pyparsing 3.0.0</a></li> <li>Next: <a href="modules.html" title="next chapter">pyparsing</a></li> </ul></li> </ul> </div> <div id="searchbox" style="display: none" role="search"> <h3>Quick search</h3> <div class="searchformwrapper"> <form class="search" action="search.html" method="get"> <input type="text" name="q" /> <input type="submit" value="Go" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </div> </div> <script type="text/javascript">$('#searchbox').show(0);</script> </div> </div> <div class="clearer"></div> </div> <div class="footer"> ©2018-2021, Paul T. McGuire. | Powered by <a href="http://sphinx-doc.org/">Sphinx 1.7.6</a> & <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.9</a> | <a href="_sources/HowToUsePyparsing.rst.txt" rel="nofollow">Page source</a> </div> </body> </html>
💻
System Info
Current path
/opt/alt/python37/share/doc/alt-python37-pyparsing-doc/html
Contents
3 folders, 12 files
Disk free
101.18 GB
PHP version
8.3.30
🚀
Quick Actions
📍 Script location
🌐 Web root
🗑️ Temp
📁 /opt/alt/python37/share/doc/alt-python37-pyparsing-doc/html
⚡ Nexus File Manager • 15 items