early-access version 3088

This commit is contained in:
pineappleEA
2022-11-05 15:35:56 +01:00
parent 4e4fc25ce3
commit b601909c6d
35519 changed files with 5996896 additions and 860 deletions

View File

@@ -0,0 +1,156 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>Boost Char Delimiters Separator</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
"#FF0000">
<p><img src="../../../boost.png" alt="C++ Boost" width="277" height=
"86"><br></p><font color="red">Note: This class is deprecated. Please use
<a href="char_separator.htm"><tt>char_separator</tt></a> instead.</font>
<h1 align="center">Char Delimiters Separator</h1>
<pre>
template &lt;class Char, class Traits = std::char_traits&lt;Char&gt; &gt;
class char_delimiters_separator{
</pre>
<p>The char_delimiters_separator class is an implementation of the <a href=
"tokenizerfunction.htm">TokenizerFunction</a> concept that can be used to
break text up into tokens. It is the default TokenizerFunction for
tokenizer and token_iterator_generator. An example is below.</p>
<h2>Example</h2>
<pre>
// simple_example_4.cpp
#include&lt;iostream&gt;
#include&lt;boost/tokenizer.hpp&gt;
#include&lt;string&gt;
int main(){
using namespace std;
using namespace boost;
string s = "This is, a test";
tokenizer&lt;char_delimiters_separator&lt;char&gt; &gt; tok(s);
for(tokenizer&lt;char_delimiters_separator&lt;char&gt; &gt;::iterator beg=tok.begin(); beg!=tok.end();++beg){
cout &lt;&lt; *beg &lt;&lt; "\n";
}
}
</pre>
<h2>Construction and Usage</h2>
<p>There is one constructor of interest. It is as follows</p>
<pre>
explicit char_delimiters_separator(bool return_delims = false,
const Char* returnable = "",const Char* nonreturnable = "" )
</pre>
<table border="1" summary="">
<tr>
<td>
<p align="center"><strong>Parameter</strong></p>
</td>
<td>
<p align="center"><strong>Description</strong></p>
</td>
</tr>
<tr>
<td>return_delims</td>
<td>Whether or not to return the delimiters that have been found. Note
that not all delimiters can be returned. See the other two parameters
for explanation.</td>
</tr>
<tr>
<td>returnable</td>
<td>This specifies the returnable delimiters. These are the delimiters
that can be returned as tokens when return_delims is true. Since these
are typically punctuation, if a 0 is provided as the argument, then the
returnable delmiters will be all characters Cfor which std::ispunct(C)
yields a true value. If an argument of "" is provided, then this is
taken to mean that there are noreturnable delimiters.</td>
</tr>
<tr>
<td>nonreturnable</td>
<td>This specifies the nonreturnable delimiters. These are delimiters
that cannot be returned as tokens. Since these are typically
whitespace, if 0 is specified as an argument, then the nonreturnable
delimiters will be all characters C for which std::isspace(C) yields a
true value. If an argument of "" is provided, then this is taken to
mean that there are no non-returnable delimiters.</td>
</tr>
</table>
<p>The reason there is a distinction between nonreturnable and returnable
delimiters is that some delimiters are just used to split up tokens and are
nothing more. Take for example the following string "b c +". Assume you are
writing a simple calculator to parse expression in post fix notation. While
both the space and the + separate tokens, you only only interested in the +
and not in the space. Indeed having the space returned as a token would
only complicate your code. In this case you would specify + as a
returnable, and space as a nonreturnable delimiter.</p>
<p>To use this class, pass an object of it anywhere a TokenizerFunction
object is required.</p>
<h3>Template Parameters</h3>
<table border="1" summary="">
<tr>
<th>Parameter</th>
<th>Description</th>
</tr>
<tr>
<td><tt>Char</tt></td>
<td>The type of the elements within a token, typically
<tt>char</tt>.</td>
</tr>
<tr>
<td>Traits</td>
<td>The traits class for Char, typically
std::char_traits&lt;Char&gt;</td>
</tr>
</table>
<h2>Model of</h2>
<p><a href="tokenizerfunction.htm">TokenizerFunction</a></p>
<p>&nbsp;</p>
<hr>
<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
"../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
height="31" width="88"></a></p>
<p>Revised
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25
December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
<p><i>Copyright &copy; 2001 John R. Bandela</i></p>
<p><i>Distributed under the Boost Software License, Version 1.0. (See
accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
copy at <a href=
"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
</body>
</html>

View File

@@ -0,0 +1,230 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>Boost Char Separator</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
"#FF0000">
<p><img src="../../../boost.png" alt="C++ Boost" width="277" height=
"86"><br></p>
<h1>char_separator&lt;Char, Traits&gt;</h1>
<p>The <tt>char_separator</tt> class breaks a sequence of characters into
tokens based on character delimiters much in the same way that
<tt>strtok()</tt> does (but without all the evils of non-reentrancy and
destruction of the input sequence).</p>
<p>The <tt>char_separator</tt> class is used in conjunction with the
<a href="token_iterator.htm"><tt>token_iterator</tt></a> or <a href=
"tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing.</p>
<h2>Definitions</h2>
<p>The <tt>strtok()</tt> function does not include matches with the
character delimiters in the output sequence of tokens. However, sometimes
it is useful to have the delimiters show up in the output sequence,
therefore <tt>char_separator</tt> provides this as an option. We refer to
delimiters that show up as output tokens as <b><i>kept delimiters</i></b>
and delimiters that do now show up as output tokens as <b><i>dropped
delimiters</i></b>.</p>
<p>When two delimiters appear next to each other in the input sequence,
there is the question of whether to output an <b><i>empty token</i></b> or
to skip ahead. The behaviour of <tt>strtok()</tt> is to skip ahead. The
<tt>char_separator</tt> class provides both options.</p>
<h2>Examples</h2>
<p>This first examples shows how to use <tt>char_separator</tt> as a
replacement for the <tt>strtok()</tt> function. We've specified three
character delimiters, and they will not show up as output tokens. We have
not specified any kept delimiters, and by default any empty tokens will be
ignored.</p>
<blockquote>
<pre>
// char_sep_example_1.cpp
#include &lt;iostream&gt;
#include &lt;boost/tokenizer.hpp&gt;
#include &lt;string&gt;
int main()
{
std::string str = ";;Hello|world||-foo--bar;yow;baz|";
typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt;
tokenizer;
boost::char_separator&lt;char&gt; sep("-;|");
tokenizer tokens(str, sep);
for (tokenizer::iterator tok_iter = tokens.begin();
tok_iter != tokens.end(); ++tok_iter)
std::cout &lt;&lt; "&lt;" &lt;&lt; *tok_iter &lt;&lt; "&gt; ";
std::cout &lt;&lt; "\n";
return EXIT_SUCCESS;
}
</pre>
</blockquote>The output is:
<blockquote>
<pre>
&lt;Hello&gt; &lt;world&gt; &lt;foo&gt; &lt;bar&gt; &lt;yow&gt; &lt;baz&gt;
</pre>
</blockquote>
<p>The next example shows tokenizing with two dropped delimiters '-' and
';' and a single kept delimiter '|'. We also specify that empty tokens
should show up in the output when two delimiters are next to each
other.</p>
<blockquote>
<pre>
// char_sep_example_2.cpp
#include &lt;iostream&gt;
#include &lt;boost/tokenizer.hpp&gt;
#include &lt;string&gt;
int main()
{
std::string str = ";;Hello|world||-foo--bar;yow;baz|";
typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt;
tokenizer;
boost::char_separator&lt;char&gt; sep("-;", "|", boost::keep_empty_tokens);
tokenizer tokens(str, sep);
for (tokenizer::iterator tok_iter = tokens.begin();
tok_iter != tokens.end(); ++tok_iter)
std::cout &lt;&lt; "&lt;" &lt;&lt; *tok_iter &lt;&lt; "&gt; ";
std::cout &lt;&lt; "\n";
return EXIT_SUCCESS;
}
</pre>
</blockquote>The output is:
<blockquote>
<pre>
&lt;&gt; &lt;&gt; &lt;Hello&gt; &lt;|&gt; &lt;world&gt; &lt;|&gt; &lt;&gt; &lt;|&gt; &lt;&gt; &lt;foo&gt; &lt;&gt; &lt;bar&gt; &lt;yow&gt; &lt;baz&gt; &lt;|&gt; &lt;&gt;
</pre>
</blockquote>
<p>The final example shows tokenizing on punctuation and whitespace
characters using the default constructor of the
<tt>char_separator</tt>.</p>
<blockquote>
<pre>
// char_sep_example_3.cpp
#include &lt;iostream&gt;
#include &lt;boost/tokenizer.hpp&gt;
#include &lt;string&gt;
int main()
{
std::string str = "This is, a test";
typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; Tok;
boost::char_separator&lt;char&gt; sep; // default constructed
Tok tok(str, sep);
for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
std::cout &lt;&lt; "&lt;" &lt;&lt; *tok_iter &lt;&lt; "&gt; ";
std::cout &lt;&lt; "\n";
return EXIT_SUCCESS;
}
</pre>
</blockquote>The output is:
<blockquote>
<pre>
&lt;This&gt; &lt;is&gt; &lt;,&gt; &lt;a&gt; &lt;test&gt;
</pre>
</blockquote>
<h2>Template parameters</h2>
<table border summary="">
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Default</th>
</tr>
<tr>
<td><tt>Char</tt></td>
<td>The type of elements within a token, typically <tt>char</tt>.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td><tt>Traits</tt></td>
<td>The <tt>char_traits</tt> for the character type.</td>
<td><tt>char_traits&lt;char&gt;</tt></td>
</tr>
</table>
<h2>Model of</h2><a href="tokenizerfunction.htm">Tokenizer Function</a>
<h2>Members</h2>
<hr>
<pre>
explicit char_separator(const Char* dropped_delims,
const Char* kept_delims = "",
empty_token_policy empty_tokens = drop_empty_tokens)
</pre>
<p>This creates a <tt>char_separator</tt> object, which can then be used to
create a <a href="token_iterator.htm"><tt>token_iterator</tt></a> or
<a href="tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing. The
<tt>dropped_delims</tt> and <tt>kept_delims</tt> are strings of characters
where each character is used as delimiter during tokenizing. Whenever a
delimiter is seen in the input sequence, the current token is finished, and
a new token begins. The delimiters in <tt>dropped_delims</tt> do not show
up as tokens in the output whereas the delimiters in <tt>kept_delims</tt>
do show up as tokens. If <tt>empty_tokens</tt> is
<tt>drop_empty_tokens</tt>, then empty tokens will not show up in the
output. If <tt>empty_tokens</tt> is <tt>keep_empty_tokens</tt> then empty
tokens will show up in the output.</p>
<hr>
<pre>
explicit char_separator()
</pre>
<p>The function <tt>std::isspace()</tt> is used to identify dropped
delimiters and <tt>std::ispunct()</tt> is used to identify kept delimiters.
In addition, empty tokens are dropped.</p>
<hr>
<pre>
template &lt;typename InputIterator, typename Token&gt;
bool operator()(InputIterator&amp; next, InputIterator end, Token&amp; tok)
</pre>
<p>This function is called by the <a href=
"token_iterator.htm"><tt>token_iterator</tt></a> to perform tokenizing. The
user typically does not call this function directly.</p>
<hr>
<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
"../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
height="31" width="88"></a></p>
<p>Revised
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25
December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
<p><i>Copyright &copy; 2001-2002 Jeremy Siek and John R. Bandela</i></p>
<p><i>Distributed under the Boost Software License, Version 1.0. (See
accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
copy at <a href=
"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
</body>
</html>

View File

@@ -0,0 +1,231 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>Boost Escaped List Separator</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
"#FF0000">
<h1 align="left"><img src="../../../boost.png" alt="C++ Boost" width="277"
height="86"></h1>
<h1 align="center">Escaped List Separator</h1>
<div align="left">
<pre>
escaped_list_separator&lt;Char, Traits = std::char_traits&lt;Char&gt; &gt;
</pre>
</div>
<p>The <tt>escaped_list_separator</tt> class is an implementation of the
<a href="tokenizerfunction.htm">TokenizerFunction</a>. The
escaped_list_separator parses a superset of the csv (comma separated value)
format. The examples of this formate are below. It is assumed that the
default characters for separator, quote, and escape are used.</p>
<p>Field 1,Field 2,Field 3<br>
Field 1,"Field 2, with comma",Field 3<br>
Field 1,Field 2 with \"embedded quote\",Field 3<br>
Field 1, Field 2 with \n new line,Field 3<br>
Field 1, Field 2 with embedded \\ ,Field 3</p>
<p>Fields are normally separated by commas. If you want to put a comma in a
field, you need to put quotes around it. Also 3 escape sequences are
supported</p>
<table border="1" summary="">
<tr>
<td>
<p align="center"><strong>Escape Sequence</strong></p>
</td>
<td>
<p align="center"><strong>Result</strong></p>
</td>
</tr>
<tr>
<td>&lt;escape&gt;&lt;quote&gt;</td>
<td>&lt;quote&gt;</td>
</tr>
<tr>
<td>&lt;escape&gt;n</td>
<td>newline</td>
</tr>
<tr>
<td>&lt;escape&gt;&lt;escape&gt;</td>
<td>&lt;escape&gt;</td>
</tr>
</table>
<p>Where &lt;quote&gt; is any character specified to be a quote
and&lt;escape&gt; is any character specified to be an escape character.</p>
<h2>Example</h2>
<pre>
// simple_example_2.cpp
#include&lt;iostream&gt;
#include&lt;boost/tokenizer.hpp&gt;
#include&lt;string&gt;
int main(){
using namespace std;
using namespace boost;
string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3";
tokenizer&lt;escaped_list_separator&lt;char&gt; &gt; tok(s);
for(tokenizer&lt;escaped_list_separator&lt;char&gt; &gt;::iterator beg=tok.begin(); beg!=tok.end();++beg){
cout &lt;&lt; *beg &lt;&lt; "\n";
}
}
</pre>
<p>&nbsp;</p>
<h2>Construction and Usage</h2>
<p>escaped_list_separator has 2 constructors. They are as follows</p>
<pre>
explicit escaped_list_separator(Char e = '\\', Char c = ',',Char q = '\"')
</pre>
<table border="1" summary="">
<tr>
<td>
<p align="center"><strong>Parameter</strong></p>
</td>
<td>
<p align="center"><strong>Description</strong></p>
</td>
</tr>
<tr>
<td>e</td>
<td>Specifies the character to use for escape sequences. It defaults to
the C style \ (backslash). However you can override by passing in a
different character. An example of when you might want to do this is
when you have many fields which are Windows style filenames. Instead of
escaping out each \ in the path, you can change the escape to something
else.</td>
</tr>
<tr>
<td>c</td>
<td>Specifies the character to use to separate the fields</td>
</tr>
<tr>
<td>q</td>
<td>Specifies the character to use for the quote.</td>
</tr>
</table>
<p>&nbsp;</p>
<pre>
escaped_list_separator(string_type e, string_type c, string_type q):
</pre>
<table border="1" summary="">
<tr>
<td>
<p align="center"><strong>Parameter</strong></p>
</td>
<td>
<p align="center"><strong>Description</strong></p>
</td>
</tr>
<tr>
<td>e</td>
<td>Any character in the string e, is considered to be an escape
character. If an empty string is given, then there are no escape
characters.</td>
</tr>
<tr>
<td>c</td>
<td>Any character in the string c, is considered to be a separator. If
an empty string is given, then there are no separator characters.</td>
</tr>
<tr>
<td>q</td>
<td>Any character in the string q, is considered to be a quote. If an
empty string is given, then there are no quote characters.</td>
</tr>
</table>
<p>&nbsp;</p>
<p>To use this class, pass an object of it anywhere in the Tokenizer
package where a TokenizerFunction is required.</p>
<p>&nbsp;</p>
<h2>Template Parameters</h2>
<table border="1" summary="">
<tr>
<th><strong>Parameter</strong></th>
<th><strong>Description</strong></th>
</tr>
<tr>
<td><tt>Char</tt></td>
<td>The type of the elements within a token, typically
<tt>char</tt>.</td>
</tr>
<tr>
<td>Traits</td>
<td>The traits class for the Char type. This is used for comparing
Char's. It defaults to std::char_traits&lt;Char&gt;</td>
</tr>
</table>
<p>&nbsp;</p>
<h2>Model of</h2>
<p><a href="tokenizerfunction.htm">TokenizerFunction</a></p>
<p>&nbsp;</p>
<hr>
<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
"../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
height="31" width="88"></a></p>
<p>Revised
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25
December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
<p><i>Copyright &copy; 2001 John R. Bandela</i></p>
<p><i>Distributed under the Boost Software License, Version 1.0. (See
accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
copy at <a href=
"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
</body>
</html>

View File

@@ -0,0 +1,96 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>Boost Tokenizer Overview</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
"#FF0000">
<p><img src="../../../boost.png" alt="C++ Boost" width="277" height=
"86"><br></p>
<h1 align="center">Table Of Contents</h1>
<p align="left">&nbsp;</p>
<h2 align="left"><a href="introduc.htm">Introduction</a></h2>
<h2 align="left">Containers and Iterators</h2>
<ul>
<li>
<h3 align="left"><a href="tokenizer.htm">tokenizer</a></h3>
</li>
<li>
<h3 align="left"><a href="token_iterator.htm">token iterator</a></h3>
</li>
</ul>
<h2><a href="tokenizerfunction.htm">TokenizerFunction Concept</a></h2>
<h2>TokenizerFunction Models</h2>
<ul>
<li>
<h3><a href="char_separator.htm">char_separator</a></h3>
</li>
<li>
<h3><a href=
"escaped_list_separator.htm">escaped_list_separator</a></h3>
</li>
<li>
<h3><a href="offset_separator.htm">offset_separator</a></h3>
</li>
<li><font color="red">Deprecated:</font> <a href=
"char_delimiters_separator.htm">char_delimiters_separator</a></li>
</ul>
<h2>&nbsp;</h2>
<h2>Acknowledgements</h2>
<p>I wish to thank the members of the boost mailing list, whose comments,
compliments, and criticisms during both the development and formal review
helped make the Tokenizer library what it is. I especially wish to thank
Aleksey Gurtovoy for the idea of using a pair of iterators to specify the
input, instead of a string. I also wish to thank Jeremy Siek for his idea
of providing a container interface for the token iterators and for
simplifying the template parameters for the TokenizerFunctions. He and
Daryle Walker also emphasized the need to separate interface and
implementation. Gary Powell sparked the idea of using the isspace and
ispunct as the defaults for char_delimiters_separator. Jeff Garland
provided ideas on how to change to order of the template parameters in
order to make tokenizer easier to declare. Thanks to Douglas Gregor who
served as review manager and provided many insights both on the boost list
and in e-mail on how to polish up the implementation and presentation of
Tokenizer. Finally, thanks to Beman Dawes who integrated the final version
into the boost distribution.</p>
<hr>
<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
"../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
height="31" width="88"></a></p>
<p>Revised
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25
December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
<p><i>Copyright &copy; 2000 Jeremy Siek<br>
Copyright &copy; 2001 John R. Bandela</i></p>
<p><i>Distributed under the Boost Software License, Version 1.0. (See
accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
copy at <a href=
"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
</body>
</html>

View File

@@ -0,0 +1,120 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>Introduction</title>
</head>
<body bgcolor="#FFFFFF">
<p><img src="../../../boost.png" alt="C++ Boost" width="277" height=
"86"/><br></p>
<h1 align="center">Introduction</h1>
<p align="left">The Boost Tokenizer package provides a flexible and
easy-to-use way to break a string or other character sequence into a series
of tokens. Below is a simple example that will break up a phrase into
words.</p>
<div align="left">
<pre>
// simple_example_1.cpp
#include&lt;iostream&gt;
#include&lt;boost/tokenizer.hpp&gt;
#include&lt;string&gt;
int main(){
using namespace std;
using namespace boost;
string s = "This is, a test";
tokenizer&lt;&gt; tok(s);
for(tokenizer&lt;&gt;::iterator beg=tok.begin(); beg!=tok.end();++beg){
cout &lt;&lt; *beg &lt;&lt; "\n";
}
}
</pre>
</div>
<p align="left">You can choose how the string gets parsed by using the
TokenizerFunction. If you do not specify anything, the default
TokenizerFunction is <em>char_delimiters_separator&lt;char&gt;</em> which
defaults to breaking up a string based on space and punctuation. Here is an
example using another TokenizerFunction called
<em>escaped_list_separator</em>. This TokenizerFunction parses a superset
of comma-separated value (CSV) lines. The format looks like this:</p>
<p align="left">Field 1,"putting quotes around fields, allows commas",Field
3</p>
<p align="left">Below is an example that will break the previous line into
its three fields.</p>
<div align="left">
<pre>
// simple_example_2.cpp
#include&lt;iostream&gt;
#include&lt;boost/tokenizer.hpp&gt;
#include&lt;string&gt;
int main(){
using namespace std;
using namespace boost;
string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3";
tokenizer&lt;escaped_list_separator&lt;char&gt; &gt; tok(s);
for(tokenizer&lt;escaped_list_separator&lt;char&gt; &gt;::iterator beg=tok.begin(); beg!=tok.end();++beg){
cout &lt;&lt; *beg &lt;&lt; "\n";
}
}
</pre>
</div>
<p align="left">Finally, for some TokenizerFunctions you have to pass
something into the constructor in order to do anything interesting. An
example is the offset_separator. This class breaks a string into tokens based
on offsets. For example, when <em>12252001</em> is parsed using offsets of
2,2,4 it becomes <em>12 25 2001</em>. Below is the code used.</p>
<div align="left">
<pre>
// simple_example_3.cpp
#include&lt;iostream&gt;
#include&lt;boost/tokenizer.hpp&gt;
#include&lt;string&gt;
int main(){
using namespace std;
using namespace boost;
string s = "12252001";
int offsets[] = {2,2,4};
offset_separator f(offsets, offsets+3);
tokenizer&lt;offset_separator&gt; tok(s,f);
for(tokenizer&lt;offset_separator&gt;::iterator beg=tok.begin(); beg!=tok.end();++beg){
cout &lt;&lt; *beg &lt;&lt; "\n";
}
}
</pre>
</div>
<p align="left">&nbsp;</p>
<hr>
<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
"../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
height="31" width="88"></a></p>
<p>Revised
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B %Y" startspan -->9 June 2010<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
<p><i>Copyright &copy; 2001 John R. Bandela</i></p>
<p><i>Distributed under the Boost Software License, Version 1.0. (See
accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
copy at <a href=
"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
</body>
</html>

View File

@@ -0,0 +1,131 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>Boost Offset Separator</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
"#FF0000">
<p><img src="../../../boost.png" alt="C++ Boost" width="277" height=
"86"><br></p>
<h1 align="center">Offset Separator</h1>
<pre>
class offset_separator
</pre>
<p>The <tt>offset_separator</tt> class is an implementation of the <a href=
"tokenizerfunction.htm">TokenizerFunction</a> concept that can be used with
the <a href="tokenizer.htm">tokenizer</a> class to break text up into
tokens. The <tt>offset_separator</tt> breaks a sequence of <tt>Char</tt>'s
into strings based on a sequence of offsets. For example, if you had the
string "12252001" and offsets (2,2,4) it would break the string into 12 25
2001. Here is an example.</p>
<h2>Example</h2>
<pre>
// simple_example_3.cpp
#include&lt;iostream&gt;
#include&lt;boost/tokenizer.hpp&gt;
#include&lt;string&gt;
int main(){
using namespace std;
using namespace boost;
string s = "12252001";
int offsets[] = {2,2,4};
offset_separator f(offsets, offsets+3);
tokenizer&lt;offset_separator&gt; tok(s,f);
for(tokenizer&lt;offset_separator&gt;::iterator beg=tok.begin(); beg!=tok.end();++beg){
cout &lt;&lt; *beg &lt;&lt; "\n";
}
}
</pre>
<p>&nbsp;</p>
<h2>Construction and Usage</h2>
<p>The offset_separator has 1 constructor of interest. (The default
constructor is just there to make some compilers happy). The declaration is
below</p>
<pre>
template&lt;typename Iter&gt;
offset_separator(Iter begin,Iter end,bool bwrapoffsets = true, bool breturnpartiallast = true)
</pre>
<table border="1" summary="">
<tr>
<td>
<p align="center"><strong>Parameter</strong></p>
</td>
<td>
<p align="center"><strong>Description</strong></p>
</td>
</tr>
<tr>
<td>begin, end</td>
<td>Specify the sequence of integer offsets.</td>
</tr>
<tr>
<td>bwrapoffsets</td>
<td>Tells whether to wrap around to the beginning of the offsets when
the all the offsets have been used. For example the string
"1225200101012002" with offsets (2,2,4) with bwrapoffsets to true,
would parse to 12 25 2001 01 01 2002. With bwrapoffsets to false, it
would parse to 12 25 2001 and then stop because all the offsets have
been used.</td>
</tr>
<tr>
<td>breturnpartiallast</td>
<td>Tells whether, when the parsed sequence terminates before yielding
the number of characters in the current offset, to create a token with
what was parsed, or to ignore it. For example the string "122501" with
offsets (2,2,4) with breturnpartiallast set to true will parse to 12 25
01. With it set to false, it will parse to 12 25 and then will stop
because there are only 2 characters left in the sequence instead of the
4 that should have been there.</td>
</tr>
</table>
<p>To use this class, pass an object of it anywhere a TokenizerFunction is
required. If you default constructruct the object, it will just return
every character in the parsed sequence as a token. (ie it defaults to an
offset of 1, and bwrapoffsets is true).</p>
<p>&nbsp;</p>
<h2>Model of</h2>
<p><a href="tokenizerfunction.htm">TokenizerFunction</a></p>
<hr>
<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
"../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
height="31" width="88"></a></p>
<p>Revised
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25
December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
<p><i>Copyright &copy; 2001 John R. Bandela</i></p>
<p><i>Distributed under the Boost Software License, Version 1.0. (See
accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
copy at <a href=
"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
</body>
</html>

View File

@@ -0,0 +1,171 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>Boost Token Iterator</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
"#FF0000">
<p><img src="../../../boost.png" alt="C++ Boost" width="277" height=
"86"><br></p>
<h1 align="center">Token Iterator</h1>
<pre>
template &lt;
class TokenizerFunc = char_delimiters_separator&lt;char&gt;,
class Iterator = std::string::const_iterator,
class Type = std::string
&gt;
class token_iterator_generator
</pre>
<pre>
template&lt;class Type, class Iterator, class TokenizerFunc&gt;
typename token_iterator_generator&lt;TokenizerFunc,Iterator,Type&gt;::type
make_token_iterator(Iterator begin, Iterator end,const TokenizerFunc&amp; fun)
</pre>
<p>The token iterator serves to provide an iterator view of the tokens in a
parsed sequence.</p>
<h2>Example</h2>
<pre>
/// simple_example_5.cpp
#include&lt;iostream&gt;
#include&lt;boost/token_iterator.hpp&gt;
#include&lt;string&gt;
int main(){
using namespace std;
using namespace boost;
string s = "12252001";
int offsets[] = {2,2,4};
offset_separator f(offsets, offsets+3);
typedef token_iterator_generator&lt;offset_separator&gt;::type Iter;
Iter beg = make_token_iterator&lt;string&gt;(s.begin(),s.end(),f);
Iter end = make_token_iterator&lt;string&gt;(s.end(),s.end(),f);
// The above statement could also have been what is below
// Iter end;
for(;beg!=end;++beg){
cout &lt;&lt; *beg &lt;&lt; "\n";
}
}
</pre>
<p>&nbsp;</p>
<h3>Template Parameters</h3>
<table border="1" summary="">
<tr>
<th>Parameter</th>
<th>Description</th>
</tr>
<tr>
<td><tt>TokenizerFunc</tt></td>
<td>The TokenizerFunction used to parse the sequence.</td>
</tr>
<tr>
<td><tt>Iterator</tt></td>
<td>The type of the iterator the specifies the sequence.</td>
</tr>
<tr>
<td><tt>Type</tt></td>
<td>The type of the token, typically string.</td>
</tr>
</table>
<h2>Model of</h2>
<p>The category of Iterator, up to and including Forward Iterator. Anything
higher will get scaled down to Forward Iterator.</p>
<h2>Related Types</h2>
<table border="1" summary="">
<tr>
<td>
<p align="center"><strong>Type</strong></p>
</td>
<td>
<p align="center"><strong>Remarks</strong></p>
</td>
</tr>
<tr>
<td>token_iterator_generator::type</td>
<td>The type of the token iterator.</td>
</tr>
</table>
<h2>Creation</h2>
<pre>
template&lt;class Type, class Iterator, class TokenizerFunc&gt;
typename token_iterator_generator&lt;TokenizerFunc,Iterator,Type&gt;::type
make_token_iterator(Iterator begin, Iterator end,const TokenizerFunc&amp; fun)
</pre>
<table border="1" summary="">
<tr>
<td>
<p align="center"><strong>Parameter</strong></p>
</td>
<td>
<p align="center"><strong>Description</strong></p>
</td>
</tr>
<tr>
<td>begin</td>
<td>The beginning of the sequence to be parsed.</td>
</tr>
<tr>
<td>end</td>
<td>Past the end of the sequence to be parsed.</td>
</tr>
<tr>
<td>fun</td>
<td>A functor that is a model of TokenizerFunction</td>
</tr>
</table>
<p>&nbsp;</p>
<hr>
<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
"../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
height="31" width="88"></a></p>
<p>Revised
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25
December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
<p><i>Copyright &copy; 2001 John R. Bandela</i></p>
<p><i>Distributed under the Boost Software License, Version 1.0. (See
accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
copy at <a href=
"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
</body>
</html>

View File

@@ -0,0 +1,244 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>Boost Tokenizer Class</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
"#FF0000">
<p><img src="../../../boost.png" alt="C++ Boost" width="277" height=
"86"><br></p>
<h1 align="center">Tokenizer Class</h1>
<pre> template &lt;
class TokenizerFunc = char_delimiters_separator&lt;char&gt;,
class Iterator = std::string::const_iterator,
class Type = std::string
&gt;
class tokenizer
</pre>
<p>The tokenizer class provides a container view of a series of tokens
contained in a sequence. You set the sequence to parse and the
TokenizerFunction to use to parse the sequence either upon construction or
using the assign member function. Note: No parsing is actually done upon
construction. Parsing is done on demand as the tokens are accessed via the
iterator provided by begin.</p>
<h2>Example</h2>
<pre>// simple_example_1.cpp
#include&lt;iostream&gt;
#include&lt;boost/tokenizer.hpp&gt;
#include&lt;string&gt;
int main(){
using namespace std;
using namespace boost;
string s = "This is, a test";
tokenizer&lt;&gt; tok(s);
for(tokenizer&lt;&gt;::iterator beg=tok.begin(); beg!=tok.end();++beg){
cout &lt;&lt; *beg &lt;&lt; "\n";
}
}
</pre>
<p>The output from simple_example_1 is:</p>
<blockquote>
<p><code>This<br>
is<br>
a<br>
test</code></p>
</blockquote>
<h3>Template Parameters</h3>
<table border="1" summary="">
<tr>
<th>Parameter</th>
<th>Description</th>
</tr>
<tr>
<td><tt>TokenizerFunc</tt></td>
<td>The TokenizerFunction used to parse the sequence.</td>
</tr>
<tr>
<td><tt>Iterator</tt></td>
<td>The type of the iterator the specifies the sequence.</td>
</tr>
<tr>
<td><tt>Type</tt></td>
<td>The type of the token, typically string.</td>
</tr>
</table>
<p>&nbsp;</p>
<h2>Related Types</h2>
<table border="1" summary="">
<tr>
<td>
<p align="center"><strong>Type</strong></p>
</td>
<td>
<p align="center"><strong>Remarks</strong></p>
</td>
</tr>
<tr>
<td>iterator</td>
<td>The type returned by begin and end. Note: the category of iterator
will be at most ForwardIterator. It will be InputIterator if the
Iterator template parameter is an InputIterator. For any other
category, it will be ForwardIterator.</td>
</tr>
<tr>
<td>const_iterator</td>
<td>Same type as iterator.</td>
</tr>
<tr>
<td>value_type</td>
<td>Same type as the template parameter Type</td>
</tr>
<tr>
<td>reference</td>
<td>Same type as value_type&amp;</td>
</tr>
<tr>
<td>const_reference</td>
<td>Same type as const reference</td>
</tr>
<tr>
<td>pointer</td>
<td>Same type as value_type*</td>
</tr>
<tr>
<td>const_pointer</td>
<td>Same type as const pointer</td>
</tr>
<tr>
<td>size_type</td>
<td>void</td>
</tr>
<tr>
<td>difference_type</td>
<td>void</td>
</tr>
</table>
<p>&nbsp;</p>
<h2>Construction and Member Functions</h2>
<pre>tokenizer(Iterator first, Iterator last,const TokenizerFunc&amp; f = TokenizerFunc())
template&lt;class Container&gt;
tokenizer(const Container&amp; c,const TokenizerFunc&amp; f = TokenizerFunc())
void assign(Iterator first, Iterator last)
void assign(Iterator first, Iterator last, const TokenizerFunc&amp; f)
template&lt;class Container&gt;
void assign(const Container&amp; c)
template&lt;class Container&gt;
void assign(const Container&amp; c, const TokenizerFunc&amp; f)
iterator begin() const
iterator end() const
</pre>
<table border="1" summary="">
<tr>
<td>
<p align="center"><strong>Parameter</strong></p>
</td>
<td>
<p align="center"><strong>Description</strong></p>
</td>
</tr>
<tr>
<td>c</td>
<td>A container that contains the sequence to parse. Note: c.begin()
and c.end() must be convertible to the template parameter
Iterator.</td>
</tr>
<tr>
<td>f</td>
<td>A functor that is a model of TokenizerFunction that will be used to
parse the sequence.</td>
</tr>
<tr>
<td>first</td>
<td>The iterator that represents the beginning position in the sequence
to be parsed.</td>
</tr>
<tr>
<td>last</td>
<td>The iterator that represents the past the end position in the
sequence to be parsed.</td>
</tr>
</table>
<p>&nbsp;</p>
<hr>
<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
"../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
height="31" width="88"></a></p>
<p>Revised
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->16 February, 2008<!--webbot bot="Timestamp" endspan i-checksum="40414" --></p>
<p><i>Copyright &copy; 2001 John R. Bandela</i></p>
<p><i>Distributed under the Boost Software License, Version 1.0. (See
accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
copy at <a href=
"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
</body>
</html>

View File

@@ -0,0 +1,182 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>TokenizerFunction Concept</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
"#FF0000">
<p><img src="../../../boost.png" alt="C++ Boost" width="277" height="86"></p>
<h1 align="center">TokenizerFunction Concept</h1>
<p>A TokenizerFunction is a functor whose purpose is to parse a given
sequence until exactly 1 token has been found or the end is reached. It
then updates the token, and informs the caller of the location in the
sequence of the next element immediately after the last element of the
sequence that was parsed for the current token.</p>
<h2>Refinement of</h2>
<p>Assignable, CopyConstructable</p>
<h2>Notation</h2>
<table border="1" summary="">
<tr>
<td valign="top"><tt>X</tt></td>
<td valign="top">A type that is a model of TokenizerFunction</td>
</tr>
<tr>
<td valign="top"><tt>func</tt></td>
<td valign="top">Object of type <tt>X</tt></td>
</tr>
<tr>
<td valign="top"><tt>tok</tt></td>
<td valign="top">Object of Token</td>
</tr>
<tr>
<td>next</td>
<td>iterator that points to the first unparsed element of the sequence
being parsed</td>
</tr>
<tr>
<td>end</td>
<td>iterator that points to the past the end of the sequence being
parsed</td>
</tr>
</table>
<h2>Definitions</h2>
<p>A token is the result of parsing a sequence.</p>
<h2>Valid expressions</h2>
<p>In addition to the expression in Assignable and CopyConstructable the
following expressions are valid</p>
<table border="1" summary="">
<tr>
<th>Name</th>
<th>Expression</th>
<th>Return type</th>
</tr>
<tr>
<td valign="top">Functor</td>
<td valign="top"><tt>func(next, end, tok)</tt></td>
<td valign="top"><tt>bool</tt></td>
</tr>
<tr>
<td valign="top">reset</td>
<td valign="top"><tt>reset()</tt></td>
<td valign="top"><tt>void</tt></td>
</tr>
</table>
<h2>Expression semantics</h2>
<p>In addition to the expression semantics in Assignable and
CopyConstructable, TokenizerFunction has the following expression
semantcs</p>
<table border="1" summary="">
<tr>
<th>Name</th>
<th>Expression</th>
<th>Precondition</th>
<th>Semantics</th>
<th>Postcondition</th>
</tr>
<tr>
<td>operator()</td>
<td><tt>func(next, end, tok)</tt></td>
<td><tt>next</tt> and <tt>end</tt> are valid iterators to the same
sequence. next is a reference the function is free to modify. tok is
constructed.</td>
<td>The return value indicates whether a new token was found in the
sequence [next,end)</td>
<td>If the return value is true, the new token is assigned to tok. next
is always updated to the position where parsing should start on the
subsequent call.</td>
</tr>
<tr>
<td>reset</td>
<td><tt>reset()</tt></td>
<td><tt>None</tt></td>
<td>Clears out all state variables that are used by the object in
parsing the current sequence.</td>
<td>A new sequence to parse can be given.</td>
</tr>
</table>
<h2>Complexity guarantees</h2>
<p>No guarantees. Models of TokenizerFunction are free to define their own
complexity</p>
<h2>Models</h2>
<p><a href="escaped_list_separator.htm">escaped_list_separator</a></p>
<p><a href="offset_separator.htm">offset_separator</a></p>
<p><a href=
"char_delimiters_separator.htm">char_delimiters_separator</a></p>
<p>&nbsp;</p>
<hr>
<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
"../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
height="31" width="88"></a></p>
<p>Revised
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25
December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
<p><i>Copyright &copy; 2001 John R. Bandela</i></p>
<p><i>Distributed under the Boost Software License, Version 1.0. (See
accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
copy at <a href=
"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
</body>
</html>