GParse/readme.md
2025-03-21 00:45:01 -06:00

178 lines
6.6 KiB
Markdown

# GParse ![(Beta)](https://img.shields.io/badge/BETA-yellow?style=plastic)
A library of useful delimited-text parsers with a common interface.
## Description
All parsers in this library implement the ITextParser interface with a single method:
```csharp
public IEnumerable<string> Parse(...);
```
To support deferred execution or more dynamic input, a text provider interface can be used instead of a string for the input.
```csharp
public interface ITextInputProvider
{
public string GetText();
}
```
The library contains these parsers:
|Class Name| |
|----:|:----|
|SplitParser |A simple wrapper around the .NET string.Split() method. |
|DelimitedTextParser |A custom implementation that looks for delimiters of any size. May be modified in the future to accept multiple delimiters. |
|QuoteAwareParser | A delimited text parser which knows to ignore instances of the delimiter when it is found within quotes. Useful for space-delimited files where the fields are human-readable text and may contain spaces, for example. |
## Usage Instructions
- Create an instance of your chosen parser.
- Then call the Parse() method with your delimited input.
- Execution is deferred. Iterate the collection to retrieve the values.
See the details for each class below for more information and examples of usage.
## ITextParser Interface
The `ITextParser` interface has only one method, `Parse()`. There are overloads to take two types of input, a `string` or an `ITextInputProvider`.
## ITextInputProvider Interface
The `ITextInputProvider` interface can be used as a sort of string factory. It can defer the collection of the input string until the moment that it is needed or it can be associated with a function instead of a literal value so that it can be parameterized.
```csharp
public interface ITextInputProvider
{
public string GetText();
}
```
## AnonDelimitedTextInputProvider Class
For convenience, the `AnonDelimitedTextInputProvider` class has been included to provide a universal implementation of `ITextInputProvider`. Its `GetText()` implementation is provided as a function through its constructor.
_Example:_
```csharp
ITextInputProvider provider = new AnonDelimitedTextInputProvider(
static () => Console.ReadLine());
Console.WriteLine(provider.GetText());
```
## SplitParser
The `SplitParser` class uses .NET's `string.Split()` method under the hood. This is a wrapper around it to bind it to the ITextParser interface.
_Example:_
```csharp
ITextParser parser = new SplitParser("|");
IEnumerable<string> oneTwoThree = parser.Parse("1|2|3");
```
## DelimitedTextParser
The `DelimitedTextParser` class is a custom replacement for .NET's `string.Split()` method. This is useful because it may provide more features in the future.
_Example:_
```csharp
ITextParser parser = new DelimitedTextParser("|");
IEnumerable<string> oneTwoThree = parser.Parse("1|2|3");
```
## QuoteAwareParser
The `QuoteAwareParser` class will ignore delimiters that it finds within quotes. This is useful for inputs which may be space-delimited but where the tokens are in human language and likely contain spaces, for example. The constructor accepts parameters for the openQuote and closeQuote, so the quotes need not be actual quotation charactes. They can be any string.
If the parsed text has an open quote without a corresponding closing quote, a `ParseException` is thrown.
_Example:_
```csharp
ITextParser parser = new QuoteAwareParser(" ", "{", "}");
List<string> containsSpacesText = parser
.Parse("{This contains spaces} {and so does this}")
.ToList();
Console.WriteLine(containsSpacesText[0]);
Console.WriteLine(containsSpacesText[1]);
// Output is:
//{This contains spaces}
//{and so does this}
```
Note in the example above that the `{` and `}` characters are not removed from the tokens during parsing. The quotation characters are maintained. It is up to the caller to remove them if that is what's desired. To facilitate this, see the `string.Unquote()` extension method.
## string.Unquote() Extension Method
The `string.Unquote()` extension method is provided to work with the remaining quotes which are kept by the QuoteAwareParser during its Parse() operation. It's easy enough to use. Just call it on the token string and pass the open and closing quotation strings. Here is the `QuoteAwareParser` example revised to use it after parsing.
```csharp
ITextParser parser = new QuoteAwareParser(" ", "{", "}");
List<string> containsSpacesText = parser
.Parse("{This contains spaces} {and so does this}")
.Select(static s => s.Unquote("{", "}"))
.ToList();
Console.WriteLine(containsSpacesText[0]);
Console.WriteLine(containsSpacesText[1]);
// Output is:
//This contains spaces
//and so does this
```
## Roadmap
### AutoParser
If it is determined useful and feasible, create a parser factory which is given an input sample for it to determine which parser should be used and its parameters. For example, if it contains spaces and even number of single or double quotes, it must be a quote-aware parser and the delimiter is what appears between the quotes. Data without quotes can determine the delimiter if there is only one non-alphanumeric non-whitespace character in the sample. If true, this is a DelimitedText parser.
### DelimitedTextParser
Future features for the `DelimitedTextParser` class include:
- Multiple delimiters
- Case-insensitive delimiters
- Convert from SafeSubstring to use `Span<char>` and read it one character at a time for performance.
### QuoteAwareParser
Future features for the `QuoteAwareParser` class include:
- Multiple delimiters
- Case-insensitive delimiters
- Convert from SafeSubstring to use `Span<char>` and read it one character at a time for performance.
### Unparsers
Unparsers will reverse the `IEnumerable<string>` into a single concatenated string. This may be moot in light of what Linq can do, but we'll see if it's more readable or more usable.
Unparsers can:
- Concatenate a list of tokens, separated by a delimiter.
- Conditional delimiters (omit some delimiters based on predicates)
- Conditional tokens (omit some tokens based on predicates)
- Token transforms / projections (surround a token with brackets, etc.)
- Calculated delimiters (e.g., "1:A", "2:B", "3:C") etc.
- Overall prefix (e.g. "MyPrefix 1,2,3")
- Overall suffix (e.g. "1,2,3 MySuffix")
- Align tokens by space-padding the fields.
## More Examples
The GParse library is fully unit-tested. You can find examples of use in the unit tests.
## Source Code
You can find the source code online at my Git server.
`https://git.pillidar.com/PillidarPublic/GParse`
## Issues
No known issues.
## Notes
Notes go here.