Possibility to exclude certain strings from parsing #11

ges1227 · 2017-10-19T14:05:37Z

Hey @simonpoole,
I tried to parse some of the following examples:

daily 05.00 am - 09.00 pm
and also
06.00 a.m. - 07.00 p.m..

Unfortunately, both didn't pass the non-strict mode of the parser, due to the fragments 'daily' and 'a.m.'. Is a future implementation planned?

In the meantime, would it be possible to provide an additional functionality to help us out? Perhaps one, in which the user is able to exclude certain strings like 'a.m.', 'daily' and others by defining them in advance?

The text was updated successfully, but these errors were encountered:

simonpoole · 2017-10-19T15:10:45Z

In general there are two ways this could be done:

restarting parsing after it fails for unknown tokens (however this wouldn't result in a valid spec in many cases)
skipping certain predefined strings in lexical analysis, disadvantage: they have to be predefined so would need to be fairly common for this to make sense

As to "a.m." - "p.m." the same goes as above, there needs to a non-neligble amount of use for adding these to make sense, we already have a bad case of diminishing returns with a lot of the special cases we are handling in non-strict mode.

ypid · 2017-10-19T17:39:41Z

skipping certain predefined strings in lexical analysis, disadvantage: they have to be predefined so would need to be fairly common for this to make sense

Maybe that helps: https://github.com/opening-hours/opening_hours.js/blob/master/locales/word_error_correction.yaml

simonpoole · 2017-10-22T13:55:33Z

@ges1227 I've done some work on this by skipping such token in lexical analysis. This works well from a pure functional pov, unluckily it makes implementation of strict/non-strict modes of the parser rather messy and forces us to return a JAVA Error instead of an Exception if we detect such a token in strict mode, which will cause validators to moan endlessly (this is due to an architectural wart of javacc), So I'm not quite sure if the code should really be included. Need to think about it a bit.

ges1227 · 2017-10-26T12:21:14Z

@simonpoole, @ypid Thanks for your support, it really brought me further!
So I have been experimenting with the YAML file and achieved a rather satisfing solution. Basically my input strings for the OpeningHoursParser will be filtered by replacing foul words according to the definitions in the YAML file (little example attached).

Therefore no worries about a messy parser anymore, your tips helped me to manage the problem.
As a sidenote, the code is far from perfect.. maybe two YAML files (one for regex, one for 'normal' words) are more helpful to distinguish, whether a regex or 'normal' replacement should be applied onto the string.

simonpoole · 2017-10-26T21:01:05Z

a.m. and p.m. supported via 0acfaa6

simonpoole added the enhancement label Oct 19, 2017

simonpoole mentioned this issue Oct 27, 2017

Remove strings before auto correction simonpoole/OpeningHoursFragment#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility to exclude certain strings from parsing #11

Possibility to exclude certain strings from parsing #11

ges1227 commented Oct 19, 2017

simonpoole commented Oct 19, 2017 •

edited

Loading

ypid commented Oct 19, 2017

simonpoole commented Oct 22, 2017

ges1227 commented Oct 26, 2017

simonpoole commented Oct 26, 2017

Possibility to exclude certain strings from parsing #11

Possibility to exclude certain strings from parsing #11

Comments

ges1227 commented Oct 19, 2017

simonpoole commented Oct 19, 2017 • edited Loading

ypid commented Oct 19, 2017

simonpoole commented Oct 22, 2017

ges1227 commented Oct 26, 2017

simonpoole commented Oct 26, 2017

simonpoole commented Oct 19, 2017 •

edited

Loading