Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New code for detecting what an ASCII column contains #8050

Merged
merged 6 commits into from
Nov 19, 2023

Conversation

PaulWessel
Copy link
Member

Description of proposed changes

This PR adds a set of new functions dedicated to determine what type of data they are given. These will eventually replace what happens when we examine the first input record to detect things like absolute time and geographic coordinates. I have split them up so we add a bunch of local functions in gmt_io.c:

GMT_LOCAL bool gmtio_is_integer (char *string)
GMT_LOCAL unsigned int gmtio_count_text (char *string)
GMT_LOCAL unsigned int gmtio_count_digits (char *string)
GMT_LOCAL unsigned int gmtio_count_special (struct GMT_CTRL *GMT, char *string) {
GMT_LOCAL bool gmtio_is_valid_integer (char *string, int max)
GMT_LOCAL bool gmtio_bad_char_in_float (char c)
GMT_LOCAL bool gmtio_is_float (struct GMT_CTRL *GMT, char *string, bool allow_exp, double max)
GMT_LOCAL unsigned int gmtio_is_dimension (struct GMT_CTRL *GMT, char *text)
GMT_LOCAL unsigned int gmtio_is_coordinate (struct GMT_CTRL *GMT, unsigned int type, char *text)
GMT_LOCAL unsigned int gmtio_is_time (struct GMT_CTRL *GMT, char *text)
GMT_LOCAL unsigned int gmtio_is_string (struct GMT_CTRL *GMT, char *string)
GMT_LOCAL unsigned int gmtio_determine_datatype (struct GMT_CTRL *GMT, char *text)
GMT_LOCAL char *gmtio_type_name (unsigned int kind)

all called directly or via the others from a library function:

void gmtlib_string_parser (struct GMT_CTRL *GMT, char *file)

At the moment it is only used as a guru test via gmtconvert -/, a hidden option. So this PR has zero effect on the god and all other tests pass since not checked by this function (yet). I added a new test (test/gmtconvert/testparser.sh) that reads a bunch of strings from the strings.txt file and tries to determine the type of data column (LON, LAT, GEO, TIME, STRONG, DIMENSION, GEODIMENSION, DURATION). It currently passes. We need to add things to strings.txt to see if we can break the parser. Once robust I will replace the current fickle checks with these solid checks. For now I just want to merge this into master so more people can be asked to participate in trying to break the checks.

The current strings.txt file looks like this, with string on the left and the answer on the right after the vertical bar.

# Test for the STRING detector:
Myfiles			|STRING
HO88&3#			|STRING
565.565.7654		|STRING
<myvariable>$$		|STRING
http://A44B44		|STRING
Testtring		|STRING
098sdf098r		|STRING
nbnidf			|STRING
44:55:61		|STRING
55.cip			|STRING
55.dms			|STRING
-44.8cm			|STRING
# Test for the FLOAT detector:
-33.56			|FLOAT
-0.8e+04		|FLOAT
-e+05			|FLOAT
+15			|FLOAT
# Test for the DIMENSION detector:
3.55p			|DIMENSION
-13.55c			|DIMENSION
443.55i			|DIMENSION
55.c			|DIMENSION
# Test for the GEODIMENSION detector:
3.55k			|GEODIMENSION
-13.55u			|GEODIMENSION
443.55n			|GEODIMENSION
55.M			|GEODIMENSION
55.d			|GEODIMENSION
# Test for the ABSTIME detector:
2000-12-04T		|ABSTIME
2000-12-04T21:09:15.5	|ABSTIME
1998-09-03		|ABSTIME
1998-OCT-01T12		|ABSTIME
12/16/2022T		|ABSTIME
7654/11/31T22:23:24.07	|ABSTIME
# Test for the LONGITUDE detector:
-33:45.8		|GEOGRAPHIC
-355:16:44		|LONGITUDE
-355:16:44.45454	|LONGITUDE
33:45.8W		|LONGITUDE
355:16:44E		|LONGITUDE
355:16:44.45454W	|LONGITUDE
# Test for the LATITUDE detector:
33:45.8S		|LATITUDE
55:16:44N		|LATITUDE
55:16:44.45454S		|LATITUDE
# Test for the GEOGRAPHIC detector:
-35:16:44.45454		|GEOGRAPHIC
-33:45.8		|GEOGRAPHIC
75:45:55		|GEOGRAPHIC

Much more exhaustive search for what input strings might be: geographical, time, dimensions, or just strings.
Also add new test
@PaulWessel PaulWessel added the maintenance Boring but important stuff for the core devs label Nov 19, 2023
@PaulWessel PaulWessel requested a review from joa-quim November 19, 2023 16:54
@PaulWessel PaulWessel self-assigned this Nov 19, 2023
Copy link
Member

@joa-quim joa-quim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might have a use to gmtio_is_coordinate(), gmtio_is_time(), gmtio_is_string() and even gmtio_determine_datatype() in Julia, so don't make those GMT_LOCAL.

@PaulWessel
Copy link
Member Author

I think I might have a use to gmtio_is_coordinate(), gmtio_is_time(), gmtio_is_string() and even gmtio_determine_datatype() in Julia, so don't make those GMT_LOCAL.

OK, done (gmtio* became gmtlib* and the GMT_LOCALs are gone for the ones you mentioned. I think the rest are OK as static. If you change your mind on that just do a PR and but the prototypes in gmt_internals.h.

@PaulWessel PaulWessel merged commit a8f05c9 into master Nov 19, 2023
@PaulWessel PaulWessel deleted the new-input-checks branch November 19, 2023 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants