Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kart truncates trailing zeroes in values of type numeric #863

Closed
craigds opened this issue Jun 9, 2023 · 10 comments · Fixed by #867
Closed

Kart truncates trailing zeroes in values of type numeric #863

craigds opened this issue Jun 9, 2023 · 10 comments · Fixed by #867
Labels
bug Something isn't working

Comments

@craigds
Copy link
Member

craigds commented Jun 9, 2023

Describe the bug
Importing from a database table where a field has NUMERIC type causes any trailing zeroes in the PK to be stripped.

To Reproduce

  1. init a postgres db with this:
begin;

create table importsource (
    mypk numeric PRIMARY KEY,
    otherfield numeric NULL
);
INSERT INTO importsource (mypk, otherfield) VALUES ('13390', '13390');
INSERT INTO importsource (mypk, otherfield) VALUES ('200000', '200000');
INSERT INTO importsource (mypk, otherfield) VALUES ('12345', '12345');
INSERT INTO importsource (mypk, otherfield) VALUES ('8888', '8888');
INSERT INTO importsource (mypk, otherfield) VALUES ('0', '0');
INSERT INTO importsource (mypk, otherfield) VALUES ('9999', '9999');

commit;
  1. import the table into a Kart repo
kart init --bare xyz
kart import postgresql://postgres:@localhost:15432/postgres importsource:dataset --message='initial' --no-checkout --primary-key=mypk

Expected behaviour

the values should be unchanged

Output

commit ed000ef99004290ba89c878ef649add804b8a076
Author: Craig de Stigter <[email protected]>
Date:   Fri Jun  9 13:07:09 2023 +1200

    initial

+++ dataset:meta:schema.json
+ [
+   {
+     "id": "4f5068af-f107-7712-9a5e-c762426bdb41",
+     "name": "mypk",
+     "dataType": "numeric",
+     "primaryKeyIndex": 0
+   },
+   {
+     "id": "b9dfa572-020a-1f24-d7e4-fb0db17f668e",
+     "name": "otherfield",
+     "dataType": "numeric"
+   }
+ ]
+++ dataset:feature:
+                                     mypk =
+                               otherfield =
+++ dataset:feature:12345
+                                     mypk = 12345
+                               otherfield = 12345
+++ dataset:feature:1339
+                                     mypk = 1339
+                               otherfield = 1339
+++ dataset:feature:2
+                                     mypk = 2
+                               otherfield = 2
+++ dataset:feature:8888
+                                     mypk = 8888
+                               otherfield = 8888
+++ dataset:feature:9999
+                                     mypk = 9999
+                               otherfield = 9999

Values 13390, 200000, and 0 all had their trailing zeroes stripped.

This isn't just a diff issue - the PKs were mangled during the import.

As far as I got with debugging was that it happens inside SQLalchemy (here)

**Version Info **

  • OS: macos and linux
  • Version: kart master (99e48a2)
@craigds craigds added the bug Something isn't working label Jun 9, 2023
@craigds craigds changed the title kart import mangles PK fields with type numeric kart import truncates trailing zeroes in values of type numeric Jun 9, 2023
@craigds
Copy link
Member Author

craigds commented Jun 9, 2023

💥 found it

str(value).rstrip("0").rstrip(".")

@olsen232
Copy link
Collaborator

olsen232 commented Jun 9, 2023

Did a quick test - this affects reading of PG numeric types in general, not just at import time (ie, also at diff / commit time).

@craigds
Copy link
Member Author

craigds commented Jun 9, 2023

Actually I suspect we shouldn't be removing trailing zeroes at all - the idea of numeric is that it's fixed-precision, but removing trailing zeroes even from the right of the decimal point lowers the precision. Any idea why that's there?

@olsen232
Copy link
Collaborator

olsen232 commented Jun 9, 2023

Probably something do with fixing this roundtrip so that nothing changes:
"5" (how we store decimals in Kart / msgpack) -> PG-numeric(5) -> python-decimal(5) -> str -> "5.0" -> "5"
It's pretty blockheaded 😢

@rcoup
Copy link
Member

rcoup commented Jun 9, 2023

Actually I suspect we shouldn't be removing trailing zeroes at all - the idea of numeric is that it's fixed-precision, but removing trailing zeroes even from the right of the decimal point lowers the precision.

Does it?

From the PG docs:

The precision of a numeric is the total count of significant digits in the whole number, that is, the number of digits to both sides of the decimal point. The scale of a numeric is the count of decimal digits in the fractional part, to the right of the decimal point. So the number 23.5141 has a precision of 6 and a scale of 4. Integers can be considered to have a scale of zero.

Both the maximum precision and the maximum scale of a numeric column can be configured. To declare a column of type numeric use the syntax:

NUMERIC(precision, scale)

The precision must be positive, while the scale may be positive or negative (see below). Alternatively:

NUMERIC(precision)

selects a scale of 0. Specifying:

NUMERIC

without any precision or scale creates an “unconstrained numeric” column in which numeric values of any length can be stored, up to the implementation limits. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale. (The SQL standard requires a default scale of 0, i.e., coercion to integer precision. We find this a bit useless. If you're concerned about portability, always specify the precision and scale explicitly.)

The latter is PG-specific obviously.

From the above, for a column defined as NUMERIC(12, 3) then IIUC 1 == 1.000? The scale of 3 is fixed, regardless of whether it's present/populated.

@olsen232
Copy link
Collaborator

olsen232 commented Jun 9, 2023

Rob is right - removing trailing zeroes after a decimal point when we store it in the Kart model - which is when this code is fired - won't do anything to what we put in the PG database - it doesn't affect the value of the numeric, and it can't affect the precision of the numeric, which is stored separately.
The reason that we can and probably should remove trailing zeroes after a decimal point (not what this code does, but probably what I intended it to do) is that we should have a canonical form ie we should be storing either "5" or "5.0" not a mixture of both. These strings, which are stored in the kart model, are also what Kart outputs in kart diff or kart show , so it would not be ideal if Kart model would happily store either and would show you that in revision X you changed the value of some numeric field from "5" to "5.0"

@craigds
Copy link
Member Author

craigds commented Jun 9, 2023

unconstrained numeric

is what I was referring to. The precision in the value is supposed to be retained, so 1.000 != 1

@craigds
Copy link
Member Author

craigds commented Jun 9, 2023

it would not be ideal if Kart model would happily store either and would show you that in revision X you changed the value of some numeric field from "5" to "5.0"

I think it should - that's a material change to an unconstrained numeric.

For constrained numerics it matters less but the canonical form ideally would be the one with the intended number of zeroes present

@craigds
Copy link
Member Author

craigds commented Jun 9, 2023

https://github.com/koordinates/kart/blob/master/docs/pages/development/table_v3.rst#data-types doesn't make it clear whether Kart supports unconstrained numerics - maybe we should clarify that and ensure we can roundtrip various types of values sanely between PG and GPKG etc

@craigds craigds changed the title kart import truncates trailing zeroes in values of type numeric Kart truncates trailing zeroes in values of type numeric Jun 9, 2023
@olsen232
Copy link
Collaborator

Hmm you are right - probably we can add a separate "1.0" vs "1" to the Kart model, since:

  • PG has unconstrained numerics as you say
  • Kart model uses strings
  • GPKG approximates numerics as TEXT so just mimics the Kart model

This is a minor issue compared to the main truncating zeroes issue however

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants