Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load a GeoPackage dataset #13

Closed
ErickChacon opened this issue Apr 22, 2023 · 48 comments
Closed

Load a GeoPackage dataset #13

ErickChacon opened this issue Apr 22, 2023 · 48 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@ErickChacon
Copy link
Member

I have a problem loading a GeoPackage dataset (https://drive.google.com/file/d/16--kcGrC56zayrK04-Brmfo3ImOwtQ5k/view?usp=sharing). The dataset can be loaded using ArchGDAL.

import ArchGDAL as AG
using DataFrames
stations = @chain "stations.gpkg" begin
    AG.read()
    AG.getlayer(0)
    DataFrame()
end

However, I can not load it using GeoTables.

using GeoTables
@time stations = GeoTables.load("stations.gpkg")

The previous code does not finish to be executed. I am using the latest version of GeoTables.jl.

@ErickChacon ErickChacon changed the title load a GeoPackage dataset Load a GeoPackage dataset Apr 22, 2023
@juliohm juliohm added bug Something isn't working help wanted Extra attention is needed labels Apr 23, 2023
@juliohm
Copy link
Member

juliohm commented Apr 23, 2023

@ErickChacon I can reproduce the issue. Nailed it down to a specific function call.

First we load the GDAL geometries:

using GeoTables

# auxiliary packages
import ArchGDAL as AG
import Tables

# load layer as a Tables.jl table
data  = AG.read("stations.gpkg")
table = AG.getlayer(data, 0)

# get the column with geometries
cols  = Tables.columns(table)
geoms = Tables.getcolumn(cols, :geom)
2-element Vector{ArchGDAL.IGeometry{ArchGDAL.wkbPoint}}:
 Geometry: POINT (-77.55011 -11.06068)
 Geometry: POINT (-77.36839 -11.36014)

Second we try to convert to Meshes.jl geometries:

# convert GDAL geometries to Meshes.jl geometries
items = GeoTables.geom2meshes.(geoms)

but the function GeoTables.geom2meshes hangs now.

It used to work, but the GeoInterface.jl is a never ending effort to convert geometry types, so our function may be broken now. Can you please check if our implementation is up-to-date and is the most efficient?

Feel free to submit a PR if you discover what is causing the function call to hang now.

@juliohm
Copy link
Member

juliohm commented Apr 23, 2023

Also, consider updating to the latest release that I just triggered with the latest version of ArchGDAL.jl.

@ErickChacon
Copy link
Member Author

Hi @juliohm. Thank for the explanation. This is something that I need so I will take a look to geom2meshes and GeoInterface.jl. I will come back when I find the issue.

@juliohm
Copy link
Member

juliohm commented Apr 26, 2023

Thank you @ErickChacon , this issue must be specific to GDAL because the same function works with Shapefile.jl and GeoJSON.jl geometries.

@ErickChacon
Copy link
Member Author

ErickChacon commented Apr 26, 2023

HI @juliohm. Yes, this is specific of GDAL but is not clear for me why it does not work. Continuing with your example. We can easily conver the first point with:

using Meshes
GI.coordinates(geoms[1]) |> Point

However a simple function like the following does not work (it hangs):

function topoint(geom)
    coords = GI.coordinates(geom)
    Point(coords)
end
topoint(geoms[1])

If I splat the coordinates vector, it works without problem.

function topoint2(geom)
    coords = GI.coordinates(geom)
    Point(coords...)
end
topoint2(geoms[1])

I have tried other options like using the GeoInterface.convert approach, but the same problem occurs when trying to create the Point from the coordinates vector. To isolate the problem I do not load GeoTables at all. Let me know if yout have any thought about how to proceed.

@juliohm
Copy link
Member

juliohm commented Apr 26, 2023

Thank you @ErickChacon for investigating this further. Can you confirm that the other geometry types from GeoJSON.jl and Shapefile.jl work fine with your topoint function?

Our constructors for Point are defined here: https://github.com/JuliaGeometry/Meshes.jl/blob/master/src/points.jl

@juliohm
Copy link
Member

juliohm commented Apr 26, 2023

I also noticed that this behavior only happens inside a function and that doing it manually with GDAL types on the REPL works. It must be one of the GDAL pointer idiosyncrasies...

@ErickChacon
Copy link
Member Author

Exactly, it only happens inside the function. Shapefile.jl works without problems for both functions I presented before.

@juliohm
Copy link
Member

juliohm commented Apr 26, 2023

Worth checking with the maintainers of GDAL.jl and ArchGDAL.jl if there is a better method with GeoInterface.jl to make these conversions. I am not following these developments myself.

@rafaqz
Copy link

rafaqz commented Apr 27, 2023

The return value of GI.coordinates is just a Vector{Float64} like it always has been?

julia> using ArchGDAL

julia> mypoint = ArchGDAL.createpoint(1.0, 5.0)
Geometry: POINT (1 5)

julia> GeoInterface.coordinates(mypoint)
2-element Vector{Float64}:
 1.0
 5.0

GeoInterface.coordinates is a pretty stable interface, and not at all part of our ongoing compatability efforts. So this is probably a Meshes.jl problem.

We usually use GeoInterface.getpoint in an iteration to do these conversions efficiently these days, and they work pretty nicely for everything else, although of course `ArchGDAL.jl will be slower due to the API overheads.

@juliohm
Copy link
Member

juliohm commented Apr 27, 2023 via email

@rafaqz
Copy link

rafaqz commented Apr 27, 2023

julia> f(geom) = GeoInterface.coordinates(geom)
f (generic function with 1 method)

julia> f(mypoint)
2-element Vector{Float64}:
 1.0
 5.0

?

@juliohm
Copy link
Member

juliohm commented Apr 27, 2023 via email

@rafaqz
Copy link

rafaqz commented Apr 27, 2023

I defined a function and ran that

@juliohm
Copy link
Member

juliohm commented Apr 27, 2023 via email

@rafaqz
Copy link

rafaqz commented Apr 27, 2023

I mean it has to be right? Or at least some combination?

Because this works fine everywhere else.

@juliohm
Copy link
Member

juliohm commented Apr 27, 2023 via email

@ErickChacon
Copy link
Member Author

I think it might be a combination of both Meshes.jl and ArchGDAL.jl that does not work. For example, other Meshes constructors for Point work with ArchGDAL.jl like the following:

mypoint = AG.createpoint(1.0, 5.0)
function topoint(geom)
    coords = GI.coordinates(geom)
    Meshes.Point(coords...)
end
topoint(mypoint)

@juliohm
Copy link
Member

juliohm commented Apr 27, 2023 via email

@rafaqz
Copy link

rafaqz commented Apr 27, 2023

Shapefile.jl will be the same as ArchGDAL.jl.

julia> GeoInterface.coordinates(Shapefile.Point(1.0, 5.0))
2-element Vector{Float64}:
 1.0
 5.0

It cant be the return values causing this unless you are testing with another floating point type in the other packages and thats not triggering the bug.

So how are you imagining that Meshes.jl would cause a low-level GDAL error that we don't get without Meshes.jl in the same function?

What is the mechanism? memory use? For this one point that shouldn't be an issue.

(but yes there are definately more problems with using these C interfaces and its not impossible this is one of them)

@ErickChacon
Copy link
Member Author

Not sure where because the lack of error trace. I also tried using Debugger.jl, the wierd thing is that it it does not hang when calling with @enter topoint(mypoint)

But which constructor is hanging? Is it a Julia issue or a low-level issue in GDAL?

@ErickChacon
Copy link
Member Author

In particular I am exploring the following contructors https://github.com/JuliaGeometry/Meshes.jl/blob/a0487c6824d6ee9d7389edc25ae937f1e4cf26fd/src/points.jl#L53-L55.

The constructor that uses Meshes.Vec does not work with ArchGDAL defined in topoint3:

import ArchGDAL as AG
import GeoInterface as GI
using Meshes

mypoint = AG.createpoint(1.0, 5.0)

function topoint1(geom)
    coords = GI.coordinates(geom)
    Meshes.Point(coords...)
end

function topoint2(geom)
    coords = GI.coordinates(geom)
    Meshes.Point{2,Float64}(coords)
end

function topoint3(geom)
    coords = GI.coordinates(geom)
    Meshes.Point(Meshes.Vec(coords))
end

topoint1(mypoint)
topoint2(mypoint)
topoint3(mypoint)

@ErickChacon
Copy link
Member Author

Shapefile.jl will be the same as ArchGDAL.jl.

julia> GeoInterface.coordinates(Shapefile.Point(1.0, 5.0))
2-element Vector{Float64}:
 1.0
 5.0

Yes, in theory they are the same. But the previous functions work without any problem when the geom is from Shapefile.jl.

So how are you imagining that Meshes.jl would cause a low-level GDAL error that we don't get without Meshes.jl in the same function?

When it hangs, the cpu continue working like getting into a infinite bucle. Maybe it is a method table issue?

@ErickChacon
Copy link
Member Author

Reducing the problem, I can notice that it happens when calling Meshes.Vec.

import ArchGDAL as AG
import GeoInterface as GI
using Meshes

mypoint = AG.createpoint(1.0, 5.0)

function tovec(geom)
    coords = GI.coordinates(geom)
    Meshes.Vec(coords)
end

tovec(mypoint)

However, it works if I am more specific:

function tovec2(geom)
    coords = GI.coordinates(geom)
    Meshes.Vec{2,Float64}(coords)
end
tovec2(mypoint)

Might this be related with the following line https://github.com/JuliaGeometry/Meshes.jl/blob/77e90c2efaba40ef5e28a1fa058396bf95994b29/src/vectors.jl#L66, where we make a call to length?

@juliohm
Copy link
Member

juliohm commented Apr 27, 2023

Thanks for investigating this further @ErickChacon , can you explain why do you think the call to length could be problematic? Isn't the coordinates vector returned by GI.coordinates a simple Julia Vector?

@ErickChacon
Copy link
Member Author

For now, the only reason is because the only intermediate step between Vec(coords) and Vec{2,Float64}(coords) is the use of length, but I am not sure.

@rafaqz
Copy link

rafaqz commented Apr 27, 2023

Ah this might actually be a compilation problem? You are calling length on a runtime object and lifting it to the type domain.

Much better to clarify the options for the compiler

function Vec(coords::AbstractVector{T}) where {T} 
    if length(coords) == 2
        return Vec{2,T}(coords)
    else
        return Vec{3,T}(coords)
    end
    # Maybe you have more dimensions to add here...
end

Then this is a known small union optimisation problem rather than complex instability.

@juliohm
Copy link
Member

juliohm commented Apr 27, 2023

@eliascarv can you please double check this?

@rafaqz
Copy link

rafaqz commented Apr 27, 2023

My ArchGDAL PR may also reduce the weight from the GDAL side, currently getting points has too much overhead:

yeesian/ArchGDAL.jl#369

It may interact with the poor design of Vec here to cause this problem.

Additional to this fix I would call GeoInterface.is3d as far out as possible in your conversion methods, hoist the 2 or 3 to the type domain and propagate that to Vec from the top rather than checking at the bottom. This will likely be many times faster.

@ErickChacon
Copy link
Member Author

Investigating a bit further, I could see that it hangs in the following line https://github.com/JuliaGeometry/Meshes.jl/blob/cb556d34900c64b6f40ac827360039c0fb3549f3/src/vectors.jl#L60, but only when interacting with ArchGDAL. It hangs when calling NTuple{Dim,T}(coords).

@rafaqz
Copy link

rafaqz commented Apr 27, 2023

Did you make that change?

Also try my ArchGDAL PR for better point performance.

But ultimately using coordinates is not the way to do this. You are losing the compile-time known 2d/3d length and getting it back again at runtime from the lowest point level in a geometry. Its terrible for performance no matter what you do.

Iterating over GI.getgeom(geom) with GI.x and GI.y, and GI.is3d to get the D parameter is the way to do it fast.

@juliohm
Copy link
Member

juliohm commented Apr 27, 2023

It would help if this "preferred new way" of doing things with GI was implemented somewhere for reference. We could just copy/paste the code that is supposedly fast and more aligned with the latest version of the interface.

@ErickChacon
Copy link
Member Author

Did you make that change?

When I say that it hangs when calling NTuple{Dim,T}(coords), I mean without any change in Meshes.jl. But I agree with you that the main issue is by using length in the constructor make it non typestable. If I follow your suggestion, it works as long as length is not used inside the definiction of Vec. The question there would be how to generalize this constructor for higher dimensions?

Also try my ArchGDAL PR for better point performance.

But ultimately using coordinates is not the way to do this. You are losing the compile-time known 2d/3d length and getting it back again at runtime from the lowest point level in a geometry. Its terrible for performance no matter what you do.

Iterating over GI.getgeom(geom) with GI.x and GI.y, and GI.is3d to get the D parameter is the way to do it fast.

Happy to follow that approach. As @juliohm mentioned, which package would you suggest to check for the geointerface implementation? GeometryBasics.jl?

@ErickChacon
Copy link
Member Author

I can confirm that using the following constructor solves the problem and passes all test. @juliohm would you like a pull request on Meshes.jl?

function Vec(coords::AbstractVector{T}) where {T}
    n = length(coords)
    if n == 1
        Vec{1,T}(coords)
    elseif n == 2
        Vec{2,T}(coords)
    elseif n == 3
        Vec{3,T}(coords)
    else
        throw(ErrorException("not implemented"))
    end
end

Of course, it would still be great to update the geointerface for Meshes.jl depending of the package @rafaqz suggest to look at.

@juliohm
Copy link
Member

juliohm commented Apr 28, 2023

@ErickChacon a PR request would be very welcome. The code you shared seems fine, I think we can assume that Meshes.jl only supports up to 3 dimensions currently.

@rafaqz
Copy link

rafaqz commented Apr 28, 2023

Maybe LibGEOS.jl is the best to compare to?

https://github.com/JuliaGeo/LibGEOS.jl/blob/master/src/geo_interface.jl

You may also notice that it accepts all GeoInterface.jl objects in all functions without explicit conversions by users.

That could also work in Meshes.jl

@rafaqz
Copy link

rafaqz commented Apr 28, 2023

It may be easiest to follow this PR:

JuliaGeo/LibGEOS.jl#156

Something to keep in mind is not all geometry objects in the ecosystem have their dimensions known at compile time.

So as with the fix here, if you need D at compile time, always try to find the number of dimensions as early as possible (e.g. on a multypolygon object rather than the points, linestrings or polygons it holds). Use GI.is3dand keep it as a type.

Next is that GI.getgeom(geom) is an iterator so you can stream from one geom to the next after getting the length with GI.ngeom and allocating once. coordinates was terrible for allocations but now we can do much better.

Then where you dont care about structure you can use GI.getpoint(geom) or GI.getring(geom)` to ignore the intermediate objects and hit optimised paths in some cases. E.g. rings and points are very cheap to get from shapefiles, but correct polygons are expensive.

@juliohm
Copy link
Member

juliohm commented Apr 28, 2023

This has to do with the fact that GI returns a Union{Vector{T}, Vector{Union{T,Nothing}} etc as the resulting type. This is a MWE where Julia apparently has a bug:

test(v::AbstractVector{T}) where {T} = NTuple{length(v),T}(v)

test([1,2,3]) # works

test([1,2,nothing]) # hangs

Thanks @eliascarv for finding this MWE.

@rafaqz
Copy link

rafaqz commented Apr 28, 2023

That's interesting, does seem like some kind of compiler bug.

But that return type sucks, we should track down where that comes from.

Edit: this is probably an inference problem with 2d/3d geoms from gdal. We should force the type from the first two values.

But not using coordinates here and getting my PR over there merged for points at ArchGDAL will probably solve this too. We should also deprecate coordinates now.

@juliohm
Copy link
Member

juliohm commented Apr 28, 2023

@ErickChacon you mentioned that we could fix the issue here by avoiding that constructor, right? Can you prepare a PR here as well to use the splatting operator as you suggested? It would be great if you could create a test set with GDAL geometries making sure that our geom2meshes function is working as expected with all of them.

@ErickChacon
Copy link
Member Author

ErickChacon commented Apr 28, 2023

@juliohm, yes. I am checking https://github.com/JuliaGeo/LibGEOS.jl/blob/master/src/geo_interface.jl to make the interface similar to it. We will just use GI.x, GI.y, and GI.z as used in LibGEOS.jl. For example, they use the following function:

function GI.convert(::Type{Point}, ::PointTrait, geom; context=get_global_context())
    if GI.is3d(geom)
        return Point(GI.x(geom), GI.y(geom), GI.z(geom), context)
    else
        return Point(GI.x(geom), GI.y(geom), context)
    end
end

I am still checking other details to have an interface aligned with @rafaqz suggestions. I will start a pull request soon.

@rafaqz
Copy link

rafaqz commented Apr 28, 2023

FYI context is purely an annoying libgeos thing ;)

But otherwise that should be generic

@rafaqz
Copy link

rafaqz commented Apr 28, 2023

FYI context is purely an annoying libgeos thing ;)

But otherwise that should be generic.

Also note Meshes.jl will need a Meshes.geointerface_geomtype method to implement package level conversions.

Probably all this code should be in extensions in Meshes.jl anyway at some point, the dependencies on GeoJSON.jl and Shapefile.jl are obsolete now, they have Tables.jl interfaces.

@juliohm
Copy link
Member

juliohm commented Apr 28, 2023

We don't have plans to add dependencies related to GeoInterface.jl in Meshes.jl, it is not aligned with our philosophy and we don't want to promote use cases where users are fiddling with different geometry types all over the place. We are providing these conversion methods here in GeoTables.jl so that people can do IO, proceed with geometric processing constrained to a single set of geometries in Meshes.jl, and then export the results back to disk. That is the only place where GeoInterface.jl is envisioned in our stack.

@rafaqz
Copy link

rafaqz commented Apr 28, 2023

Sure thing, love the strict, clean philosophy!

But mostly the idea is that theres no fiddling to be done, interop will be seemless ;)

@juliohm
Copy link
Member

juliohm commented Apr 28, 2023

But mostly the idea is that theres no fiddling to be done, interop will be seemless ;)

Perhaps you are too optimistic about the co-existence of different geometry types in advanced geometric processing pipelines. At some point you will hit function calls that are not implemented in terms of the GeoInterface.jl and the application will just crash. It is as simple as that in my mind, but I would be happy to be proven wrong. Looking forward to the seamless experience!

@rafaqz
Copy link

rafaqz commented Apr 28, 2023

Its actually pretty easy to add methods for all functions in a package to accept GeoInterface types. See e.g. my LibGEOS PR. There is very little code used on that interop.

And note my comment was in aid of your package getting high level conversion like GeoInterface.convert(Meshes, any_geom). Otherwise users need to convert type to type manually for all geometry types they might use.

Meshes.jl does need some minor coordination with GeoInterface.jl for that to work, not even a dependency, but at least a function stub that can have methods added to it here.

@juliohm
Copy link
Member

juliohm commented May 3, 2023

Thank you @ErickChacon for fixing this ❤️

@juliohm juliohm closed this as completed May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants