Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to use registry, and to take custom schemas #6

Closed
pwalsh opened this issue Jun 4, 2015 · 12 comments
Closed

Refactor to use registry, and to take custom schemas #6

pwalsh opened this issue Jun 4, 2015 · 12 comments

Comments

@pwalsh
Copy link
Contributor

pwalsh commented Jun 4, 2015

@rgrp /cc @gvidon

I'd like to suggest we make the following changes to datapackage-validate:

  • Remove schemas submodule
  • Instead, get our "official" schemas via the new datapackage-registry module
  • Don't read from a filepath, but read a schema object. For schemas in our registry, could pass a key which is the unique id of the schema

Goals

  • Package works seamlessly in browser and server environments (doesn't have to read a file from path)
  • Package always stays up-to-date with changes to the core Data Package schemas
  • Package has instant access to any new schemas in our registry
  • Package can be used with any other custom Data Package schemas that a user may have

Cons

  • The only con is see is that datapackage-registry gets the "official" schemas via an API call over the web. I think it is not really a problem though - if the user does not want to rely on having a connection, that user could simply keep local copies of the schemas as part of their codebase, as we are adding the ability to load any schema

Pros

  • Because of the API call to the schemas - the official schemas are always up-to-date, and new schemas are seamlessly integrated :)
@rufuspollock
Copy link
Contributor

Comments:

  • +1 on removing the submodule
  • I think there is an offline, command line use case
  • I suggest we support 3 arguments: js object (hash), file path, url, and "identifier" in the registry
    • could just support js object and have other 3 come from 3rd party library ... (that seems very reasonable and KISS)

@gvidon
Copy link
Contributor

gvidon commented Jun 12, 2015

@rgrp @pwalsh so, it means that .validate() should take second param which can only be a js object, and will be applied directly as a scheme?

@pwalsh
Copy link
Contributor Author

pwalsh commented Jun 14, 2015

@gvidon @rgrp this is how I imagine it:

validate will have to support:

  • the thing being validated
  • the thing to validate against

so the function signature will become:

  • validate(data, schema)

Where

  • data is a string parseable as json, or an object, and it is the thing being validated
  • schema is one of:
    • object
    • filepath
    • url
    • registry_id

The logic on schema:

  • IF schema is an object, we can pass it as is to the actual validator function
  • IF schema is a filepath, we'll open it as per the current functionality that opens a file from a path
  • IF schema is a url, we'll open it too
  • IF schema is a registry_id we get the schema out of the registry
  • IF schema is null/undefined, we use 'base' as a registry_id and get schema out of registry

The logic probably doesn't need to be in that order - maybe first we check if the value of schema is an id in the registry, and then fallback into other checks....

So the change here is:

  • we no longer have the schemas submodule or the hard coded file read
  • we default to using the datapackage registry
  • we support the user directly passing us (any) schema that is already an object, and likewise, a filepath or url to an object

Is that clear? Any suggestions to improve on that?

@gvidon
Copy link
Contributor

gvidon commented Jun 14, 2015

@pwalsh thank you, now it is clear. But to me it looks bit complicated, it seems like datapackage-validate will be overwhelmed with features which are not core features.

I would rather keep just one type of schema param — just json. And let user decide himself on how and where to get schema from. Additionally I would create datapackage-registry-schema module which gets schema from registry by id or name.

This way user will be able to comfortably fit datapackage-validate into his existing file/url routines.

@pwalsh
Copy link
Contributor Author

pwalsh commented Jun 14, 2015

@gvidon ok understood: that is what @rgrp was suggesting (object only).

But about datapackage-registry-schema - why do we need this as well as datapackage-registry - maybe we are getting too modular? Also, if datapackage-validate is object only, then the user has to have created the object, and I feel like we've missed that really basic use case of validating against the "base" data package schema, without the user needing to know too much/use other libs.

@gvidon
Copy link
Contributor

gvidon commented Jun 14, 2015

@pwalsh I absolutely agree to the point about basic usage. It is a real issue. If the most common way of using datapackage-validate is validating against base schema, then I would hardcode (or something like this) base schema and for other cases I would allow to pass schema object.

Also agree to datapackage-registry, maybe we need to extend it to be able to get actual schema object.

@pwalsh
Copy link
Contributor Author

pwalsh commented Jun 14, 2015

@gvidon

So how about:

  1. we can add a get query method to datapackage-registry
  2. If schema is undefined, we do a get on "base" from datapackage-registry

?

@gvidon
Copy link
Contributor

gvidon commented Jun 14, 2015

@pwalsh for consistency I would make datapackage-validate operating synchronous in case of any type params. That is why I suggest hardcoding base schema.

@gvidon
Copy link
Contributor

gvidon commented Jun 15, 2015

@pwalsh do we go according your last comment?

@pwalsh
Copy link
Contributor Author

pwalsh commented Jun 15, 2015

@gvidon and I had a quick skype chat, and it was decided:

  • async api
  • datapackage-validate does depend on datapackage-registry
  • schema can be either an object, or a registry id
    • (so files and urls, if the user had them, would be read by user and then passed in as an object)

@gvidon
Copy link
Contributor

gvidon commented Jun 15, 2015

Quick recap

  • remove loading schema from file
  • remove link to schemas repo
  • accept second param schema — either object or registry id (string). Validate input over passed schema if it is an object, return Promise
  • get schema using datapackage-registry and validate input over returned object if schema is a registry profile id, return Promise
  • if no second param passed consider it as a 'base' (default registry id) and do what described in previous step
  • update tests

@gvidon gvidon assigned alexeyk and gvidon and unassigned gvidon and alexeyk Jun 15, 2015
@pwalsh
Copy link
Contributor Author

pwalsh commented Jun 16, 2015

@gvidon that's correct.

@gvidon gvidon added the inwork label Jun 16, 2015
gvidon pushed a commit that referenced this issue Jun 16, 2015
gvidon pushed a commit that referenced this issue Jun 16, 2015
gvidon pushed a commit that referenced this issue Jun 16, 2015
gvidon pushed a commit that referenced this issue Jun 16, 2015
@gvidon gvidon removed the inwork label Jun 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants