Skip to content

frictionlessdata/TableSchema.jl

Repository files navigation

TableSchema.jl

Travis Coveralls SemVer Release Codebase Support

A library for working with Table Schema in Julia:

Table Schema is a simple language- and implementation-agnostic way to declare a schema for tabular data. Table Schema is well suited for use cases around handling and validating tabular data in text formats such as CSV, but its utility extends well beyond this core usage, towards a range of applications where data benefits from a portable schema format.

Features

  • Table class for working with data and schema
  • Schema class for working with schemata
  • Field class for working with schema fields
  • validate function for validating schema descriptors
  • infer function that creates a schema based on a data sample

Status

🚧 This package is pre-release and under heavy development. Please see DESIGN.md for a detailed overview of our goals, and visit the issues page to contribute and make suggestions. For questions that need to a real time response, reach out via Gitter. Thanks! 🚧

We aim to make this library compatible with all widely used approaches to work with tabular data in Julia.

Please visit our wiki for a list of related projects that we are tracking, and contibute use cases there or as enhancement issues.

Usage

See examples folder and unit tests in runtests.jl for current usage.

Table

using TableSchema

table = Table("cities.csv")
table.headers
# ['city', 'location']
table.read(keyed=True)
# [
#   {city: 'london', location: '51.50,-0.11'},
#   {city: 'paris', location: '48.85,2.30'},
#   {city: 'rome', location: 'N/A'},
# ]
rows = table.source
# 6×5 Array{Any,2}:
#   "id"    "height"   "age"  "name"     "occupation"         
#  1      10.0        1       "string1"  "2012-06-15 00:00:00"
#  2      10.1        2       "string2"  "2013-06-15 01:00:00"
# ...
err = table.errors # handle errors
...

Schema

schema = Schema("schema.json")
schema.fields
# <Field1, Field2...>
err = schema.errors # handle errors

Field

Add fields to create or expand your schema like this:

schema = Schema()
field = Field()
field.descriptor._name = "A column"
field.descriptor.typed = "Integer"
add_field(schema, field)

Installation

🚧 Work In Progress. The following documentation is relevant only after package release. In the interim, please see DataPackage.jl

The package use semantic versioning, meaning that major versions could include breaking changes. It is highly recommended to specify a version range in your REQUIRE file e.g.:

v"0.0.1-" <= TableSchema < v"1.0.0-"

At the Julia REPL, install the package with:

(v1.0) pkg> add "https://github.com/loleg/TableSchema.jl"

Development

Code examples here require Julia 0.7, as we are now migrating to Julia 1.0. See Pkg documentation for further information.

Clone this repository, enter the REPL (press ] at the Julia prompt) to activate and test it using:

cd <path-to-my-folder>/TableSchema.jl
julia
# Press ]
(v1.0) pkg> activate .
(TableSchema) pkg> test

You can also install the package locally and run unit tests from the console:

(v1.0) pkg> add .
julia test/runtests.jl

A new feature of Julia's package manager is the dev command. To get a copy of this package installed into your ~/.julia folder and updated with every change, use:

(v1.0) pkg> dev TableSchema