Skip to content

Commit

Permalink
Heavy package-level refactoring. _mirrorsubsets table now always in…
Browse files Browse the repository at this point in the history
… `public`.

The refactoring should reflect the CoStoSys name tag for once and
better stick to Java naming conventions.
  • Loading branch information
khituras committed May 20, 2019
1 parent 1d046a1 commit 709cf19
Show file tree
Hide file tree
Showing 49 changed files with 6,045 additions and 139 deletions.
8 changes: 2 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[![Build Status](https://travis-ci.com/JULIELab/costosys.svg?branch=master)](https://travis-ci.com/JULIELab/costosys)
[![Build Status](https://travis-ci.com/JULIELab/costosys.svg?branch=master)](https://travis-ci.com/JULIELab/costosys)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/6c06345e4f6b4a18a0e38043f11c6e60)](https://app.codacy.com/app/khituras/costosys?utm_source=github.com&utm_medium=referral&utm_content=JULIELab/costosys&utm_campaign=Badge_Grade_Dashboard)[![Automated Release Notes by gren](https://img.shields.io/badge/%F0%9F%A4%96-release%20notes-00B2EE.svg)](https://github-tools.github.io/github-release-notes/)

# CoStoSys
The Corpus Storage System (CoStoSys) is a tool and abstraction layer for a PostgreSQL document database.
Expand Down Expand Up @@ -46,11 +47,6 @@ have been predefined, including
| medline_2017 | Defines the columns 'pmid' and 'xml'. Import data is expected to be in PubMed XML PubmedArticleSet format where one large XML file contains a bulk of PubMed articles. The individual articles must be located at XPath /PubmedArticleSet/PubmedArticle/MedlineCitation. This format is employed by the downloadable PubMed distribution since 2017. XML data are stored in GZIP format.|
| medline_2016 | Defines the columns 'pmid' and 'xml'. Import data is expected to be in MEDLINE XML MedlineCitationSet format where one large XML file contains a bulk of MEDLINE articles. The individual articles must be located at XPath /MedlineCitationSet/MedlineCitation. This format was employed by the downloadable MEDLINE distribution until 2016. XML data are stored in GZIP format. |
| pubmed_gzip | The same as medline_2017. |
| xmi_text | Used internally. Defines the columns 'pmid', 'xmi', 'max_xmi_id' and 'sofa_mapping'. Used by the JeDIS components [jcore-xmi-db-reader](https://github.com/JULIELab/jcore-base/tree/b2128199bd548dd989b0d7c198634ed79670e8c7/jcore-xmi-db-reader) and [jcore-xmi-db-writer](https://github.com/JULIELab/jcore-base/tree/b2128199bd548dd989b0d7c198634ed79670e8c7/jcore-xmi-db-writer) to read and store UIMA annotation graphs in XMI format that were segmented into annotation types with separate storage.|
| xmi_annotation | Used internally. Defines the columns 'pmid' and 'xmi'. This table schema is used for the annotation data segmented away from full XMI annotation graphs, see xmi_text. |
| xmi_text_gzip | Used internally. The same as xmi_text but the contents of the xmi column are stored an GZIP format.|
| max_id_addition | Used internally. Defines the fields 'pmid', 'xmi' and 'max_xmi_id' but only marks the 'max_xmi_id' column for retrieval. This schema is not supposed to be used for data import but for a table with xmi_text schema for which only the current maximum XMI ID should be retrieved. Technical detail of the JeDIS architecture.|
| xmi_annotation_gzip | Used internally. The same as xmi_annotation but with the XMI data in GZIP format.|

Custom table schema may be added to the configuration at XPath `/databaseConnectorConfiguration/DBSchemaInformation/tableSchemas`. Refer to docbook documentation and the XML schema for details.

Expand Down
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<artifactId>costosys</artifactId>
<version>1.3.2</version>
<version>1.4.0-SNAPSHOT</version>
<name>Corpus Storage System</name>
<description>A utility for managing documents stored in a PostgreSQL database. The documents are imported into a
PostgreSQL DB as full texts with the goal to be able to retrieve the documents by their PubMedID efficiently.
Expand All @@ -21,7 +21,7 @@
</descriptors>
<archive>
<manifest>
<mainClass>de.julielab.xmlData.cli.CLI</mainClass>
<mainClass>de.julielab.costosys.cli.CLI</mainClass>
</manifest>
</archive>
</configuration>
Expand Down
116 changes: 116 additions & 0 deletions src/main/java/de/julielab/costosys/Constants.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
/**
* Constants.java
*
* Copyright (c) 2010, JULIE Lab.
* All rights reserved. This program and the accompanying materials
* are made available under the terms of the Common Public License v1.0
*
* Author: faessler
*
* Current version: 1.0
* Since version: 1.0
*
* Creation date: 19.11.2010
**/

package de.julielab.costosys;

/**
* This class provides Constants useful for common tasks. Examples include
* database field names for the import or retrieval of Medline documents, table
* names etc.
*
* @author faessler
*/
public final class Constants {

// Field attribute names

/**
* The default PostgreSQL schema in which all data related tables are
* stored. The schema is {@value #DEFAULT_DATA_SCHEMA}.
*/
public static final String DEFAULT_DATA_SCHEMA = "_data";

/**
* Constant for the name of a database table holding at least document ID
* and document data (e.g. PubmedId and Medline XML). Value:
* {@value #DEFAULT_DATA_TABLE_NAME}.
*/
public static final String DEFAULT_DATA_TABLE_NAME = DEFAULT_DATA_SCHEMA
+ "._data";

public static final String VALUE = "value";

// SQL type constants

public static final String TYPE_TEXT = "text";

public static final String TYPE_TEXT_ARRAY = "text[]";

public static final String TYPE_VARCHAR_ARRAY = "varchar[]";

public static final String TYPE_BINARY_DATA = "bytea";

/**
* Constant for a possible value of <code>type.</code>
* <p>
* Used to to create a timestamp without time zone.
*/
public static final String TYPE_TIMESTAMP_WITHOUT_TIMEZONE = "timestamp without time zone";

public static final String TYPE_INTEGER = "integer";

public static final String TYPE_BOOLEAN = "boolean";

public static final String TYPE_XML = "xml";

public static final String XML_FIELD_NAME = "xml";

public static final String PMID_FIELD_NAME = "pmid";

public static final String DATE_FIELD_NAME = "date";

public static final String NLM_ID_FIELD_NAME = "nlm_id";

public static final String AUTO_ID_FIELD_NAME = "autoID";

public static final String HAS_ERRORS = "has_errors";

public static final String LOG = "log";

public static final String IN_PROCESS = "is_in_process";

public static final String IS_PROCESSED = "is_processed";

public static final String LAST_COMPONENT = "last_component";

public static final String HOST_NAME = "host_name";

public static final String PROCESSING_TIMESTAMP = "processing_timestamp";

public static final String PID = "pid";

@Deprecated
public static final String DOC_ID_FIELD_NAME = "doc_id";

public static final String PROCESSED = "is_processed";

public static final String HIDDEN_CONFIG_PATH = "dbcTest.hiddenConfigPath";

public static final String COSTOSYS_CONFIG_FILE = "costosys.configurationfile";

public static final String MIRROR_COLLECTION_NAME = "public._mirrorSubsets";

public static final String MIRROR_COLUMN_DATA_TABLE_NAME = "datatablename";

public static final String MIRROR_COLUMN_SUBSET_NAME = "subsettablename";

public static final String MIRROR_COLUMN_DO_RESET = "performreset";

public static final String TIMESTAMP_FIELD_NAME = "timestamp";

public static final String TOTAL = "total";


}
Loading

0 comments on commit 709cf19

Please sign in to comment.