Refactor language utility data out from utils.py #52

wkyoshida · 2023-10-14T23:18:41Z

Terms

I have searched open and closed issues
I agree to follow Scribe-Data's Code of Conduct

Issue

This issue is for refactoring the language data out from src/scribe_data/utils.py to a separate file, likely a JSON - as suggested by @m-charlton in #51 🙌

As proposed by @m-charlton,

This file would be loaded once on module import and then the utils functions could interrogate this loaded object.

We can use this issue to track discussion of details for the implementation and the work itself.

wkyoshida · 2023-10-14T23:35:10Z

I do like how this could facilitate the inclusion of new languages in the future as well 🙌 Instead of having to hunt down where all modifications would be needed in code, a lot of it can be centralized in this JSON file.

@m-charlton we can absolutely go over what format might work best too, as you've already started the discussion for. Could using a gist though make it easier perhaps to collaborate? To make comments and revisions on the file?

m-charlton · 2023-10-16T11:58:09Z

@wkyoshida thanks for opening this issue. The following gist contains the truncated version of the language data file (only two languages are included).

The "used by" & "description" fields are comments,the data proper are held in "languages". My main question concerns the placement of the "ignore" & "remove" fields. Is the following to be preferred?

  "remove-words": [],
  "ignore-words": []

A second question concerns the location of the resources directory, containing the JSON data file: currently it's in Scribe-Data/src/scribe_data and called resources. Is this the best location? I notice that there's a _resources directory in extract_transform

wkyoshida · 2023-10-18T04:02:17Z

I'm thinking just having the structure of

  "remove-words": [],
  "ignore-words": []

as you mentioned makes sense actually. Tying them under a "words" property maybe feels like some unnecessary nesting perhaps? The common denominator is really only that both lists are of, well.., words hahah 😆 Do others feel the same? Definitely open to discussion if there's benefit to the current structure that I'm overlooking.

andrewtavis · 2023-10-18T07:06:58Z

Makes sense to me, @wkyoshida :)

refactor(utils.py): move language data to JSON file (resolves #52)

andrewtavis · 2023-10-30T01:18:32Z

951bf86 sent along some minor changes to fix what we discussed in the sync and also improved the docstrings in a few places. Thanks for the note that one was copied and not updated, @m-charlton! I think the next thing to do here would be #55 so we simplify this process even more. I noted in #54 what parts would be able to be removed :)

wkyoshida added this to Scribe Board Oct 14, 2023

github-project-automation bot moved this to Todo in Scribe Board Oct 14, 2023

wkyoshida mentioned this issue Oct 14, 2023

Add tests for utility functions (resolves #50) #51

Merged

1 task

wkyoshida added good first issue Good for newcomers data Relates to data or Wikidata refactor Refactor code to improve quality labels Oct 14, 2023

wkyoshida moved this from Todo to In Progress in Scribe Board Oct 14, 2023

andrewtavis assigned m-charlton and wkyoshida Oct 16, 2023

m-charlton mentioned this issue Oct 18, 2023

refactor(utils.py): move language data to JSON file (resolves #52) #54

Merged

1 task

andrewtavis closed this as completed in f7a38b5 Oct 30, 2023

andrewtavis added a commit that referenced this issue Oct 30, 2023

Merge pull request #54 from m-charlton/refactor-utils

ad07bb1

refactor(utils.py): move language data to JSON file (resolves #52)

github-project-automation bot moved this from In Progress to Done in Scribe Board Oct 30, 2023

andrewtavis added a commit that referenced this issue Oct 30, 2023

refactor(utils.py/json): docstrings, combine lines, add return type #52

951bf86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor language utility data out from utils.py #52

Refactor language utility data out from utils.py #52

wkyoshida commented Oct 14, 2023

wkyoshida commented Oct 14, 2023

m-charlton commented Oct 16, 2023

wkyoshida commented Oct 18, 2023

andrewtavis commented Oct 18, 2023

andrewtavis commented Oct 30, 2023

Refactor language utility data out from utils.py #52

Refactor language utility data out from utils.py #52

Comments

wkyoshida commented Oct 14, 2023

Terms

Issue

wkyoshida commented Oct 14, 2023

m-charlton commented Oct 16, 2023

wkyoshida commented Oct 18, 2023

andrewtavis commented Oct 18, 2023

andrewtavis commented Oct 30, 2023