Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor language utility data out from utils.py #52

Closed
2 tasks done
wkyoshida opened this issue Oct 14, 2023 · 5 comments
Closed
2 tasks done

Refactor language utility data out from utils.py #52

wkyoshida opened this issue Oct 14, 2023 · 5 comments
Assignees
Labels
data Relates to data or Wikidata good first issue Good for newcomers refactor Refactor code to improve quality

Comments

@wkyoshida
Copy link
Member

Terms

Issue

This issue is for refactoring the language data out from src/scribe_data/utils.py to a separate file, likely a JSON - as suggested by @m-charlton in #51 🙌

As proposed by @m-charlton,

This file would be loaded once on module import and then the utils functions could interrogate this loaded object.

We can use this issue to track discussion of details for the implementation and the work itself.

@wkyoshida
Copy link
Member Author

I do like how this could facilitate the inclusion of new languages in the future as well 🙌 Instead of having to hunt down where all modifications would be needed in code, a lot of it can be centralized in this JSON file.

@m-charlton we can absolutely go over what format might work best too, as you've already started the discussion for. Could using a gist though make it easier perhaps to collaborate? To make comments and revisions on the file?

@wkyoshida wkyoshida added good first issue Good for newcomers data Relates to data or Wikidata refactor Refactor code to improve quality labels Oct 14, 2023
@wkyoshida wkyoshida moved this from Todo to In Progress in Scribe Board Oct 14, 2023
@m-charlton
Copy link
Contributor

@wkyoshida thanks for opening this issue. The following gist contains the truncated version of the language data file (only two languages are included).

The "used by" & "description" fields are comments,the data proper are held in "languages". My main question concerns the placement of the "ignore" & "remove" fields. Is the following to be preferred?

  "remove-words": [],
  "ignore-words": []

A second question concerns the location of the resources directory, containing the JSON data file: currently it's in Scribe-Data/src/scribe_data and called resources. Is this the best location? I notice that there's a _resources directory in extract_transform

@wkyoshida
Copy link
Member Author

I'm thinking just having the structure of

  "remove-words": [],
  "ignore-words": []

as you mentioned makes sense actually. Tying them under a "words" property maybe feels like some unnecessary nesting perhaps? The common denominator is really only that both lists are of, well.., words hahah 😆 Do others feel the same? Definitely open to discussion if there's benefit to the current structure that I'm overlooking.

@andrewtavis
Copy link
Member

Makes sense to me, @wkyoshida :)

andrewtavis added a commit that referenced this issue Oct 30, 2023
refactor(utils.py): move language data to JSON file (resolves #52)
@github-project-automation github-project-automation bot moved this from In Progress to Done in Scribe Board Oct 30, 2023
@andrewtavis
Copy link
Member

951bf86 sent along some minor changes to fix what we discussed in the sync and also improved the docstrings in a few places. Thanks for the note that one was copied and not updated, @m-charlton! I think the next thing to do here would be #55 so we simplify this process even more. I noted in #54 what parts would be able to be removed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Relates to data or Wikidata good first issue Good for newcomers refactor Refactor code to improve quality
Projects
Archived in project
Development

No branches or pull requests

3 participants