Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add map gender function to utils.py #84

Merged
merged 3 commits into from
Mar 17, 2024

Conversation

ikeadeoyin
Copy link
Contributor

Contributor checklist


Description

Added the map_gender function to utils.py

Related issue

Copy link

github-actions bot commented Feb 27, 2024

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. It'd be great to have you!

Maintainer checklist

  • The commit messages for the remote branch should be checked to make sure the contributor's email is set up correctly so that they receive credit for their contribution

    • The contributor's name and icon in remote commits should be the same as what appears in the PR
    • If there's a mismatch, the contributor needs to make sure that the email they use for GitHub matches what they have for git config user.email in their local Scribe-Data repo
  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

@wkyoshida wkyoshida self-requested a review February 27, 2024 23:40
Copy link
Member

@wkyoshida wkyoshida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for taking this on @ikeadeoyin! 🚀

Added some comments to the PR! ✌️

Comment on lines +12 to +14
PATH_TO_SCRIBE_ORG = os.path.dirname(sys.path[0]).split("Scribe-Data")[0]
PATH_TO_SCRIBE_DATA_SRC = f"{PATH_TO_SCRIBE_ORG}Scribe-Data/src"
sys.path.insert(0, PATH_TO_SCRIBE_DATA_SRC)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I guess I'm a little confused as to why the above is needed. Does something not work without it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a problem in loading the function into the file that was discussed on Matrix. I can take a bit more of a look into this and try to find a solution :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to import the map_genders function in the utils.py file without these lines. It gives an error message (ModuleNotFound: no module named scribe_data)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work to figure this out after the merge, @ikeadeoyin :)

@@ -420,3 +420,21 @@ def check_and_return_command_line_args(
python {all_args[0]} '["comma_separated_sets_in_quotes"]'
"""
)

def map_genders(wikidata_gender):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding #69 correctly, it seems that what we'd rather want here actually is that map_genders() also works for all supported languages. There are other map_genders() functions for other languages that exist elsewhere in the code, as pointed out in the issue. The changes here only address the map_genders() for German.

So to make the function general and reconcile all supported languages, it could make sense to modify map_genders(wikidata_gender) into something like map_genders(wikidata_gender, iso_code) instead perhaps - where iso_code is "en", "fr", "es", etc. We can then use the iso_code for the language-specific checks, such as "Q499327" here for German masculine.

Let me know if there are any questions regarding the above 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the review, it was helpful in understanding the codebase. I can see that we have map_genders() function for other languages.

I have tried implementing this feedback however on running update.py file, I got a couple of files modified and a new file created(src/scribe_data/extract_transform/languages/German/nouns/nouns_queried.json).

Can I commit these extra files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not commit the extra files like nouns_queried.json, @ikeadeoyin :) These should have been deleted by the format_nouns.py file in the German directory. The way it works is that the query results are brought down, they're formatted and saved in the final version, and then the original file is deleted. Let's just have this focus on the Python files 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright @andrewtavis,

Following the contribution guide, I have committed the Python files however, I have this error

Scribe-Data git:(add-map-gender-function) ✗ git pull --rebase upstream main
error: cannot pull with rebase: You have unstaged changes.
error: Please commit or stash them. 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ikeadeoyin 👋
Are the unstaged changes the *.json files?
If so, I guess you could either delete them from your local machine or run git stash to stash them so that your working directory is clean before pulling

elif wikidata_gender in ["neuter", "Q1775461"]:
return "N"
else:
return "" # nouns could have a gender that is not valid as an attribute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Add a newline at the end of file


There are some uncommon issues that can occasionally occur with the absence of an ending newline 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this, thanks.

Copy link
Member

@andrewtavis andrewtavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have the weird sys import magic at the top of the files, but this will be worked out when I got through and convert all of the formatting processes over to being lexeme based rather than string based 😊 Thanks for all the work here, @ikeadeoyin! Great to be improving the packages structure like this 🚀

@andrewtavis andrewtavis merged commit 0ed0669 into scribe-org:main Mar 17, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants