-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yegane: Task 721-724 : DSTC2 and hate_speech18 Dataset #108
Conversation
@SavanDoshi, wondering if you can help review this PR? |
@yeganehkordi, wondering if you can help review this PR? |
Sure! I'll review them now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work!
One note: Labels in task 721 look skewed towards the "0" class. Make sure the distribution is not skewed toward a class in all the tasks.
"Nisarg Patel" | ||
], | ||
"Source": [ | ||
"https://huggingface.co/datasets/hate_speech18" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the dataset name in the source field of each task. (in addition to the link)
"Categories": [ | ||
"Classification" | ||
], | ||
"Definition": "Given a statement, determine if the sentiment is of hatred, no hate,relation or nothing at all.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain the numbers in the input. Anyone should be able to do the task by just reading the definition. You can ask one of your friends to read the definition and solve one of the instances.
Also, consider adding a space after comma in no hate,
.
{ | ||
"input": "Jeeze its worst than the UK.", | ||
"output": "3", | ||
"explanation": "Its a relative comparision with United Kingdom." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct the typo of the It's
, and replace United Kingdom
with the United Kingdom
. Consider adding a comma before and, in the previous explanation.
"Categories": [ | ||
"Classification" | ||
], | ||
"Definition": "Given the customer side of conversation, determine what the goal of the customer is.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate more on the definition. For example, you can say that each part of the conversation is indicated with a "\n" and a number, and all the enquires of the customer should be reflected in the output.
{ | ||
"input": "0.noise\n1.cheap restaurant in the east part of town\n2.phone number\n3.noise\n4.phone number\n5.good bye\n", | ||
"output": "You want to find a cheap restaurant and it should be in the east part of town. You want to know the phone number.", | ||
"explanation": "All the enquires of the customer with every detail is reflected" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a period at the end of the sentence.
{ | ||
"input": "0.traditional food\n1.traditional\n2.spanish food\n3.spanish\n4.food\n5.address\n6.price range\n7.thank you good bye\n", | ||
"output": "You want to find and it should serve african food. You don't care about the price range. Make sure you get the address, phone number, and area of the venue.", | ||
"explanation": "The food type did not match and the price range was mis calculated." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct the typo of the miscalculated
.
Maybe change the explanation of the example to: The defined goals do not match the requirements of the customer.
"Categories": [ | ||
"Classification" | ||
], | ||
"Definition": "Given the reponses by restaurant system, determine the experience of the customer understanding as strongly agree, agree,slightly agree,slightly disagree and strongly disagree.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct the typo of the responses
, consider adding a space after commas, and elaborate more on the definition as explained in the previous task.
I suggest merging strongly agree, agree, and slightly agree into agree and the others into disagree because the border between them might be fuzzy. Hence, average humans might not be able to clearly distinguish them.
tasks/README.md
Outdated
`task722_DSTC2_classification` | Classify if the speaker is a customer or restaurant system | Classification | ||
`task723_DSTC2_classification` | Determine the customer goals from the given customer side of conversation | Classification | ||
`task724_DSTC2_classification` | Classify the experience of the speaker understanding in terms of strongly agree, agree,slightly agree,slightly disagree and strongly disagree. | Classification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change to: Given the responses by a restaurant system, classify the experience of the customer understanding.
Please consider rewriting the summaries in this format.
@nisargpatel58, can you address reviewer comments so that I can evaluate? |
A kind reminder if you have forgotten this task. |
I assume you are not planning to revise this PR. @nisargpatel58 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Some remaining comments:
- Please change the source fields to something like
"Hate Speech Recognition (https://huggingface.co/datasets/hate_speech18)"
. Since you have used one dataset, the list of sources should have one item. - Labels in task 721 look skewed towards the "0" class. I guess task 724 has the same problem. Make sure the distribution is not skewed toward a class in all the tasks.
} | ||
], | ||
"Negative Examples": [ | ||
{ | ||
"input": "0.traditional food\n1.traditional\n2.spanish food\n3.spanish\n4.food\n5.address\n6.price range\n7.thank you good bye\n", | ||
"output": "You want to find and it should serve african food. You don't care about the price range. Make sure you get the address, phone number, and area of the venue.", | ||
"explanation": "The food type did not match and the price range was mis calculated." | ||
"explanation": "The food type did not match and the price range was miscalculated." | ||
}, | ||
{ | ||
"input": "0.im looking for something in the west side\n1.doesnt matter as long as it is moderately priced\n2.can i have the address of it\n3.whats its phone number\n4.thank you\n5.goodbye\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change the explanation of this example(second negative example) to: The defined goals do not match the requirements of the customer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distribution of the data in both these datasets are skewed in nature. The overall distribution of task 721 falls highly under class 0. It's like 180:20 ( 0 : other classes) kind of distribution. For 724, there is the same thing but I will try to make it a little less skewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We prefer not to have skewness, even though the data is skewed in nature.
Last reminder @nisargpatel58 to update this PR. |
@nisargpatel58 Please check the files. I guess you have pushed the first version of the tasks before addressing the comments. |
I guess just the source field names were not formatted. Rest all changes about skewed data and grammatical errors have been taken care of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that you had addressed these comments once, but now it seems that you have pushed the first version. Here are two examples. Would you please recheck the comments?
"Categories": [ | ||
"Classification" | ||
], | ||
"Definition": "Given a statement, determine if the sentiment is of hatred, no hate,relation or nothing at all.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain the numbers in the input. Anyone should be able to do the task by just reading the definition. You can ask one of your friends to read the definition and solve one of the instances.
Also, consider adding a space after comma in no hate,.
{ | ||
"input": "Jeeze its worst than the UK.", | ||
"output": "3", | ||
"explanation": "Its a relative comparision with United Kingdom." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct the typo of the It's
, and replace United Kingdom
with the United Kingdom
. Consider adding a comma before and, in the previous explanation.
grading complete |
No description provided.