-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Phi3poc #2301
base: master
Are you sure you want to change the base?
[WIP] Phi3poc #2301
Conversation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2301 +/- ##
==========================================
- Coverage 84.55% 84.37% -0.18%
==========================================
Files 328 328
Lines 16848 16848
Branches 1513 1513
==========================================
- Hits 14246 14216 -30
- Misses 2602 2632 +30 ☔ View full report in Codecov by Sentry. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py
Outdated
Show resolved
Hide resolved
core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py
Outdated
Show resolved
Hide resolved
core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py
Outdated
Show resolved
Hide resolved
self.config.update(kwargs) | ||
|
||
|
||
def camel_to_snake(text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there might already be one in library to use
"output column", | ||
typeConverter=TypeConverters.toString, | ||
) | ||
modelParam = Param(Params._dummy(), "modelParam", "Model Parameters") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain difference between model params and other params (you can just link to other docs if easier)
typeConverter=TypeConverters.toString, | ||
) | ||
modelParam = Param(Params._dummy(), "modelParam", "Model Parameters") | ||
modelConfig = Param(Params._dummy(), "modelConfig", "Model configuration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain difference between model config and other params (you can just link to other docs if easier)
useFabricLakehouse = Param( | ||
Params._dummy(), | ||
"useFabricLakehouse", | ||
"Use FabricLakehouse", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is for a local cache then you might be able to make the verbage generic like useLocalCache
deviceMap = Param( | ||
Params._dummy(), | ||
"deviceMap", | ||
"device map", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might need to explain a bit more about this param and what it takes
torchDtype = Param( | ||
Params._dummy(), | ||
"torchDtype", | ||
"torch dtype", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here
def load_model(self): | ||
""" | ||
Loads model and tokenizer either from Fabric Lakehouse or the HuggingFace Hub, | ||
depending on the 'useFabricLakehouse' param. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you name it more generically place that name here
"Use FabricLakehouse", | ||
typeConverter=TypeConverters.toBoolean, | ||
) | ||
lakehousePath = Param( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be able to get rid of earlier param just check if this is None
if self.getUseFabricLakehouse(): | ||
local_path = ( | ||
self.getLakehousePath() or f"/lakehouse/default/Files/{model_name}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch to just use cachePath and then in our docs well say this is a good place to store things
|
||
if self.getUseFabricLakehouse(): | ||
local_path = ( | ||
self.getLakehousePath() or f"/lakehouse/default/Files/{model_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: hf_cache
def __init__( | ||
self, | ||
base_cache_dir="./cache", | ||
base_url="https://mmlspark.blob.core.windows.net/huggingface/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets use
%sh
azcopy cp https://mmlspark.blob.core.windows.net/huggingface/blah /lakehouse/blah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Youy can also put in the mardown cell a little explanation of this and how its just for a speedup otherwise it will download from the huggingface hub
|
||
def _predict_single_chat(self, prompt, model, tokenizer): | ||
param = self.getModelParam().get_param() | ||
chat = [{"role": "user", "content": prompt}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the prompt is a list, then assume its of structure of "chat"
|
||
def _predict_single_chat(self, prompt, model, tokenizer): | ||
param = self.getModelParam().get_param() | ||
chat = [{"role": "user", "content": prompt}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chat = [{"role": "user", "content": prompt}] | |
if isinstance(prompt, list): | |
chat = prompt | |
else: | |
chat = [{"role": "user", "content": prompt}] | |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Related Issues/PRs
#xxx
What changes are proposed in this pull request?
Briefly describe the changes included in this Pull Request.
How is this patch tested?
Does this PR change any dependencies?
Does this PR add a new feature? If so, have you added samples on website?
website/docs/documentation
folder.Make sure you choose the correct class
estimators/transformers
and namespace.DocTable
points to correct API link.yarn run start
to make sure the website renders correctly.<!--pytest-codeblocks:cont-->
before each python code blocks to enable auto-tests for python samples.WebsiteSamplesTests
job pass in the pipeline.