Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Phi3poc #2301

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

[WIP] Phi3poc #2301

wants to merge 14 commits into from

Conversation

JessicaXYWang
Copy link
Contributor

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Briefly describe the changes included in this Pull Request.

How is this patch tested?

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

  • No. You can skip this section.
  • Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

  • No. You can skip this section.
  • Yes. Make sure you have added samples following below steps.
  1. Find the corresponding markdown file for your new feature in website/docs/documentation folder.
    Make sure you choose the correct class estimators/transformers and namespace.
  2. Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
  3. Make sure the DocTable points to correct API link.
  4. Navigate to website folder, and run yarn run start to make sure the website renders correctly.
  5. Don't forget to add <!--pytest-codeblocks:cont--> before each python code blocks to enable auto-tests for python samples.
  6. Make sure the WebsiteSamplesTests job pass in the pipeline.

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Oct 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.37%. Comparing base (bab6aed) to head (e59a981).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2301      +/-   ##
==========================================
- Coverage   84.55%   84.37%   -0.18%     
==========================================
  Files         328      328              
  Lines       16848    16848              
  Branches     1513     1513              
==========================================
- Hits        14246    14216      -30     
- Misses       2602     2632      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

self.config.update(kwargs)


def camel_to_snake(text):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there might already be one in library to use

"output column",
typeConverter=TypeConverters.toString,
)
modelParam = Param(Params._dummy(), "modelParam", "Model Parameters")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain difference between model params and other params (you can just link to other docs if easier)

typeConverter=TypeConverters.toString,
)
modelParam = Param(Params._dummy(), "modelParam", "Model Parameters")
modelConfig = Param(Params._dummy(), "modelConfig", "Model configuration")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain difference between model config and other params (you can just link to other docs if easier)

useFabricLakehouse = Param(
Params._dummy(),
"useFabricLakehouse",
"Use FabricLakehouse",
Copy link
Collaborator

@mhamilton723 mhamilton723 Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is for a local cache then you might be able to make the verbage generic like useLocalCache

deviceMap = Param(
Params._dummy(),
"deviceMap",
"device map",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to explain a bit more about this param and what it takes

torchDtype = Param(
Params._dummy(),
"torchDtype",
"torch dtype",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here

def load_model(self):
"""
Loads model and tokenizer either from Fabric Lakehouse or the HuggingFace Hub,
depending on the 'useFabricLakehouse' param.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you name it more generically place that name here

"Use FabricLakehouse",
typeConverter=TypeConverters.toBoolean,
)
lakehousePath = Param(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be able to get rid of earlier param just check if this is None

Comment on lines 163 to 166
if self.getUseFabricLakehouse():
local_path = (
self.getLakehousePath() or f"/lakehouse/default/Files/{model_name}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch to just use cachePath and then in our docs well say this is a good place to store things


if self.getUseFabricLakehouse():
local_path = (
self.getLakehousePath() or f"/lakehouse/default/Files/{model_name}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hf_cache

def __init__(
self,
base_cache_dir="./cache",
base_url="https://mmlspark.blob.core.windows.net/huggingface/",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use

%sh
azcopy cp https://mmlspark.blob.core.windows.net/huggingface/blah /lakehouse/blah

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Youy can also put in the mardown cell a little explanation of this and how its just for a speedup otherwise it will download from the huggingface hub


def _predict_single_chat(self, prompt, model, tokenizer):
param = self.getModelParam().get_param()
chat = [{"role": "user", "content": prompt}]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the prompt is a list, then assume its of structure of "chat"


def _predict_single_chat(self, prompt, model, tokenizer):
param = self.getModelParam().get_param()
chat = [{"role": "user", "content": prompt}]
Copy link
Collaborator

@mhamilton723 mhamilton723 Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
chat = [{"role": "user", "content": prompt}]
if isinstance(prompt, list):
chat = prompt
else:
chat = [{"role": "user", "content": prompt}]

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants