-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion) Adding vertexAI ingestion source (v1 - model group and model) #12632
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
a204827
to
98aa10a
Compare
8e36548
to
51a13d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every entity needs container aspects
I tried to separate metadata extraction and workunit generation like this. @hsheth2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a few additional comments
}, | ||
{ | ||
"entityType": "mlModelGroup", | ||
"entityUrn": "urn:li:mlModelGroup:(urn:li:dataPlatform:vertexai,test-project-id.model.mock_prediction_model_2,PROD)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pretty critical part of the connector - so we should figure out how to mock and test it, even if its tricky
2. Download a service account JSON keyfile. | ||
Example credential file: | ||
|
||
```json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config: | ||
project_id: "acryl-poc" | ||
region: "us-west2" | ||
# Note that GOOGLE_APPLICATION_CREDENTIALS or credential section below is required for authentication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Note that GOOGLE_APPLICATION_CREDENTIALS or credential section below is required for authentication. | |
# You must either set GOOGLE_APPLICATION_CREDENTIALS or provide credential as shown below. |
super().__init__(**data) | ||
|
||
if self.credential: | ||
self._credentials_path = self.credential.create_credential_temp_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we actually need to create a credentials file?
) | ||
return job_meta | ||
|
||
def _gen_endpoint_mcps( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there could be multiple endpoints right?
def _gen_endpoint_mcps( | |
def _gen_endpoints_mcps( |
job_meta.output_model = model | ||
job_meta.output_model_version = model_version | ||
except GoogleAPICallError: | ||
logger.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use structured error reporting https://acryldata.notion.site/Error-reporting-in-ingestion-5343cc6ea0c84633b38070d1a409c569?pvs=74
if func_to_mock == "google.cloud.aiplatform.Model.list": | ||
mock.return_value = gen_mock_models() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these probably should not be in the for loop
instead, do something like this
exit_stack.enter_context(patch("google.cloud.aiplatform.Model.list")).return_value = gen_mock_models()
mock.return_value = [mock_automl_job] | ||
elif ( | ||
func_to_mock | ||
== "datahub.ingestion.source.vertexai.VertexAISource._get_training_job_metadata" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it makes sense to mock this method - we should be mocking whatever it fetches from vertex ai
mock_endpoint.description = "test endpoint" | ||
mock_endpoint.create_time = datetime.now() | ||
mock_endpoint.display_name = "test endpoint display name" | ||
return mock_endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this stuff copy-pasted from the other test file? if so, we should probably pull it into a vertex_ai_mocks.py that both import
# Run _gen_ml_model_mcps | ||
mcp = [mcp for mcp in source._gen_ml_model_mcps(model_meta)] | ||
assert len(mcp) == 1 | ||
assert hasattr(mcp[0], "aspect") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these unit tests need some work
- removing hasattr calls
- the
for mcp in mcps: if ...
flow does not work. I left a comment about this earlier as well
Checklist