-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion) Adding vertexAI ingestion source (v1 - model group and model) #12632
Open
ryota-cloud
wants to merge
67
commits into
datahub-project:master
Choose a base branch
from
ryota-cloud:vertex_src
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
45ce05e
feat(ingestion) Adding vertexAI ingestion source
ryota-cloud 9a1355d
lintfix
ryota-cloud 04315d4
minor comment change
ryota-cloud e3a17b5
minor
ryota-cloud 2a5ea58
minor change in unit test
ryota-cloud 3739c20
Adding sources and documents
ryota-cloud 520eda6
delete unnecessary file
ryota-cloud c320a6c
fetch list of training jobs
ryota-cloud bc9e451
adding comments
ryota-cloud 960129b
feat(ingest): add vertex AI sample data ingestion
ryota-cloud 95712f5
Update vertexai.py
ryota-cloud 78d184b
added endopint workunit creation and refactored
ryota-cloud d746a4c
commit temporarily
ryota-cloud 5fbe0e5
lintfix
ryota-cloud 9f8e8a3
removing unnecesary commits
ryota-cloud 85d1830
cleanup recipe
ryota-cloud aae6893
minor change in config
ryota-cloud 764f8fd
fixing dataset
ryota-cloud 29ddcff
adding comments for dataset
ryota-cloud 437e7d2
minor fix
ryota-cloud a2a1f0a
adding vertex to dev requirements in setup.py
ryota-cloud bf869da
minor fix
ryota-cloud c1f24b7
caching dataset list acquisitions
ryota-cloud 453688d
review comment on dataset
ryota-cloud be03cf5
minor chagne
ryota-cloud 8c76435
change name
ryota-cloud 33a19c9
lint fix
ryota-cloud b76ec25
Refactor code to use auto_workunit
ryota-cloud c7d5165
flattern make_vertexai_name
ryota-cloud 482c159
lint type error is fixed
ryota-cloud 1032630
adding credentail config
ryota-cloud 616b76a
refactor and changed GCP credential to pass project_id
ryota-cloud 1dcfce1
Adding more unit test case coverage, fixed lint and test case
ryota-cloud f16c8f5
fix platform name
ryota-cloud 1de43a0
fixed _get_data_process_input_workunit test case
ryota-cloud ea577cb
Adding subtype and container to dataset and training job
ryota-cloud 46ff526
fix UI issue on timestamp and refactor
ryota-cloud 9b6c01e
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud 7b0fb70
removed token
ryota-cloud cf9c242
Adding integration test for VertexAI
ryota-cloud 398c380
Adding unit test cases
ryota-cloud 4703cd9
increasing unit test coverage
ryota-cloud 63e8e8e
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud ba26abb
adding more unit tests
ryota-cloud 3a85d8a
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud 84ebae0
fixed review comments
ryota-cloud 0b6b7db
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud 5472929
fixed review comments, adding unit test cases
ryota-cloud 0eeeb72
minor change
ryota-cloud 6c43ecc
Change BigQueryCredentail to common function: GCPCredential
ryota-cloud d381b9e
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud 1f64a95
fixed one unit test case failure, and naming chagne
ryota-cloud b559286
Added Enum and refactoring
ryota-cloud 4edd575
add comment
ryota-cloud 5765025
fixed review comments
ryota-cloud 4b09365
delete test case using real model
ryota-cloud eb261c3
delete commented out code
ryota-cloud e6feb8a
consolidate use of auto_workunit and change func output to mcps
ryota-cloud a8d7980
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud b31d0f6
fix comment
ryota-cloud 99269aa
Add POJO for model and change logic of model extraction and mcps crea…
ryota-cloud a517173
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud f900f6d
use datetime_to_ts_millis helper
ryota-cloud 5c46c59
refactored unit test case for better assertion
ryota-cloud 1772b7e
Modified integration test to cover relationship between job to datase…
ryota-cloud 8e40b7c
fix import error in test case
ryota-cloud 2a91e6d
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
Ingesting metadata from VertexAI requires using the **Vertex AI** module. | ||
|
||
#### Prerequisites | ||
Please refer to the [Vertex AI documentation](https://cloud.google.com/vertex-ai/docs) for basic information on Vertex AI. | ||
|
||
#### Credentials to access to GCP | ||
Please read the section to understand how to set up application default Credentials to GCP [GCP docs](https://cloud.google.com/docs/authentication/provide-credentials-adc#how-to). | ||
|
||
#### Create a service account and assign roles | ||
|
||
1. Setup a ServiceAccount as per [GCP docs](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-console) and assign the previously created role to this service account. | ||
2. Download a service account JSON keyfile. | ||
- Example credential file: | ||
|
||
```json | ||
{ | ||
"type": "service_account", | ||
"project_id": "project-id-1234567", | ||
"private_key_id": "d0121d0000882411234e11166c6aaa23ed5d74e0", | ||
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIyourkey\n-----END PRIVATE KEY-----", | ||
"client_email": "test@suppproject-id-1234567.iam.gserviceaccount.com", | ||
"client_id": "113545814931671546333", | ||
"auth_uri": "https://accounts.google.com/o/oauth2/auth", | ||
"token_uri": "https://oauth2.googleapis.com/token", | ||
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", | ||
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test%suppproject-id-1234567.iam.gserviceaccount.com" | ||
} | ||
``` | ||
|
||
3. To provide credentials to the source, you can either: | ||
|
||
- Set an environment variable: | ||
|
||
```sh | ||
$ export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json" | ||
``` | ||
|
||
_or_ | ||
|
||
- Set credential config in your source based on the credential json file. For example: | ||
|
||
```yml | ||
credential: | ||
private_key_id: "d0121d0000882411234e11166c6aaa23ed5d74e0" | ||
private_key: "-----BEGIN PRIVATE KEY-----\nMIIyourkey\n-----END PRIVATE KEY-----\n" | ||
client_email: "test@suppproject-id-1234567.iam.gserviceaccount.com" | ||
client_id: "123456678890" | ||
``` |
16 changes: 16 additions & 0 deletions
16
metadata-ingestion/docs/sources/vertexai/vertexai_recipe.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,16 @@ | ||||||
source: | ||||||
type: vertexai | ||||||
config: | ||||||
project_id: "acryl-poc" | ||||||
region: "us-west2" | ||||||
ryota-cloud marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
# Note that GOOGLE_APPLICATION_CREDENTIALS or credential section below is required for authentication. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
# credential: | ||||||
# private_key: '-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n' | ||||||
# private_key_id: "project_key_id" | ||||||
# client_email: "client_email" | ||||||
# client_id: "client_id" | ||||||
|
||||||
sink: | ||||||
type: "datahub-rest" | ||||||
config: | ||||||
server: "http://localhost:8080" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
53 changes: 53 additions & 0 deletions
53
metadata-ingestion/src/datahub/ingestion/source/common/gcp_credentials_config.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
import json | ||
import tempfile | ||
from typing import Any, Dict, Optional | ||
|
||
from pydantic import Field, root_validator | ||
|
||
from datahub.configuration import ConfigModel | ||
from datahub.configuration.validate_multiline_string import pydantic_multiline_string | ||
|
||
|
||
class GCPCredential(ConfigModel): | ||
project_id: Optional[str] = Field(description="Project id to set the credentials") | ||
private_key_id: str = Field(description="Private key id") | ||
private_key: str = Field( | ||
description="Private key in a form of '-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n'" | ||
) | ||
client_email: str = Field(description="Client email") | ||
client_id: str = Field(description="Client Id") | ||
auth_uri: str = Field( | ||
default="https://accounts.google.com/o/oauth2/auth", | ||
description="Authentication uri", | ||
) | ||
token_uri: str = Field( | ||
default="https://oauth2.googleapis.com/token", description="Token uri" | ||
) | ||
auth_provider_x509_cert_url: str = Field( | ||
default="https://www.googleapis.com/oauth2/v1/certs", | ||
description="Auth provider x509 certificate url", | ||
) | ||
type: str = Field(default="service_account", description="Authentication type") | ||
client_x509_cert_url: Optional[str] = Field( | ||
default=None, | ||
description="If not set it will be default to https://www.googleapis.com/robot/v1/metadata/x509/client_email", | ||
) | ||
|
||
_fix_private_key_newlines = pydantic_multiline_string("private_key") | ||
|
||
@root_validator(skip_on_failure=True) | ||
def validate_config(cls, values: Dict[str, Any]) -> Dict[str, Any]: | ||
if values.get("client_x509_cert_url") is None: | ||
values["client_x509_cert_url"] = ( | ||
f"https://www.googleapis.com/robot/v1/metadata/x509/{values['client_email']}" | ||
) | ||
return values | ||
|
||
def create_credential_temp_file(self, project_id: Optional[str] = None) -> str: | ||
configs = self.dict() | ||
if project_id: | ||
configs["project_id"] = project_id | ||
with tempfile.NamedTemporaryFile(delete=False) as fp: | ||
cred_json = json.dumps(configs, indent=4, separators=(",", ": ")) | ||
fp.write(cred_json.encode()) | ||
return fp.name |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be nested under list item 2 - not dedented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, probably big query doc also need to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not fixed; this code block should be indented, similar to how the code blocks below are indented