- Docker
Note that the port mapping will conflict with running make test
-
Redis
docker run --name dev-redis -d -p 6379:6379 redis
-
Install go 1.11 and Python 3.7
-
Install the golang packages via dep
dep ensure
-
Install tesseract OCR framework.
-
Build and install re2 (Optional. Presidio will use
regex
instead ofpyre2
ifre2
is not installed)re2_version="2018-12-01" wget -O re2.tar.gz https://github.com/google/re2/archive/${re2_version}.tar.gz mkdir re2 tar --extract --file "re2.tar.gz" --directory "re2" --strip-components 1 cd re2 && make install
-
Install pipenv
Pipenv is a Python workflow manager, handling dependencies and environment for python packages, it is used in the Presidio's Analyzer project as the dependencies manager
pip3 install --user pipenv
brew install pipenv
Additional installation instructions: https://pipenv.readthedocs.io/en/latest/install/#installing-pipenv
-
Create virtualenv for the project & Install all requirements in the Pipfile, including dev requirements Install the Python packages for the analyzer in the
presidio-analyzer
folder, run:pipenv install --dev --sequential
-
Run all tests
pipenv run pytest
-
To run arbitrary scripts within the virtual env, start the command with
pipenv run
. For example:pipenv run flake8 analyzer --exclude "*pb2*.py"
pipenv run pylint analyzer
pipenv run pip freeze
-
Start shell:
pipenv shell
-
Run commands in the shell
pytest pylint analyzer pip freeze
Presidio leverages protobuf to create API classes and services across multiple environments. The proto files are stored on a different Github repo
Follow these steps to change Presidio's API:
-
Fork the presidio-genproto repo into
YOUR_ORG/presidio-genproto
-
Clone the repo into the
$GOPATH/src/github.com/YOUR_ORG/presidio-genproto
folder -
Make the desired changes to the .proto files in /src
-
Make sure you have protobuf installed
-
Generate the Go and Python files. Run the following commands in the
src
folder ofpresidio-genproto
:python -m grpc_tools.protoc -I . --python_out=../python --grpc_python_out=../python ./*.proto protoc -I . --go_out=plugins=grpc:../golang ./*.proto
-
Copy all the files in the
python
folder intopresidio-analyzer/analyzer
. All generated files end with*pb2.py
or*pb2_grpc.py
-
Change the constraint on
Gopkg.toml
which directs to the location ofpresidio-genproto
From:
[[constraint]]
branch = "master"
name = "github.com/Microsoft/presidio-genproto"
To:
[[constraint]]
branch = "YOUR_GENPROTO_BRANCH"
name = "github.com/YOUR_ORG/presidio-genproto"
- Update
Gopkg.lock
by callingdep ensure
ordep ensure --update github.com/YOUR_ORG/presidio-genproto
- Push all the changes (generated python files,
Gopkg.toml
andGopkg.lock
into your presidio repo
For more info, see https://grpc.io/docs/tutorials/basic/python.html
- Build the bins with
make build
- Build the base containers with
make docker-build-deps DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_DEPS_LABEL=${PRESIDIO_DEPS_LABEL}
(If you do not specify a valid, logged-in, registry a warning will echo to the standard output) - Build the the Docker image with
make docker-build DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_DEPS_LABEL=${PRESIDIO_DEPS_LABEL} PRESIDIO_LABEL=${PRESIDIO_LABEL}
- Push the Docker images with
make docker-push DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_LABEL=${PRESIDIO_LABEL}
- Run the tests with
make test
- Adding a file in go requires the
make go-format
command before running and building the service. - Run functional tests with
make test-functional
- Updating python dependencies instructions
GRPC_PORT
:3001
GRPC listen port
GRPC_PORT
:3002
GRPC listen port
WEB_PORT
:8080
HTTP listen portREDIS_URL
:localhost:6379
, Optional: Redis addressANALYZER_SVC_ADDRESS
:localhost:3001
, Analyzer addressANONYMIZER_SVC_ADDRESS
:localhost:3002
, Anonymizer address
Run locally the core services Presidio needs to operate:
docker run --rm --name test-redis --network testnetwork -d -p 6379:6379 redis
docker run --rm --name test-presidio-anonymizer --network testnetwork -d -p 3001:3001 -e GRPC_PORT=3001 mcr.microsoft.com/presidio-anonymizer:latest
docker run --rm --name test-presidio-recognizers-store --network testnetwork -d -p 3004:3004 -e GRPC_PORT=3004 -e REDIS_URL=test-redis:6379 mcr.microsoft.com/presidio-recognizers-store:latest
Naviagate to <Presidio folder>\presidio-analyzer\
Install the python packages if didn't do so yet:
pipenv install --dev --sequential
To simply run unit tests, execute:
pipenv run pytest --log-cli-level=0
If you want to experiment with analyze
requests, navigate into the analyzer
folder and start serving the analyzer service:
pipenv run python __main__.py serve --grpc-port 3000
In a new pipenv shell
window you can run analyze
requests, for example:
pipenv run python __main__.py analyze --text "John Smith drivers license is AC432223" --fields "PERSON" "US_DRIVER_LICENSE" --grpc-port 3000
-
Edit
post.lua
. Change the template name -
Run wrk
wrk -t2 -c2 -d30s -s post.lua http://<api-service-address>/api/v1/projects/<my-project>/analyze
-
If deploying from a private registry, verify that Kubernetes has access to the Docker Registry.
-
If using a Kubernetes secert to manage the registry authentication, make sure it is registered under 'presidio' namespace
Edit charts/presidio/values.yaml to:
- Setup secret name (for private registries)
- Change presidio services version
- Change default scale