Skip to content

Latest commit

 

History

History
202 lines (144 loc) · 6.86 KB

development.md

File metadata and controls

202 lines (144 loc) · 6.86 KB

Development

Setting up the environment - Golang

  1. Docker

Note that the port mapping will conflict with running make test

  1. Redis

    docker run --name dev-redis -d -p 6379:6379 redis
  2. Install go 1.11 and Python 3.7

  3. Install the golang packages via dep

    dep ensure
  4. Install tesseract OCR framework.

Setting up the environment - Python

  1. Build and install re2 (Optional. Presidio will use regex instead of pyre2 if re2 is not installed)

    re2_version="2018-12-01"
    wget -O re2.tar.gz https://github.com/google/re2/archive/${re2_version}.tar.gz
    mkdir re2 
    tar --extract --file "re2.tar.gz" --directory "re2" --strip-components 1
    cd re2 && make install
  2. Install pipenv

    Pipenv is a Python workflow manager, handling dependencies and environment for python packages, it is used in the Presidio's Analyzer project as the dependencies manager

    Using Pip3:

    pip3 install --user pipenv
    

    Homebrew

    brew install pipenv
    

    Additional installation instructions: https://pipenv.readthedocs.io/en/latest/install/#installing-pipenv

  3. Create virtualenv for the project & Install all requirements in the Pipfile, including dev requirements Install the Python packages for the analyzer in the presidio-analyzer folder, run:

    pipenv install --dev --sequential
    
  4. Run all tests

    pipenv run pytest
    
  5. To run arbitrary scripts within the virtual env, start the command with pipenv run. For example:

    1. pipenv run flake8 analyzer --exclude "*pb2*.py"
    2. pipenv run pylint analyzer
    3. pipenv run pip freeze

Alternatively, activate the virtual environment and use the commands by starting a pipenv shell:

  1. Start shell:

    pipenv shell
    
  2. Run commands in the shell

    pytest
    pylint analyzer
    pip freeze
    

Changing Presidio's API

Presidio leverages protobuf to create API classes and services across multiple environments. The proto files are stored on a different Github repo

Follow these steps to change Presidio's API:

  1. Fork the presidio-genproto repo into YOUR_ORG/presidio-genproto

  2. Clone the repo into the $GOPATH/src/github.com/YOUR_ORG/presidio-genproto folder

  3. Make the desired changes to the .proto files in /src

  4. Make sure you have protobuf installed

  5. Generate the Go and Python files. Run the following commands in the src folder of presidio-genproto:

    python -m grpc_tools.protoc -I . --python_out=../python --grpc_python_out=../python ./*.proto
    
    protoc -I . --go_out=plugins=grpc:../golang ./*.proto
  6. Copy all the files in the python folder into presidio-analyzer/analyzer. All generated files end with *pb2.py or *pb2_grpc.py

  7. Change the constraint on Gopkg.toml which directs to the location of presidio-genproto From:

[[constraint]]
  branch = "master"
  name = "github.com/Microsoft/presidio-genproto"

To:

[[constraint]]
  branch = "YOUR_GENPROTO_BRANCH"
  name = "github.com/YOUR_ORG/presidio-genproto"
  1. Update Gopkg.lock by calling dep ensure or dep ensure --update github.com/YOUR_ORG/presidio-genproto
  2. Push all the changes (generated python files, Gopkg.toml and Gopkg.lock into your presidio repo

For more info, see https://grpc.io/docs/tutorials/basic/python.html

Development notes

  • Build the bins with make build
  • Build the base containers with make docker-build-deps DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_DEPS_LABEL=${PRESIDIO_DEPS_LABEL} (If you do not specify a valid, logged-in, registry a warning will echo to the standard output)
  • Build the the Docker image with make docker-build DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_DEPS_LABEL=${PRESIDIO_DEPS_LABEL} PRESIDIO_LABEL=${PRESIDIO_LABEL}
  • Push the Docker images with make docker-push DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_LABEL=${PRESIDIO_LABEL}
  • Run the tests with make test
  • Adding a file in go requires the make go-format command before running and building the service.
  • Run functional tests with make test-functional
  • Updating python dependencies instructions

Set the following environment variables

presidio-analyzer

  • GRPC_PORT: 3001 GRPC listen port

presidio-anonymizer

  • GRPC_PORT: 3002 GRPC listen port

presidio-api

  • WEB_PORT: 8080 HTTP listen port
  • REDIS_URL: localhost:6379, Optional: Redis address
  • ANALYZER_SVC_ADDRESS: localhost:3001, Analyzer address
  • ANONYMIZER_SVC_ADDRESS: localhost:3002, Anonymizer address

Developing only for Presidio Analyzer under Windows environment

Run locally the core services Presidio needs to operate:

docker run --rm --name test-redis --network testnetwork -d -p 6379:6379 redis
docker run --rm --name test-presidio-anonymizer --network testnetwork -d -p 3001:3001 -e GRPC_PORT=3001 mcr.microsoft.com/presidio-anonymizer:latest
docker run --rm --name test-presidio-recognizers-store --network testnetwork -d -p 3004:3004 -e GRPC_PORT=3004 -e REDIS_URL=test-redis:6379 mcr.microsoft.com/presidio-recognizers-store:latest

Naviagate to <Presidio folder>\presidio-analyzer\

Install the python packages if didn't do so yet:

pipenv install --dev --sequential

To simply run unit tests, execute:

pipenv run pytest --log-cli-level=0

If you want to experiment with analyze requests, navigate into the analyzer folder and start serving the analyzer service:

pipenv run python __main__.py serve --grpc-port 3000

In a new pipenv shell window you can run analyze requests, for example:

pipenv run python __main__.py analyze --text "John Smith drivers license is AC432223" --fields "PERSON" "US_DRIVER_LICENSE" --grpc-port 3000

Load test

  1. Edit post.lua. Change the template name

  2. Run wrk

    wrk -t2 -c2 -d30s -s post.lua http://<api-service-address>/api/v1/projects/<my-project>/analyze

Running in kubernetes

  1. If deploying from a private registry, verify that Kubernetes has access to the Docker Registry.

  2. If using a Kubernetes secert to manage the registry authentication, make sure it is registered under 'presidio' namespace

Further configuration

Edit charts/presidio/values.yaml to:

  • Setup secret name (for private registries)
  • Change presidio services version
  • Change default scale