This repository was archived by the owner on Jan 29, 2026. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 187
adding archive note #201
Open
jamaya2001
wants to merge
88
commits into
IBM:master
Choose a base branch
from
jamaya2001:patch-1
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
adding archive note #201
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[don't merge yet] Fix package names and travis config
Update instructions with Cloud Object Storage
* arch diag * arch diag * Update README.md * adding specs * adding specs
* update none VM_TYPE * polish export commands
* Extent make test-submit waiting time
…BM#22) * assign default edit role to lcm * add helm value options for 1.7 and below
* adding prereqs and bumping user guide in front * adding prereqs and bumping user guide in front
…ors) (IBM#24) * add caffe2 and pytorch cpu support * update LCM, learner config file, and example jobs * fix pytorch example bug * Update gpu-guide.md * Update gpu-guide.md * merge CPU and GPU examples into a single example * add more tf framework versions * fix typo * add S3 prereq
* update UI instructions * fix command
* adding contributors * Update README.md
* Updating maintainers file * Update MAINTAINERS.md
* add converting script * update converter readme and update tensorflow version * update troubleshooting * Update README.md * Update gpu-guide.md * Update README.md * Update README.md * Update README.md
* Adding references to Watson Studio * Update README.md * Rename README.md to ffdl-wml.md * Update README.md * Create train-deploy-wml.md * Update train-deploy-wml.md * Update README.md * Update ffdl-wml.md * Update ffdl-wml.md * Update ffdl-wml.md * Update README.md * Update README.md * update WML instructions * revert tf example * update caffe manifest
* Update feature-gates for k8s 1.9.4 and above * Update troubleshooting * Update README.md
…ild (IBM#51) * * Add codebase configuration for device plugin and custom learner images * Add developer guide for those who want to do a custom FfDL build * update developer-guide * fix declare type
Plus minor fixes
* Creating CLA * Update CLA.md * Update CONTRIBUTING.md
* Remove 4 minute timeout for log follow process (IBM#106) The process that follows the training logs of an ongoing training job should not timeout after 4 minutes. Instead the log follow process should complete after the training job itself is finished. This behavior is necessary to enable chaining up commands to create machine learning pipelines, where subsequent commands require the output data of the training job whose logs are being "followed" like in our ART notebook. This commit reinstates the log follow behavior prior merge of PR IBM#79 * Updates suggested by sboagibm Intention was to not rely on a long term stream being held open, but to be able to re-open a new stream starting from where the old left off, if the connection terminates.
* Update ART Notebook after PR IBM#79 - Load cluster configuration from environment variables - Require PUBLIC_IP and KUBECONFIG instead of CLUSTER_NAME and VM_TYPE - Use storage type "mount_cos" (s3fs) instead of "s3_datastore" * Update ART demo notebook after PR IBM#79 - Load cluster configuration from environment variables - Require PUBLIC_IP and KUBECONFIG instead of CLUSTER_NAME and VM_TYPE - Use storage type "mount_cos" (s3fs) instead of "s3_datastore"
* update dl framework versions * update examples with new framework tags
* update fashion mnist example with seldon 0.2 * fix readme
* Pointed travis testing to do hostmount minikube * Debugging permissions error. * Fix to mkdir problems. * Fixed Makefile syntax. * Printing debugging information about pods. * Printing debugging information about pods. * Printing debugging information about pods. * Printing debugging information incl kubectl get pod. * Enabled debug mode. * Again. * Set debug as default. * tracing from the trainer to lcm * more debugging * added lower level logging * dist: xenial * Update .travis.yml * fix typo * Trying to fix Travis issue. * Fixed Travis issue. * Followed Tommy's request and increased resource limits to values from before. Might break CI. * Parameterized memory values like Tommy requested. * Attempt to fix CI. * Removed excessive debug statements and cleaned comments. Probably breaks code. * DLaaS pull june 14, with security mods * fixed glide problem * Added Image.go etc. files, deleted learner_test.go * temporarily disable framework validation * FIXME: Disable validation check for bucket until conditionalize for s3fs vs. option. * fixed two bugs related to volume mounting * I think mostly just logging changes * basic success * Add FfDL.iml to .gitignore * removed docker ref to csf_env.properties * Test for mount_cos before attempting s3 validation * fixed hostmount by pre-setup of model code in Makefile * fixed missing import * log HELM_DEPLOY_DIR, add a bunch of logging for the ci test * Added create-volumes to jenkins file, more verbose docker build for ui * Wound back Angular to 6.0.8 * Quiet docker-build-ui docker build * merged bin/create_static_volumes_config2.sh into bin/create_static_volumes_config.sh
* update prebuild image version, update helm chart to 0.1.1 * fix make deploy bug
…ce (IBM#110) * make helm charts and scripts compatible to deploy FfDL on any namespace * allow users to export all the enviornment variables in a txt file * Update readme with new notice * Fix typo * Update static volumes config v2 namespace parameter * capitalize NAMESPACE, update Makefile, developer guide, and trobleshooting.
LGTM. Ran fine / fixed statsd issue on Ubuntu 18.04 Vagrant VM.
* Simplifying README * Simplifying README * Create detailed-installation-instructions.md * Update README.md * Update detailed-installation-instructions.md * Update and rename detailed-installation-instructions.md to detailed-installation-guide.md * Update detailed-installation-guide.md * Update detailed-installation-guide.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md
* added pytorch distributed example draft * use pytorch community image * added experimental pytorch example * update launch example * add onnx example * update c10d dist example * update cuda method for gpu and group definition for c10d * update c10d to based on SEP 11 build on PyTorch master. * update custom pytorch version name * update data parallelism examples * update distributed training examples * update pytorch c10d examples * delete dummy file * fix minor bugs * update example to sep 25 build * update readme * update changes for distributed CPU job * update multi-gpu code * update multi-gpu code * update world_size for multi-gpu senario * update world_size for multi-gpu senario * add seldon ngraph example, update data parallelism with multi gpu * remove unnecessary device code * added pytorch mpi core changes and seldon readme * added pytorch mpi core changes and seldon readme * update readme * update example readme and remove old distributed example * update c10d-paralleism example naming and readme. * update readme with better naming and consistancy
* Create PyTorch.md * Update PyTorch.md * Update PyTorch.md * Update PyTorch.md * Update PyTorch.md * arch-image * Update PyTorch.md * Adding temporary linkage to PyTorch 1.0 * Fix broken link for Seldon example * Correcting Horovod naming
* Travis CI: lint Python for syntax errors and undefined names In Travis CI, add a Python linting step that runs [flake8](http://flake8.pycqa.org) to find syntax errors and undefined names. [flake8](http://flake8.pycqa.org) testing of https://github.com/IBM/FfDL on Python 3.7.0 $ __flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__ ``` ./etc/examples/c10d-onnx-mpi/model-files/train_dist_onnx_mpi.py:93:55: F821 undefined name 'bsz' num_batches = ceil(len(train_set.dataset) / float(bsz)) ^ ./etc/examples/c10d-dist-onnx/model-files/train_dist_onnx.py:121:14: E999 SyntaxError: positional argument follows keyword argument ', epoch ', epoch, '. avg_loss: ', ^ 1 E999 SyntaxError: positional argument follows keyword argument 1 F821 undefined name 'bsz' 2 ``` __E901,E999,F821,F822,F823__ are the "_showstopper_" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. Most other flake8 issues are merely "style violations" -- useful for readability but they do not effect runtime safety. * F821: undefined name `name` * F822: undefined name `name` in `__all__` * F823: local variable name referenced before assignment * E901: SyntaxError or IndentationError * E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree * Undefined name: bsz --> batch_size * Fix syntax error: print() does not accept an 'rank=' parameter
* update converter bx commands to ibmcloud * update converter bx commands to ibmcloud
* Updated setup script to K8s 1.13. * Modified Makefile
*Total -- 4,710.69kb -> 3,241.39kb (31.19%) /dashboard/src/assets/img/ffdl-blue.png -- 6.10kb -> 3.07kb (49.66%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/ffdl-blue.png -- 6.10kb -> 3.07kb (49.66%) /dashboard/src/assets/img/ffdl-text.png -- 10.97kb -> 5.73kb (47.73%) /dashboard/src/assets/img/ffdl.png -- 6.04kb -> 3.31kb (45.14%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/p1.png -- 488.27kb -> 275.07kb (43.67%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/ui-example.png -- 80.93kb -> 47.47kb (41.35%) /docs/images/ui-example.png -- 80.93kb -> 47.47kb (41.35%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/ffdl-fashion.png -- 301.63kb -> 177.72kb (41.08%) /docs/images/ffdl-architecture.png -- 529.86kb -> 314.10kb (40.72%) /docs/images/ffdl-pattern-arch.png -- 413.11kb -> 246.18kb (40.41%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/fashion-arch.png -- 444.84kb -> 266.69kb (40.05%) /docs/images/horovod.png -- 73.13kb -> 44.93kb (38.56%) /demos/fashion-mnist-adversarial/images/ffdl-art-jupyter.png -- 258.40kb -> 160.20kb (38%) /community/FfDL-H2Oai/images/ffdl-h203.png -- 246.33kb -> 171.45kb (30.4%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/images/1.jpg -- 308.56kb -> 221.44kb (28.23%) /etc/examples/images/pytorch-ffdl-onnx.png -- 277.25kb -> 199.65kb (27.99%) /demos/fashion-mnist-training/sample-test-data/sneaker.jpg -- 29.37kb -> 24.93kb (15.13%) /demos/fashion-mnist-training/sample-test-data/trouser.jpg -- 26.84kb -> 22.79kb (15.08%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/images/5.jpg -- 42.41kb -> 36.41kb (14.15%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/images/4.jpg -- 29.22kb -> 25.33kb (13.3%) /demos/fashion-mnist-training/sample-test-data/sandal3.jpg -- 40.43kb -> 35.15kb (13.04%) /docs/images/ffdl-arch-web.png -- 191.11kb -> 166.52kb (12.87%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/images/7.jpg -- 34.31kb -> 30.11kb (12.25%) /demos/fashion-mnist-adversarial/images/adv_sample_predictions.png -- 142.53kb -> 125.72kb (11.79%) /demos/fashion-mnist-training/sample-test-data/coatwhite.jpg -- 34.56kb -> 30.51kb (11.73%) /demos/fashion-mnist-training/sample-test-data/sneakerbrown.jpg -- 46.75kb -> 41.43kb (11.37%) /demos/fashion-mnist-training/sample-test-data/dress2.jpg -- 46.95kb -> 41.65kb (11.3%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/images/0.jpg -- 37.63kb -> 33.53kb (10.9%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/images/8.jpg -- 81.46kb -> 72.65kb (10.82%) /demos/fashion-mnist-training/sample-test-data/trouser2.jpg -- 29.28kb -> 26.11kb (10.82%) /demos/fashion-mnist-training/sample-test-data/redtshirt.jpg -- 34.92kb -> 31.46kb (9.91%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/images/3.png -- 2.68kb -> 2.44kb (8.98%) /demos/fashion-mnist-training/sample-test-data/boot2.jpg -- 45.36kb -> 41.31kb (8.93%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/images/9.jpg -- 76.05kb -> 69.67kb (8.39%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/images/6.jpg -- 46.66kb -> 42.80kb (8.28%) /demos/fashion-mnist-adversarial/images/ffdl.png -- 15.40kb -> 14.30kb (7.18%) /demos/fashion-mnist-training/fashion-mnist-webapp/static/img/codait-logo.jpg -- 144.32kb -> 139.03kb (3.67%)
Adding name and year per when the file was added. Signed-off-by: Sahdev Zala <spzala@us.ibm.com> Signed-off-by: Sahdev Zala <spzala@us.ibm.com>
* refactor helm charts * first draft of the new Travis CI script * first draft of the new Travis CI script * add helm init for CI * enhance CI script * fix typo in script * package helm chart and move to docs repo for helm chart hosting preparation * condense the 4 helm charts into 3 * update detailed installation guide * update developer guide * update helm chart naming
Pattern is being retired from the IBM Developer site, so adding archive note to the GH repo.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pattern is being retired from the IBM Developer site, so adding archive note to the GH repo.
@Tomcli Can you approve/merge?
Developer's Certificate of Origin 1.1