Enabling Autonomous Cloud Native infra as a code powered by Python by Vijayakumar Ravindran

As our software development Eco system move towards a cloud native enabled approach, it's a must to have on-demand infrastructure provisioning and availability to support the platforms. The applications need to have main characteristics like failure resistant, decentralized, evolutionary, built for business as well as integrating with multiple components. This Challenge brought to me come up with a innovative approach to design an autonomous software controller system built via micro services. I started developing a container based approach on the automation controller using Ansible technology which fully uses Python Modules. Backed via complete configuration enabled via python modules, such as pyVmomi and community modules I had a success of configuring virtual infrastructure and application deployment in an autonomous fashion. I have used the Open Source Tools ecosystem for enabling the critical infrastructure pieces. The micro services used for Identity & Access management and TSL keys are FreeIPA based tools. The micro service components include FileRepo, ELK stack including Fluentd for logging, Prometheus monitoring along with Grafana for metric and analytic dashboards. Every component got deployed in the form of containers. I call OpenAPI service to call the APIs for the authentication and data services. This approach initially helped me to come up with speed and agility on the provisioning side and enabling CI / CD for few of projects. I included a student who has physically challenge for reading topics and used audio books for learning container, vmware technologies and he came up with writing YAML Ansible playbooks and updating python modules with his hard work. Finally, I want to share this success story of using Open Source and Python modules how any infrastructure or services can be automated in a innovative approach on a low cost model.


Cloud Containers Ansible

Speaker bio
Moving a machine learning app from notebook to production by Matas Šeimys

Oxipit experience moving machine learning models for medical image analysis into production: adding centralized logging, setting up continuous integration pipelines, splitting and refactoring code, bundling and distributing obfuscated Docker images, deployment using Salt.


Python Machine learning DevOps

Speaker bio
Pug Training by Christopher Lozinski

Pug (was Jade) is an interesting HTML templating language Like Python and Coffeescript it uses indentation to define structure. It can generate static HTML, Chameleon Page Templates, or Javascript. I find it so much simpler than hacking HTML. In this Pug class, we will learn the basics and try out all 3 approaches. The focus will be on using it to create bootstrap menus, first rendering them on the server using Chameleon Page templates, and then rendering them on the client, using javascript. The idea is to make you comfortable with the technology, so that you can explore more on your own. Basic HTML, Javascript and Python experience is required. Here is a demo. https://demo.forestwiki.com/HtmlContentTypes/Javascript-Folder/Pug/acedemo.


Pug Tools

Speaker bio
Anvil - Build a full-stack web app with nothing but Python by Shaun Taylor-Morgan

Anvil is a platform for rapidly developing web apps using nothing but Python. It's like Visual Basic for the web. It has a drag-and-drop editor for constructing a user interface, controlled by Python code in both the client and a hosted server environment. This is an interactive workshop where you'll be guided through creating an app. Either:

  • A TODO list - a simple example of a Create, Read, Update, Delete (CRUD) app
  • A weather data dashboard
  • An app that controls a remote machine from the web
  • Any other idea you might have!
You’ll need a laptop to follow along. For a preview, have a look at this short video https://anvil.works/#about


Speaker bio
Container based architecture for Django applications by Ibukun Oluwayomi

Have you heard about Docker? Do you think its too complicated? Are you intimidated by the whole concept of containerization? I answered yes to these questions less than a year ago. Now I cannot start a new project without docker. I have tamed the beast and I am now able to take advantage of all of its benefits. You will never say the phrase "But It works on my computer" ever again. You will be able to deploy in multiple regions of your delivery pipeline the exact same way, You will be able to completely isolate your applications and associated stacks from each other. Finally, if you use platform as a service providers like Heroku, Azure and AWS, you will be able to mitigate vendor lock in with this architecture.


Containers Django Docker

Speaker bio
How to use machine learning for better ranking results? The curious case of HomeToGo. by Gediminas Žylius

In this talk we will: 1) shortly review state of the art ranking strategies used for information retrieval, advertisement and e-commerce; 2) reveal challenges we face at HomeToGo when applying advanced methods in ranking with ML (biases, imbalance, non-stationarity of data, training target formulation and etc.) 3) review ML ranking pipeline at HomeToGo: 3.1) data pre-processing (how we use PySpark on AWS EMR and do data preprocessing and feature engineering + Python example) 3.2) model training (how we use XGBoost on AWS Sagemaker + Python example) 3.2) model deployment (how to export model in PMML and deploy with JPMML, TreeLite, raw XGBoost API + Python example)


Speaker bio
Package and Dependency Management with Poetry by Steph Samson

Managing a Python project can be overwhelming -- one can need anything from just a requirements.txt file to an array of other files: setup.py, setup.cfg, MANIFEST.in. The question for the new or unfamiliar developer can become, when to use what and why?


Poetry, a new packaging and dependency management tool, was built by the author as "the one tool to manage [my] Python projects from start to finish". This is accomplished with a single configuration file, the PEP-518-recommended pyproject.toml.


In this talk, I will present how one can use Poetry for their new and existing projects.


Poetry Tools Packaging

Speaker bio
Challenges that everyone struggles while productionizing Apache Spark workloads by Chetan Khatri

Spark is a good tool for processing large amounts of data, but there are many pitfalls to avoid in order to build large scale systems in production, This talk will help you to understand kind of challenges you get, when you productionize Spark for TB’s of Data. Talk will guide you through possible practical use cases with best practice solution for Fast Data processing.


Speaker bio
PyOdide/Iodide Tutorial by Christopher Lozinski

Iodide/Pyodide, offered by Mozilla, is like Jupyter Notebooks, but the cPython kernel runs in the browser, compiled to Web Assembly. All that is needed is a file server, and a web browser, no compute server is required. This is particularly good for schools, so that the servers do not get overloaded by thousands of students. They already have a rich collection of Python libraries compiled to Web Assembly. 32 as of last week, certainly all the ones which I need. We will start with the Pyodide tutorial. It is the last one, https://demo.forestwiki.com/Iodide-notebooks And then make some small modifications.. The goal is to get students comfortable with the basics, so that they can explore more on their own.


Speaker bio
Practical NLP by Šarūnas Navickas

Machine learning is very interesting field. It attracts wast amounts of talents working on interesting problems. However, this is mostly endless prototyping phase. What if we want to actually use what is already created? This talk is oriented at showing what unique (or not so unique) challenges appears when trying to use ML in real world scenario, and what ways it can be dealt with. This talk give you a brief introduction to what is NLP (Natural Language Processing) and why it is an interesting problem On higher level it is meant to showcase current problems, propose quick solutions how to avoid them, but in general show what is wrong with whole community and how it can be fixed.


Machine Learning NLP

Speaker bio
The Power of Dict by Jurgis Pralgauskis

Overview of how dicts penetrate Python code + tips&tricks:

  • - Dicts are quite better than arrays/lists, because they relate "human readable" meaning with "computer readable" (and any graph!)
  • - simple dict instantiation: {}, dict(), (ps.: key needs __hash__);
  • - simple syntax sugar: get, setdefault;
  • - internal usage: locals(), globals() (vars() and dir()), __dict__;
  • - passing (context) around trick: **kwargs;
  •  
  • - from Collections import defaultdict, Counter, OrderedDict;
  • - serializing: json (and how to make it ordered before 3.6), (ps., pprint, ast.literal_eval);
  •  
  • - advanced dict instantiation: from (zipped) tuples, dict comprehension;
  • - dict merging/chaining;
  • - use dot notation to access items;
  • - tool library: pydash (pluck, pick, map_values[_deep], merge);


Dictionaries Pollections Pydash

Speaker bio
Using Python automated text processing to study group behavior in space analog facilities by Inga Popovaite

NASA plans to go back to the Moon in five years. And a first human may step on Mars in a couple of decades. But to get there, we need a well functioning crew who can work and live together in claustrophobic quarters for a very long time. Gender is an important variable that shapes crew interaction – on one hand mixed gender groups tend to perform better on various tasks, but on the other hand gender differences lead to additional stress, especially for women. In a perfect world, researchers interested in people in space would study people in space. But in reality these data are very hard to get, and scientists conduct studies in space analogs instead. Space analogs are Earth-based facilities that resemble conditions of a space flight or space colony. Mars Desert Research Station (MDRS) in Utah desert is one of them. It replicates a habitat on Mars that could be built using current technology, and it can accommodate a crew of six people at a time. MDRS crews rotate every 2-3 weeks, and more than 200 crews have participated in a simulation over the course of 17 field seasons. All crews are required to document their stay. In this project I use simulated EVA (extravehicular activity) logs from all previous crews to investigate co-working patterns. I use Python's NLTK to process thousands of EVA logs to see who was working with whom on each simulated space walk. Then I use these data to construct co-working networks and to calculate who is the most central person in each crew (who worked with the most people on the most EVAs). Later, I use logit regression models to see whether men are more likely to be the most central accounting for their official role and other socio-demographic characteristics. My preliminary results from 29 randomly selected and hand-coded crews show that men are 2.85 times more likely to be the most central in a crew than women, accounting for their official crew roles.


social science nltk data science r

Speaker bio
Can python help save lifes? by Migle Gabrielaite

While python is known to be used for web applications, machine learning and even software development, it is also one of the most important programming languages in bioinformatics - a subject where biology and programming meets. Snakemake, which is python based workflow management system, is irreplaceable in health science where big amounts of data have to be analyzed fast and precisely. I will try to show python use in bioinformatics and health sciences. Most importantly, I will try to answer the question if python is truly capable of saving lifes.


Speaker bio
Interactive Data Visualization Web Apps with no JavaScript: what you can, should and probably shouldn't do with plotly/Dash. by Dom Weldon

Your data science or machine learning project probably won't just produce a written report. Instead, projects are increasingly expected to produce interactive tools to allow end-users to explore data and results with rich, interactive visualizations. Inevitably, this will be done in a web browser, meaning you'll need to add a quantitatively trained web developer to your team, or have your data scientists spend time learning HTML, Javascript and CSS. Dash, a project by the team that makes Plotly, solves some of these problems by allowing data scientists to build rich and interactive websites in pure python, with minimal knowledge of HTML and absolutely no Javascript. At decisionLab, a London-based data science consultancy producing decision tools, we've embraced Dash to produce proof-of-concept models for our projects in alpha. Along the way, we've learned many lessons and best practises we'd like to share. This talk will give an overview of Dash, how it works and what it can be used for, before outlining some of the common problems that emerge when data scientists are let loose to produce web applications, and web developers have to work with the pydata ecosystem, and discussing effective working practises to start producing cool interactive statistical web applications, fast. We'll also identify some of the pitfalls of Dash, and how and when to make the decision to stop using Dash and start building a proper web application.


Speaker bio
Pitfalls of emotion detection from video in production by Justin Shenk

Deep learning provides many opportunities for businesses to easily scale technology which would have otherwise required thousands of hours of labelling. Using the FER2013 dataset, emotion detection was developed with Peltarion's deep neural network model builder (Deep Emotion, https://github.com/justinshenk/deepemotion). It was implemented as an API with both Keras and the Peltarion API. Some of the challenges in developing this and putting it into production are discussed.


Speaker bio
Data pipelines. Slice and dice your data. by Oleg Shydlouski

Right now there are so many ways to process your data. But for some reasons when it comes to parallel or distributed calculations it always hard, and it always headache. I would like to explain why data pipelines could be an elegant, simple and powerful way of data processing. In my talk we will dive under the hood of data pipelines approach, understand why we have more and more solutions related to this, look at some examples where data pipelines are cool and take a small look on framework called Stairs which allows you to solve a lot of data related tasks.


Parallel Distributed Data Science

Speaker bio
Building an Open-source Django Side-project Into a Business by Pēteris Caune

Healthchecks.io, a cron job monitoring service, is an ongoing "serious" side-project of mine. The project's code is open-source and can be self-hosted, and I'm also offering it as a hosted SaaS service. With this project I've been dipping out of my comfort zone (frontent & backend development, devops) and doing everything else that needs doing: product design, customer support, marketing, billing setup and accounting, maintaining the open source project, keeping on top of things like GDPR, etc. etc. In the presentation I will talk about both the technical and non-technical challenges and decisions I've faced so far.


Speaker bio
Network Science, Game of Thrones and Airports by Mridul Seth

This tutorial will introduce the basics of network theory and working with graphs/networks using python and the NetworkX package. This will be a hands on tutorial and will require writing a lot of code snippets. The participants should be comfortable with basic python (loops, dictionaries, lists) and some(minimal) experience with working inside a jupyter notebook. Broadly the tutorial is divided into three parts: Part A (20 mins) - Basics of graph theory, NetworkX and various examples of networks in real life. Part B (35 mins) - Study the Game of Thrones network and find important characters and communities in the network. Part C (35 mins) - Analyze the structure of the US Airport dataset and look at the temporal evolution of the network from 1990 to 2015. By the end of the tutorial everyone should be comfortable with hacking on the NetworkX API, modelling data as networks and basic analysis on networks using python. The tutorial and the datasets is available on GitHub at: https://github.com/MridulS/pydata-networkx


Speaker bio
Data Science in Python: Past, Present and Future by Radovan Kavicky

• In my talk I will walk you through history of Data Science itself (from John Tukey/60's and creation of first programming languages for Data Analysis, till creation of Python/Guido van Rossum and today/mass popularity of Python within Data Science) and I will also offer a short preview of the future (Pandas 2.0, Feather, bokeh, JupyterLab and so on). • I will walk you also through Data Science cycle and show you Python toolbox and some best practices of a modern Data Scientist (modules and libraries that anybody who wants to start with Data Science in Python). • As founder of PyData Bratislava I will also talk about history of PyData movement and its foundations within Slovakia and our region/V4 countries (we started as one of the first Data Science community within our region) and also our plans and goals (hopefully common with PyData Vilnius) for the future.


Speaker bio
Fun with histograms and the physt library by Jan Pipek

Histograms are a somewhat naive but also very powerful statistical tool for describing data. In many scientific Python libraries (and also your spreadsheet software), there are efficient methods for calculating and visualizing histograms but there is much more (fun as well as boring) stuff that can be done with them. The physt library focuses especially on those fun parts. What if you want to fill your histograms continuously (perhaps not even specifying a proper value range from the start)? What if you want to automatically find human-friendly bin edges (ever wondered why we should count people that are from 168.478 to 173.456 cm tall?)? What if you want to project or slice your multidimensional histograms? What if you wanted cylindrical or spherical histograms? What if you want to add the values of two histograms? What if you want to persist bins and meta-data alongside with the calculated values? The physt library takes histograms as proper objects and combines the computing power of numpy with visualization posibilities of matplotlib (and optionally other backends) and a level of semantics and more advanced functionality. In this talk, I will describe the object model behind the library and show some examples how it can be used.


Histograms Statistics Physt PyData

Speaker bio
Aspect Oriented Programming in Python by Ibukun Oluwayomi

In this talk I will explain what Aspect oriented programming (AOP) is and why and when it should be used. I will talk abut the features in Python that enable AOP such as properties, decorators, context managers and metaclasses. AOP complements object oriented programming by allowing developers introduce functionalities into existing objects without modifying the definition of the objects. A typical use case is injecting logging and monitoring capabilities into already existing code without modifying its source.


Aspect oriented programming AOP

Speaker bio
From ML experiments to production: versioning and reproducibility with MLV-tools by Sarah Diot Girard

You're a data scientist. You have a bunch of analyses you performed in Jupyter Notebooks, but anything older than 2 months is totally useless because it's never working right when you open the notebook again. You're working with software engineers. They can’t imagine life without Git, reviews on readable files, tests, code analysis, CI. They are aghast that you cannot reproduce your Machine Learning analysis seamlessly. And when your team wants to bring anything into production, it's a nightmare. We had these kind of issues in PeopleDoc. Building on open-source solutions, we have developed a set of open-source tools and designed a process that works for us. It helps us version Machine Learning experiments and smoothen the path to production. We are thrilled to present our project and we hope to spark a discussion with the community. See you on Github: https://github.com/peopledoc/ml-versioning-tools


Speaker bio
Visualizing PyTorch and Keras layer saturation with delve by Justin Shenk

Identifying the number of hidden units in a fully connected layer is considered a heuristically-guided craft. A library for PyTorch and Keras, delve, was developed that allows identifying the degree of over-parameterization of a layer, thus guiding architecture selection. The library compares the intrinsic dimensionality of the layer over training, providing the user with live feedback and suggesting improvements. Try it out with Deep Playground at http://104.155.67.242 or install delve with `pip install delve`.


data science data engineering deep-learning keras pytorch

Speaker bio
Embeddings: advanced dimension reduction technique in practice by Tomasz Chabinka

An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words (Google's "Machine Learning Crash Course") During a presentation you will learn: * why do you often need to reduce dimensionality of your data * what standard techniques are used to do that * what the embeddings are in a machine learning context and how can they help you reduce dimensionality of your data * why embeddings are used in natural language processing, and how you can use them to solve your own NLP problem * how to train your own embeddings in TensorFlow


Speaker bio
PySpark - Data intensive processing in practice by Marcin Szymaniuk

Would you like to see Big Data use-cases implemented on Spark? Are you working with Big Data projects already and you are considering introducing Spark to your technology stack? Would you like to know what Spark is good at and what parts of Spark are tricky? First I would like to provide an overview of multiple Spark use cases in various areas. The number of use cases described will be broad enough so it is likely that the audience will be able to find similarities to projects they are working on and see how they can use Spark to solve problems and bring value to the company. The second part of the presentation will be focused on technical challenges which need to be solved when introducing Spark to your ecosystem. Spark has a nice and relatively intuitive API. It also promises high performance for crunching large datasets. It’s really easy to write an app in Spark. Unfortunately, the nice API might be misleading and make us forget that we are implementing a distributed application. For that reason it’s easy to write one which doesn’t perform the way you would expect or just fails for no obvious reason. I will show in a nutshell all the lessons I have learned over 3 years of experience with Spark. It will give you an overview of what to expect and help you to avoid making mistakes typically made by Spark newbies. We will emphasise what you should know about your data in order to write efficient Spark jobs and what the most important configuration tweaks and optimisation techniques are which will come in handy when implementing Spark-based solutions.


Speaker bio
Visualizing machine learning - from confusion matrices to a game by Piotr Migdał

Data visualizations provide insight into machine learning models. Some are crucial for deep learning models (architecture diagrams, log-loss charts), while others are model-agnostic (confusion matrices, misclassified examples). During this talk, I will dive into practical tools for visualizing machine learning in Python (especially: Keras and PyTorch). Moreover, I will present an educational game based on drawing 2d-classifiers, in which a player separates points, and is benchmarked against Scikit-learn algorithms. Relevant projects: - (pip install) livelossplot https://github.com/stared/livelossplot (Live training loss plot in Jupyter Notebook for Keras, PyTorch and others) - "Draw a classifier - a game" https://github.com/stared/which-ml-are-you (WIP) - "Simple diagrams of convoluted neural networks" https://medium.com/inbrowserai/simple-diagrams-of-convoluted-neural-networks-39c097d2925b


Speaker bio
System Administrators: to Python or not to Python by Eivydas Vilčinskas

For a very long time now System Administrators used Bash as their primary tool for their tasks. But the technology keeps evolving and there are options available. One of them is, notably, Python. What does it do better to help the administrators supervise their share of servers? What does it lack when compared to Bash? What are the extreme examples of their respective strengths and weaknesses? This talk is about compromises and competences an administrator would have to consider when selecting between the two alternatives.


System administration Bash

Speaker bio
Geodata processing using Python and JupyterHub by Martin Christen

Geospatial data is data containing a spatial component – describing objects with a reference to the planet's surface. This data usually consists of a spatial component, of various attributes, and sometimes of a time reference (where, what, and when). Efficient processing and visualization of small to large-scale spatial data is a challenging task. This talk describes how to process and visualize geospatial vector and raster data using Python and the Jupyter Notebook. There are numerous modules available which help using geospatial data in using low- and high-level interfaces. We will look at shapely, which is used for manipulation and analysis of geometric objects. Then we go further to Fiona – a module which handles geospatial vector data in a very pythonic way. We move on to raster data processing using the rasterio module and briefly look at the pyproj module which is used for transforming spatial reference systems. After that we look at GeoPandas which is basically an extended pandas module with support for geodata. At the end we will see how maps are created using the cartopy and folium modules. At the end of the talk some examples are shown how to use deep learning for raster analysis using a GPU cluster.


Speaker bio
How to deploy your python microservices with Kubernetes by Vladimir Puzakov

I want to talk about CI/CD of your python microservices with Gitlab CI, docker and Kubernetes. I show you, how to create python projects, which ready to deploy within Gitlab CI pipelines. You will know how to dockerize your microservices, how to create Helm charts from them and deploy it to Kubernetes cluster. We implement our canary deployment pattern.


Speaker bio
Machine learning project anatomy by Joonatan Samuel

Machine learning has drastically grown in popularity in the last years. Rightfully so, if something that computers do feels like arcane sorcery it more often than not ends up being a machine learning system. Companies all over are nowadays coming to an understanding that they should build their internal team, developers move into data science positions and managers are tasked with leading a completely unknown field. How do you set yourself and your team up for success if you want to apply machine learning? Let's go from theory to practice to examples. We will cover the theory of choosing metrics, development cycles and data sets. We will take an example of Veriff where these principles have been applied and finish up with a concrete project's source to drive the learnings home.


Speaker bio
Porting Python to Lithuanian syntax in order to teach your 7-years-old kid to code by Mantas Urbonas

As my 7-years-old kid kept asking what I was doing all that time on my PC, and because he couldn't read English, I decided it was easier to modify Python syntax and let him have it. Everything you always wanted to know about fixing Python syntax (but were afraid to ask).


education localization

Speaker bio
Data Science at PMI - The Tools of The Trade by Maciej Marek

Data Science is not a one man show. It is a team effort that requires every team member to master the tools of the trade. This is extremely important for effectively putting data science to work in a global organization. In this talk we would like to share with you the best practices to start, develop and ship data science products developed inside PMI - the best practices and tools, currently in use by 40+ data scientists across four different location, where data science labs of PMI were established in 2017.


Data science Tools Best Practices

Speaker bio
Anvil: Full-stack Web Apps with Nothing but Python by Ian Davies

Building for the modern web is complicated and error-prone: a typical web app requires five different languages and four or five frameworks just to get started. Wouldn't it be nicer if we could do it all in Python? With Anvil, you can design your page visually, write your browser-side and server-side code in pure Python, and even set up a built-in database in seconds. In this talk, Ian will walk us through how Anvil works under the bonnet, and the challenges of building a programming system that's easy to use without sacrificing power.


Speaker bio
Introduction to DevOps for Data Scientists and ML Engineers by Tomasz Chabinka

Workshop participants will learn how to automate machine learning model preparation with popular open-source tools: Airflow, Clipper and (in very basic scope) Kubeflow. After the workshop you will know: * Airflow: ** how to automate your data workflows ** what DAGs, Operators, Tasks and Pipelines are ** what scheduling and triggering options are available ** how to create your own plugin * Clipper: ** how to create REST API for you machine learning model ** how to version your models in easy way * Kubeflow (basics): * what Kubernetes and Kubeflow are * what are basic features/use cases Kubeflow supports


Speaker bio
Agent-based Modelling and Reinforcement Learning for Route Scheduling: Integrating Open Source Software by Valentas Gružauskas

The presentation is based on a genetic e-grocery business model focusing on last-mile logistics. Households generate orders with two-hour delivery window and demand based on historical sales data. The routing algorithm goal function is defined as food quality level, which effectiveness is being reduced by creating a real-world environment with traffic flow, traffic accidents and traffic lights. The route scheduling approach is implemented with simulated annealing and Q-learning algorithms. The reinforcement learning approach goal is to estimate traffic congestions and recommend alternative routes. The agent-based model was implemented with NetLogo. Python and R was used for input data preparation by implementing K-mean and DBSCAN clustering algorithms for pattern recognition, afterword probabilities were estimated. Simulated annealing and Q-learning algorithms were implemented with Java, however Python could have been used likewise. The simulation was implemented with PostgreSQL database, where Python (psycopg2 module) was used to create the integration between NetLogo and PostgreSQL. The goal of the presentation is to show how open source tools can be integrated to develop agent-based simulations for research, personal learning or business decision analysis.


Speaker bio
Auto Mock External Dependencies in Python Tests by Albertas Gimbutas

Python is a mother tongue language for scripting and programming among infrastructure engineers in the Danske Bank. The Python programs usually consume data provided by external tools over APIs, databases and command line interfaces. Sometimes multiple external tools are consumed by a single program and these tools might even be chained together. I will present how much effort does it take to mock each external tool separately by using standard Python `unittest.mock` library. I have developed a Python library and will present its great capabilities. This library automatically patches external tools, drastically reducing time and effort needed to cover Python code by tests. The Python code, which is tested, can be seen as a black box - simply call your code in a test and check if the result is correct. All the mocking and patching is taken care of for you automatically. Simply run your tests once, when external dependencies are available and the responses will be memorized for each external call and stored in a JSON file. All the subsequent test runs will use the memorized responses without calling external tools again, unless explicitly stated. All credentials and sensitive data is also removed from the memorized calls/responses so the JSON file can be saved as part of the Python code. This library is not open-source yet, however, general ideas should also be a valuable resource.


Testing Mocking Patching

Speaker bio
How to easily set up and version your Machine Learning pipelines, using DVC and MLV-tools by Stephanie Bracaloni

Have you ever heard about Machine Learning versioning solutions? Have you ever tried one of them? And what about automation? Come with us and learn how to easily build versionable pipelines! This tutorial explain through small exercises how to setup a project using DVC and MLV-tools. Abstract You're a data scientist. You have a bunch of analyses you performed in Jupyter Notebooks, but anything older than 2 months is totally useless because it's never working right when you open the notebook again. Also, you cannot remember the dropout rate on the second to last layer of this convolutional neural network which gave really great results 2 weeks ago and that you now want to deploy into production. Does that ring a bell? You're a software engineer in a data science team. You can’t imagine life without Git. Reviews on readable files, tests, code analysis, CI, used to belong to your daily basis. You were thinking of Jupyter Notebooks only as a demo tool. You need reproducibility for every step of your work even if you lose a server. And last but not least, you want to be able to deliver to production something usable by anyone. Is there a magical solution? No! But we can find compromise to satisfy those 2 worlds... We had these kind of issues in PeopleDoc. Building on open-source solutions, we have developed a set of open-source tools and designed a process that works for us. To promote our workflow and tools, we have created some tutorials. We will cover: - Machine Learning pipeline creation and usage - Pipeline versioning & Reproducibility These tutorial is mainly based on Data Version Control and MLV-tools (https://github.com/peopledoc/ml-versioning-tools) Requirements: - be familiar with code versioning (basic usage of Git is enough) - install Git, Virtualenv, Docker, Docker Compose, Jupyter, an editor - some familiarity with Machine Learning workflows Overview The whole tutorial will be available on Github. We will guide attendees step by step, and they can experiment on their own computer. After completing this tutorial, the attendees will be able to set up their own project using DVC and MLV-tools, to use versioning on their experiments and to reproduce them easily. 1 - Introduction: versioning, automation and reproducibility issues with Machine Learning projects 2 - What is DVC and how it works ? 3 - How to handle a Machine Learning project with DVC and MLV-tools 4 - Going further (set of tutorials with realistic use cases available on our GitHub) See https://github.com/peopledoc/mlv-tools-tutorial


Speaker bio
Computer Aided Innovation (CAI): what's state of the art now and expectations for the future by Andrius Žilėnas

It's just YAH (yeat another hype) or real thing? INNOVATION: what we are talking about? Innovation Management Models (IMS) compared: what's on input, in the process and on otput? 100+ Creativity Techniques compared: main features, principles, suppported processes. 150+ CAI compared: anybody knows full IM cycle? Very personal and subjective view on the 9 systems evolution laws (trends) framing expectations for the future?


Innovation Management

Speaker bio