Skip to main content

Posts

Showing posts from 2021

Turning Python Scripts into Working Web Apps Quickly with Streamlit

 I just realized that I am using Streamlit since almost one year now, posted about in Twitter or LinkedIn several times, but never wrote a blog post about it before. Communication in Data Science and Machine Learning is the key. Being able to showcase work in progress and share results with the business makes the difference. Verbal and non-verbal communication skills are important. Having some tool that could support you in this kind of conversation with a mixed audience that couldn't have a technical background or would like to hear in terms of results and business value would be of great help. I found that Streamlit fits well this scenario. Streamlit is an Open Source (Apache License 2.0) Python framework that turns data or ML scripts into shareable web apps in minutes (no kidding). Python only: no front‑end experience required. To start with Streamlit, just install it through pip (it is available in Anaconda too): pip install streamlit and you are ready to execute the working de...

The Codex Paper Has Been Published: the Idea Behind GitHub Copilot

  The Codex paper has been published yesterday. Codex is a GPT language model finetuned on publicly available code from GitHub which has Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot . This paper focuses on the work leading to the early Codex models. The main task is the generation of standalone Python functions from docstrings, and the automated evaluation of the correctness of code samples through unit tests (this is in contrast to natural language generation, where samples are typically evaluated by heuristics or by human evaluators). To solve a problem in the test set, the authors generate multiple samples from the models, and check if any of them passes the unit tests. The raw training dataset was collected in May 2020 from 54 million public software repositories hosted on GitHub, containing 179 GB of unique Python files under 1 MB. Then it has been filtered by removing files which were likely auto-generated, had average line l...

Python Calculations in Jupyter with Handcalcs

 Jupyter notebooks allows LaTeX rendering inside markdown. This way you can write complex math equations within a notebook. While LaTeX is the de facto standard for scientific documents, it hasn't a very friendly and intuitive syntax. handcalcs is an Open Source library for converting Python calculations into rendered LaTeX: just write the symbolic formula, followed by numeric substitutions and that's it. After install it (it is available through PyPI), in the simplest case you just need to import the render class and use the %%render magic command to render the content of a cell: Here another example of equation render and numeric substitution: It is also possible to render just the symbolic equation: or any way generate the corresponding LaTeX code: By default handcalcs renders code vertically, but it is possible to use the %%render params magic to save space by rendering in a single line or show just the result of a calculation: handcalcs allows to adjust precision, use Gr...

Generating Meaningful Mock Data with Faker

  Faker is an Open Source Python package that generates synthetic data that could be used for many things such as populating a database, do load testing or anonymize production data for development or ML purposes. Generating fully random data isn't a good choice: with Faker you can drive the generation process and tailor the generated data to your specific needs: this is the greatest value provided by Faker. This package comes with 23 built-in data providers, some other providers are available from the community. The available data providers cover majority of data types and cases, but it is possible any way make the generated data more meaningful by implementing a custom provider. Faker supports Python 3.6+ and it is available for installation through PyPI or Anaconda.  Here's a code example that shows how to implement a custom provider to generate synthetic data following the structure and constraints as for this Kaggle  dataset related to a restaurant data with consumer...

Diagrams as Code with Python

 In my career I have noticed that often organizations are reluctant on providing Engineering teams with licenses for software to draw diagrams. In the best case scenarios MS Visio is usually the only option available, which isn't the best experience when trying to draw modern software architectures. Several online options are available, but they require to share project details that cannot leave your organization network, so they couldn't be taken into account often. Also, while treating everything as code, it would be nice to have also diagrams as code. All these needs can be satisfied by adopting Diagrams . It is an Open Source Python package that allows you draw cloud system architecture diagrams programmatically and then put them under version control, (as at the end they are regular Python files). It supports cloud (AWS, Azure, GCP, Alibaba, Oracle) and on-prem system architecture diagrams. The Diagrams nodes include also Kubernetes, programming languages and frameworks. ...

TagUI: an Excellent Open Source Option for RPA - Introduction

 Photo by Dinu J Nair on Unsplash Today I want to introduce  TagUI , an RPA (Robotic Process Automation) Open Source tool I am using to automate test scenarios for web applications. It is developed and maintained by the AI Singapore national programme. It allows writing flows to automate repetitive tasks, such as regression testing of web applications. Flows are written in natural language : English and other 20 languages are currently supported. Works on Windows, Linux and macOS. The TagUI official documentation can be found  here . The tool doesn't require installation: just go the official GitHub repository and download the archive for your specific OS (ZIP for Windows, tar.gz for Linux or macOS). After the download is completed, unpack its content in the local hard drive. The executable to use is named  tagui  (.cmd in Windows, .sh for other OS) and it is located into the  <destination_folder>/tagui/src  directory. In order to ...

Googlielmo's Blog 2.0: a Fresh Restart

After a 7 months hiatus I have decided to go back posting on this blog. Lot of things happened across 2020 and 2021 that left me with little or no time at all to share my thoughts and findings. In this long period of time I have been involved in challenging ML/AI projects, managing them and interacting with people 100% remotely because of the COVID-10 pandemic, had a chance to experiment with many and in some cases successfully applications of new DL architectures and Python Open Source libraries, but also tune mine and my family personal life among all the style changes imposed by the pandemic. The reasons that led me to restart the blog are mostly the following: I have accumulated tons of technical topics that are worth to share with a wider audience. During the past months I have shared some through social networks such as LinkedIn and Twitter or in few virtual meetups or conferences, but they need more deep dive. This week I gave a workshop at the ODSC Europe 2021 conference and I...