In this post I wanted to share a little bit of what I learned so far using the version control tool Git.
Git was probably the most abstract and misunderstood point when I was studying software development. And as well in my experience as a teacher assistant for the coding bootcamp Ironhack in Berlin, it was really hard to keep things simple when using Git or teaching the basic of its usage without some layers of abstraction.
I am not going to cover the installation and setting up of Git, neither cloning or pulling a .git repository and a moderate…
I have a busy mind and I like to build my own tools to help me remember my ideas. I also forget my ideas really fast. So I need to find a way to write them somewhere ASAP and without too much effort. And because I know a lot of people with the same issue, I wanted to share my last tool which is a CLI application and is really helpful since, as a software developer, I spend all my day on the command line. I needed something fast, accessible everywhere and something that feels really natural to use. …
For everyone of us interested in artificial intelligence, the idea of a human level AI that could understand and interact in the world like we do is charming and fascinating. For others, this is a really creepy idea. And everyone can understand why. But I am wondering what means “human level” AI ? What aspect(s) of our intelligence are we trying to artificially reproduce ? And when I think about intelligence, I also think about consciousness. Which is even more mysterious. What does that mean to be conscious and do we need to be conscious to be intelligent ? On…
In this article I will explain how a basic anomaly detection algorithm works. For educational purpose I will use a simple n dimensional training dataset :
There is no correlation between the features x 1,…,x n here. In a future post I will show how to deal with correlations and multivariate gaussian distributions, that will involve matrix inverses. I will also give an example of application in python.
We can create a probabilistic model that fit our training data. …
(Physics): The measure of chaos or disorder in a system.
The lower the order, the lower the entropy.
(Information theory): Measure of information in terms of uncertainty.
The higher the uncertainty, the higher the entropy. The higher the entropy, the more amount of information is contained in the system.
To understand what is information and entropy, let’s start with an example : we flipped a coin and we want to know which side it landed on. What is the amount of information ? Or, how many questions do we have to ask before we know the state of the system…
The term asymptotic means approaching a value or curve arbitrarily closely (i.e., as some sort of limit is taken). A line or curve a that is asymptotic to given curve c is called the asymptote of c. (http://mathworld.wolfram.com/Asymptotic.html)
When studying Algorithm, even at a beginner level, we need to understand what asymptotic (or limiting) behaviour means. Asymptotic analysis is a way to classify algorithms according to how their running time or space requirements grow. We need to predict how it will behave with a very large amount of data.
The graph below represents the function f(x)= 1/x. What’s important to…
Once we have a model ready to perform forward propagation and a function that returns a cost J, we can do backward propagation with gradient descent to update our parameters W and b and minimise our cost function.
To perform back propagation we need to compute the partial derivatives of our cost function J with respect to our parameters because we need to understand how a change in their values will affect our cost. To do so, we can first compute the error for each layer :
With the subscript j representing the j-th neuron in the l-th layer.
To better understand what’s going on underneath Gradient descent, it has been useful for me to really understand what is the derivative of a function and how to calculate it.
So we know that the derivative is the rate of change of a function at a particular point. Basically it’s the slope of a function calculated with two point really, really close to each other.
Here is the step of calculation :
let’s take the equation y = f(x).
When x increases by △x, y increases by △y :
If we subtract both formulas :
In a previous note I explained how we can build a L=2 neural network for binary classification. Now that we have the structure of the model, we can feed it with our training data set and compute the error between our predicted value ŷ and the correct value y.
The formula for forward propagation in a L=2 is the following:
In the previous post we built the model taking as an example only two input variables x1 and x2. In reality we will feed the network with a set of m training data. …