Kohei Suzuki

email address: kohei.suzuki808@gmail.com |  Download CV

  • Developed an iOS app, DeepColors that helps users colorie sketches and grayscale images, by using the power of DeepLearning.
  • More than 3 years of Machine Learning Experience including 6 months of internship as an ML engineer.
  • Looking for a full time job as a Machine Learning Engoneer in the Vancouver area.
  • Already have a valid work permit for one year in Canada.

Main skills

  • Programming Laguage:
  • Python (TensorFlow, PyTroch, Pandas, Numpy, Matplotlib, OpenCV, Sklearn, Jupyter Notebook, Tensorforce, SCipy, Flask, unittest),
    Swift, SwiftUI, Kotlin, C/C++, SQL, Java, Bash
  • Machine Learning:
  • Computer Vision, NLP, Signal Processing, Time Series, Reinforcement Learning, Classic machine learning
  • OS:
  • Ubuntu, MacOS
  • Others:
  • AWS(EC2, DynamoDB, CloudWatch), GCP, Github, Docker, CircleCI, Apache, Scrum, Agile, Algorithms and Data structures, Quantum computing (Dwave-Ocean-SDK)


Jan 2019 - Jun 2019

Machine Learning Developer Internship

Singular Software Inc. [SEE Reference from the CEO]

About the company

Singular Software is a company which has developed a phone-based hearing assistive app, HeardThat which debuted at CES 2020, that helps people to hear speech in noisy situations by using the power of deep learning. The company was nominated at the top 10 out of about 200 in the New Ventures BC 2019 and was also the winner of 2020 What's Next Innovation Challenge.

I worked as a machine learning developer intern and involved in creating HeardThat app in the early developing stage.


Joined the team in the early development phase, and took initiative on the role of creating several platforms with Flask and Javascript for model evaluation, constructing database with SQLite where results of the evaluation were kept, implementing, and training machine learning models.

- One of the things I developed for the purpose of the evaluation was an internal visualization tool with Flask where team members can check the spectrogram and waveform of any audio files interactively. Below gif is what I have found on twitter and is pretty similar to what I developed except for one feature that the tool I developed was also able to do arithmetic calculation between audio files and to plot its spectrogram and waveform.

Responsive image


- Another thing I developed was an external platform which was for Mechanical Turk where we could collect scores and transcription from people so that we were then able to calculate metrics such as Mean Opinion Score (MOS) and Word Error Rate (WER) that were part of metrics that we used to evaluate deep learning models.

Worked with team members in an agile way by changing things as we needed and learned how Scrum works. Since the internship was remote, gained necessary skills such as SSH on Ubuntu, good communication skills with both verbal and written English with concise explanations, and writing easy to understand documentation.

Skills used

  • Python:
  • TensorFlow, Numpy, Scipy, Matplotlib, Seaborn, Bokeh, Flask, Jupyter Notebook, Pandas
  • Others:
  • Ubuntu, Git, SQLite, Scrum, SSH, Anaconda, JavaScript


Aug 2020 - Current

Responsive imageDeepColors app for iOS Website Responsive image

About this project
A mobile app that helps users colorize sketches they have drawn and grayscale photos, by using the power of Deep Learning.
Responsive image

- Colorize sketches
Responsive image

- Colorize grayscale photos
Responsive image

Application outline
Took about 3 months from learning how to display "Hello world" in a View in SwiftUI.
  • Client Side
  • Mainly used SwiftUI and Swift but SwiftUI is a new framework so I combined UIKit to help SwiftUI.
  • Machine Learning
  • Reading a lot of papers to find what I can use in the app by considering the performance and the processing speed.
    Trained GANs on datasets that I preprocessed on GCP by following their architectures and hyperparameters.
  • Server Side
  • Used Flask to receive POST requests from the client-side and to process the given data.
    Set up Apache on Ubuntu to handle concurrent requests.
  • Monetisation
  • Set up Admob and created mediation groups to maximize the revenues. Prepared in-app purchases as well. Currently, there is only one item that users can buy but I have been working on some new features.

Jul 2020 - Aug 2020

Speech-To-Text app with Flask [github]

About this project
A Speech-To-Text app with Flask in which we can upload a video or an audio file and can get transcripts of the speech in the file we upload.

How it works
Once we upload a video file, it takes the audio from the video with the information of the file such as the sampling rate by using ffmpeg-python, which is a wrapper of ffmpeg. Based on the information, it converts the audio to a 1-D Numpy array which is fed into the DeepSpeech model which trained by machine learning techniques based on Baidu's Deep Speech research paper. The output from the DeepSpeech model is then fed into a language model in order to improve the prediction accuracy.

Application outline

Responsive image

Responsive image

After a video or an audio file is uploaded, it collects information about the file such as sampling rate, and takes audio from the video if the file is a video by using ffmpeg-python. The DeepSpeech model desires 16,000 Hz for sampling rate so we need to resample the audio to satisfy that if needed. Then it converts the audio to 1-D Numpy array which is fed into the DeepSpeech model.

Responsive image

The output from the DeepSpeech model is then fed into a language model to improve the prediction accuracy.
Once it gets the output from the language model, it creates a JSON that includes a list of words with start-time and duration. Based on the JSON file, it also creates a text file that keeps a sentence of words concatenated with white space. A zip file that contains the JSON file and the text file is downloadable.

Further work
  • Improve UI.
  • Add another feature that detects specific motions of the user and put marks on the sequence of frames so that the user will be able to find easily where they want to cut.
  • Deploy as a C++ software since I want to create a software with C++.
Skills used
  • Python:
  • TensorFlow, Numpy, ffmpeg-python, Flask
  • Others:
  • Github, Docker, CircleCI, HTML, CSS Bootstrap

Jul 2020 - Jul 2020

Twitter Real Time Financial Sentiment Analysis  [github]

About this project
A Flask application where we can enter hashtags and keywords related to tweets we want to stream and in which an NLP model, FinBERT which is a pre-trained NLP model to analyze the sentiment of the financial text, does sentiment analysis on the tweets in real-time. We can see the results of the tweets collected containing the hashtags or keywords and their sentiment scores given by FinBERT via Pandas dataFrame.

It is built by further training the BERT language model in the finance domain, using a large financial corpus, Financial Phrase Bank from Malo et al. (2014) which can be downloaded from here , and thereby fine-tuning it for financial sentiment classification. For the details, please see FinBERT: Financial Sentiment Analysis with Pre-trained Language Models.

Application outline

Responsive image

It starts by entering hashtags and keywords we want to stream on such as $TSLA, $GOOGL, #CAD, #USDJPY, and so on.
Since we use the FinBERT NLP model to get sentiment scores so the hashtags and keywords should be something that may related to finance.
The below image is taken after some keywords are given. Now, ready to start streaming by using Tweepy, which is an API to deal with twitter functionality in Python.

Responsive image

After streaming started, each tweet collected including at least one keyword we defined is preprocessed with NLTK to be given a sentiment score by FinBERT. Once sentiment score is calculated we store information of each tweet into a csv file and display them in a Pandas DataFrame, which we can see the below as an example.

Responsive image

Yes, we can obviously use something like DynamoDB on AWS by using boto3 library to comminucate with in order to store the data instead of a csv file locally.

Further work
  • Some of the functionality used in this application can also be used in the Automated Forex Trading Strategy which I have been working on in order to create new features.
  • We change the Deep Learning model which gives sentiment scores to tweets to another NLP model which trained on different dataset, if we want to switch the domain we want to use for.
Skills used
  • Python:
  • PyTorch, NLTK, Tweepy, Flask, Pandas, Numpy
  • Others:
  • Github, Docker, CircleCI, HTML, CSS, JavaScript, jQuery, Bootstrap

Jun 2020 - Jul 2020

Sudoku Solver on Quantum Computers [github]

About the Sudoku solver
This is a Flask app in which it detects Sudoku puzzles that we show to the webcam by using OpenCV and a CNN model which is for recognizing each digit.
Once it detected a puzzle, then we formulate a Quadratic Binary Model (QBM) and an objective function that we want to minimize in order to find the solution by using D-Wave's quantum computers.

The purpose of this project:
  • Learn the annealing way quantum computers which are good at solving particular problems such as optimization problems.
  • Get hands dirty with OpenCV.
  • Learn testing and how to use CI tools such as CircleCI.
  • Learn how to use Docker.

Preprocessing steps
Here, the workflow from capturing a sudoku puzzle to findind a solution will be descrived.
I will go with images from left to right and top to bottom so you will can easily imagine what is going on the inside.

Responsive image

1. Convert a frame to gray scale.

Responsive image

2. Applies an adaptive threshold to an array.

Responsive image

3. Blurs an image using the median filter.

Responsive image

4. Detect the puzzle.

Responsive image

5. Create a mask.

Responsive image

6. Capture the grid.

Responsive image

7. Detect the vertical lines.

Responsive image

8. Detect the horizontal lines.

Responsive image

9. Calculating the points where the vertical lines and horizontal lines cross.

The samples of digits cropped and fed into a CNN model.
Responsive image Responsive image Responsive image
Responsive image Responsive image Responsive image
Responsive image Responsive image Responsive image

Then we create a 2D Numpy array which represents the sudoku puzzle that the user showed and which is created based on recognition of a CNN model, which is trained on Chars74K, for each cell in the grid.

The last thing the user has to do is that fix the numbers that are misclassified by the CNN model by filling the corresponding text box with a correct one.

Responsive image

Formulate our problem for D-Wave's quantum computers.
  • We need to formulate problems we want to solve as a Binary Quadratic Model (BQM).
  • In order to solve a BQM, we need to define an objective function which would be Quadratic Unstractured Binary Optimization (QUBO) or Ising. By finding the values that minimize the objective function, we solve the BQM.
  • A BQM equation has two parts: Objective: What we are trying to minimize Constraints: Rules we need to satisfy

BQM Development Process
  • Convert our objective and constraints to math statements with binary variables if we picked QUBO as an objective function or -1/+1 variables if Ising.
  • Make our objective and constraints "QUBO appropreate".
    • Objective is a minimizing function
    • Constraints are satisfied at thier minimum values
Binary Quadratic Model

The coefficients and are constant numbers we choose to define our problem, as is the constant term .
The binary variables and are the values that we are looking for to solve our problem.
The best solution for these variables is the value for each that produces the smallest value for the overall expression.
Searching for the variables that minimize an expression is called an “argmin” in mathematics.

  • Linear Terms:
    The first summation, , contains linear terms, with each having just one binary variable.
  • Quadratic Terms:
    The second summation, , contains quadratic terms, with each term in the summation containing a product of two variables.
  • Constants:
    In the general BQM form we may or may not include constants. Since we are looking for an argmin, any constant terms will not affect our final answer. However, it may be useful when interpreting the output from D-Wave solvers and samplers.
For more information about BQM, QUBO, and Ising, please visit here.

  • Though we use the D-Wave quantum computers to solve Sudoku puzzles which has a fascinating speed for the computation, many people access the computers so that we have to wait for a queue for using them. So the total process will take time but usually, it is done within a minute.
  • Sometimes the quantum computers can not find the solution for the given Sudoku puzzle especially for difficult ones since quantum computers run the calculation several times and pick up the best solution they found. In other words, it did not converge to an optimum of the objective function.

Jan 2020 - Present

Forex Trading System with Deep Reinforcement Learning


This is an automated forex trading strategy by using the power of reinforcement learning which is a type of machine learning. The goal is to optimize a forex trading strategy and to make a profit with it on the real financial market while I am sleeping.

About this project can be separated into two sections, MVP and version 2.0.
  • In MVP section:
  • I will introduce you what I had done and the result I got from the first deep reinforcement learning model.
  • In version 2.0 section:
  • I will mention what I changed, especially preprocessing in order to make a better model and will show you improved results.


First, I will list up key parts in MVP that I finished almost within a month and then I will add explanation to each of them.
  1. SureFireStrategy
  2. Gramian Angular Field
  3. Data
  4. Result
  5. Deployment
1. SureFireStrategy

I loosely followed this paper, Deep Reinforcement Learning for Foreign Exchange Trading.

In this paper, what they tried was that tried to optimize SureFireStrategy which is a variant of the Martingale by using ConvNet as the agent in reinforcement learning in order to find patters in heatmap images encoded from time series data by Gramian Angular Field (GAF) which I will talk about later.

The Sure-Fire starategy

Responsive image
Responsive image
Responsive image
First, as illustrated in Fig. 2, we purchase one unit at any price and set a stop-gain price of +k and a stop-loss price of −2k. At the same time, we select a price with a difference of −k to the buy price and +k to the stop-loss price and set a backhand limit order for three units. Backhand refers to engaging in the opposite behavior. The backhand of buying is selling and the backhand of selling is buying. A limit order refers to the automatic acquisition of corresponding units.
As illustrated in Fig. 3, when a limit order is triggered, and three units are successfully sold backhand, we place an additional backhand limit order, where the buy price is +k to the sell price and −k to the stop-loss price. We set the stopgain point as the difference of +k and the stop-loss point as the difference of −2k, after which an additional six units are bought.
As illustrated in Fig. 4, the limit order is triggered in the third transaction. The final price exceeded the stop-gain price of the first transaction, the stop-loss price of the second transaction, and the stop-gain price of the third transaction. In this instance, the transaction is complete. The calculation in the right block shows that the profit is +1k.

Forex Trading System with Deep Reinforcement Learning

2. Gramian Angular Field (GAF)

On the left side, it shows price movement in 5 minutes time frame with 12 window size. On the right image, it is an image encoded by GAF which represents the price movement on the left side and is a sample of images that were fed into ConvNet and that were defined as the states in reinforcement learning. Each image had 4 channels that corresponded to Open, High, Low, and Close in a timeframe.

Responsive image
Responsive image

3. Data
How I got forex data was that I used a python API provided by OANDA which is a broker that I use. I was able to gather data in any major timeframe I wanted.

4. Result
The below image is a result plot of training. From this plot what I could say were:
  • Obviously, the model had not been trained well
  • Exploration and exploitation problem
  • SureFireStrategy might not fit
  • Data quality might be not good enough
How I fixed those problems is mentioned in the version 2.0 section. Responsive image
5. Deployment
Though I had not got any model that might be able to make a profit on the real market, I deployed the model on AWS EC2 and made all the process needed automated by defining operations in a bash script. I also set up CloudWatch to turn on and off the server not to waste money on weekends when the forex market close.

Version 2.0

I have been working on the version 2.0 and its differences from the MVP are,
  1. Trading strategy
  2. Definition of the state
  3. Data
  4. Result
  5. Further work
1. Trading strategy
I can see the strength of the SureFire Strategy only when I can bet double continuously over and over again. Due to my bankroll size, I was not able to place order like that. So I set a certain pip size to exit the market instead of using the SureFireStrategy.

2. Definition of the state
In MVP, I used encoded heatmap images as a state but this might cause the result that the model had not been trained well, meaning that ConvNet could not find any patters in the images. So I switched the way to define the state to use technical indicators as features that describe the state.

3. Data
In MVP, I used data that OANDA provides, but the data actually had a considerable amount of nans that I filled up. And the paper I followed did training on data that had timeframe instead of using bid-ask price so that I could not perfectly reproduce the actual price movement that is happening in the real market. So I started collecting bid-ask data in real-time that is used to train models in version 2.0.
Responsive image
Responsive image

4.Result of backtesting
All the entry points for short from 2020/May/18 to 2020/May/23 (green: made profit, red: loss) Responsive image
All the entry points for long from 2020/May/18 to 2020/May/23 (green: made profit, red: loss) Responsive image

All the entry points for short from 2020/May/25 to 2020/May/30 (green: made profit, red: loss) Responsive image
All the entry points for long from 2020/May/25 to 2020/May/30 (green: made profit, red: loss) Responsive image

2020/May/18 - 2020/May/23 2020/May/25 - 2020/May/30
Profit Responsive image Responsive image
Number of trading 117 105
Number of winning 86 79
Number of losing 31 26
Winning ratio 0.735 0.752
Profit Factor 1.515 1.820
Max DrawDown -40pips -50pips
Net Profit 217.7pips 225.0pips

Further work
  1. Solve overfitting:
  2. As we can see the entry points plots above, the amount of entry points for short is much more than it is for long. This is becaouse of the training data I used has several downtrends that could cause overfitting problem. We can solve this by increasing the amount of training data, especially data that has uptrends.
  3. Use heatmap images as extra features.
  4. Use Fourier transform to approximate the price movement and calculate derivatives that are used as features that may be thought of the strength of the current trend.
  5. Implement algorithmic trading strategies and use their outputs as features with one-hot encoding.
  6. Update the reward function which is one of the crucial parts in reinforcement learning.
  7. Hyperparameter tuning and feature section.

Aug 2019 - Sep 2019

Ticket-Dodger [link]

This is the final team project in Machine Learning Bootcamp at 7 Gate Academy and is an application predicting the likelihood of getting a parking ticket in the Vancouver area based on the user's geolocation and the time. When a user taps a location at where he is planning to park his car or at where he is currently parking his car, that is going to be a trigger to call AWS Lambda where our machine learning model runs to predict the likelihood.

Here is how I and Paul had created this application within a month. Responsive image
We found dataset on Vancouver open data catalog, the original dataset had the information of parking tickets issued such as date time, address including block, infraction, status, etc. However the dataset obviously did not have any target variable that we could use in our case the likelihood or probability of getting a parking ticket. I will explain how we solve this problem in Obstacles section below but the simple answer is that we created by using traffic counts on each street.
We estimated the probability for each street and thresholded them to create three categories, Low, Medium, and High that were the likelihood we were predicting. So we dealt with this problem as a classification problem because it was more user-friendly than giving users a probability.

While we were working on feature engineering we found that the time was definetely a factor. As you can see below, there is high chances for getting a ticket around 3 PM.
Responsive image
Responsive image
Training Machine Learning Models
As I mentioned, this was a classification problem so we started from training a simple logistic regresssion because it was easy to implement.
Afterwords, we trained different kind of models such as Random Forest, XGBoost, and Neural Networks. At first we made sure that there was a capacity for models to learn something from our data by trying them to overfit on the training data.
Then we started iteratively building more complecated models by changing, for instance in Multi Layer Perceptron (MLP), changing the number of neurons in each layer, the number of layers, optimizers, and so on.
Here is one of the results we got from MLP and XGBoost after Hyperparameter search by using Hyperas and Optuna that are framework in Python for Hyper parameter search.
Responsive image
Responsive image
Model Evaluation
Subjectively evaluating our models was difficult. The best that we could say was that we did a pretty good job of determining the low risk of getting a ticket. It is much more important for us to have accurate LOW risks. For example, if you park expecting a low risk and you end up getting a ticket, it will be a much worse user experience than if you went in expecting a ticket and got none!

We did chase down a parking ticket enforcer and asked for his opinion and he gave us some streets that are common of getting a parking ticket. Our predictions from XGBoost were pretty good. Due to model's performance and inference time, 28.19 [ms], we choose XGBoost model.

Application Archtecture
  • Server:
  • Flask running behind Gunicorn, and NGinx
  • Custom built location to street matching engine
  • Model:
  • XGBoost
  • Deployment: DigitalOcean Droplet
    • 1 vCPU
    • 1 GB RAM
    • 24 GB SSD
Client Side
  • Website:
  • HTML and Javascript
  • Map:
  • LeafletJS serving OpenStreetMap (No google!)
  • Deployment:
  • Github Pages
How did we work as a team

Since we lived a little bit far to work together in person, it was important that we had a good system to work together.
We started by working together by sourcing our data, evaluating what we have and creating a merged base dataset.
In order to streamline our approach, we then split up our roles to focus on primary areas, building machine learning models was my focus and Paul was working on development.
Afterwards, we did a knowledge transfer to fill each other on the gaps that we might have missed out on.

We did loosely work in the agile way, changing things as we needed. We made sure we reviewed each others work to the standards that we set out for ourselves. To do so we used Trello to manage our tasks. Here are the some of the tags we had in our channel on Trello.

  • Product backlog
  • Current sprint
  • Doing
  • Review
  • Blocked
  • Done
We also had a daily meeting to catch up what everyone had done.

Target variable creation As I mentioned above, we did not have a target variable, the probability or the likelihood of getting a parking ticket. We created one by using three datasets, one that contained the information of parking tickets issued, second that had the traffic counts on each street including some private streets, and third that had almost all of the street name in the Vancouver area.
It was important for us to define what we mean by “Risk".
It was a fairly arbitrary term. We had decided to use the number of tickets given, divided by the amount of traffic on the street. In this way, we defined risk RELATIVE to the risk of other streets. The formula for estimating the probability for each block on each street was as follows:

In order to do the calculation, we needed to make sure that each street in the parking ticket dataset and traffic counts dataset were the same format to marge the two datasets with the streets as the key.
Here is an example of a street we needed to clean up: "WEST GEORGIA" and "GEORGIA W"
So we used a Python framework, fuzzywazzy, to clean up the streets name.


Dec 2019 - Mar 2020

The university of Tokyo

Data Science Cetificate [link]  [See certificate]

Passed coding examinations to take this course. I am one of the about 400 students who have successfully finished this course out of 900, the number of students who had started the course.

  • 1st:
    Introduction to Data Science and Python
  • 2nd:
    Numpy and Pandas
  • 3rd:
  • 4th:
    Probability and Statistics
  • 5th:
    Supervised Learning
  • 6th:
    unsupervised Learning
  • 7th:
    Model evaluation and Hyperparameter tuning
  • 8th:
    Final Project
Aug 2019 - Sep 2019

7 Gate Academy

Machine Learning Bootcamp [link]
Out of about 120 applicants, I was selected as 7 people who can take the course.

8 weeks, 3.5 hours * 4 days/week.
  • 1st week:
  • Data Engineering, Modeling, BigData (ETL, DWH, Airflow, Spark)
  • 2nd week:
  • Data Visualization (Matplotlib), Data Processing (Duplicated rows, Missing Values, Outliers, Multiple Value Ranges, Non-numerical Data)
  • 3rd week:
  • AutoML (Google Cloud, Microsoft Azure), ML Library (sklearn)
  • 4th week:
  • MVP, Interpretability, Problem-solving, ML Technics (Bias, Variance, Regularization, etc)
  • 5th week:
  • Planning and estimating the work, Data Science Scrum
  • 6th week:
  • Team Project
  • 7th week:
  • Team Project
  • 8th week:
  • Presentation about the team project, Ticket-Dodger

Jul 2017 - Sep 2019

Institute of Technology Development of Canada

Computer Science Diploma [link]  [See transcription]

The course was 2 years diploma in Open Source Programming which contains one year in class and the secound year for Coop program. I worked as a Machine Learning Developer at Singular Software Inc.

Apr 2018 - Jun 2018

Brain Station Vancouver

Data Science Bootcamp [link]  [See certificate]
  • UNIT 1 Python Programming
  • Programming Fundamentals, Pandas, Python Packages
  • UNIT 2 Working with Data
  • Importing, Cleaning, Sampling
  • UNIT 3 Data Visualization
  • Matplotlib, Bokeh, Model Visualizations
  • UNIT 4 Numerical Models
  • Linear Regression, Polynomial Regression
  • UNIT 5 Classification Models
  • Logistic Regression, Naive Bayes, Decision Trees
  • UNIT 6 Model Validation
  • Distribution Fitting, Testing Goodness of Fit, Training Models
  • UNIT 7 Machine Learning
  • Intro to Neural Networks, Intro to Random Forests
  • UNIT 8 Presenting Data
  • Storytelling with Data, Project Presentation

Apr 2012 - Apr 2013

Tokyo City University

Bachelor Degree in Computer Science [link]