IDO GREENBERG
I am a
researcher in the areas of Algorithms, Data Science and Mathematical
Modelling, looking to face challenges of both intellectual and practical
importance, along with worldclass research scientists. My great
passion is to gain and simplify knowledge, spread it and apply it to new
fields, pushing the limits of both understanding and capability. Currently I am
particularly interested in fundamental research and better understanding of
Machine Intelligence and Deep Neural Networks.
Talpiot graduate
MSc from TAU in applied math (summa
cum laude, excellence reward of math school, 2 journal publications)
Experience in ML and datascience
(Istra), operations research (MAFAT) and lowlevel research (Intelligence) 

Applicational
Field 
Title 
Year 
Language 
Code 
Description 
Traffic
analysis 
Traffic Analysis in Original
Video Data of Ayalon Road 
2019 
Python 
The
recording smartphone in action Outofthebox
SSD vs. the dedicated detector Probabilistic
field of the expected nextlocation of a tracked vehicle Traffic
density is indeed high in known rushhours The
fundamental traffic diagram (density, speed and flux): practice vs.
theory Lane
transitions mostly occur on the right lanes Left
lane transition is followed by average speed increase of 4.1 km/h 

News 
Scraping and Analysis
of Hebrew Newspapers 
2019 
Python 
summary
of data parties
and politicians count graph
embedding of words by contextsimilarity 

Elevators 
Elevators Simulator 
2018 
Python 
Elevators
continuoustime visual simulator, intended to test various algorithms for
elevators managing. visual
simulation single
scenario summary multi
scenario summary 

Earthquakes 
[Kaggle]
Earthquake Prediction 
2019 
Python 
Prediction
of the remaining time until next earthquake according to seismic
measurements. Worked
in the framework of a Kaggle competition
with prizes of 50K$, as part of a silvermedal winning team that
included Zahar Chikishev
(full time) and myself (part time). The
repo includes only my work, mainly concentrating on two NeuralNetwork
solutions based on rawsignal:
seismic
signal and corresponding timeuntilnextquake a
sample of melspectrogram images 

Public
transportation 
[Hackathon] Matching
of Recorded Bus Trips to Planned Routes 
2019 
Python,
C++ 
This
project was developed in collaboration with Oded
Shimon during the twodays Civil Hackathon of the
Israeli Workshop of Public Knowledge. Both
data sets of planned bus routes and of actual bus trips are publicly
available in Israel, allowing diagnosis of trips and comparison of their
timings and routes to the plans. However,
it is suspected that the reported matching between the planned routes and the
actual trips is inaccurate, making any diagnosis ineffective. This
project shows that within limited subsets of the data, all the observed trips
correctly correspond to their reported route. It
also provides general tools for detection of uncertainlyclassified trips
(with more than one plausible planned route) and of anomalous trips (with no
plausible planned route). The
match of a trip to a planned route is essentially calculated through the
distances of the observed trip locations from the route. a
trip vs. several potential routes best
route fits vs. 2^{nd} and 3^{rd} best fits anomality and certainty distributions of trips 

Knesset
committees protocols 
[Hackathon] Analysis
and Clustering of Hebrew Knesset Committees Protocols 
2019 
Python 
This
project was carried out as part of Talpiotalumnus singleday hackathon with
a group of 6 members under supervision of the Israeli Workshop of Public
Knowledge. In
order to improve the public access to the extensive parliamentary activity
carried in the Knesset committees, the corresponding protocols were studied
and some relevant information was extracted regarding both the maturity of
the data and the information it carries. In
addition, several topicbased clustering approaches were tried, showed
promising potential and yielded several results until the end of the
hackathon. quantification
of the parliamentary activity of a sample of Knesset members visualization
of LDAbased clustering (circle = cluster of protocols) 

Statistics 
Numeric Calculation
of Fisher Information for Quantification of Information Loss in
NonParametric Tests in TwoPopulationsComparison 
2018 
Python 
Nonparametric
statistical tests can avoid the datanormality assumption in the cost of
losing some of the information in the data. NumericFI module quantifies the loss of
information in nonparametric tests (signedrank & ranksum, both often
named Wilcoxon test) compared to parametric tests (ttest) in the case of two
paired datasets of normal iid data, using numeric calculation of Fisher Information
of each test statistic. The
calculations show that while the ttest does have some advantage, it is quite
minor (less than 10% in terms of Fisher Information, which roughly means it
can be compensated by using 10% more data). In addition, the signedrank test
looks slightly better than the ranksum test. These results are consistent
with literature that studied such tests efficiency using different
approaches. output
example 

Signal
processing 
Spectrum
reconstruction for signals with dropped samples 
2019 
Python 
Given
a signal sampled in discrete uniform times up to certain missing points, this
module compares various methods to find the Fourier spectrum of the original
signal and reconstructs the missing points. It
is shown that ignoring the missing points causes major disturbances to the
Fourier transform, and the limited tests that were tried, linear
reconstruction of the signal seems like the best way to prevent these
disturbances. demonstration
of the interpolationbased methods 

Signal
processing 
ICAbased Sound
Decomposition 
2019 
Python 
Synchronization
of the starttimes of audio recordings, and decomposition of them
into hopefullyindependent source signals using
FastICA. Under
most setups, the process failed to cleanly reconstruct the original
components from which the recorded sounds consisted. a
sample of convolutions used for signalssynchronization 

Finance 
LongTerm Savings
Calculator 
2018 
Python 
Main
functions: Generic tools:
Generic tools:
WARNING: nothing is
guaranteed to be (even approximately) correct. WARNING: validity of
calculations is probably restricted to Israel. output
example 

Plotting
infrastructure 
Interactive Plotter 
2018 
Python 
Infrastructure
for interactive figures, in which the limits of the axes are changed
dynamically by scrolling and dragging the mouse. zoom
out vs. zoom in 

Graphical
interface infrastructure 
Window Controller 
2018 
Shell 
Move
and resize active window programmatically (Linux only; intended to be used
along with corresponding customized keyboard shortcuts). 
All the materials below are free to
use for any purpose as long as proper attribution is given. The writer has
no official education or certification in most topics appearing below. None
of the documents is meant to recommend any action to the reader, and no
responsibility on such actions will be taken. 



Field 
Topic 
Framework 
Year 
Documents 
Language 
Pages 
Short
summary included 
Comments 


Mathematical
Modeling 
Elevators
WaitingTime Optimization 
MSc
course 
2014 
HE 
 
 



Simulator: Liquid
within Moving Container 
High
school 
2007 
HE 
49 
Yes 
In collaboration with
Jonathan Cederbaum 


Machine
Learning 
Web Servers
Classification 
High
school 
2008 
HE 
42 
Yes 
In collaboration with Jonathan
Cederbaum and CheckPoint Security 


Illustrated List of
Basic Methods in ML 
Independent 
2018 
EN 
 
 
Illustrations were collected
from various sources (none was made by me) 


Finance 
Basic Concepts in
Finance and Investments 
Independent 
2017 
HE 
12 
Yes 



Pension Tutorial 
Independent 
2018 
HE 
20 
Yes 



Human
Resources 
Disturbing Factors
for Technological Employees in Permanent Military Service 
Independent 
2017 
HE 
4 
No 
Based on survey among 21
subjects 

*PPTX
files are partially corrupted until I find an alternative hosting to Google
Drive.
All the materials below are free to
use for any purpose as long as proper attribution is given. Most summaries
passed little to none review, and probably contain inaccuracies. 


Field 
Topic 
Framework 
Year 
Documents 
Language 
Pages 
Short
summary included 
Main
source 

Algorithms 
Basic algorithms,
graphs, DP, LP, spatial search, etc. 
Independent 
2018 
EN 
14 
No 
Algorithms
1, Technion, 2013
and analog courses in TAU, HUJI and Udacity 


Information 
Information Theory 
Independent 
2019 
EN 
22 
Yes 
Information,
Physics and Computation,
Stanford, 2009; Elements of Information Theory, 2006 


Signal
processing 
Digital Signal
Processing 
Independent 
2018 
EN 
13 
No 


Graphs 
Intro to graph
theory, basic algorithms, spectral graph theory and SNA 
Independent 
2019 
EN 
16 
No 
Distributed sources (see references within
the PDF) 


Statistics 
Intro to Statistical
Theory 
Independent 
2018 
EN 
11 
No 


Cointegration 
MSc
course 
2015 
HE 
 
 
Distributed sources 


Advanced Statistical
Theory 
Independent 
2019 
EN 
25 
Yes 


Experimental Design
and Analysis of Variance 
Independent 
2018 
HE 
16 
Yes 
Notes by prof. David
Steinberg, TAU (syllabus) 


Probability in High
Dimension 
PhD
course 
2020 
EN 
11 
Yes 
Distributed sources (see references within
the PDF) 


Statistical Learning
Theory 
Independent 
2021 
EN 
4 
No 
Distributed sources (see references within
the PDF) 


Optimization 
Convex optimization 
Independent 
2019 
Was
not summarized 
 
 
 


Artificial
Intelligence and Machine
Learning 
Theoretical Intro to
ML 
Independent 
2019 
Was
not summarized 
 
 
 
67577:
Introduction to Machine Learning, Shai ShalevShwartz,
HUJI 

Intro to Artificial
Intelligence 
Independent 
2016 
EN 
47 
Chapters
summaries +
summarizing table 
Udacity
(Sebastian Thrun, Stanford & Google) 


Intro to Supervised Learning
through Linear Regression 
Independent 
2018 
EN 
 
Summarizing
table on
last slide 
Distributed sources 


Supervised Learning 
Independent 
2016 
EN 
17 
No 
Udacity (Georgia Tech) 


Unsupervised Learning 
Independent 
2017 
EN 
14 
No 
Udacity
(Georgia Tech) 


Reinforcement
Learning 
Independent 
2018 
EN 
16 
Yes 
Udacity
(Georgia Tech) 


Machine Learning (only complementary
materials on top of previous courses) 
Independent 
2017 
EN 
10 
No 
Coursera (Andrew
NG, Stanford) 


Intro to Machine
Learning (only complementary materials
on top of previous courses) 
Independent 
2018 
EN 
3 
 
Udacity
(Sebastian Thrun, Stanford & Google) 


Intro to NLP 
Independent 
2017 
EN 
23 
Chatbotoriented summary 
NLTK book (OReilly Media Inc.) 


Intro to Deep
Learning 
Independent 
2019 
Was
not summarized 
 
 
 
Udacity
(Facebook) CS231n: CNNs
(Stanford) 

*PPTX
files are partially corrupted until I find an alternative hosting to Google
Drive.
·
Common Lines Modeling for Reference Free
AbInitio Reconstruction in CryoEM Ido
Greenberg, Yoel Shkolnisky Journal
of Structural Biology, 2017 Abstract
(informal): Reconstruction of molecular structures (e.g. proteins) is
essential for understanding of their biological function. Such reconstruction
from images of electronmicroscope requires estimation of the unknown viewing
directions of the images. Common lines between the images reveal the relative
viewing direction between any pair of images, but extremely low SNR often
leads to errors in the detection of the common lines. This research attempts to detect the reliable estimates of the
common lines, in order to increase their weight in the estimation of the
images' viewing directions. This new feature, incorporated into an existing
reconstruction algorithm, is shown to achieve improvement of ~40% in the
resolution of the reconstructed map of a ribosome's subunit. ·
A Graph Partitioning Approach to Simultaneous Angular
Reconstitution Gabi
Pragier, Ido Greenberg, Xiuyuan Cheng, Yoel
Shkolnisky IEEE
Transactions in Computational Imaging, 2016 Abstract
(informal): Reconstruction of molecular structures (e.g. proteins) is
essential for understanding of their biological function. Such reconstruction
from images of electronmicroscope requires estimation of the unknown viewing
directions of the images. The viewing directions (denoted ) can be restored from estimations of the relative directions
(denoted ). Unfortunately, some of the relative directions estimates are
distorted (in addition to being noisy) by some kind of reflection. While it
is proven impossible to tell which ones are distorted and which ones are not,
it is vital to at least partition all estimates into 2 homogeneous
groups distorted and not distorted (without knowing which is which) so
that one group can be "fixed" and all estimates be consistent
(either all distorted or all not distorted, where the former results in
reconstruction of a reflected variant of the molecule). This part of the process can be summarized as follows: In this paper we locally estimate the consistency of distortion
between certain relative directions, form these estimated relations as a
graph, and use spectral analysis of the graph to synchronically aggregate all
the local estimations in favor of a global partition of into 2 groups. We use simulations of electronmicroscope images to demonstrate
that using this method as part of the reconstruction algorithm significantly
improves the reconstruction. 
·
Winds of Winter: all public sample chapters in one
printable document. ·
Track & Field: my personal
page in the Israeli Athletic Association (all 800m results were
mysteriously lost ☹). 