IDO GREENBERG

(updated: July 2021)

I am a researcher in the areas of Algorithms, Data Science and Mathematical Modelling, looking to face challenges of both intellectual and practical importance, along with world-class research scientists.

My great passion is to gain and simplify knowledge, spread it and apply it to new fields, pushing the limits of both understanding and capability.

Currently I am particularly interested in fundamental research and better understanding of Machine Intelligence and Deep Neural Networks.

• Talpiot graduate

• MSc from TAU in applied math (summa cum laude, excellence reward of math school, 2 journal publications)

• Experience in ML and data-science (Google, Istra, Technion) and operations research (MAFAT)

GitHub

Scholar

LinkedIn

Software

Academic Publications

Articles

Self-studied Courses Summaries

Other Stuff

Software

Applicational Field	Title	Year	Language	Code	Description
Traffic analysis	Traffic Analysis in Original Video Data of Ayalon Road	2019	Python	GitHub, LinkedIn	Recording of 81 8-minute-long videos of Ayalon road from a window using a smartphone. Dedicatedly-designed CNN (based on transfer learning from Resnet34) for detection of vehicles in large aerial images. Tracking vehicles over frames using probabilistic model implemented through a Kalman filter. Analysis of the fundamental traffic diagram (speed, density and flux), daily patterns, effects of lane-transitions and more. The recording smartphone in action Out-of-the-box SSD vs. the dedicated detector Probabilistic field of the expected next-location of a tracked vehicle Traffic density is indeed high in known rush-hours *The fundamental traffic diagram* (density, speed and flux): practice vs. theory Lane transitions mostly occur on the right lanes Left lane transition is followed by average speed increase of 4.1 km/h**
News	Scraping and Analysis of Hebrew Newspapers	2019	Python	GitHub	Crawling of ~700 articles from 3 Hebrew news websites. Analysis of appearance of parties and politicians before the elections of 2019. Classification of articles and stand-alone paragraphs into sections. Context-based similarity: embedding of words either in graph or in word2vec. summary of data parties and politicians count graph embedding of words by context-similarity
Elevators	Elevators Simulator	2018	Python	GitHub	Elevators continuous-time visual simulator, intended to test various algorithms for elevators managing. visual simulation single scenario summary multi scenario summary
Earthquakes	[Kaggle] Earthquake Prediction	2019	Python	GitHub	Prediction of the remaining time until next earthquake according to seismic measurements. Worked in the framework of a Kaggle competition with prizes of 50K$, as part of a silver-medal winning team that included Zahar Chikishev (full time) and myself (part time). The repo includes only my work, mainly concentrating on two Neural-Network solutions based on raw-signal: Convolutional Neural Network for spectrogram-images of the seismic signal. Attention-based Transformer Network for the raw seismic signal. seismic signal and corresponding time-until-next-quake a sample of mel-spectrogram images
Public transportation	[Hackathon] Matching of Recorded Bus Trips to Planned Routes	2019	Python, C++	GitHub, PPTX	This project was developed in collaboration with Oded Shimon during the two-days Civil Hackathon of the Israeli Workshop of Public Knowledge. Both data sets of planned bus routes and of actual bus trips are publicly available in Israel, allowing diagnosis of trips and comparison of their timings and routes to the plans. However, it is suspected that the reported matching between the planned routes and the actual trips is inaccurate, making any diagnosis ineffective. This project shows that within limited subsets of the data, all the observed trips correctly correspond to their reported route. It also provides general tools for detection of uncertainly-classified trips (with more than one plausible planned route) and of anomalous trips (with no plausible planned route). The match of a trip to a planned route is essentially calculated through the distances of the observed trip locations from the route. a trip vs. several potential routes best route fits vs. 2^nd and 3^rd best fits anomality and certainty distributions of trips
Knesset committees protocols	[Hackathon] Analysis and Clustering of Hebrew Knesset Committees Protocols	2019	Python	GitHub	This project was carried out as part of Talpiot-alumnus single-day hackathon with a group of 6 members under supervision of the Israeli Workshop of Public Knowledge. In order to improve the public access to the extensive parliamentary activity carried in the Knesset committees, the corresponding protocols were studied and some relevant information was extracted regarding both the maturity of the data and the information it carries. In addition, several topic-based clustering approaches were tried, showed promising potential and yielded several results until the end of the hackathon. quantification of the parliamentary activity of a sample of Knesset members visualization of LDA-based clustering (circle = cluster of protocols)
Statistics	Numeric Calculation of Fisher Information for Quantification of Information Loss in Non-Parametric Tests in Two-Populations-Comparison	2018	Python	GitHub	Non-parametric statistical tests can avoid the data-normality assumption in the cost of losing some of the information in the data. NumericFI module quantifies the loss of information in non-parametric tests (signed-rank & rank-sum, both often named Wilcoxon test) compared to parametric tests (t-test) in the case of two paired datasets of normal iid data, using numeric calculation of Fisher Information of each test statistic. The calculations show that while the t-test does have some advantage, it is quite minor (less than 10% in terms of Fisher Information, which roughly means it can be compensated by using 10% more data). In addition, the signed-rank test looks slightly better than the rank-sum test. These results are consistent with literature that studied such tests efficiency using different approaches. output example
Signal processing	Spectrum reconstruction for signals with dropped samples	2019	Python	GitHub	Given a signal sampled in discrete uniform times up to certain missing points, this module compares various methods to find the Fourier spectrum of the original signal and reconstructs the missing points. It is shown that ignoring the missing points causes major disturbances to the Fourier transform, and the limited tests that were tried, linear reconstruction of the signal seems like the best way to prevent these disturbances. demonstration of the interpolation-based methods
Signal processing	ICA-based Sound Decomposition	2019	Python	GitHub	Synchronization of the start-times of audio recordings, and decomposition of them into hopefully-independent source signals using FastICA. Under most setups, the process failed to cleanly reconstruct the original components from which the recorded sounds consisted. a sample of convolutions used for signals-synchronization
Finance	Long-Term Savings Calculator	2018	Python	GitHub	Main functions: Generic tools: pension_payment() – estimate the monthly pension payment at retirement. mortgage_vs_rent() – estimate the time required to buy a house (both with and without mortgage). Generic tools: deposit_and_invest() – estimate future savings by initial sum, depositions and returns. time_to_target() – estimate time required to reach certain amount of savings. WARNING: nothing is guaranteed to be (even approximately) correct. WARNING: validity of calculations is probably restricted to Israel. output example
Plotting infrastructure	Interactive Plotter	2018	Python	GitHub	Infrastructure for interactive figures, in which the limits of the axes are changed dynamically by scrolling and dragging the mouse. zoom out vs. zoom in
Graphical interface infrastructure	Window Controller	2018	Shell	GitHub	Move and resize active window programmatically (Linux only; intended to be used along with corresponding customized keyboard shortcuts).

Academic Publications

· Noise Estimation Is Not Optimal: How to Use Kalman Filter the Right Way

Ido Greenberg, Netanel Yannay, Shie Mannor

See publication

See repository

Abstract:

Determining the noise parameters of a Kalman Filter (KF) has been studied for decades. A huge body of research focuses on the task of estimation of the noise under various conditions, since precise noise estimation is considered equivalent to minimization of the filtering errors. However, we show that even a small violation of the KF assumptions can significantly modify the effective noise, breaking the equivalence between the tasks and making noise estimation an inferior strategy. We show that such violations are very common, and are often not trivial to handle or even notice. Consequentially, we argue that a robust solution is needed – rather than choosing a dedicated model per problem.

To that end, we apply gradient-based optimization to the filtering errors directly, with relation to a simple and efficient parameterization of the symmetric and positive-definite parameters of KF. In radar tracking and video tracking, we show that the optimization improves both the accuracy of KF and its robustness to design decisions. In addition, we demonstrate how an optimized neural network model can seem to reduce the errors significantly compared to a KF – and how this reduction vanishes once the KF is optimized similarly. This indicates how complicated models can be wrongly identified as superior to KF, while in fact they were merely more optimized.

· Detecting Rewards Deterioration in Episodic Reinforcement Learning

Ido Greenberg, Shie Mannor

ICML, 2021

See publication

See repository

Abstract (informal):

In Reinforcement Learning (RL), where an agent is trained to do tasks in a certain environment (e.g. autonomous driving, medical devices, etc.), the agent performance may deteriorate due to various reasons (e.g. changes in the environment). Certain works address training under changing conditions, but in production training is often forbidden: when your car behaves suspiciously with a passenger sitting inside, you don’t begin to retrain the car! Rather, you wish to detect the performance deterioration ASAP, and activate safety & fallback mechanisms.

Our work addresses the detection of performance degradation in episodic RL. This is essentially a statistical problem for mean-shift detection in the rewards of the agent. However, the non-i.i.d nature of the rewards makes many common statistical tests irrelevant for the problem. Instead, we use the episodic structure of the signal to formulate the problem as multivariate mean-shift detection. That is, given K episodes of T time-steps each, we consider the 1D sequence of length KT by a T-dimensional sequence of length K, in which the copies in the sequence are i.i.d. Then we suggest a concrete deterioration model which allows us to derive an optimal test for detection of deterioration. We also show how to control the False Alarm Rate of the test, even when running it sequentially on non-i.i.d data.

Finally, we show that indeed, our test detects deterioration in experiments significantly faster and with higher probability compared to alternative tests.

· Common Lines Modeling for Reference Free Ab-Initio Reconstruction in Cryo-EM

Ido Greenberg, Yoel Shkolnisky

Journal of Structural Biology, 2017

See publication

See related presentation

Abstract (informal):

This research attempts to detect the reliable estimates of the common lines, in order to increase their weight in the estimation of the images' viewing directions. This new feature, incorporated into an existing reconstruction algorithm, is shown to achieve improvement of ~40% in the resolution of the reconstructed map of a ribosome's subunit.

· A Graph Partitioning Approach to Simultaneous Angular Reconstitution

Gabi Pragier, Ido Greenberg, Xiuyuan Cheng, Yoel Shkolnisky

IEEE Transactions in Computational Imaging, 2016

See publication

Abstract (informal):

Reconstruction of molecular structures (e.g. proteins) is essential for understanding of their biological function. Such reconstruction from images of electron-microscope requires estimation of the unknown viewing directions of the images. The viewing directions (denoted ) can be restored from estimations of the relative directions (denoted ). Unfortunately, some of the relative directions estimates are distorted (in addition to being noisy) by some kind of reflection. While it is proven impossible to tell which ones are distorted and which ones are not, it is vital to at least partition all estimates into 2 homogeneous groups – distorted and not distorted (without knowing which is which) – so that one group can be "fixed" and all estimates be consistent (either all distorted or all not distorted, where the former results in reconstruction of a reflected variant of the molecule).

This part of the process can be summarized as follows:

In this paper we locally estimate the consistency of distortion between certain relative directions, form these estimated relations as a graph, and use spectral analysis of the graph to synchronically aggregate all the local estimations in favor of a global partition of into 2 groups.

We use simulations of electron-microscope images to demonstrate that using this method as part of the reconstruction algorithm significantly improves the reconstruction.

Articles and Presentations

All the materials below are free to use for any purpose as long as proper attribution is given. The writer has no official education or certification in most topics appearing below. None of the documents is meant to recommend any action to the reader, and no responsibility on such actions will be taken.
	Field	Topic	Framework	Year	Documents	Language	Pages	Short summary included	Comments
	Mathematical Modeling	Elevators Waiting-Time Optimization	MSc course	2014	PPTX*, PDF	HE	-	-
	Mathematical Modeling	Simulator: Liquid within Moving Container	High school	2007	PDF	HE	49	Yes	In collaboration with Jonathan Cederbaum
	Machine Learning	Web Servers Classification	High school	2008	PDF	HE	42	Yes	In collaboration with Jonathan Cederbaum and CheckPoint Security
		Illustrated List of Basic Methods in ML	Independent	2018	PNG	EN	-	-	Illustrations were collected from various sources (none was made by me)
		Intro to Deep Reinforcement Learning	Independent	2021	PPTX*	EN	13	-
	Finance	Basic Concepts in Finance and Investments	Independent	2017	PDF, PPTX*	HE	12	Yes
	Finance	Pension Tutorial	Independent	2018	PDF	HE	20	Yes
	Human Resources	Disturbing Factors for Technological Employees in Permanent Military Service	Independent	2017	PDF	HE	4	No	Based on survey among 21 subjects

*PPTX files are partially corrupted until I find an alternative hosting to Google Drive.

Self-studied Courses Summaries

All the materials below are free to use for any purpose as long as proper attribution is given. Most summaries passed little to none review, and probably contain inaccuracies.
	Field	Topic	Framework	Year	Documents	Language	Pages	Short summary included	Main source
	Algorithms	Basic algorithms, graphs, DP, LP, spatial search, etc.	Independent	2018	PDF	EN	14	No	Algorithms 1, Technion, 2013 and analog courses in TAU, HUJI and Udacity
	Information	Information Theory	Independent	2019	PDF	EN	22	Yes	Information, Physics and Computation, Stanford, 2009; Elements of Information Theory, 2006
	Signal processing	Digital Signal Processing	Independent	2018	PDF	EN	13	No	DSP, Technion, 2012
	Graphs	Intro to graph theory, basic algorithms, spectral graph theory and SNA	Independent	2019	PDF	EN	16	No	Distributed sources (see references within the PDF)
	Graphs	Random Graphs	PhD course	2021	PDF	EN		No	Random Graphs and Hypergraphs (049014), O. Bobrowski, Technion, 2021
	Statistics	Intro to Statistical Theory	Independent	2018	PDF	EN	11	No	Intro to Statistical Theory, Technion, 2012
		Cointegration	MSc course	2015	PPTX*, PDF	HE	-	-	Distributed sources
		Advanced Statistical Theory	Independent	2019	PDF, PPTX*	EN	25	Yes	Notes by Ryan Martin, North Caroline, 2017
		Experimental Design and Analysis of Variance	Independent	2018	PDF, PPTX*	HE	16	Yes	Notes by prof. David Steinberg, TAU (syllabus)
		Probability in High Dimension	PhD course	2020	PDF	EN	11	Yes	Distributed sources (see references within the PDF)
		Statistical Learning Theory	Independent	2021	PDF	EN	4	No	Distributed sources (see references within the PDF)
	Optimization	Convex optimization	Independent	2019	Was not summarized	-	-	-	EE364a: Convex Optimization, Stephen Boyd, Stanford
	Artificial Intelligence and Machine Learning	Theoretical Intro to ML	Independent	2019	Was not summarized	-	-	-	67577: Introduction to Machine Learning, Shai Shalev-Shwartz, HUJI
		Intro to Artificial Intelligence	Independent	2016	PDF	EN	47	Chapters summaries + summarizing table	Udacity (Sebastian Thrun, Stanford & Google)
		Intro to Supervised Learning through Linear Regression	Independent	2018	PPTX*, PDF	EN	-	Summarizing table on last slide	Distributed sources
		Supervised Learning	Independent	2016	PDF	EN	17	No	Udacity (Georgia Tech)
		Unsupervised Learning	Independent	2017	PDF	EN	14	No	Udacity (Georgia Tech)
		Reinforcement Learning	Independent	2018	PDF	EN	16	Yes	Udacity (Georgia Tech)
		Machine Learning (only complementary materials on top of previous courses)	Independent	2017	PDF	EN	10	No	Coursera (Andrew NG, Stanford)
		Intro to Machine Learning (only complementary materials on top of previous courses)	Independent	2018	PDF	EN	3	-	Udacity (Sebastian Thrun, Stanford & Google)
		Intro to NLP	Independent	2017	PDF	EN	23	Chatbot-oriented summary	NLTK book (O’Reilly Media Inc.)
		Intro to Deep Learning	Independent	2019	Was not summarized	-	-	-	Udacity (Facebook) CS231n: CNNs (Stanford)

*PPTX files are partially corrupted until I find an alternative hosting to Google Drive.

Other Stuff

· Winds of Winter: all public sample chapters in one printable document.

· Track & Field: my personal page in the Israeli Athletic Association (all 800m results were mysteriously lost ☹).