hastic-server/analytics/analytics/detectors/pattern_detector.py

import models

import logging
import config

import pandas as pd
from typing import Optional

from detectors import Detector
from buckets import DataBucket
from models import ModelCache


logger = logging.getLogger('PATTERN_DETECTOR')


def resolve_model_by_pattern(pattern: str) -> models.Model:
    if pattern == 'GENERAL':
        return models.GeneralModel()
    if pattern == 'PEAK':
        return models.PeakModel()
    if pattern == 'TROUGH':
        return models.TroughModel()
    if pattern == 'DROP':
        return models.DropModel()
    if pattern == 'JUMP':
        return models.JumpModel()
    if pattern == 'CUSTOM':
        return models.CustomModel()
    raise ValueError('Unknown pattern "%s"' % pattern)

AnalyticUnitId = str
class PatternDetector(Detector):

    def __init__(self, pattern_type: str, analytic_unit_id: AnalyticUnitId):
        self.analytic_unit_id = analytic_unit_id
        self.pattern_type = pattern_type
        self.model = resolve_model_by_pattern(self.pattern_type)
        self.window_size = 150
        self.bucket = DataBucket()
        self.bucket_full_reported = False

    def train(self, dataframe: pd.DataFrame, segments: list, cache: Optional[models.ModelCache]) -> models.ModelCache:
        # TODO: pass only part of dataframe that has segments
        new_cache = self.model.fit(dataframe, segments, cache)
        return {
            'cache': new_cache
        }

    def detect(self, dataframe: pd.DataFrame, cache: Optional[models.ModelCache]) -> dict:
        logger.debug('Unit {} got {} data points for detection'.format(self.analytic_unit_id, len(dataframe)))
        # TODO: split and sleep (https://github.com/hastic/hastic-server/pull/124#discussion_r214085643)
        detected = self.model.detect(dataframe, cache)

        segments = [{ 'from': segment[0], 'to': segment[1] } for segment in detected['segments']]
        newCache = detected['cache']

        last_dataframe_time = dataframe.iloc[-1]['timestamp']
        # TODO: convert from nanoseconds to millisecond in a better way: not by dividing by 10^6
        last_detection_time = last_dataframe_time.value / 1000000
        return {
            'cache': newCache,
            'segments': segments,
            'lastDetectionTime': last_detection_time
        }

    def recieve_data(self, data: pd.DataFrame, cache: Optional[ModelCache]) -> Optional[dict]:
        self.bucket.receive_data(data.dropna())

        if len(self.bucket.data) >= self.window_size and cache != None:
            if not self.bucket_full_reported:
                logging.debug('{} unit`s bucket full, run detect'.format(self.analytic_unit_id))
                self.bucket_full_reported = True

            res = self.detect(self.bucket.data, cache)

            excess_data = len(self.bucket.data) - self.window_size
            self.bucket.drop_data(excess_data)
            return res
        else:
            filling = len(self.bucket.data)*100 / self.window_size
            logging.debug('bucket for {} {}% full'.format(self.analytic_unit_id, filling))
        
        return None
Split out models from detectors #98 (#101) * Create abstract model class * Move detectors/_detector -> models/_model * Update Model class * Change detectors to models and move fields to self.state * Use models instead of detectors in PatternDetector * Update inits in detectors/ and models/ * Add types to resolve_model_by_pattern * Add types to abstract Model class 6 years ago			`import models`
detectors to folder 6 years ago
Add src 6 years ago			`import logging`
folders config++ 6 years ago			`import config`

Add src 6 years ago			`import pandas as pd`
Make all models work && add reverse peak model (#124) - Subtract min value from dataset before passing to model - Rename StepModel -> DropModel - Use cache to save state in all models - Return `Segment { 'from': <timestamp>, 'to': <timestamp>}` instead of `Segment { 'from': <index>, 'to': <index>}` in all models - Integrate new peaks model (from https://github.com/hastic/hastic-server/pull/123) - Integrate new reverse-peaks model (from https://github.com/hastic/hastic-server/pull/123) - Refactor: make `predict` method in `Model` not abstract and remove it from all children - Refactor: add abstract `do_predict` method to models 6 years ago			`from typing import Optional`
Add src 6 years ago
One panel - one worker #62 6 years ago			`from detectors import Detector`
Analytic unit worker bucket #273 (#297) 6 years ago			`from buckets import DataBucket`
Error: detect missing cache #299 (#302) * Pass cache to detect * rename AnalyticUnitCache to ModelCache * Send .data from cache * Drop nans from bucket && set window size from cache && check cache None * Read proper payload on DETECT message 6 years ago			`from models import ModelCache`
folders config++ 6 years ago
Add src 6 years ago
analytics clearup 6 years ago			`logger = logging.getLogger('PATTERN_DETECTOR')`
Add src 6 years ago

Split out models from detectors #98 (#101) * Create abstract model class * Move detectors/_detector -> models/_model * Update Model class * Change detectors to models and move fields to self.state * Use models instead of detectors in PatternDetector * Update inits in detectors/ and models/ * Add types to resolve_model_by_pattern * Add types to abstract Model class 6 years ago			`def resolve_model_by_pattern(pattern: str) -> models.Model:`
general predictor -> general model (#130) 6 years ago			`if pattern == 'GENERAL':`
			`return models.GeneralModel()`
upperase literals + server start bugfix 6 years ago			`if pattern == 'PEAK':`
Make all models work && add reverse peak model (#124) - Subtract min value from dataset before passing to model - Rename StepModel -> DropModel - Use cache to save state in all models - Return `Segment { 'from': <timestamp>, 'to': <timestamp>}` instead of `Segment { 'from': <index>, 'to': <index>}` in all models - Integrate new peaks model (from https://github.com/hastic/hastic-server/pull/123) - Integrate new reverse-peaks model (from https://github.com/hastic/hastic-server/pull/123) - Refactor: make `predict` method in `Model` not abstract and remove it from all children - Refactor: add abstract `do_predict` method to models 6 years ago			`return models.PeakModel()`
Reverse peak -> trough 6 years ago			`if pattern == 'TROUGH':`
			`return models.TroughModel()`
upperase literals + server start bugfix 6 years ago			`if pattern == 'DROP':`
Make all models work && add reverse peak model (#124) - Subtract min value from dataset before passing to model - Rename StepModel -> DropModel - Use cache to save state in all models - Return `Segment { 'from': <timestamp>, 'to': <timestamp>}` instead of `Segment { 'from': <index>, 'to': <index>}` in all models - Integrate new peaks model (from https://github.com/hastic/hastic-server/pull/123) - Integrate new reverse-peaks model (from https://github.com/hastic/hastic-server/pull/123) - Refactor: make `predict` method in `Model` not abstract and remove it from all children - Refactor: add abstract `do_predict` method to models 6 years ago			`return models.DropModel()`
upperase literals + server start bugfix 6 years ago			`if pattern == 'JUMP':`
Split out models from detectors #98 (#101) * Create abstract model class * Move detectors/_detector -> models/_model * Update Model class * Change detectors to models and move fields to self.state * Use models instead of detectors in PatternDetector * Update inits in detectors/ and models/ * Add types to resolve_model_by_pattern * Add types to abstract Model class 6 years ago			`return models.JumpModel()`
upperase literals + server start bugfix 6 years ago			`if pattern == 'CUSTOM':`
Add custom model 6 years ago			`return models.CustomModel()`
detectors cleanup & jump_detector integration 6 years ago			`raise ValueError('Unknown pattern "%s"' % pattern)`
Add src 6 years ago
Detection return empty result #347 (#348) * set constant window size * improve logging, save detected segments from push\pull process 6 years ago			`AnalyticUnitId = str`
One panel - one worker #62 6 years ago			`class PatternDetector(Detector):`
Add src 6 years ago
Detection return empty result #347 (#348) * set constant window size * improve logging, save detected segments from push\pull process 6 years ago			`def __init__(self, pattern_type: str, analytic_unit_id: AnalyticUnitId):`
			`self.analytic_unit_id = analytic_unit_id`
detectors to folder 6 years ago			`self.pattern_type = pattern_type`
Split out models from detectors #98 (#101) * Create abstract model class * Move detectors/_detector -> models/_model * Update Model class * Change detectors to models and move fields to self.state * Use models instead of detectors in PatternDetector * Update inits in detectors/ and models/ * Add types to resolve_model_by_pattern * Add types to abstract Model class 6 years ago			`self.model = resolve_model_by_pattern(self.pattern_type)`
Detection return empty result #347 (#348) * set constant window size * improve logging, save detected segments from push\pull process 6 years ago			`self.window_size = 150`
Analytic unit worker bucket #273 (#297) 6 years ago			`self.bucket = DataBucket()`
Detection return empty result #347 (#348) * set constant window size * improve logging, save detected segments from push\pull process 6 years ago			`self.bucket_full_reported = False`
Add src 6 years ago
Error: detect missing cache #299 (#302) * Pass cache to detect * rename AnalyticUnitCache to ModelCache * Send .data from cache * Drop nans from bucket && set window size from cache && check cache None * Read proper payload on DETECT message 6 years ago			`def train(self, dataframe: pd.DataFrame, segments: list, cache: Optional[models.ModelCache]) -> models.ModelCache:`
177-improve-drops-model 6 years ago			`# TODO: pass only part of dataframe that has segments`
Make all models work && add reverse peak model (#124) - Subtract min value from dataset before passing to model - Rename StepModel -> DropModel - Use cache to save state in all models - Return `Segment { 'from': <timestamp>, 'to': <timestamp>}` instead of `Segment { 'from': <index>, 'to': <index>}` in all models - Integrate new peaks model (from https://github.com/hastic/hastic-server/pull/123) - Integrate new reverse-peaks model (from https://github.com/hastic/hastic-server/pull/123) - Refactor: make `predict` method in `Model` not abstract and remove it from all children - Refactor: add abstract `do_predict` method to models 6 years ago			`new_cache = self.model.fit(dataframe, segments, cache)`
Hotfix: return cache from pattern_detector.train 6 years ago			`return {`
Make all models work && add reverse peak model (#124) - Subtract min value from dataset before passing to model - Rename StepModel -> DropModel - Use cache to save state in all models - Return `Segment { 'from': <timestamp>, 'to': <timestamp>}` instead of `Segment { 'from': <index>, 'to': <index>}` in all models - Integrate new peaks model (from https://github.com/hastic/hastic-server/pull/123) - Integrate new reverse-peaks model (from https://github.com/hastic/hastic-server/pull/123) - Refactor: make `predict` method in `Model` not abstract and remove it from all children - Refactor: add abstract `do_predict` method to models 6 years ago			`'cache': new_cache`
Hotfix: return cache from pattern_detector.train 6 years ago			`}`
Add src 6 years ago
Error: detect missing cache #299 (#302) * Pass cache to detect * rename AnalyticUnitCache to ModelCache * Send .data from cache * Drop nans from bucket && set window size from cache && check cache None * Read proper payload on DETECT message 6 years ago			`def detect(self, dataframe: pd.DataFrame, cache: Optional[models.ModelCache]) -> dict:`
Detection return empty result #347 (#348) * set constant window size * improve logging, save detected segments from push\pull process 6 years ago			`logger.debug('Unit {} got {} data points for detection'.format(self.analytic_unit_id, len(dataframe)))`
Make all models work && add reverse peak model (#124) - Subtract min value from dataset before passing to model - Rename StepModel -> DropModel - Use cache to save state in all models - Return `Segment { 'from': <timestamp>, 'to': <timestamp>}` instead of `Segment { 'from': <index>, 'to': <index>}` in all models - Integrate new peaks model (from https://github.com/hastic/hastic-server/pull/123) - Integrate new reverse-peaks model (from https://github.com/hastic/hastic-server/pull/123) - Refactor: make `predict` method in `Model` not abstract and remove it from all children - Refactor: add abstract `do_predict` method to models 6 years ago			`# TODO: split and sleep (https://github.com/hastic/hastic-server/pull/124#discussion_r214085643)`
Rename predict to detect #279 (#284) * dummy rename * fixes * renaming in analytics 6 years ago			`detected = self.model.detect(dataframe, cache)`
Add src 6 years ago
Rename predict to detect #279 (#284) * dummy rename * fixes * renaming in analytics 6 years ago			`segments = [{ 'from': segment[0], 'to': segment[1] } for segment in detected['segments']]`
			`newCache = detected['cache']`
Add src 6 years ago
detectors cleanup & jump_detector integration 6 years ago			`last_dataframe_time = dataframe.iloc[-1]['timestamp']`
Update pattern_detector.py 6 years ago			`# TODO: convert from nanoseconds to millisecond in a better way: not by dividing by 10^6`
Convert lastDetectionTime from ns to ms (#305) 6 years ago			`last_detection_time = last_dataframe_time.value / 1000000`
Analytic unit cache start #117 (#120) 6 years ago			`return {`
Make all models work && add reverse peak model (#124) - Subtract min value from dataset before passing to model - Rename StepModel -> DropModel - Use cache to save state in all models - Return `Segment { 'from': <timestamp>, 'to': <timestamp>}` instead of `Segment { 'from': <index>, 'to': <index>}` in all models - Integrate new peaks model (from https://github.com/hastic/hastic-server/pull/123) - Integrate new reverse-peaks model (from https://github.com/hastic/hastic-server/pull/123) - Refactor: make `predict` method in `Model` not abstract and remove it from all children - Refactor: add abstract `do_predict` method to models 6 years ago			`'cache': newCache,`
Analytic unit cache start #117 (#120) 6 years ago			`'segments': segments,`
Rename predict to detect #279 (#284) * dummy rename * fixes * renaming in analytics 6 years ago			`'lastDetectionTime': last_detection_time`
Analytic unit cache start #117 (#120) 6 years ago			`}`
Analytic unit worker bucket #273 (#297) 6 years ago
ModelCache 6 years ago			`def recieve_data(self, data: pd.DataFrame, cache: Optional[ModelCache]) -> Optional[dict]:`
Error: detect missing cache #299 (#302) * Pass cache to detect * rename AnalyticUnitCache to ModelCache * Send .data from cache * Drop nans from bucket && set window size from cache && check cache None * Read proper payload on DETECT message 6 years ago			`self.bucket.receive_data(data.dropna())`
Analytic unit worker bucket #273 (#297) 6 years ago
Error: detect missing cache #299 (#302) * Pass cache to detect * rename AnalyticUnitCache to ModelCache * Send .data from cache * Drop nans from bucket && set window size from cache && check cache None * Read proper payload on DETECT message 6 years ago			`if len(self.bucket.data) >= self.window_size and cache != None:`
Detection return empty result #347 (#348) * set constant window size * improve logging, save detected segments from push\pull process 6 years ago			`if not self.bucket_full_reported:`
			logging.debug('{} unit`s bucket full, run detect'.format(self.analytic_unit_id))
			`self.bucket_full_reported = True`

Error: detect missing cache #299 (#302) * Pass cache to detect * rename AnalyticUnitCache to ModelCache * Send .data from cache * Drop nans from bucket && set window size from cache && check cache None * Read proper payload on DETECT message 6 years ago			`res = self.detect(self.bucket.data, cache)`
Analytic unit worker bucket #273 (#297) 6 years ago
			`excess_data = len(self.bucket.data) - self.window_size`
			`self.bucket.drop_data(excess_data)`
			`return res`
Detection return empty result #347 (#348) * set constant window size * improve logging, save detected segments from push\pull process 6 years ago			`else:`
			`filling = len(self.bucket.data)*100 / self.window_size`
			`logging.debug('bucket for {} {}% full'.format(self.analytic_unit_id, filling))`
Analytic unit worker bucket #273 (#297) 6 years ago
			`return None`