Auflistung nach Schlagwort "Feature engineering"
1 - 3 von 3
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelEnabling data-centric AI through data quality management and data literacy(it - Information Technology: Vol. 64, No. 1-2, 2022) Abedjan, ZiawaschData is being produced at an intractable pace. At the same time, there is an insatiable interest in using such data for use cases that span all imaginable domains, including health, climate, business, and gaming. Beyond the novel socio-technical challenges that surround data-driven innovations, there are still open data processing challenges that impede the usability of data-driven techniques. It is commonly acknowledged that overcoming heterogeneity of data with regard to syntax and semantics to combine various sources for a common goal is a major bottleneck. Furthermore, the quality of such data is always under question as the data science pipelines today are highly ad-hoc and without the necessary care for provenance. Finally, quality criteria that go beyond the syntactical and semantic correctness of individual values but also incorporate population-level constraints, such as equal parity and opportunity with regard to protected groups, play a more and more important role in this process. Traditional research on data integration was focused on post-merger integration of companies, where customer or product databases had to be integrated. While this is often hard enough, today the challenges aggravate because of the fact that more stakeholders are using data analytics tools to derive domain-specific insights. I call this phenomenon the democratization of data science, a process, which is both challenging and necessary. Novel systems need to be user-friendly in a way that not only trained database admins can handle them but also less computer science savvy stakeholders. Thus, our research focuses on scalable example-driven techniques for data preparation and curation. Furthermore, we believe that it is important to educate the breadth of society on implications of a data-driven world and actively promote the concept of data literacy as a fundamental competence.
- ZeitschriftenartikelFeature Engineering Techniques and Spatio-Temporal Data Processing(Datenbank-Spektrum: Vol. 21, No. 3, 2021) Forke, Chris-Marian; Tropmann-Frick, MarinaMore and more applications nowadays use spatio-temporal data for different purposes. In order to be processed and used efficiently, this unique type of data requires special handling. This paper summarizes methods and approaches for feature selection of spatio-temporal data and machine learning algorithms for spatio-temporal data engineering. Furthermore, it highlights relevant work in specific domains. The range of possible approaches for data processing is quite wide. However, in order to use these approaches with the spatio-temporal data in a meaningful and practical way, individual data processing steps need to be adapted. One of the most important steps is feature engineering.
- ZeitschriftenartikelPredictive analytics for data driven decision support in health and care(it - Information Technology: Vol. 60, No. 4, 2018) Hayn, Dieter; Veeranki, Sai; Kropf, Martin; Eggerth, Alphons; Kreiner, Karl; Kramer, Diether; Schreier, GünterDue to an ever-increasing amount of data generated in healthcare each day, healthcare professionals are more and more challenged with information. Predictive models based on machine learning algorithms can help to quickly identify patterns in clinical data. Requirements for data driven decision support systems for health and care ( DS4H ) are similar in many ways to applications in other domains. However, there are also various challenges which are specific to health and care settings. The present paper describes a) healthcare specific requirements for DS4H and b) how they were addressed in our Predictive Analytics Toolset for Health and care ( PATH ). PATH supports the following process: objective definition, data cleaning and pre-processing, feature engineering, evaluation, result visualization, interpretation and validation and deployment. The current state of the toolset already allows the user to switch between the various involved levels, i. e. raw data (ECG), pre-processed data (averaged heartbeat), extracted features (QT time), built models (to classify the ECG into a certain rhythm abnormality class) and outcome evaluation (e. g. a false positive case) and to assess the relevance of a given feature in the currently evaluated model as a whole and for the individual decision. This allows us to gain insights as a basis for improvements in the various steps from raw data to decisions.