ࡱ! Z 2 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XV~OJPJQJ^J@ Normal4$$dA$]^`a$3567>*B*CJOJPJQJ\]^JaJphPP{ Heading 1$dxA$5CJ$\aJ$PP{ Heading 2$dhPA$5CJ\aJZZ{ Heading 3$dPA$5B*CJ\aJphfffZZ{ Heading 4$d(A$6B*CJ]aJphfffZZ{ Heading 5$d(A$5B*CJ\aJphfffZZ{ Heading 6$d(A$6B*CJ]aJphfffDAD Default Paragraph FontRiR  Table Normal4 l4a (k(No ListH>H{Title$dxA$5CJH\aJHhJh{Subtitle$dhPA$'6B*CJ0OJPJQJ]^JaJ0phfff;PR'd9LvUVWXYZb &ttv *"*--11\6^6l9n9 <<tAvAIINK q~lv[\]^_`abcdefghijklmnopqrstuvwxyh=h^`567>*B*CJOJPJQJ\]^JaJphhH.h=8^`567>*B*CJOJPJQJ\]^JaJphhH.=pL^p`L567>*B*CJOJPJQJ\]^JaJphhH.h=@  ^@ `567>*B*CJOJPJQJ\]^JaJphhH.h= ^`567>*B*CJOJPJQJ\]^JaJphhH.=Lx^`L567>*B*CJOJPJQJ\]^JaJphhH.h=H^`567>*B*CJOJPJQJ\]^JaJphhH.h=^`567>*B*CJOJPJQJ\]^JaJphhH.=PL^P`L567>*B*CJOJPJQJ\]^JaJphhH. ;PGz Times New Roman5Symbol3& z Arial7Georgia@"h!r4d2X?>{2Unknown!` vAWN1... .>>;P 4X CPh*C  CF   +C3 CCCCCCCCCCCCCCCCCCCCCCC FI PPP ArchB Task Force  Data and Information Uncertainty Draft Version V2.0 Contributors: Andreas Metzger (U Duisburg-Essen, FInest, caretaker), Adrie Beulens (U Wageningen, SmartAgriFood), Federico Facca (CREATE-NET, Infinity), Fabiana Fournier (IBM, FInest), Denis Havlik (AIT, Environfi), Fano Ramparany (Orange, OutSmart), Zoheir Sabeur (IT Innovation, Environfi) ** The aim of this document is to collect material, input and results of discussions on the issue of Data Uncertainty (which sometimes is also referred to as Quality of Information (QoI). Data and information is a key Input for systems operation and optimization. In fact, considering trends such as sensor networks and business intelligence, the amount and frequency of data will radically increase, leading to the opportunities and challenges faced by Big Data. The Big Data issue (cf. [6]) in fact is exemplified by emerging FI systems, which will be highly distributed, decentralized, multi-stakeholder systems composed from third-party entities (IoS, IoT, IoC) exploiting heterogeneous sources of data and information [4, 5, 6]. Managing the uncertainty of data and information of third-party sources will thus become essential for reliable systems operation. The aims of this working document are to: (1) Collect and analyse in more detail the various factors that may characterize Data and Information Uncertainty and to support them with examples from the FI PPP UC projects; (2) Identify whether (existing or planned) FI-WARE Generic Enablers (GEs) may help address the challenges related Data and Information Uncertainty Management; and thus identify potential missing capabilities in FI-WARE (3) Scrutinize the state of the art to identify possible solutions to provide those missing capabilities (4) Propose features to FI-WARE (and possibly open call topics) to develop those missing capabilities Aims (2)-(4) are sub-structured as follows: % QoI definition and specifications (e.g., models and standards) % Quantifying QoI (e.g., levels of reliability and trust of sources of information,) % Reasoning in the presence of QoI (e.g., statistical analysis of data and information) 1. QoI Characterizing Factors Below we list and motivate the different factors that may characterize QoI: FactorExample(s)Root cause / sourcesCommentsTrustworthiness / credibility/reliability of the information and data sourceCrowd-sourced waste detection (Outsmart): trust in person entering data and information Communities observation of events (ENVIROFI): trust and reliability in the information and data entered by volunteersTrust depends on users types (e.g., new, unknown & city employee). Trust may evolve as it is subjective; while it depends on the number of times reliable information and data are provided by each data provider. The trust varies with time, user types and also measurement types in consideration and for what context (Trust estimations will vary also depending on to what level of criticality we want to use such data and information)But: Sources of data may have to be protected against unauthorized being made public and privacy protection.Accuracy of  sensors (measurement errors, human errors in data entering)Temperature of cargo is measured imprecisely (e.g., error in measurement is 5 degrees  too high for sensitive good) (Finest) Customer states wrong amount of cargo that he likes to ship (FInest) (1) Air temperature for weather conditions, using cheap sensors is more likely to be with low accuracy rather than more expensive weather stations with high accuracy (ENVIROFI) Multi-resolution and coverage cameras with various vision background information (image recognition) in a city (SafeCity)Ad (1): Overbooking to be  on the safe side ; last minute changes in plans / productionBut: inaccuracy may also be a means to ensure privacy; e.g., adding uncertainty to data is one potential policy for disclosing part of private data. It is extremely useful and important for the FI to bring in sensor measurements; with errors of measurements as a pre-requisite piece of information available to the community. The distinction between sensors and their respective systematic errors availability will guarantee a more reliable propagation of estimated uncertainties across systems in the FI.Timeliness of information (such as delay between effective creation of data to arrival in IT system and services)Time lag between creation of paper documents for transport until the event is registered in the IT systems (FInest) Data are made available as of a certain moment in time (time lag). What does that mean for fit for use? (SmartAgriFood) Break between physical and virtual worlds, time lags between IT systems (e.g., batch mode upload of data), ... Uncertainty of data introduced due to the result of an abductive (or other) inference and reasoning processAs an example, a future traffic jam based is predicted based on the car flow at a road intersection (InstantMobility & FInest); of course, prediction is not possible with 100% certainty.The cars flow sensor measurement gives an indication (and possible trend), but not a definitive conclusion / observation Absence of dataFaulty / non-responding sensor No deployed sensors in the area of interestBattery might be empty Harsh environmental conditions for sensing Sensor life cycle reached. Sensor bio-fouling It is important to know the specification of sensors. Some standards on capability of sensors (OGC sensor web enablement for example) can be adopted. In this manner, sensors specifications including their reliability and the duration of such reliability need to be known. Also the sensor accuracy is given under specific environmental conditions. Some sensors do not work under harsh conditions. Example: sensor measuring humidity, but at temperature not exceedingDiscrepancy in information representation (data vs. information)Geographical information may be highly accurate, but may be tweaked by the  wrong choice of representation (i.e., projection from 3D surface of the earth to a 2D map) (SmartAgriFood)Geographic projection referencing may be wrongly adopted Data sources using different projection methods and/or geographic referencing. Fit for purpose<example?>Quite often data have been captured with a (set of) purposes) in mind. As a consequence they may be fit for use for those purposes and not obviously for others; also, other observations of object systems may have thus been omitted (SmartAgriFood)In which regards is this related to  Quality of Experience ? Chain and propagation of uncertainties /  noisiness of processing stepsEach processing step adds uncertainties (ENVIFOFI) Models for derived data that can be trusted. E.g. if data about crop growth and soil conditions etc are gathered then standard models may be used to derive accepted derived data. (SmartAgriFood)In various data fusion components, uncertainties are taken on board to establish/quantify values at or even predict an event for decision support in the Future InternetThis is pre-requisite for achieving awareness about the state of processes and environments in the Future Internet. This applies across all usage areas and therefore FI-WARE should be aware of it.Belief and Plausibility of measurements and information Air Temperature today is 50 degrees in London....does this makes sense? Is this possible? (ENVIROFI) Can this be right? The sensor, although highly accurate and reliable because it is made by a reputable manufacturer, displays a wrong temperature. This is may be due to its badly calibrated set up in the first place. The numerical model rounding errors build up at a stage where it is eminent that the model no longer converges to the right physical solutions and therefore its predictions needs to be ignored altogether.The plausibility of data values and information can only be tested when we have critical mass in the number of information and data we obtain. The buffering method for overcoming such problem is to introduce the concept of plausibility as a measure of testing the validity of information and data in time and in space as well as during specific events. These can also be critical and therefore the decision made will greatly depend on the plausibility of information and data. Broadly speaking, there are basically three clusters of uncertainty sources: imprecise measurements, human error and uncertainty introduced due to data processing. 2. Analysis of FI-WARE GEs ((to be updated after the discussions during the Vienna ArchB)) (Q1) May the  meta data capabilities be used to facilitate the documentation of data uncertainty (see Section 2.1)? If so how can this be done? Meta data capabilities, particularly those concerning standard observation & measurement models will be key for achieving interoperability between the ever growing volume and heterogeneity of data sources generated in the future internet of things, services, people and content. If adopted, it will be then important to give channels for representing uncertainty through observation uncertainty modelling. Activities which are evolving in the open geospatial world of research need to be taken on board. (Q2) As it seems  massive data gathering is outside the scope of the chapter, how to ensure that data can be collected while measuring and attributing the uncertainty aspects? (See Q1) (Q2) Does Big Data Analysis and CEP provide any mechanisms for handling uncertainty during processing  uncertain data and events? If so, what concrete mechanisms are available? There is a distinction between the automated handling of uncertainty information and processing it downstream. From the word go at source level of data and/information, there will be a need for the management of given observation error, reliability, trust and to some extent uncertainty. Although I would like to reserve the  word uncertainty as processed information downstream the original source of information/data. Since many sources may provide observation data /information about the same, overlapping, or compatible phenomena or processes, with associated respective errors, there will be a need to aggregate such errors in a very objective and still automated way. It is this process which needs to be put in place in FI-WARE. The aggregation of errors from generating sources of data and information needs to be achieved to push the FI information and data downstream for services and application usage. The most relevant and best candidate processing services which badly need such  uncertainty management are data fusion and modelling oriented services. What the users of FI require is making sense and retrieving intelligent knowledge from an ever increasing volume of data and information. The fusion services, aided by the uncertainty management services will be able to do the job perfectly in this case. 2.1 Documenting / expressing QoI As we have articulated such issues in previous sections, it is indeed important and relevant to reiterate the fact that data and context information should be brought side by side when connecting sources together in the future internet. Processing data without context is a recipe for disaster. It is important to provide context to observation data for example in order to process them together and achieve important new knowledge at both temporal and spatial levels for any user on demand. Every user may ask for that knowledge differently and therefore the context of data to be used a geospatial needs to be addressed. Example: from a given database containing observation of a particular environmental parameter during a particular period of time, it will be important to know if it was the same sensor used to achieve such observation or has there been many replacements of such sensor by other sensors with different specifications such as reliability, operational cycle and accuracy etc....Then if there was a downstream process of such data, at least this process will take such context information and history about the sensor usage over time. Other example may include say a phenomenological model providing data with perhaps context information about how valid this model is ? or has it been already validated against observations/ what are its limitations for usage etc....we really need to enrich data with context information. One way of proceeding with this if the data sources do not provide such things (context information) will be to perhaps develop some intermediate  semantic enrichment type of services by FI-WARE. These may serve as a buffer zone for educating communities who provide data to learn how to add context information to their data and make their sources of information, to be prioritised for usage downstream, therefore selling better than those who do not proceed with. In a way, this may become a good and worthwhile incentive for data providers to actually provide not only data but also supporting context information. Again some generic context information model, perhaps focussed on the context in which observations are made, can be initiated as an experiment under FI-WARE. But certainly this will be welcome by those who want to derive new knowledge out of data and context and deliver it to a large community of users on demand. 2.2 Determining / measuring QoI The importance of establishing metrics at the generic level for quantifying QoI is paramount. In this manner, flow of information/data with processed uncertainties can propagate across the information value chain in the Future Internet. The establishment of such metrics need to be developed carefully and tested across the various areas of FI. There are enough dimensions across the usage areas to establish a working set of metrics which will specialise in observation errors, reliability, plausibility, trust, accuracy etc...These will then need to be validated through test beds by the usage area projects. The more successful are the test the more refined and validated these metrics become. 2.3 Reasoning in the presence of QoI The reasoners to be put in place in the presence of QoI are essential when metrics are established for quantifying QoI. The combined usage of the various QoI metrics will require some generic logical model to aggregate them. This part will be a prelude to the establishment of processed uncertainties generated by various sources of data/information. The funnelling effect is required to be put in place in order to manage such large volume of data and information instead of leaving messy as it was originally generated. The reasoner of QoI will be able take on board the dynamic states of data generations across the whole landscape of the Future Internet under which data providers subscribe. 3. State of the Art 3.1 Documenting / expressing QoI Standards related to services under this epic include OGC SWE SensorML, for sensing and processing history, OGC SWE O&M for quality associated with observation data, and UncertML for complex uncertainty representations including probability distributions. In Larkc (http://www.larkc.eu/) there may be some work related to data uncertainty for example related to the prevision of traffic. As stated in previous sections, it is important to adopt open geospatial standards at least for the description of sensor observation and measurements, as well as uncertainty. The challenge will be to extend it with concepts of observation data/information error, reliability, plausibility, trust etc In any case the representation of uncertainty under UNcertML is attractive. Nevertheless, it should be thought in a way that will assist the interoperation with the services which specialise in the quantification of QoI. In other words, if we adopt UncertML, we should spread its usage across services which will take interest in using uncertainty, such as the reasoners in the presence of QoI. 3.2 Determining / measuring QoI The development of represented uncertainties of the various types using standards will make life easier for those who want to develop metrics for the quantification of QoI and data. The statistical approach to this problem is proposed. Nevertheless, one should NOT assume that the distributions of error, trust, plausibility, reliability etc ...in information and data follow the same statistics or nature of statistics. The assumption on the type of distribution should be carefully studies for usage downstream. Safely speaking these distribution should offered as a multitude of distribution for the expert to chose rather than imposing them without any prior justification. One thing for sure is that any user of such services may report back on the validity of their assumption to allow learning capabilities on the measurement of QoI. This latter capability in the systems may be too futuristic though! 3.3 Reasoning in the presence of QoI Techniques for processing uncertain data include Bayesian approaches, Dempster-Shafer / theory of evidence, Fuzzy logics, Truth maintenance systems, Possibility theory Techniques for modelling trust: Trust network, Considering the issue of data delays in the setting of Big Data / Complex Event Processing (CEP), some work related to this problem exists in the CEP field. Several authors have dealt with the more general problem of uncertain events, focusing on the effect that uncertain input events have on the accuracy of the situation that is detected [1] [3]. The combined usage of the various QoI metrics as stated earlier, will require some generic logical model to aggregate them. This may be done if we could develop an ontology model to process the aggregation of such metrics usage in a logical way(using OWL perhaps). Only then we could trigger reasoners, that should accommodate fuzziness and evidential techniques to be used to process uncertainty towards its next propagation journey downstream 4. (Initial) Conclusions / Feature Requests In this document we have established the imminent uncertainty components which need to be addressed as complementary GEs to FI-WARE enablers. These specialise as follows: Representation of meta-data and information (including, for instance, using geospatial standards of sensor web enablement) incorporating various elements representing observation and measurement errors, reliability, plausibility, trust etc... Need for Quality of Information quantifications using metric based approaches Need for Quality of Information representation and context using standards Need for context information on observation and measurement for incentivising data providers for their data to be prioritised (therefore make it more sellable) in the information value chain Need for representing uncertainties under various statistical distributions to choose by domain expert users. Also, allow experimentation for learning from these distribution with feedback from domain experts. Need for managing types of  uncertainties to aggregate them while using multiple metrics Need for an ontology model(s) to support reasoners with accommodated fuzzy logic and evidential statistics principle to combine uncertainties generically prior to propagating downstream the information value chain of the Future Internet. References [1] S. Wasserkrug, A. Gal, and O. Etzion,  A model for reasoning with uncertain rules in event composition, in UAI, 2005. [2] S. Wasserkrug, A. Gal, O. Etzion, and Y. Turchin,  Efficient processing of uncertain events in rule-based systems, IEEE Transactions on Knowledge and Data Engineering, 2010. [3] S. Wasserkrug, A. Gal, and O. Etzion,  Inference of security hazards from event composition based on incomplete or uncertain information, Knowledge and Data Engineering, IEEE Transactions on, vol. 20, no. 8, pp. 1111 1114, 2008. [4] L. Baresi, N. Georgantas, K. Hamann, V. Issarny, W. Lamersdorf, A. Metzger, and B. Pernici,  Emerging research themes in services-oriented systems, in SRII Global Conference, 2012 [5] A. Metzger, C.-H. Chi, Y. Engel, and A. Marconi,  Research challenges on online service quality prediction for proactive adaptation, in Proceedings of the ICSE 2012 Workshop on European Software Services and Systems Research  Results and Challenges (S-Cube), 2012 [6] Alexander Artikis, Opher Etzion, Zohar Feldman, Fabiana Fournier,  Event Processing Under Uncertainty , in DEBS 2012 .0tv & ^ ` b d f ztv*,$&jl&(*,.02468rt  ƾƾƾƾh>{CJ$aJ$ CJ$ZaJ$ 5\ZZh>{6CJ]aJ6CJ]ZaJh>{CJ,aJ, CJ,ZaJ,h>{ CJ ZaJ G "68`brv468:XZ\^,.02  !!""""# #"#$#|$~$$$v%x%&&(&P'R'۾۾۾۾9h>{567>*B*CJOJPJQJ\]^JaJph6567>*B*CJOJPJQJ\]Z^JaJph9h>{567>*B*CJOJPJQJ\]^JaJphG567>*B*CJOJPJQJ\]Z^JaJfHphq 4R'T'V'*"*T*V*++++++,,,,----..001 1 11.101l1n1p1r111111112T2V22222222233Z6^666N8P88888b9d99h>{567>*B*CJOJPJQJ\]^JaJph9h>{567>*B*CJOJPJQJ\]^JaJph6567>*B*CJOJPJQJ\]Z^JaJphBd9f9h9j9n99999;; <<<< = ===>>??rAvAAABBBBBBjDlDnDpDF F&G(G*G,GIIIIKKLKNKKKKKLLLL㩧h>{CJ$aJ$ CJ$ZaJ$h>{Z9h>{567>*B*CJOJPJQJ\]^JaJph9h>{567>*B*CJOJPJQJ\]^JaJph6567>*B*CJOJPJQJ\]Z^JaJph:LPPPPPRRRTRVRfRhRjRlRSSSS(^*^j^l^q qHqJqvv ww||||||h}x}@~R~~~~~prʌ̌dfŽĎƎȎ `b<>ҙԙPR 5\Zh>{CJ$aJ$ CJ$ZaJ$h>{CJaJ CJZaJh>{ZU ޣTVtvjlnprvh>{CJaJ CJZaJh>{CJ$aJ$ CJ$ZaJ$Zh>{0v b M;$$ & Fd$d%d&d'd(d)fNOPQRA$a$8$ & Fd$d%d&d'd(d)fNOPQRA$<$ & Fdx$d%d&d'd(d)fNOPQRA$b f v,&8$ & Fd$d%d&d'd(d)fNOPQRA$&l(,048tHHHH<$ & FdhP$d%d&d'd(d)fNOPQRA$8$ & Fd$d%d&d'd(d)fNOPQRA$@$ & Fd$d%d&d'd(d)fNOPQRA$^`t "8btvvvvP$$ & Fdd$d%d&d'd(d)fNOPQRA$$If]^d`a$8$ & Fd$d%d&d'd(d)fNOPQRA$tv,kd$$Ifl\T#  t(062d44 ladp(v6:Z^.2 P$$ & Fdd$d%d&d'd(d)fNOPQRA$$If]^d`a$ \kd1$$Ifl\T# t062d44 lad !"" #$#~$$x%(&R'V' *P$$ & Fdd$d%d&d'd(d)fNOPQRA$$If]^d`a$ *"*\kd$$Ifl\T# t062d44 lad"*V*+++,,--P$$ & Fdd$d%d&d'd(d)fNOPQRA$$If]^d`a$--\kd$$Ifl\T# t062d44 lad-.0 11P$$ & Fdd$d%d&d'd(d)fNOPQRA$$If]^d`a$11\kd$$Ifl\T# t062d44 lad101n1r11112V222223\6P$$ & Fdd$d%d&d'd(d)fNOPQRA$$If]^d`a$\6^6\kdq$$Ifl\T# t062d44 lad^66P888d9h9l9P$$ & Fdd$d%d&d'd(d)fNOPQRA$$If]^d`a$l9n9\kdA$$Ifl\T# t062d44 ladn999; <P$$ & Fdd$d%d&d'd(d)fNOPQRA$$If]^d`a$ <<\kd$$Ifl\T# t062d44 lad<< ==>?tAP$$ & Fdd$d%d&d'd(d)fNOPQRA$$If]^d`a$tAvA\kd$$Ifl\T# t062d44 ladvAABBBlDpD F(G,GIP$$ & Fdd$d%d&d'd(d)fNOPQRA$$If]^d`a$ III\@$ & Fd$d%d&d'd(d)fNOPQRA$^`kd$$Ifl\T# t062d44 ladIIKNK}@<$ & FdhP$d%d&d'd(d)fNOPQRA$@$ & Fd$d%d&d'd(d)fNOPQRA$^`@$ & Fd$d%d&d'd(d)fNOPQRA$^`NKKKLLPPRRVRhRlRSS*^l^ q<$ & FdhP$d%d&d'd(d)fNOPQRA$8$ & Fd$d%d&d'd(d)fNOPQRA$ qJqvw|||~~M;$$ & Fd$d%d&d'd(d)fNOPQRA$a$8$ & Fd$d%d&d'd(d)fNOPQRA$<$ & FdhP$d%d&d'd(d)fNOPQRA$~řfĎȎ<$ & FdhP$d%d&d'd(d)fNOPQRA$8$ & Fd$d%d&d'd(d)fNOPQRA$  bM<$ & FdhP$d%d&d'd(d)fNOPQRA$8$ & Fd$d%d&d'd(d)fNOPQRA$;$$ & Fd$d%d&d'd(d)fNOPQRA$a$>ԙR}8$ & Fd$d%d&d'd(d)fNOPQRA$H$ & F d$d%d&d'd(d)fNOPQRA$^` VvlHHHHHH@$ & FHd$d%d&d'd(d)fNOPQRA$^`H8$ & Fd$d%d&d'd(d)fNOPQRA$<$ & FdhP$d%d&d'd(d)fNOPQRA$lptv8$ & Fd$d%d&d'd(d)fNOPQRA$,1h/ =!"#$% /$$Ifl!vh55f5t 5( #v#vf#vt #v( l  t(065/ 2d44 lp(ad$$Ifl!vh55f5t 5( #v#vf#vt #v( l t065/ 2d44 lad$$Ifl!vh55f5t 5( #v#vf#vt #v( l t065/ 2d44 lad$$Ifl!vh55f5t 5( #v#vf#vt #v( l t065/ 2d44 lad$$Ifl!vh55f5t 5( #v#vf#vt #v( l t065/ 2d44 lad$$Ifl!vh55f5t 5( #v#vf#vt #v( l t065/ 2d44 lad$$Ifl!vh55f5t 5( #v#vf#vt #v( l t065/ 2d44 lad$$Ifl!vh55f5t 5( #v#vf#vt #v( l t065/ 2d44 lad$$Ifl!vh55f5t 5( #v#vf#vt #v( l t065/ 2d44 lad$$Ifl!vh55f5t 5( #v#vf#vt #v( l t065/ 2d44 lad  FMicrosoft Office Word Document MSWordDocWord.Document.89qOh+'0   0 8DLXx @@@Aspose.Words for Java 11.4.0.01 Normal.dot@՜.+,0 PX` l  Title    !"$&'()*+,./0Root Entry F@ Data 1TableCompObj#qWordDocument .SummaryInformation(%DocumentSummaryInformation8-  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~