Invited Talks
-
Location: Athens, Greece (University of Athens, Greece)
Date: July 23rd, 2010.
Abstract:
In this talk I will present a family of algorithms for Top-k ranking of query results in a distributed environment. A Top-K query focuses on the subset of most relevant answers for two reasons: i) to minimize the cost metric that is associated with the retrieval of all answers; and ii) to improve the quality of the answer set such that the user is not overwhelmed with irrelevant results. I will start out by providing an overview of Top-K query processing algorithms for centralized and middleware systems. I will then highlight the limitations of these algorithms and focus on three novel algorithms we developed designated for networked environments (i.e., Peer-to-Peer Networks, Wireless Sensor Networks and Smartphone Networks). I will also present evaluation studies of these algorithms on: i) a Wireless Sensor Network testbed of 54 sensor devices; ii) a Peer-to-Peer testbed of 1000 peers deployed on 75 linux workstations; and iii) A smartphone network deployment on Android-based smartphone devices. The talk will conclude with an overview of related research problems that I am currently working on and an outlook to future work.
-
Panel Chair: Anastasia Ailamaki (EPFL, Switzerland)
Location: Ayia Napa, Cyprus (HDMS'10 Symposium)
Date: July 3rd, 2010.
-
Location: Hawthorne, NY, USA (IBM T.J. Watson Research Center)
Date: May 27th, 2010.
Abstract:
In this talk I present the fundamental concepts behind distributed
Top-K query processing algorithms. A Top-K query focuses on the subset
of most relevant answers for two reasons: i) to minimize the cost
metric that is associated with the retrieval of all answers; and ii) to
improve the quality of the answer set such that the user is not
overwhelmed with irrelevant results. I will start out by providing an
overview of state-of-the-art Top-K query processing algorithms for
centralized and middleware systems. I will then highlight the
limitations of these algorithms and focus on two novel algorithms we
developed designated for networked environments (i.e., Wireless Sensor
Networks, Peer-to-Peer Networks, Vehicular Networks, etc.) I will also
present evaluation studies conducted on: i) a Peer-to-Peer testbed of
1000 peers deployed on 75 workstations; ii) a Wireless Sensor Network
testbed of 54 sensor devices and iii) A Smartphone Network, deployed on
a number of Android-based smartphone devices. The talk will conclude
with an overview of related research problems that I am currently
working on and an outlook to future applications of the presented
ideas.
-
Location: Kansas City, MO, USA (SSPC-WAN Workshop, 11th International Conference on Mobile Data Management (MDM 2010)
)
Date: May 23rd, 2010.
Abstract:
In this presentation I will present a powerful and distributed
spatio-temporal query processing framework, coined HUB-K.
Our framework can be utilized to promptly answer queries of the form:
``Report the objects (i.e., trajectories) that follow a similar spatio-temporal
motion to Q, where Q is some query trajectory.''
HUB-k, relies on an in-situ data storage model, where spatio-temporal data remains
on the smartphone that generated the given data, as well a state-of-the-art
top-k query processing algorithms, which exploit distributed
trajectory similarity measures in order to identify the correct answers promptly.
We present preliminary design choices, an outline of our preliminary implementation
and an outlook to future challenges.
-
Location: Dagstuhl, Germany (Seminar 10042: Semantic Challenges in Sensor Networks)
Date: January 26th, 2010.
Abstract:
The widespread deployment of mobile phones along with the massive
production of sensors for every aspect of modern life provides evidence
that Computer Science research and education will evolve dramatically
over the next few years. The boundaries of Mobile Devices and Sensor
Devices are nowadays blurring as the former devices are already
equipped with a multitude of sensing capabilities, including GPS (which
enables the derivation of geospatial coordinates), accelerometers
(which enable the derivation of orientation, vibration and shock) and
an exciting set of other sensors (e.g., proximity sensors, ambient
light sensors, while more traditional sensors such as temperature,
acoustic, magnetometers and others will be integrated in these devices
very soon). That creates the notion of Mobile Sensor Devices that will
become even more ubiquitous than their predecessor "smart-phone"
devices.
In this talk, I will provide an overview and definitions of
Mobile-Sensor-Network (MSN) related platforms and applications. In
particular, I will show how applications in environmental monitoring,
body sensor networks, vehicular sensor networks and intelligent
transportation systems have brought a dramatic shift on how
spatio-temporal data is nowadays generated. I will then outline some
semantic challenges that arise in this context including: vastness,
uncertainty, data integration, query processing and privacy. I will
also address some more general challenges that currently hinder the
evolution and uptake of semantic MSNs.
-
Location: Barcelona, Spain (DAMA Group, Polytechnic University of Catalonia (UPC))
Date: December 15th 2008.
Abstract: In this talk I will present the fundamental concepts of
distributed Top-K query processing algorithms. A Top-K query focuses on
a subset of most relevant answers for two reasons: i) to minimize the
cost metric that is associated with the retrieval of all answers; and
ii) to improve the quality of the answer set such that the user is not
overwhelmed with irrelevant results. I will start out by providing an
overview of state-of-the-art Top-K query processing algorithms for
centralized DBMS systems. I will then highlight the limitations of
these algorithms and focus on the Threshold Join Algorithm (TJA), our
distributed top-k query processing algorithm designated for distributed
computing networks (i.e., Wireless Sensor Networks, Peer-to-Peer
Networks, Vehicular Networks, etc.) I will finally present an
evaluation study conducted with our middleware system deployed over a
network of 1000 peers on 75 workstations.
-
Location: Zurich, Switzerland (IBM Research, Zurich)
Date: December 12th 2008.
Abstract:
Wireless Sensor Networks offer a non-intrusive and non-disruptive
technology that enables users to monitor the physical world at an
extremely high fidelity. Research in this area has to this day
primarily focused on the trade-off between local computation and
communication in order to minimize the transfer of data over the
fundamentally expensive wireless link. On the contrary, we focus on the
challenges of storing sensor readings locally at each node. This
In-Situ storage paradigm offers a novel perspective for conserving
energy in Wireless Sensor Networks as the communication channel is only
accessed for answering on-demand queries rather than for percolating
each and every event to a centralized database. Storing large
quantities of data locally at each sensor has to be complemented by
efficient access methods that will speed up the execution of queries
when required.
In this talk I will present MicroHash, an external memory index
structure that is tailored to the distinct characteristics of the most
prevalent type of non-volatile memory used in sensor systems, namely
flash memory. MicroHash exploits the asymmetric read/write
characteristics of flash memory in order to offer high performance
indexing and searching capabilities in the presence of energy and
storage media lifetime constraints.
-
Location: Zurich, Switzerland (ETH Zurich, Communication Systems Group (CSG))
Date: December 12th 2008.
Abstract: In this talk I will present the fundamental concepts of
distributed Top-K query processing algorithms. A Top-K query focuses on
a subset of most relevant answers for two reasons: i) to minimize the
cost metric that is associated with the retrieval of all answers; and
ii) to improve the quality of the answer set such that the user is not
overwhelmed with irrelevant results. I will start out by providing an
overview of state-of-the-art Top-K query processing algorithms for
centralized DBMS systems. I will then highlight the limitations of
these algorithms and focus on the Threshold Join Algorithm (TJA), our
distributed top-k query processing algorithm designated for distributed
computing networks (i.e., Wireless Sensor Networks, Peer-to-Peer
Networks, Vehicular Networks, etc.) I will finally present an
evaluation study conducted with our middleware system deployed over a
network of 1000 peers on 75 workstations.
-
Location: Beijing, China (The 9th International Conference on Mobile Data Management (MDM'08))
Date: April 27-30, 2008.
Abstract: Wireless Sensor Networks create an innovative technology that
enables users to monitor and study the physical world at an extremely
high resolution. Query processing in such ad-hoc environments is a
challenging task due to the complexities imposed by the inherent energy
and communication constraints. To this end, the research community has
proposed to take into account user-defined parameters in order to
derive the K most relevant (or Top-K) answers quickly and efficiently.
A Top-K query returns the subset of most relevant answers, in place of
all answers, for two reasons: i) to minimize the cost metric that is
associated with the retrieval of all answers; and ii) to improve the
recall and the precision of the answer set, such that the user is not
overwhelmed with irrelevant results.
This tutorial presents the fundamental concepts behind distributed
Top-K query processing and the adaptations of these algorithms to
distributed and wireless sensor networks. It additionally provides a
gentle overview of rudimentary and advanced techniques covering a
significant body of research in this domain. The tutorial will start
out with an overview of the most influential centralized and middleware
Top-K query processing algorithms and then proceed with an elaborate
description of distributed Top-K ranking algorithms for One-time Top-K
Queries, Continuous Top-K Queries and Approximate Top-K Queries.
Finally, it will provide an outlook to compelling future applications
that can be constructed on the foundation of these algorithms. Although
the tutorial is specifically geared towards Wireless Sensor Networks,
many of the presented ideas find extensions in other mobile
environments such as Adhoc Networks, Vehicular Networks and the Mobile
Web.
-
Location: Cambridge, UK (Microsoft Research Cambridge, Systems and Networking Group)
Date: January 11th 2008.
Abstract: Wireless Sensor Networks offer a non-intrusive and
non-disruptive technology that enables users to monitor the physical
world at an extremely high fidelity. Research in this area has to this
day primarily focused on the trade-off between local computation and
communication in order to minimize the transfer of data over the
fundamentally expensive wireless link. On the contrary, we focus on the
challenges of storing sensor readings locally at each node. This
In-Situ storage paradigm offers a novel perspective for conserving
energy in Wireless Sensor Networks as the communication channel is only
accessed for answering on-demand queries rather than for percolating
each and every event to a centralized database. Storing large
quantities of data locally at each sensor has to be complemented by
efficient access methods that will speed up the execution of queries
when required. In this talk I will present MicroHash, an external
memory index structure that is tailored to the distinct characteristics
of the most prevalent type of non-volatile memory used in sensor
systems, namely flash memory. MicroHash exploits the asymmetric
read/write characteristics of flash memory in order to offer high
performance indexing and searching capabilities in the presence of
energy and storage media lifetime constraints.
-
Location:
Stockholm, Sweden (Department of Electronic, Computer and Software
Systems (ECS), KTH - Royal Institute of Technology).
Date: December 28th, 2006.
Abstract:
The emerging Peer-to-Peer (P2P) model has become a very powerful and
attractive paradigm for developing Internet-scale services for sharing
resources, including files and documents. The distributed nature of these
systems, where nodes are typically located across different networks and
domains, inherently hinders the efficient retrieval of information. In this
talk I will present techniques to perform content-based search over data
repositories that are geographically scattered over peers of different
networks. Data repositories in this context contain documents of text,
audio, video or other semi-structured data and the task is to locate a
certain set of keywords or multimedia features. We present the components of
the pFusion architecture, an open source system that builds on work in
unstructured P2P systems and topologically-aware overlay construction
techniques. Our empirical results using datasets from AKAMAI, NLANR and
TREC, show that the architecture we propose is both efficient and practical.
In this talk I will also overview other related research activities in Grid,
P2P and Sensor systems that we are currently involved in.
-
Location: Crete, Greece (Institute of Computer Science (ICS) of the Foundation for Research and Technology � Hellas (FORTH)).
Date: June 8th, 2006.
Abstract:
Emerging applications in Sensor and Peer-to-Peer networks make the concept
of data integration without centralization nowadays more meaningful than
ever. In these environments, data is generated continuously and potentially
automatically across geographically diverse locations. Organizing data in
centralized repositories is becoming prohibitively expensive and in many
occasions impractical. Storing data in-situ however, complicates query
processing because data relations are fragmented over a number of remote
sites. Furthermore, accessing these fragmented relations is only feasible by
traversing a network of other nodes. This makes the execution of a query an
even more complex task. We claim that in many occasions it might more
beneficial to find the K highest ranked (or Top-K) answers, for some user
defined parameter K, if this can minimize the query execution cost.
In this talk, I will present techniques to efficiently answer Top-K queries
in a distributed environment. A Top-K query returns the K highest ranked
answers to a user defined similarity function. At the same time it also
minimizes some cost metric, such as the utilization of the communication
medium, which is associated with the retrieval of the desired answer set. I
will provide an overview of state-of-the-art algorithms that solve the Top-K
problem in a centralized setting and show why these are not applicable to
the distributed case. I will then focus on the Threshold Join Algorithm
(TJA), which is a novel solution for executing Top-K queries in a
distributed environment. I will also present results from our performance
study with a real middleware testbed deployed over a network of 75
workstations.
Talks in English
-
Location:
Indianapolis, Indiana, USA (The 9th International ACM Workshop on Data
Engineering for Wireless and Mobile Access (MobiDE'10), in conjunction
with ACM SIGMOD/PODS�10)
Date: Jun. 6th 2010.
Abstract: We present a novel distributed
algorithm (MHS) that constructs a query routing tree that minimizes
collisions during query execution. It was shown in previous work that
minimizing collisions during query execution saves significant amount
of energy[1]. In the same paper it is shown that balancing the node
degrees of a query routing tree significantly reduces collisions during
query execution.
We address the inefficiencies of the previously proposed algorithm and
propose a simpler, purely distributed, parameter-free, cheaper and more
efficient algorithm. Our resulting query trees are optimally balanced,
guarantee minimum collisions and minimum latency for query execution
and allow for opportunistic in-network processing. MHS poses the
minimum possible communication overhead to the network and is
parameter-free as opposed to previously proposed algorithms. Our
proposed algorithm can be used for acquiring data from the nodes of any
distributed systems where the main objective is to minimize the
communication cost.
-
Location:
Kansas City, USA (The 11th International Conference on Mobile Data
Management (MDM 2010), to be held in the city of Kansas City, Missouri,
from May 23rd, 2010 to May 26th, 2010.
Date: May 25th 2010.
-
Location:
Lyon, France (The 6th Intl. Workshop on Data Management for Sensor
Networks (DMSN�09), in conjunction with VLDB�09, Lyon, France, 2009)
Date: Aug 24th 2009.
Abstract: In long-term deployments of
Wireless Sensor Networks, it is often more efficient to store sensor
readings locally at each device and transmit those readings to the user
only when requested (i.e., in response to a user query). Many of the
techniques that collect information from a sensor network require that
the data is sorted on some attribute (e.g., range queries, top-k
queries, join queries, etc.) Yet, the underlying storage medium of
these devices (i.e., Flash media) presents some unique characteristics
which renders traditional disk-based sorting algorithms inefficient in
this context. In this paper we devise the FSort algorithm, an efficient
external sorting algorithm for flash-based sensor devices with a small
memory footprint. FSort minimizes the expensive write/delete operations
of flash memory minimizing in that way the consumption of energy. In
particular, FSort uses a top-down replacement selection algorithm in
order to produce sorted runs on flash media in a log-based manner.
Sorted runs are then recursively merged in order to yield the sorted
result. Our experimentation with real traces from Intel Research
Berkeley show that FSort greatly outperforms the traditional External
Mergesort Algorithm both in regards to time and energy consumption. We
found similar advantages in regards to the wearability constraints of
flash media.
-
Location: Taipei, Taiwan (The 10th International Conference on Mobile Data Management (MDM'09))
Date: May 20th 2009.
Abstract: This paper assumes a set of n mobile sensors that move in the
Euclidean plane as a swarm. Our objectives are to explore a given
geographic region by detecting spatio-temporal events of interest and
to store these events in the network until the user requests them. Such
a setting finds applications in mobile environments where the user
(i.e., the sink) is infrequently within communication range from the
field deployment. Our framework, coined SenseSwarm, dynamically
partitions the sensing devices into perimeter and core nodes. Data
acquisition is scheduled at the perimeter, in order to minimize energy
consumption, while storage and replication takes place at the core
nodes which are physically and logically shielded to threats and
obstacles. To efficiently identify the nodes laying on the perimeter of
the swarm we devise the Perimeter Algorithm (PA), an efficient
distributed algorithm with a low communication complexity. For storage
and fault-tolerance we devise the Data Replication Algorithm (DRA), a
voting-based replication scheme that enables the exact retrieval of
events from the network in cases of failures. Our trace-driven
experimentation shows that our framework can offer significant energy
reductions while maintaining high data availability rates. In
particular, we found that when failures are less than 60\% failure then
we can recover over 80\% of generated events exactly.
-
Location: Taipei, Taiwan (SenTIE'09 workshop - collocated with MDM'09)
Date: May 20th 2009.
Abstract: Continuous queries in Wireless Sensor Networks (WSNs) are
founded on the premise of Query Routing Tree structures (denoted as T
), which provide sensors with a path to the querying node. Predominant
data acquisition systems for WSNs construct such structures in an
ad-hoc manner and therefore there is no guarantee that a given query
workload will be distributed equally among all sensors. That leads to
data collisions which represent a major source of energy waste. In this
paper we present the Energy-driven Tree Construction (ETC) algorithm,
which balances the workload among nodes and minimizes data collisions,
thus reducing energy consumption, during data acquisition in WSNs. We
show through real micro-benchmarks on the CC2420 radio chip and
trace-driven experimentation with real datasets from Intel Research and
UC-Berkeley that ETC can provide significant energy reductions under a
variety of conditions prolonging the longevity of a wireless sensor
network.
-
Location: HPCL, Department of Computer Science, University of Cyprus
Date: February 14th 2008.
Abstract: Wireless Sensor Networks offer a non-intrusive and
non-disruptive technology that enables users to monitor the physical
world at an extremely high fidelity. Research in this area has to this
day primarily focused on the trade-off between local computation and
communication in order to minimize the transfer of data over the
fundamentally expensive wireless link. On the contrary, we focus on the
challenges of storing sensor readings locally at each node. This
In-Situ storage paradigm offers a novel perspective for conserving
energy in Wireless Sensor Networks as the communication channel is only
accessed for answering on-demand queries rather than for percolating
each and every event to a centralized database. Storing large
quantities of data locally at each sensor has to be complemented by
efficient access methods that will speed up the execution of queries
when required. In this talk I will provide an overview of recent
developments in Wireless Sensor Network Technology and highlight some
important data indexing and searching challenges that arise in this
context. In particular, I will present MicroHash which is an external
memory index structure that is tailored to the distinct characteristics
of flash memory, the most prevalent type of non-volatile memory used in
sensor systems.
-
Location: Hilton, Nicosia, Cyprus
Date: January 21st 2008.
Abstract: ICGrid (Intensive Care Grid) is a distributed platform that
enables the seamless integration, correlation and retrieval of
clinically interesting episodes across Intensive Care Units, which is
currently under development by our group. Such a task requires huge
processing and data storage capabilities, which are common attributes
of Grid infrastructures. ICGrid is based on a hybrid architecture that
combines i) a heterogeneous set of monitors that sense the inpatients
and ii) Grid technology that enables the storage, processing and
information sharing task between Intensive Care Units.
-
Location: Paris, France (Coregrid Network of Excellence)
Date: January 15th 2008.
Abstract: The objective of Grid computing is to make processing power
as accessible and easy to use as electricity and water. The last decade
has seen an unprecedented growth in Grid infrastructures which nowadays
enables large-scale deployment of applications in the scientific
computation domain. One of the main challenges in realizing the full
potential of Grids is to make these systems {\em dependable}. In this
presentation we present {\em FailRank}, a novel framework for
integrating and ranking information sources that characterize failures
in a grid system. After the failing sites have been ranked, these can
be eliminated from the job scheduling resource pool yielding in that
way a more predictable and dependable infrastructure. We also present
the tools we developed towards evaluating the FailRank framework. In
particular, we present the {\em FailBase Repository} which is a 38GB
corpus of state information that characterizes the EGEE Grid for one
month in 2007. Such a corpus paves the way for the community to
systematically uncover new, previously unknown patterns and rules
between the multitude of parameters that can contribute to failures in
a Grid environment.
-
Location: Vienna, Austria (4th Intl. Workshop on Data Management for Sensor Networks DMSN'07 (with VLDB'07))
Date: Sep. 24th, 2007.
Abstract:
This paper assumes a set of $n$ mobile sensors that move in the Euclidean plane
as a swarm. Our objectives are to explore a given geographic region by detecting and
aggregating spatio-temporal events of interest and to store these events in
the network until the user requests them. Such a setting finds applications
in environments where the user (i.e., the sink) is infrequently
within communication range from the field deployment.
Our framework, coined SenseSwarm, dynamically partitions the
sensing devices into perimeter and core nodes. Data acquisition
is scheduled at the perimeter in order to minimize energy consumption while
storage and replication takes place at the core nodes which are physically and logically
shielded to threats and obstacles. To efficiently identify the perimeter
of the swarm we devise the Perimeter Algorithm (PA), an efficient distributed
algorithm with a message complexity of O(p + n), where p denotes the number
of nodes on the perimeter and $n$ the overall number of nodes. For storage and
replication we devise a spatio-temporal in-network aggregation scheme based on
minimum bounding rectangles and minimum bounding cuboids. Our trace-driven experimentation
shows that our framework can offer significant energy reductions while maintaining
high data availability rates.
-
Location: Nicosia, Cyprus (Cyprus Summer School on Intelligent Systems)
Date: July 4th 2007.
Abstract:
In this talk I will introduce the distributed spatio-temporal similarity search problem: given
a query trajectory Q, we want to find the trajectories that follow a motion similar to Q,
when each of the target trajectories is segmented across a number of distributed nodes. We
propose two novel algorithms, UB-K and UBLB-K, which combine local computations of lower
and upper bounds on the matching between the distributed subsequences and Q. Such an operation
generates the desired result without pulling together all the distributed subsequences
over the fundamentally expensive communication medium. Our solutions find applications
in a wide array of domains, such as cellular networks, wildlife monitoring and video
surveillance. Our experimental evaluation using realistic data demonstrates that our
framework is both efficient and robust to a variety of conditions.
In this talk, I will also present techniques to efficiently answer Top-K queries
in a distributed environment. A Top-K query returns the K highest ranked
answers to a user defined similarity function. At the same time it also
minimizes some cost metric, such as the utilization of the communication
medium, which is associated with the retrieval of the desired answer set. I
will provide an overview of state-of-the-art algorithms that solve the Top-K
problem in a centralized setting and show why these are not applicable to
the distributed case. I will then focus on the Threshold Join Algorithm
(TJA), which is a novel solution for executing Top-K queries in a
distributed environment. I will also present results from our performance
study with a real middleware testbed deployed over a network of 75
workstations.
-
Location:
Crete, Greece (CoreGRID Workshop on Grid Programming Model Grid and P2P
Systems Architecture Grid Systems, Tools and Environments)
Date: June 12th 2007.
Abstract: The objective of Grid
computing is to make processing power as accessible and easy to use as
electricity and water. The last decade has seen an unprecedented growth
in Grid infrastructures which nowadays enables large-scale deployment
of applications in the scientific computation domain. One of the main
challenges in realizing the full potential of Grids is to make these
systems {\em dependable}. In this paper we present {\em FailRank}, a
novel framework for integrating and ranking information sources that
characterize failures in a grid system. After the failing sites have
been ranked, these can be eliminated from the job scheduling resource
pool yielding in that way a more predictable and dependable
infrastructure. We also present the tools we developed towards
evaluating the FailRank framework. In particular, we present the {\em
FailBase Repository} which is a 38GB corpus of state information that
characterizes the EGEE Grid for one month in 2007. Such a corpus paves
the way for the community to systematically uncover new, previously
unknown patterns and rules between the multitude of parameters that can
contribute to failures in a Grid environment.
-
Location:
Mannheim, Germany (IEEE First International Workshop on Data Intensive
Sensor Networks 2007, in conjunction with MDM 2007))
Date: May 11th 2007.
Abstract:
In this paper we present MicroPulse, a novel framework for adapting the waking window of a
sensing device S based on the data workload incurred by a query Q. Assuming a typical
tree-based aggregation scenario, the waking window is defined as the time interval t
during which S enables its transceiver in order to collect the results from its children.
Minimizing the length of t enables S to conserve energy that can be used to prolong the
longevity of the network and hence the quality of results. Our method is established on
profiling recent data acquisition activity and on identifying the bottlenecks using an
in-network execution of the Critical Path Method. We show through trace-driven experimentation
with a real dataset that MicroPulse can reduce the energy cost of the waking window by three
orders of magnitude.
-
Location: Mannheim, Germany (The 8th International Conference on Mobile Data Management (MDM'07))
Date: May 10th 2007.
Abstract:
In this paper we introduce MINT (Materialized In-Network Top-k) Views, a novel framework
for optimizing the execution of continuous monitoring queries in sensor networks.
A typical materialized view V maintains the complete results of a query Q in order
to minimize the cost of future query executions. In a sensor network context,
maintaining consistency between V and the underlying and distributed base relation
R is very expensive in terms of communication. Thus, our approach focuses on a
subset V' (\subseteq V) that unveils only the k highest-ranked answers at the
sink for some user defined parameter k. We additionally provide an elaborate
description of energy-conscious algorithms for constructing, pruning and maintaining
such recursively-defined in-network views. Our trace-driven experimentation with real
datasets show that MINT offers significant energy reductions compared to other
predominant data acquisition models.
-
Location: Nicosia, Cyprus (EPL 671 - Computer Science: Research and Technology)
Date: March 20th, 2007.
Abstract:
In this talk, I will present techniques to efficiently answer Top-K queries
in a distributed environment. A Top-K query returns the K highest ranked
answers to a user defined similarity function. At the same time it also
minimizes some cost metric, such as the utilization of the communication
medium, which is associated with the retrieval of the desired answer set. I
will provide an overview of state-of-the-art algorithms that solve the Top-K
problem in a centralized setting and show why these are not applicable to
the distributed case. I will then focus on the Threshold Join Algorithm
(TJA), which is a novel solution for executing Top-K queries in a
distributed environment. I will also present results from our performance
study with a real middleware testbed deployed over a network of 75
workstations.
Location: Nicosia, Cyprus (EPL651 - Data Management for Mobile Computing, Department of Computer Science (UCY)).
Date: April 26th 2007.
Abstract: Wireless Sensor Networks offer a non-intrusive and
non-disruptive technology that enables users to monitor and understand
the physical world at an extremely high fidelity. Research to this day
has primarily focused on the trade-off between local computation and
communication, in order to offset the expensive transfer of data over
the fundamentally unreliable wireless link. On the contrary, we focus
on the challenges of storing sensor readings locally at each node. This
In-Situ storage paradigm offers a novel perspective for conserving
energy, as we access the communication channel to answer on-demand
queries rather than for percolating each and every event to a
centralized database. Storing large quantities of data locally at each
node has to be complemented by efficient index structures that will
enable access to data when required.
In this talk we present MicroHash, an external memory index
structure which is tailored to the distinct characteristics of the most
prevalent type of non-volatile memory used in sensor systems, namely
flash memory. Our index structure exploits the asymmetric read/write
and wear characteristics of flash memory in order to offer high
performance indexing and searching capabilities in the presence of a
low energy budget.
-
Location: Sophia-Antipolis, France (CoreGRID Industrial Conference).
Date: December 1st, 2006. (Best Demo Award)
Abstract:
Intensive Care Units (ICUs) at hospitals utilize cutting edge
technology in order to acquire the physiological state of inpatients,
which are in a critical (life-threatening) physiological state, at an
extremely high fidelity. In particular, ICUs utilize a very large
number of monitoring and sensing devices that are continuously attached
on inpatients in order to uncover the physiological state of the
inpatients. Such measurements can then be utilized for i) education,
ii) early diagnosis and iii) for defining early warning systems that
identify when a human life is jeopardy. A problem with the current
setting is that individual ICUs are limited to the locally acquired
measurements. As a result, the number of clinically "interesting"
episodes available to doctors is also very limited.
ICGrid (Intensive Care Grid) is a distributed platform that enables the
seamless integration, correlation and retrieval of clinically
interesting episodes across Intensive Care Units, which is currently
under development by our group. Such a task requires huge processing
and data storage capabilities, which are common attributes of Grid
infrastructures. ICGrid is based on a hybrid architecture that combines
i) a heterogeneous set of monitors that sense the inpatients and ii)
Grid technology that enables the storage, processing and information
sharing task between Intensive Care Units. Our demonstration aims at
presenting the first part of the hybrid architecture of ICGrid (i.e.
the acquisition of real signals from inpatients and their storage on
the Grid). Our demonstration platform operates on a standalone laptop.
In a real setting, this software is able to extract the physiological
parameters from monitoring devices installed at ICUs.
``Business Processes: Behavior Prediction and Capturing Reasons for Evolution''
Location: Paphos, Cyprus ("8th International Conference on Enterprise Information Systems)
Date: May 24th, 2006
Location: Nicosia, Cyprus (Computer Science Colloquium Series, University of Cyprus)
Date: 31 March 2006.
Abstract: Wireless Sensor Networks offer a non-intrusive and
non-disruptive technology that enables users to monitor and understand
the physical world at an extremely high fidelity. Research to this day
has primarily focused on the trade-off between local computation and
communication, in order to offset the expensive transfer of data over
the fundamentally unreliable wireless link. On the contrary, we focus
on the challenges of storing sensor readings locally at each node. This
In-Situ storage paradigm offers a novel perspective for conserving
energy, as we access the communication channel to answer on-demand
queries rather than for percolating each and every event to a
centralized database. Storing large quantities of data locally at each
node has to be complemented by efficient index structures that will
enable access to data when required.
In this talk we present MicroHash, an external memory index
structure which is tailored to the distinct characteristics of the most
prevalent type of non-volatile memory used in sensor systems, namely
flash memory. Our index structure exploits the asymmetric read/write
and wear characteristics of flash memory in order to offer high
performance indexing and searching capabilities in the presence of a
low energy budget.
Location: Nicosia, Cyprus (eNEXT Workshop on Sensor and Ad-hoc Networks)
Date: March 14th, 2006.
``Global Internet Content Delivery''
Location:
Nicosia, Cyprus ("EPL602 - Programming of Internet Systems and
Services", Department of Computer Science, University of Cyprus)
Date: November 22th, 2005.
Location: Nicosia, Cyprus (Computer Science Colloquium Series, University of Cyprus)
Date: November 16th, 2005.
Abstract:
Modern Sensor and Peer-to-Peer data management systems have to cope
with data that is generated automatically and continuously across
distributed and potentially geographically diverse locations.
Organizing data in centralized repositories is becoming increasingly
expensive and in many occasions impractical. Additionally, users are
usually only interested in finding the highest ranked answers to their
queries rather that the complete range of answers.
In this talk, I will present efficient techniques to answer Top-K
queries in a distributed environment. A Top-K query returns the K
highest ranked answers to a user defined similarity function. At the
same time it also minimizes some cost metric which is associated with
the retrieval of the desired answer set. My talk focuses on the
Threshold Join Algorithm (TJA), which is a novel distributed Top-K
query processing algorithm that combines local similarity scores
available at each computing site. I will also present the LB-K and
UBLB-K algorithms which utilize lower and upper bounds, when exact
scores are not available. An extensive experimental evaluation with our
distributed middleware testbed reveals that the proposed methods are
orders of magnitudes more efficient than their competitors.
Location: Toronto, Canada (DBISP2P'04 (VLDB Conference))br>
Date: September 2004
Abstract: We initiate a study on the effect of the network topology on
the performance of Peer-to-Peer (P2P) information retrieval systems.
The emerging P2P model has become a very powerful and attractive
paradigm for developing Internet-scale systems for sharing resources,
including files, or documents. We show that the performance of
Information Retrieval algorithms can be significantly improved through
the use of fully distributed topologically aware overlay network
construction techniques. Our empirical results, using the Peerware
middleware infrastructure, show that the approach we propose is both
efficient and practical.
Location: McLean VA, USA (The ACM Conference on Information and Knowledge Management)
Date: November 2002.
Abstract: One important problem in peer-to-peer (P2P) networks is
searching and retrieving the correct information. However, existing
searching mechanisms in pure peer-to-peer networks are inefficient due
to the decentralized nature of such networks. We propose two mechanisms
for information retrieval in pure peer-to-peer networks. The first, the
modified Breadth-First-Search (BFS) mechanism, is an extension of the
current Gnuttela protocol, allows searching with keywords, and is
designed to minimize the number of messages that are needed to search
the network. The second, the Intelligent Search mechanism, uses the
past behavior of the P2P network to further improve the scalability of
the search procedure. In this algorithm, each peer autonomously decides
which of its peers are most likely to answer a given query. The
algorithm is entirely distributed, and therefore scales well with the
size of the network. We implemented our mechanisms as middleware
platforms. To show the advantages of our mechanisms we present
experimental results using the middleware implementation.
``A Quantitative Analysis of the Gnutella Network Traffic''
Location: Nicosia, Cyprus (Computer Science Colloquium Series, University of Cyprus)
Date: July 2002.
Talks in Greek
-
Location: Ayia Napa, Cyprus (The 9th Hellenic Data Management Symposium (HDMS'10))
Date: Jul 2nd 2010.
Περίληψη:
Σε αυτό το άρθρο παρουσιάζουμε έναν κατανεμημένο αλγόριθμο για την δημιουργία ενός ισοζυγισμένου δένδρου επικοινωνίας που αποσκοπεί στην συλλογή δεδομένων από ένα ασύρματο δίκτυο αισθητήρων. Ο αλγόριθμος αυτός έχει ελάχιστο κόστος εκτέλεσης και το απορρέον δένδρο επικοινωνίας έχει σχεδόν βέλτιστη ισορροπία. Κατά την συλλογή δεδομένων κάθε σύγκρουση μεταξύ πακέτων προκαλεί την επαναποστολή τους. Η ίση κατανομή των βαθμών μεταξύ των κόμβων στο δένδρο επικοινωνίας έχει ως αποτέλεσμα την ελαχιστοποίηση των συγκρούσεων αυτών και συνεπώς την εξοικονόμηση ενέργειας και την αύξηση του χρόνου ζωής του ασύρματου δικτύου αισθητήρων. Συγκρίνουμε τον αλγόριθμό μας με έναν υπάρχον αλγόριθμο και έναν κεντρικό αλγόριθμο. Τα αποτελέσματα δείχνουν ότι ο αλγόριθμός μας υπερέχει του ανταγωνισμού για την πλειοψηφία των δικτυακών τοπολογιών και επιτυγχάνει σχεδόν βέλτιστη ισορροπία στο δέντρο. Επίσης, έχει το ελάχιστο δυνατό κόστος εκτέλεσης συντελώντας έτσι ακόμα περισσότερο στην εξοικονόμηση ενέργειας στο δίκτυο.
-
Nicosia, Cyprus (Ημερίδα Πληροφορικής για Μαθητές Λυκείων και Τεχνικών Σχολών, Σάββατο, 6/5/2010, Πανεπιστημιούπολη.
Date: March 6th, 2010.
-
Nicosia, Cyprus ("EPL202 - Explorations in Computer Science",
Undergraduate Course, Department of Computer Science, University of
Cyprus)
Date: Sept 30th, 2009.
-
Location: Heraklion, Crete (The 7th Hellenic Data Management Symposium (HDMS'08))
Date: July 29th, 2008.
Abstract: Continuous queries in wireless sensor networks are
established on the premise of a routing tree that provides each sensor
with a path over which answers can be transmitted to the query
processor. The number of tuples received by S in a given epoch e
denotes the workload of S. Since the exact workload of a node is not
known ahead of time, a node has to enable its transceiver for a
sub-optimal amount of time in order to collect the results from its
children. We found that this leads to an enormous waste of energy in
predominant data acquisition frameworks such as TAG and Cougar. We
found that these structures are sub-optimality constructed in
predominant data acquisition systems leading to an enormous waste of
energy. In this paper we present MicroPulse, a workload-aware
optimization algorithm for query routing trees in wireless sensor
networks. Our algorithm is established on profiling recent data
acquisition activity and on identifying the bottlenecks using an
in-network execution of the critical path method. A node S utilizes
this information in order to locally derive the time instance during
which it should wake up, the interval during which it should deliver
its workload and the workload increase tolerance of its parent node. We
additionally provide an elaborate description of energy-conscious
algorithms for disseminating and maintaining the critical path cost in
a distributed manner. Our trace-driven experimentation with real sensor
traces from Intel Research Berkeley shows that MicroPulse can reduce
the data acquisition costs by many orders.
-
Location:
Nicosia, Cyprus ("EPL601 - Distributed Systems", Graduate Course,
Department of Computer Science, University of Cyprus)
Date: November 16th, 2007.
-
Location: Athens, Greece (The 6th Hellenic Data Management Symposium (HDMS'07))
Date: July 5th 2007.
Abstract:
In this paper we introduce the distributed spatio-temporal similarity search problem: given
a query trajectory Q, we want to find the trajectories that follow a motion similar to Q,
when each of the target trajectories is segmented across a number of distributed nodes. We
propose two novel algorithms, UB-K and UBLB-K, which combine local computations of lower
and upper bounds on the matching between the distributed subsequences and Q. Such an operation
generates the desired result without pulling together all the distributed subsequences
over the fundamentally expensive communication medium. Our solutions find applications
in a wide array of domains, such as cellular networks, wildlife monitoring and video
surveillance. Our experimental evaluation using realistic data demonstrates that our
framework is both efficient and robust to a variety of conditions.
-
Location: Athens, Greece (The 6th Hellenic Data Management Symposium (HDMS'07))
Date: July 5th 2007.
Abstract:
In this paper we introduce MINT (Materialized In-Network Top-k) Views, a novel framework
for optimizing the execution of continuous monitoring queries in sensor networks.
A typical materialized view V maintains the complete results of a query Q in order
to minimize the cost of future query executions. In a sensor network context,
maintaining consistency between V and the underlying and distributed base relation
R is very expensive in terms of communication. Thus, our approach focuses on a
subset V' (\subseteq V) that unveils only the k highest-ranked answers at the
sink for some user defined parameter k. We additionally provide an elaborate
description of energy-conscious algorithms for constructing, pruning and maintaining
such recursively-defined in-network views. Our trace-driven experimentation with real
datasets show that MINT offers significant energy reductions compared to other
predominant data acquisition models.
-
Location:
Nicosia, Cyprus ("EPL601 - Distributed Systems", Graduate Course,
Department of Computer Science, University of Cyprus)
Date: October 12th, 2006.