Data Profiling mit Eclipse. Von den Grundlagen zum Prototypen (German Edition)

Data Profiling mit Eclipse: Von den Grundlagen zum Prototypen (German Edition) . Diplomica Verlag, Paperback. Used:Good.
Table of contents

Industrie Management , 26 2: Gunter Saake and Kai-Uwe Sattler. Elsevier Information Sciences , Post Execution Analysis of Business Processes: Investigation of Graph Mining for Business Processes. Data and Knowledge Engineering , 68 On the Impact of the Optional Feature Problem: Analysis and Case Studies. Using software product lines for runtime interoperability. Pauschalisierte Sicherheitsbetrachtungen automotiver Systeme.

SecuMedia Verlag, May University of Kiel, May Data and Knowledge Engineering , 68 8: A schema matching Context. Downsizing Data Management for Embedded Systems. Georgian Electronic Scientific Journal: Computer Science and Telecommunications , Proceedings , pages —, ACM Press, October Features as First-class Entities: Toward a Better Representation of Features. Fuzzy Constraint-based Schema Matching Formulation.

Modelling and Certifying Concurrent Systems: A Sequence-based Ontology Matching Approach. Practice and Experience, Special Issue: The Web on the Move , 9 4: An adaptive ECA-centric architecture for agile service-based business processes with compliant aspectual. Datenhaltung in eingebetteten Systemen. Datenbank Spektrum , 7 20 , February Features Interaction in Adaptive Service-driven Environments: Dynamic Interaction of Information Systems: Weaving Connectors on Component Petri Nets. Benefits and Drawbacks to Software Evolution.

A Unified Schema Matching Framework. In Grundlagen von Datenbanken , pages 57—61, Bretten, Germany, Aspects and Features in Concert. An Eclipse-Based Approach -. Rule-based Schema Matching for Ontology-based Mediators. Journal of Applied Logic , 3 2: A Mediator for E-Business. Beitragsband zum Studierenden-Programm bei der Fachtagung glqq Datenbanken in Business, Technologie und Web grqq , Ostvold , editors, Object-Oriented Technology.

Ingo Schmitt and Gunter Saake. Aspect Refinement in Software Product Lines. In Aspects and Software Product Lines: Coordination and Co-Nets for specifying and reconfiguring Agile information systems. Data and Knowledge Engineering , 50 2: System Evolution through Design Information Evolution: Saake, editors, Beitragsband zum Workshop Grundlagen und Anwendungen mobiler Informationstechnologie, Beitragsband zum Workshop Grundlagen und Anwendungen mobiler Informationstechnologie, Efficient Similarity-based Operations for Data Integration.

Data and Knowledge Engineering , 48 3: The Active Vertice Method: Data and Knowledge Engineering , 51 3: Schallehn, editors, Tagungsband zum GI-Workshop Grundlagen von Datenbanken Juni , Preprint Nr. Information Systems , July Logics for Emerging Applications of Databases. Informatik, Forschung und Entwicklung , 17 3: A two-level temporal logic for evolving specifications. Information Processing Letters , 83 3: Ngu, editors, 8th Int. Data and Knowledge Engineering , 42 2: Australian Computer Press, Application to a System with Several Lifts.

Extensible Grouping and Aggregation for Data Reconciliation. Advanced Grouping and Aggregation for Data Integration. Consistency management in object-oriented databases. Practice and Experience , 13 September , Freiburg, Germany , Andreas Heuer and Gunter Saake. Datenbank- und Visualisierungstechnologie in der Informationsfusion. Hinz, editors, Simulation und Visualisierung , Australian Journal of Information Systems , 8 1: Information Systems , 25 8: De Brock, and S. Database schema evolution and meta-modeling, Dagstuhl, Germany , pages 1—25, Foundations for Integrity Independence in Relational Databases.

In Fundamentals of Information Systems , pages — Kluwer Academic Publishers, Boston, Informationsfusion - Herausforderungen an die Datenbanktechnologie. Overview of the Magdeburg-Approach to Database Federations. Engineering Federated Information Systems. Workshop on Requirements Engineering: Modellierung'99, Karlruhe, Germany, March , pages — Consistency Management in Object-Oriented Databases.

Transactions and Database Dynamics, Proc. Design Support for Database Federations. Consistency Control in Object-Oriented Databases. The Theta Foundation Bucharest, Romania, DB-Rundbrief , Volume 21, pages 26—28, May Logics for Databases and Information Systems. Introduction to Logics for Databases and Information Systems. In Logics for Databases and Information Systems , pages 1—4. Evolving Logical Specification in Information Systems. In Logics for Databases and Information Systems , pages — Society for Design and Process Science, Transitive Dependencies in Transaction Closures.

Derived Transaction Termination Dependencies: An Algorithm for Computing Transitivity Rules. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs.

We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.

Two node variables determine the evolution of cascades in random networks: Correlations between both fundamentally change the robustness of a network, yet, they are disregarded in standard analytic methods as local tree or heterogeneous mean field approximations because of the bad tractability of order statistics.

We show how they become tractable in the thermodynamic limit of infinite network size. This enables the analytic description of node attacks that are characterized by threshold allocations based on node degree. Using two examples, we discuss possible implications of irregular phase transitions and different speeds of cascade evolution for the control of cascades. We analyze large-scale data sets about collaborations from two different domains: Considering the different domains of the data sets, we address two questions: In our data-driven modeling approach we use aggregated network data to calibrate the probabilities at which agents establish collaborations with either newcomers or established agents.

The model is then validated by its ability to reproduce network features not used for calibration, including distributions of degrees, path lengths, local clustering coefficients and sizes of disconnected components. Emphasis is put on comparing domains, but also sub-domains economic sectors, scientific specializations.

Our results shed new light on the long-standing question about the role of endogenous and exogenous factors i. It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relative citation count.

We use a large citation dataset from Microsoft Academic Graph and a new statistical framework based on the Mahalanobis distance to show that the rankings by well known indicators, including the relative citation count and Google's PageRank score, are significantly biased by paper field and age. Our statistical framework to assess ranking bias allows us to exactly quantify the contributions of each individual field to the overall bias of a given ranking. We propose a general normalization procedure motivated by the z-score which produces much less biased rankings when applied to citation count and PageRank score.

In this manuscript, we show how higher-order graphical models can be applied to study the controllability of networked systems with dynamic topologies. Studying empirical data on temporal networks, we specifically show that the order correlations in the activation sequence of links can both increase or reduce the time needed to achieve full controllability.

We then demonstrate how spectral properties of higher-order graphical models can be used to analytically explain the effect of order correlations on controllability in temporal networks. We introduce a framework for the modeling of sequential data capturing pathways of varying lengths observed in a network.


  • The Duty of a Beta (Pack Discipline Book 3)!
  • Büro- und Kontierungsservice.
  • Data Profiling mit Eclipse. Von den Grundlagen zum Prototypen (German Edition);
  • Performance-Driven Fundraising.

Such data are important, e. While it is common to apply graph analytics and network analysis to such data, recent works have shown that temporal correlations can invalidate the results of such methods. This raises a fundamental question: Addressing this open question, we propose a framework which combines Markov chains of multiple, higher orders into a multi-layer graphical model that captures temporal correlations in pathways at multiple length scales simultaneously. We develop a model selection technique to infer the optimal number of layers of such a model and show that it outperforms previously used Markov order detection techniques.

An application to eight real-world data sets on pathways and temporal networks shows that it allows to infer graphical models which capture both topological and temporal characteristics of such data. Our work highlights fallacies of network abstractions and provides a principled answer to the open question when they are justified. Generalizing network representations to multi-order graphical models, it opens perspectives for new data mining and knowledge discovery algorithms.

We introduce a statistical method to investigate the impact of dyadic relations on complex networks generated from repeated interactions. It is based on generalised hypergeometric ensembles, a class of statistical network ensembles developed recently. We represent different types of known relations between system elements by weighted graphs, separated in the different layers of a multiplex network. With our method we can regress the influence of each relational layer, the independent variables, on the interaction counts, the dependent variables.

Moreover, we can test the statistical significance of the relations as explanatory variables for the observed interactions. To demonstrate the power of our approach and its broad applicability, we will present examples based on synthetic and empirical data. In this article we study to what extent the academic peer review pro-cess is influenced by social relations between the authors of a manuscript and the editor handling the manuscript.

Taking the Open Access journal PlosOne as a case study, our analysis is based on a data set of more than , articles pub-lished between and Using available data on handling editor, submission and acceptance time of manuscripts, we study the question whether co-authorship relations between authors and the handling editor affect the manuscript handling time, i. Our analysis reveals i that editors handle papers co-authored by previous col-laborators significantly more often than expected at random, and ii that such prior co-author relations are significantly related to faster manuscript handling.

Addressing the question whether these shorter manuscript handling times can be explained by the quality of publications, we study the number of citations and downloads which accepted papers eventually accumulate.

by Björn Knebel

Our findings show that, even when correcting for other factors like time, experience, and performance, prior co-authorship relations have a large and significant influence on manuscript handling times, speeding up the editorial decision on average by 19 days. Full understanding of the underlying mechanisms, however, remains a challenging task. Using agent-based computer simulations, in this work we study the dynamics of emotional communications in online social networks. The rules that guide how the agents interact, are motivated by actual online social systems. The realistic network structure and some key parameters are inferred from the empirical dataset compiled from the MySpace social network.

Our results indicate that group behavior may arise from individual emotional actions of agents; the collective states appear, which are characterized by temporal correlations and predominantly positive emotions, in analogy to the empirical system; the driving signal—rate of the user stepping into the online world—has a profound effect on building the coherent behaviors that are observed in online social networks.

Moreover, our simulations suggest that spreading patterns may differ for the emotions with the entirely different positive and negative emotional content. Online communication takes a variety of shapes in the different technological media that allow users interact with each other, with their friends, or with arbitrarily large groups. These serve as breeding grounds for collective emotions, in which large amounts of users share emotional states through time. We present our modeling framework for collective emotions in online communities, which can be adapted for the different kinds of online interaction present in the cyberspace.

This approach aims at a unification of modeling efforts, connecting the sentiment analysis of big data with psychological experiments, through tractable agent-based models. We illustrate the applications of this framework to different online communities, including product reviews, chatrooms, virtual realities, and social networking sites. We show how our model reproduces properties of collective emotions in the reviews of Amazon, and the group discussions of IRC channels.

We comment the applications of this framework for data-driven simulation of emotions, and how we formulate testable hypotheses of emotion dynamics for future research on the field. Computer-mediated communication between humans is at the center of the formation of collective emotions on the Internet. This chapter presents how interactive affective systems can be applied in order to study the role of emotion in online communication at the micro-scale, i. Based on these findings, we propose applications for such systems focused on supporting different e-communities with real-time information and discuss ethical implications of such systems.

Using a large patent dataset, we demonstrate that coreness values strongly correlate with the number of patents of a firm. Analyzing coreness differences between firms and their partners, we identify a change in selecting partners: After that, well integrated firms with low coreness choose preferably partners with high coreness, either newcomers or firms from the periphery.

We use the agent-based model to test whether this change in behavior needs to be explained by means of strategic considerations, i. We find that the observed behavior can be well reproduced without such strategic considerations, this way challenging the role of strategies in explaining macro patterns of collaborations. We highlight the existence of a stronger positive FDI relationship in pairs of countries that are more central in the migration network. Both intensive and extensive forms of centrality are FDI enhancing.

We test the existence of anticipated shocks in online activity, a class of collective dynamics that does not fit in the state of the art theory on social response functions. We use data on shares and views to Youtube videos, measuring their time series to classify them according to their dynamical class. We find evidence of the existence of anticipated shocks, and that they are more likely to appear in word-of-mouth interaction than in attention dynamics. Our results show that not all exogenous events in online activity are unexpected, calling for new models that differentiate social interaction and attention dynamics.

The pervasive presence of online media in our society has transferred a significant part of political deliberation to online forums and social networking sites. This article examines popularity, reputation, and social influence on Twitter using large-scale digital traces from to We process network information on more than 40 million users, calculating new global measures of reputation that build on the D-core decomposition and the bow-tie structure of the Twitter follower network.

We integrate our measurements of popularity, reputation, and social influence to evaluate what keeps users active, what makes them more popular, and what determines their influence. We find that there is a range of values in which the risk of a user becoming inactive grows with popularity and reputation. Popularity in Twitter resembles a proportional growth process that is faster in its strongly connected component, and that can be accelerated by reputation when users are already popular. We find that social influence on Twitter is mainly related to popularity rather than reputation, but that this growth of influence with popularity is sublinear.

The explanatory and predictive power of our method shows that global network metrics are better predictors of inactivity and social influence, calling for analyses that go beyond local metrics like the number of followers. We study the changes in emotional states induced by reading and participating in online discussions, empirically testing a computational model of online emotional interaction. Using principles of dynamical systems, we quantify changes in valence and arousal through subjective reports, as recorded in three independent studies including participants female.

In the context of online discussions, the dynamics of valence and arousal are composed of two forces: The dynamics of valence show the existence of positive and negative tendencies, while arousal increases when reading emotional content regardless of its polarity.

The tendency of participants to take part in the discussion increases with positive arousal. When participating in an online discussion, the content of participants' expression depends on their valence, and their arousal significantly decreases afterwards as a regulation mechanism. We illustrate how these results allow the design of agent-based models to reproduce and analyze emotions in online communities. Our work empirically validates the microdynamics of a model of online collective emotions, bridging online data analysis with research in the laboratory. Statistical ensembles define probability spaces of all networks consistent with given aggregate statistics and have become instrumental in the analysis of relational data on networked systems.

Their numerical and analytical study provides the foundation for the inference of topological patterns, the definition of network-analytic measures, as well as for model selection and statistical hypothesis testing. Contributing to the foundation of these important data science techniques, in this article we introduce generalized hypergeometric ensembles, a framework of analytically tractable statistical ensembles of finite, directed and weighted networks.

This framework can be interpreted as a generalization of the classical configuration model, which is commonly used to randomly generate networks with a given degree sequence or distribution. Our generalization rests on the introduction of dyadic link propensities, which capture the degree-corrected tendencies of pairs of nodes to form edges between each other.

Studying empirical and synthetic data, we show that our approach provides broad perspectives for community detection, model selection and statistical hypothesis testing. We study the influence of risk diversification on cascading failures in weighted complex networks, where weighted directed links represent exposures between nodes.

These weights result from different diversification strategies and their adjustment allows us to reduce systemic risk significantly by topological means. As an example, we contrast a classical exposure diversification ED approach with a damage diversification DD variant. The latter reduces the loss that the failure of high degree nodes generally inflict to their network neighbors and thus hampers the cascade amplification. To quantify the final cascade size and obtain our results, we develop a branching process approximation taking into account that inflicted losses cannot only depend on properties of the exposed, but also of the failing node.

This analytic extension is a natural consequence of the paradigm shift from individual to system safety. To deepen our understanding of the cascade process, we complement this systemic perspective by a mesoscopic one: Additionally, we ask for the role of these failures in the cascade amplification.

Contributing to the writing of history has never been as easy as it is today thanks to Wikipedia, a community-created encyclopedia that aims to document the world's knowledge from a neutral point of view. Though everyone can participate it is well known that the editor community has a narrow diversity, with a majority of white male editors.

While this participatory gender gap has been studied extensively in the literature, this work sets out to assess potential gender inequalities in Wikipedia articles along different dimensions: We find that i women in Wikipedia are more notable than men which we interpret as the outcome of a subtle glass ceiling effect; ii family-, gender-and relationship-related topics are more present in biographies about women; iii linguistic biases manifest in Wikipedia since abstract terms tend to be used to describe positive aspects in the biographies of men and negative aspects in the biographies of women; and iv there are structural differences in terms of meta-data and hyperlinks, which have consequences for information-seeking activities.

While some differences are expected, due to historical and social contexts, other differences are attributable to Wikipedia editors. The implications of such differences are discussed, specially having Wikipedia contribution policies in mind. We hope that our work contributes to increase awareness about, first, gender issues in the content of Wikipedia, and second, the different levels on which gender biases can manifest on the Web.

We find that most network properties are not only invariant across sectors, but also independent of the scale of aggregation at which they are observed, and we highlight the presence of core-periphery architectures in explaining some properties emphasized in previous empirical studies e. We find that such dynamics is driven by mechanisms of accumulative advantage, structural homophily and multiconnectivity.

In particular, the change from the "rise" to the "fall" phase is associated to a structural break in the importance of multiconnectivity. Extant research has pointed out that firms select alliance partners considering both network-related and network-unrelated features e. In our agent-based model, firms are located in a metric knowledge space.

The interaction rules incorporate an exploration phase and a knowledge transfer phase, during which firms search for a new partner and then evaluate whether they can establish an alliance to exchange their knowledge stocks. The model parameters determining the overall system properties are the rate at which alliances form and dissolve and the agents' interaction radius. Next, we define a novel indicator of performance, based on the distance traveled by the firms in the knowledge space.

Remarkably, we find that - depending on the alliance formation rate and the interaction radius - firms tend to cluster around one or more attractors in the knowledge space, whose position is an emergent property of the system. And, more importantly, we find that there exists an inverted U-shaped dependence of the network performance on both model parameters.

Refine list

Through the analysis of collective upvotes and downvotes in multiple social media, we discover the bimodal regime of collective evaluations. When online content surpasses the local social context by reaching a threshold of collective attention, negativity grows faster with positivity, which serves as a trace of the burst of a filter bubble. To attain a global audience, we show that emotions expressed in online content has a significant effect and also play a key role in creating polarized opinions.

Urban structures encompass settlements, characterized by the spatial distribution of built-up areas, and also transportation structures, to connect these built-up areas. These two structures are very different in their origin and function, fulfilling complementary needs: Their evolution cannot be understood by looking at the dynamics of urban aggregations and transportation systems separately.

Instead, existing built-up areas feed back on the further development of transportation structures, and the availability of the latter feeds back on the future growth of urban aggregations.

Data Profiling mit Eclipse: Von den Grundlagen zum Prototypen (German Edition)

To model this co-evolution, we propose an agent-based approach that builds on existing agent-based models for the evolution of trail systems and urban settlements. The key element in these separate approaches is a generalized communication of agents by means of an adaptive landscape. This landscape is only generated by the agents, but once it exists, it feeds back on their further actions.

The emerging trail system or urban aggregation results as a self-organized structure from these collective interactions.

Home - BKS Durgeloh

In our co-evolutionary approach, we couple these two separate models by means of meta-agents that represent humans with their different demands for housing and mobility. We characterize our approach as a statistical ensemble approach, which allows to capture the potential of urban evolution in a bottom-up manner, but can be validated against empirical observations. We study cascades on a two-layer multiplex network, with asymmetric feedback that depends on the coupling strength between the layers.

Based on an analytical branching process approximation, we calculate the systemic risk measured by the final fraction of failed nodes on a reference layer. The results are compared with the case of a single layer network that is an aggregated representation of the two layers.

We find that systemic risk in the two-layer network is smaller than in the aggregated one only if the coupling strength between the two layers is small. Above a critical coupling strength, systemic risk is increased because of the mutual amplification of cascades in the two layers. We even observe sharp phase transitions in the cascade size that are less pronounced on the aggregated layer. Our insights can be applied to a scenario where firms decide whether they want to split their business into a less risky core business and a more risky subsidiary business.

In most cases, this may lead to a drastic increase of systemic risk, which is underestimated in an aggregated approach. The QWERTY effect postulates that the keyboard layout influences word meanings by linking positivity to the use of the right hand and negativity to the use of the left hand. For example, previous research has established that words with more right hand letters are rated more positively than words with more left hand letters by human subjects in small scale experiments.

Using data from eleven web platforms related to products, movies, books, and videos, we conduct observational tests whether a hand-meaning relationship can be found in text interpretations by web users. Furthermore, we investigate whether writing text on the web exhibits the QWERTY effect as well, by analyzing the relationship between the text of online reviews and their star ratings in four additional datasets. Overall, we find robust evidence for the QWERTY effect both at the point of text interpretation decoding and at the point of text creation encoding.

We also find under which conditions the effect might not hold. Our findings have implications for any algorithmic method aiming to evaluate the meaning of words on the web, including for example semantic or sentiment analysis, and show the existence of "dactilar onomatopoeias" that shape the dynamics of word-meaning associations. To the best of our knowledge, this is the first work to reveal the extent to which the QWERTY effect exists in large scale human-computer interaction on the web.

Recent research on temporal networks has highlighted the limitations of a static network perspective for our understanding of complex systems with dynamic topologies. In particular, recent works have shown that i the specific order in which links occur in real-world temporal networks affects causality structures and thus the evolution of dynamical processes, and ii higher-order aggregate representations of temporal networks can be used to analytically study the effect of these order correlations on dynamical processes. In this article we analyze the effect of order correlations on path-based centrality measures in real-world temporal networks.

Analyzing temporal equivalents of betweenness, closeness and reach centrality in six empirical temporal networks, we first show that an analysis of the commonly used static, time-aggregated representation can give misleading results about the actual importance of nodes.


  • dblp: Gunter Saake.
  • Tricks, Tactics, and Techniques from Published Authors: Thoughts on Traditional vs. E-book Publishin.
  • 2010 – today.
  • Reaching For Celestial Heights: Uplifting, Encouraging, and Success Poems including some written for;
  • Publications - ETH - Chair of Systems Design - Welcome;
  • FUTURE TENSE;

We further study higher-order time-aggregated networks, a recently proposed generalization of the commonly applied static, time-aggregated representation of temporal networks. Here, we particularly define path-based centrality measures based on second-order aggregate networks, empirically validating that node centralities calculated in this way better capture the true temporal centralities of nodes than node centralities calculated based on the commonly used static first-order representation.

Apart from providing a simple and practical method for the approximation of path-based centralities in temporal networks, our results highlight interesting perspectives for the use of higher-order aggregate networks in the analysis of time-stamped network data. We analyze the controllability of a two-layer network, where driver nodes can be chosen randomly only from one layer. Each layer contains a scale-free network with directed links and the node dynamics depends on the incoming links from other nodes.

We combine the in-degree and out-degree values to assign an importance value w to each node, and distinguish between peripheral nodes with low w and central nodes with high w. Based on numerical simulations, we find that the controllable part of the network is larger when choosing low w nodes to connect the two layers. The control is as efficient when peripheral nodes are driver nodes as it is for the case of more central nodes. However, if we assume a cost to utilize nodes that is proportional to their overall degree, utilizing peripheral nodes to connect the two layers or to act as driver nodes is not only the most cost-efficient solution, it is also the one that performs best in controlling the two-layer network among the different interconnecting strategies we have tested.

The social connections, or ties, individuals create affect their life outcomes, for example, by providing novel information that leads to new jobs or career opportunities. A host of socioeconomic and cognitive factors are believed to affect social interactions, but few of these factors have been empirically validated. In this research work, we extracted a large corpus of data from a popular social media platform that consists of geo-referenced messages, or tweets, posted from a major US metropolitan area.

We linked these tweets to US Census data through their locations. This allowed us to measure emotions expressed in tweets posted from a specific area, and also use that area's socioeconomic and demographic characteristics in the analysis. We extracted the structure of social interactions from the people mentioned in tweets from that area.

millrace-cedarfalls.com

We find that at an aggregate level, areas where social media users engage in stronger, less diverse online social interactions are those where they express more negative emotions, like sadness and anger. With respect to demographics, these areas have larger numbers of Hispanic residents, lower mean household income, and lower education levels. Conversely, areas with weaker, more diverse online interactions are associated with happier, more positive feelings and also have better educated, younger and higher-earning residents.

Our work highlights the value of linking social media data to traditional data sources, such as US Census, to drive novel analysis of online behavior. Complex software development projects rely on the contribution of teams of developers, who are required to collaborate and coordinate their efforts. The productivity of such development teams, i. The majority of studies in empirical software engineering suggest that - due to coordination overhead - teams of collaborating developers become less productive as they grow in size.

Outside software engineering, the non-additive scaling of productivity in teams is often referred to as the Ringelmann effect, which is studied extensively in social psychology and organizational theory. Conversely, a recent study suggested that in Open Source Software OSS projects, the productivity of developers increases as the team grows in size. Using a data set of 58 OSS projects with more than , commits contributed by more than 30, developers, in this article we provide a large-scale analysis of the relation between size and productivity of software development teams.

Our findings confirm the negative relation between team size and productivity previously suggested by empirical software engineering research, thus providing quantitative evidence for the presence of a strong Ringelmann effect. Using fine-grained data on the association between developers and source code files, we investigate possible explanations for the observed relations between team size and productivity.

In particular, we take a network perspective on developer-code associations in software development teams and show that the magnitude of the decrease in productivity is likely to be related to the growth dynamics of co-editing networks which can be interpreted as a first-order approximation of coordination requirements.

Getting started on Data profiling

We study properties of multi-layered, interconnected networks from an ensemble perspective, i. Using a diffusive process that evolves on a multi-layer network, we analyze how the speed of diffusion depends on the aggregate characteristics of both intra- and inter-layer connectivity. Through a block-matrix model representing the distinct layers, we construct transition matrices of random walkers on multi-layer networks, and estimate expected properties of multi-layer networks using a mean-field approach.

In addition, we quantify and explore conditions on the link topology that allow to estimate the ensemble average by only considering aggregate statistics of the layers. Our approach can be used when only partial information is available, like it is usually the case for real-world multi-layer complex systems. Location-sharing services were built upon people's desire to share their activities and locations with others.

By "checking-in" to a place, such as a restaurant, a park, gym, or train station, people disclose where they are, thereby providing valuable information about land use and utilization of services in urban areas. This information may, in turn, be used to design smarter, happier, more equitable cities. We use data from Foursquare location-sharing service to identify areas within a major US metropolitan area with many check-ins, i. We then use data from the Twitter microblogging platform to analyze the properties of these areas. Specifically, we have extracted a large corpus of geo-tagged messages, called tweets, from a major metropolitan area and linked them US Census data through their locations.

This allows us to measure the sentiment expressed in tweets that are posted from a specific area, and also use that area's demographic properties in analysis. Our results reveal that areas with many check-ins are different from other areas within the metropolitan region. In particular, these areas have happier tweets, which also encourage people from other areas to commute longer distances to these places. These findings shed light on human mobility patterns, as well as how physical environment influences human emotions.

How are economic activities linked to geographic locations? To answer this question, we use a data-driven approach that builds on the information about location, ownership and economic activities of the world's 3, largest firms and their almost one million subsidiaries. From this information we generate a bipartite network of cities linked to economic activities.

Analysing the structure of this network, we find striking similarities with nested networks observed in ecology, where links represent mutualistic interactions between species. This motivates us to apply ecological indicators to identify the unbalanced deployment of economic activities. Such deployment can lead to an over-representation of specific economic sectors in a given city, and poses a significant thread for the city's future especially in times when the over-represented activities face economic uncertainties.

If we compare our analysis with external rankings about the quality of life in a city, we find that the nested structure of the city-firm network also reflects such information about the quality of life, which can usually be assessed only via dedicated survey-based indicators. However, the narrow diversity of the Wikipedia editor community has the potential to introduce systemic biases such as gender biases into the content of Wikipedia. In this paper we aim to tackle a sub problem of this larger challenge by presenting and applying a computational method for assessing gender bias on Wikipedia along multiple dimensions.

We find that while women on Wikipedia are covered and featured well in many Wikipedia language editions, the way women are portrayed starkly differs from the way men are portrayed. We hope our work contributes to increasing awareness about gender biases online, and in particular to raising attention to the different levels in which gender biases can manifest themselves on the web. In our model, agents form links based on their network features, i. Furthermore, we validate the model against real data using a two-step approach.

The underlying knowledge space that we consider in our real example is defined by IPC patent classes, allowing for a precise quantification of every firm's knowledge position. Our novel data-driven approach allows us to unveil the complex interdependencies between the firms' network embeddedness and their technological positions.

Most of the alliances, indeed, have no consequence on the partners' knowledge positions: Finally, we propose an indicator of collaboration performance for the whole network. Our study shows that there exist configurations that can be both realistic and optimized with respect to the collaboration performance. Metastasizing tumor cells migrate through the surrounding tissue and extracellular matrix toward the blood vessels, in order to colonize distant organs.

They typically move in a dense environment, filled with other cells. In this work we study cooperative effects between neighboring cells of different types, migrating in a maze-like environment with directional cue. Wall degradation of mesenchymal cells, as well as motility of both types of cells, are coupled to metabolic energy-like resource level. We find that indirect cooperation emerges in mid-level energy, as mesenchymal cells create paths that are used by amoeboids.

Therefore, we expect to see a small population of mesenchymals kept in a mostly-amoeboid population. We also study different forms of direct interaction between the cells, and show that energy-dependent interaction strength is optimal for the migration of both mesenchymals and amoeboids. The obtained characteristics of cellular cluster size are in agreement with experimental results.

We therefore predict that hybrid states, e. Building relationships is crucial for satisfaction and success, especially when entering new social contexts. In Study 2, linguistic analysis of the tweets from over Twitter users from formation of their accounts revealed that use of IER predicted greater popularity in terms of the number of followers gained.

However, not all types of IER had positive effects. Add to cart Add to wishlist E-mail a link to this book. Ergodebooks US Bookseller Inventory: Diplomica Verlag Date published: Log-in or create an account first! Add to cart Add to wishlist. Ask the seller a question. How to tell if a signature is genuine Author's signatures can undoubtedly increase the value of a rare book or first edition. Collecting Books on NYC This gallery is dedicated to some of the best fiction and nonfiction portraying one of the greatest cities in the world.